We are excited to introduce CapRL-3B, a lightweight 3B image captioner that achieves perception capabilities comparable to Qwen2.5-VL-72B.
This is the first study of applying Reinforcement Learning with Verifiable Rewards for the
open-ended and subjective image captioning task. Unlike traditional Supervised Fine-Tuning, which
can lead to models memorizing a limited set of annotated captions, our method allows the model to
explore and generate a broader range of creative and general descriptions.
CapRL is a new training paradigm featuring a decoupled two-stage pipeline. The initial
stage uses LVLMs to generate rich and accurate captions. Subsequently, the second stage evaluates
caption quality by using a vision-only LLM to perform the QA task. We also created a specific QA
curation pipeline to ensure the quality of the questions and answers used for the second stage.
By employing CapRL training framework, initializing with the Qwen2.5-VL-3B model, and using a carefully
filtered 75K QA dataset as the training set, we obtained a highly capable captioner, CapRL-3B.
Key Features
Remarkable visual understanding for Chart, Infographics and Document
: CapRL-3B achieves perception accuracy and visual information coverage comparable to Qwen2.5-VL-72B.
Well-organized output
: The outputs of CapRL-3B are relatively well-structured, making them clear and easy to understand.
Detailed description for natural images
: The outputs of CapRL-3B can perfectly cover all valid visual information while containing fewer hallucinations.
Usage
If you want to use
CapRL-3B
for captioning, you can directly follow the exact same inference approach as in
Qwen2.5-VL-series
.
We recommend using
vLLM
to speed up inference.
Start an OpenAI API Service
Run the command below to start an OpenAI-compatible API service:
CapRL-3B huggingface.co is an AI model on huggingface.co that provides CapRL-3B's model effect (), which can be used instantly with this internlm CapRL-3B model. huggingface.co supports a free trial of the CapRL-3B model, and also provides paid use of the CapRL-3B. Support call CapRL-3B model through api, including Node.js, Python, http.
CapRL-3B huggingface.co is an online trial and call api platform, which integrates CapRL-3B's modeling effects, including api services, and provides a free online trial of CapRL-3B, you can try CapRL-3B online for free by clicking the link below.
internlm CapRL-3B online free url in huggingface.co:
CapRL-3B is an open source model from GitHub that offers a free installation service, and any user can find CapRL-3B on GitHub to install. At the same time, huggingface.co provides the effect of CapRL-3B install, users can directly use CapRL-3B installed effect in huggingface.co for debugging and trial. It also supports api for free installation.