Skywork-R1V4
is a 30B (A3B) multimodal agent that unifies:
Multimodal task planning
Active image manipulation (“thinking with images”)
Deep multimodal search (text × image)
Interleaved tool-grounded reasoning
Skywork-R1V4 is trained
purely via supervised finetuning
on
< 30k high-quality, execution-consistent trajectories
.
At inference time, the model exhibits
emergent long-horizon reasoning
, executing
10+ tool calls
across visual operations and web search to solve complex real-world tasks.
Skywork-R1V4 achieves
state-of-the-art performance
on multimodal search benchmarks:
MMSearch: 66.1
FVQA: 67.2
Beats Gemini 2.5 Flash on all 11 comparable metrics
2. Feature
🔍
“Thinking With Images”
Skywork-R1V4 actively manipulates images through:
• Multi-stage cropping
• Local detail extraction
• Region attention
• Visual clue refinement
If you use Skywork-R1V4 in your research, please cite:
@misc{zhang2025skyworkr1v4agenticmultimodalintelligence,
title={Skywork-R1V4: Toward Agentic Multimodal Intelligence through Interleaved Thinking with Images and DeepResearch},
author={Yifan Zhang and Liang Hu and Haofeng Sun and Peiyu Wang and Yichen Wei and Shukang Yin and Jiangbo Pei and Wei Shen and Peng Xia and Yi Peng and Tianyidan Xie and Eric Li and Yang Liu and Xuchen Song and Yahui Zhou},
year={2025},
eprint={2512.02395},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2512.02395},
}
@misc{peng2025skyworkr1vpioneeringmultimodal,
title={Skywork R1V: Pioneering Multimodal Reasoning with Chain-of-Thought},
author={Yi Peng and Peiyu Wang and Xiaokun Wang and Yichen Wei and Jiangbo Pei and Weijie Qiu and Ai Jian and Yunzhuo Hao and Jiachun Pan and Tianyidan Xie and Li Ge and Rongxian Zhuang and Xuchen Song and Yang Liu and Yahui Zhou},
year={2025},
eprint={2504.05599},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2504.05599},
}
R1V4 huggingface.co is an AI model on huggingface.co that provides R1V4's model effect (), which can be used instantly with this Skywork R1V4 model. huggingface.co supports a free trial of the R1V4 model, and also provides paid use of the R1V4. Support call R1V4 model through api, including Node.js, Python, http.
R1V4 huggingface.co is an online trial and call api platform, which integrates R1V4's modeling effects, including api services, and provides a free online trial of R1V4, you can try R1V4 online for free by clicking the link below.
R1V4 is an open source model from GitHub that offers a free installation service, and any user can find R1V4 on GitHub to install. At the same time, huggingface.co provides the effect of R1V4 install, users can directly use R1V4 installed effect in huggingface.co for debugging and trial. It also supports api for free installation.