Trinity-Large-Thinking is a reasoning-optimized variant of Arcee AI's Trinity-Large family — a 398B-parameter sparse Mixture-of-Experts (MoE) model with approximately 13B active parameters per token, post-trained with extended chain-of-thought reasoning and agentic RL.
This repository contains the NVFP4 quantized weights of Trinity-Large-Thinking for deployment on NVIDIA Blackwell GPUs.
For full model details, benchmarks, and usage guidance, see the main
Trinity-Large-Thinking
model card.
Note (For Blackwell pip installs):
If installing vLLM via pip on Blackwell rather than using Docker, native FP4 kernels may produce incorrect output due to package version mismatches. As a workaround, force the Marlin backend:
Marlin decompresses FP4 weights to BF16 for compute, providing the full memory compression benefit but not native FP4 compute speedup. On Hopper GPUs (H100/H200), Marlin is selected automatically and no extra flags are needed.
Works out of the box on
OpenRouter
as
arcee-ai/trinity-large-thinking
.
License
Trinity-Large-Thinking-NVFP4 is released under the Apache License, Version 2.0.
Citation
If you use this model, please cite:
@misc{singh2026arceetrinity,
title = {Arcee Trinity Large Technical Report},
author = {Varun Singh and Lucas Krauss and Sami Jaghouar and Matej Sirovatka and Charles Goddard and Fares Obied and Jack Min Ong and Jannik Straube and Fern and Aria Harley and Conner Stewart and Colin Kealty and Maziyar Panahi and Simon Kirsten and Anushka Deshpande and Anneketh Vij and Arthur Bresnu and Pranav Veldurthi and Raghav Ravishankar and Hardik Bishnoi and DatologyAI Team and Arcee AI Team and Prime Intellect Team and Mark McQuade and Johannes Hagemann and Lucas Atkins},
year = {2026},
eprint = {2602.17004},
archivePrefix= {arXiv},
primaryClass = {cs.LG},
doi = {10.48550/arXiv.2602.17004},
url = {https://arxiv.org/abs/2602.17004}
}
Runs of arcee-ai Trinity-Large-Thinking-NVFP4 on huggingface.co
840
Total runs
0
24-hour runs
1
3-day runs
7
7-day runs
631
30-day runs
More Information About Trinity-Large-Thinking-NVFP4 huggingface.co Model
More Trinity-Large-Thinking-NVFP4 license Visit here:
Trinity-Large-Thinking-NVFP4 huggingface.co is an AI model on huggingface.co that provides Trinity-Large-Thinking-NVFP4's model effect (), which can be used instantly with this arcee-ai Trinity-Large-Thinking-NVFP4 model. huggingface.co supports a free trial of the Trinity-Large-Thinking-NVFP4 model, and also provides paid use of the Trinity-Large-Thinking-NVFP4. Support call Trinity-Large-Thinking-NVFP4 model through api, including Node.js, Python, http.
Trinity-Large-Thinking-NVFP4 huggingface.co is an online trial and call api platform, which integrates Trinity-Large-Thinking-NVFP4's modeling effects, including api services, and provides a free online trial of Trinity-Large-Thinking-NVFP4, you can try Trinity-Large-Thinking-NVFP4 online for free by clicking the link below.
arcee-ai Trinity-Large-Thinking-NVFP4 online free url in huggingface.co:
Trinity-Large-Thinking-NVFP4 is an open source model from GitHub that offers a free installation service, and any user can find Trinity-Large-Thinking-NVFP4 on GitHub to install. At the same time, huggingface.co provides the effect of Trinity-Large-Thinking-NVFP4 install, users can directly use Trinity-Large-Thinking-NVFP4 installed effect in huggingface.co for debugging and trial. It also supports api for free installation.
Trinity-Large-Thinking-NVFP4 install url in huggingface.co: