igorktech / grpo

huggingface.co
Total runs: 0
24-hour runs: 0
7-day runs: 0
30-day runs: 0
Model's Last Updated: October 30 2025

Introduction of grpo

Model Details of grpo

Model Card for grpo

This model is a fine-tuned version of unsloth/Llama-3.2-1B-Instruct . It has been trained using TRL .

Quick start
from transformers import pipeline

question = "If you had a time machine, but could only go to the past or the future once and never return, which would you choose and why?"
generator = pipeline("text-generation", model="igorktech/grpo", device="cuda")
output = generator([{"role": "user", "content": question}], max_new_tokens=128, return_full_text=False)[0]
print(output["generated_text"])
Training procedure

Visualize in Weights & Biases

This model was trained with GRPO, a method introduced in DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models .

Framework versions
  • TRL: 0.23.0
  • Transformers: 4.56.2
  • Pytorch: 2.9.0a0+git1c57644
  • Datasets: 4.3.0
  • Tokenizers: 0.22.1
Citations

Cite GRPO as:

@article{shao2024deepseekmath,
    title        = {{DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models}},
    author       = {Zhihong Shao and Peiyi Wang and Qihao Zhu and Runxin Xu and Junxiao Song and Mingchuan Zhang and Y. K. Li and Y. Wu and Daya Guo},
    year         = 2024,
    eprint       = {arXiv:2402.03300},
}

Cite TRL as:

@misc{vonwerra2022trl,
    title        = {{TRL: Transformer Reinforcement Learning}},
    author       = {Leandro von Werra and Younes Belkada and Lewis Tunstall and Edward Beeching and Tristan Thrush and Nathan Lambert and Shengyi Huang and Kashif Rasul and Quentin Gallou{\'e}dec},
    year         = 2020,
    journal      = {GitHub repository},
    publisher    = {GitHub},
    howpublished = {\url{https://github.com/huggingface/trl}}
}

Runs of igorktech grpo on huggingface.co

0
Total runs
0
24-hour runs
0
3-day runs
0
7-day runs
0
30-day runs

More Information About grpo huggingface.co Model

grpo huggingface.co

grpo huggingface.co is an AI model on huggingface.co that provides grpo's model effect (), which can be used instantly with this igorktech grpo model. huggingface.co supports a free trial of the grpo model, and also provides paid use of the grpo. Support call grpo model through api, including Node.js, Python, http.

igorktech grpo online free

grpo huggingface.co is an online trial and call api platform, which integrates grpo's modeling effects, including api services, and provides a free online trial of grpo, you can try grpo online for free by clicking the link below.

igorktech grpo online free url in huggingface.co:

https://huggingface.co/igorktech/grpo

grpo install

grpo is an open source model from GitHub that offers a free installation service, and any user can find grpo on GitHub to install. At the same time, huggingface.co provides the effect of grpo install, users can directly use grpo installed effect in huggingface.co for debugging and trial. It also supports api for free installation.

grpo install url in huggingface.co:

https://huggingface.co/igorktech/grpo

Url of grpo

Provider of grpo huggingface.co

igorktech
ORGANIZATIONS

Other API from igorktech

huggingface.co

Total runs: 20
Run Growth: 0
Growth Rate: 0.00%
Updated:August 06 2025
huggingface.co

Total runs: 16
Run Growth: 0
Growth Rate: 0.00%
Updated:August 06 2025
huggingface.co

Total runs: 13
Run Growth: 0
Growth Rate: 0.00%
Updated:August 06 2025
huggingface.co

Total runs: 9
Run Growth: 2
Growth Rate: 22.22%
Updated:January 18 2024
huggingface.co

Total runs: 9
Run Growth: 4
Growth Rate: 44.44%
Updated:February 01 2023
huggingface.co

Total runs: 0
Run Growth: 0
Growth Rate: 0.00%
Updated:October 29 2025
huggingface.co

Total runs: 0
Run Growth: 0
Growth Rate: 0.00%
Updated:August 06 2025