dParallel_Dream_7B_Instruct huggingface.co api & Zigeng dParallel_Dream_7B_Instruct github AI Model

Introduction of dParallel_Dream_7B_Instruct

Model Details of dParallel_Dream_7B_Instruct

🚀 dParallel: Learnable Parallel Decoding for dLLMs

dParallel: Learnable Parallel Decoding for dLLMs
Zigeng Chen , Gongfan Fang , Xinyin Ma , Ruonan Yu , Xinchao Wang
xML Lab , National University of Singapore

💡 Introduction

We introduce dParallel, a simple and effective method that unlocks the inherent parallelism of dLLMs for fast sampling. We identify that the key bottleneck to parallel decoding arises from the sequential certainty convergence for masked tokens. Building on this insight, we introduce the core of our approach: certainty-forcing distillation, a novel training strategy that distills the model to follow its original sampling trajectories while enforcing it to achieve high certainty on masked tokens more rapidly and in parallel. Extensive experiments across various benchmarks demonstrate that our method can dramatically reduce the number of decoding steps while maintaining performance. When applied to the LLaDA-8B-Instruct model, dParallel reduces decoding steps from 256 to 30 on GSM8K, achieving an 8.5x speedup without performance degradation. On the MBPP benchmark, it cuts decoding steps from 256 to 24, resulting in a 10.5x speedup while maintaining accuracy.

Overview of proposed certainty-forcing distillation.

💻 Model and Datasets

📄 Paper	ArXiv-Link
🤖 LLaDA Model	dParallel-LLaDA-8B-instruct
🤖 Dream Model	dParallel-Dream-7B-instruct
📊 LLaDA Data	dParallel-LLaDA-Distill Dataset
📊 Dream Data	dParallel-Dream-Distill Dataset

🚀 Quick Start:

import torch
from transformers import AutoModel, AutoTokenizer
import types

model_path = "Zigeng/dParallel_Dream_7B_Instruct"
model = AutoModel.from_pretrained(model_path, torch_dtype=torch.bfloat16, trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)
model = model.to("cuda").eval()

from model.generation_utils_semiar import DreamGenerationMixin
model.diffusion_generate = types.MethodType(DreamGenerationMixin.diffusion_generate, model)
model._sample = types.MethodType(DreamGenerationMixin._sample, model)


messages = [
    {"role": "user", "content": "Toulouse has twice as many sheep as Charleston. Charleston has 4 times as many sheep as Seattle. How many sheep do Toulouse, Charleston, and Seattle have together if Seattle has 20 sheep? Let's think step by step."}
]

inputs =  tokenizer.apply_chat_template(
                messages,
                tokenize=False,
                add_generation_prompt=True
            )

inputs = tokenizer.apply_chat_template(
    messages, return_tensors="pt", return_dict=True, add_generation_prompt=True
)
input_ids = inputs.input_ids.to(device="cuda")
attention_mask = inputs.attention_mask.to(device="cuda")

output, nfe = model.diffusion_generate(
        input_ids,
        attention_mask=attention_mask,
        max_new_tokens=256,
        output_history=False,
        return_dict_in_generate=True,
        steps=256,
        temperature=0.,
        top_p=None,
        alg="entropy_threshold",
        alg_temp=0.1,
        top_k=None,
        block_length=32,
        threshold=0.5,
    )

generations = [
    tokenizer.decode(g[0:].tolist())
    for p, g in zip(input_ids, output.sequences)
]

print(generations[0].split(tokenizer.eos_token)[0])
print("NFE:", nfe)

📖 Experimental Results

Results on LLaDA-8B-Instruct:

Results on Dream-7B-Instruct:

Better Speed-Accuracy Trade-off:

☀️ Acknowledgement

Our code builds on LLaDA , Dream , Fast-dLLM , and dKV-Cache , and we acknowledge these great works for laying the groundwork that made our approach possible.

Citation

If our research assists your work, please give us a star ⭐ or cite us using:

@article{chen2025dparallel,
  title={dParallel: Learnable Parallel Decoding for dLLMs},
  author={Chen, Zigeng and Fang, Gongfan and Ma, Xinyin and Yu, Ruonan and Wang, Xinchao},
  journal={arXiv preprint arXiv:2509.26488},
  year={2025}
}

Runs of Zigeng dParallel_Dream_7B_Instruct on huggingface.co

1.9K

Total runs

24-hour runs

3-day runs

-31

7-day runs

1.8K

30-day runs

More Information About dParallel_Dream_7B_Instruct huggingface.co Model

More dParallel_Dream_7B_Instruct license Visit here:

https://choosealicense.com/licenses/mit

dParallel_Dream_7B_Instruct huggingface.co

dParallel_Dream_7B_Instruct huggingface.co is an AI model on huggingface.co that provides dParallel_Dream_7B_Instruct's model effect (), which can be used instantly with this Zigeng dParallel_Dream_7B_Instruct model. huggingface.co supports a free trial of the dParallel_Dream_7B_Instruct model, and also provides paid use of the dParallel_Dream_7B_Instruct. Support call dParallel_Dream_7B_Instruct model through api, including Node.js, Python, http.

dParallel_Dream_7B_Instruct huggingface.co Url

https://huggingface.co/Zigeng/dParallel_Dream_7B_Instruct

Zigeng dParallel_Dream_7B_Instruct online free

dParallel_Dream_7B_Instruct huggingface.co is an online trial and call api platform, which integrates dParallel_Dream_7B_Instruct's modeling effects, including api services, and provides a free online trial of dParallel_Dream_7B_Instruct, you can try dParallel_Dream_7B_Instruct online for free by clicking the link below.

Zigeng dParallel_Dream_7B_Instruct online free url in huggingface.co:

https://huggingface.co/Zigeng/dParallel_Dream_7B_Instruct

dParallel_Dream_7B_Instruct install

dParallel_Dream_7B_Instruct is an open source model from GitHub that offers a free installation service, and any user can find dParallel_Dream_7B_Instruct on GitHub to install. At the same time, huggingface.co provides the effect of dParallel_Dream_7B_Instruct install, users can directly use dParallel_Dream_7B_Instruct installed effect in huggingface.co for debugging and trial. It also supports api for free installation.

dParallel_Dream_7B_Instruct install url in huggingface.co:

https://huggingface.co/Zigeng/dParallel_Dream_7B_Instruct

huggingface.co

Zigeng/DMax-Math-16B

Total runs: 12.8K

Run Growth: 12.8K

Growth Rate: 100.00%

Updated:April 20 2026

huggingface.co

Zigeng/SlimSAM-uniform-77

Total runs: 2.7K

Run Growth: -1.4K

Growth Rate: -53.14%

Updated:November 08 2024

huggingface.co

Zigeng/DMax-Coder-16B

Total runs: 1.7K

Run Growth: 1.7K

Growth Rate: 100.00%

Updated:April 20 2026

huggingface.co

Zigeng/R1-VeriThinker-7B

Total runs: 827

Run Growth: -256

Growth Rate: -30.96%

Updated:May 27 2025

huggingface.co

Zigeng/SlimSAM-uniform-50

Total runs: 183

Run Growth: 89

Growth Rate: 48.63%

Updated:November 08 2024

huggingface.co

Zigeng/dParallel-LLaDA-8B-instruct

Total runs: 151

Run Growth: 8

Growth Rate: 5.30%

Updated:October 14 2025

huggingface.co

Zigeng/SlimSAM

Total runs: 0

Run Growth: 0

Growth Rate: 0.00%

Updated:December 19 2023

huggingface.co

Zigeng/VAR_CoDe

Total runs: 0

Run Growth: 0

Growth Rate: 0.00%

Updated:November 28 2024

Zigeng / dParallel_Dream_7B_Instruct

Introduction of dParallel_Dream_7B_Instruct

Model Details of dParallel_Dream_7B_Instruct

🚀 dParallel: Learnable Parallel Decoding for dLLMs

💡 Introduction

💻 Model and Datasets

🚀 Quick Start:

📖 Experimental Results

Results on LLaDA-8B-Instruct:

Results on Dream-7B-Instruct:

Better Speed-Accuracy Trade-off:

☀️ Acknowledgement

Citation

Runs of Zigeng dParallel_Dream_7B_Instruct on huggingface.co

More Information About dParallel_Dream_7B_Instruct huggingface.co Model

More dParallel_Dream_7B_Instruct license Visit here:

dParallel_Dream_7B_Instruct huggingface.co

dParallel_Dream_7B_Instruct huggingface.co Url

Zigeng dParallel_Dream_7B_Instruct online free

Zigeng dParallel_Dream_7B_Instruct online free url in huggingface.co:

dParallel_Dream_7B_Instruct install

dParallel_Dream_7B_Instruct install url in huggingface.co:

Url of dParallel_Dream_7B_Instruct

dParallel_Dream_7B_Instruct huggingface.co Url

Provider of dParallel_Dream_7B_Instruct huggingface.co

Other API from Zigeng