AffineQuant huggingface.co api & ByteDance AffineQuant github AI Model

Introduction of AffineQuant

Model Details of AffineQuant

AffineQuant Model Zoo

AffineQuant is a novel quantization method that uses an affine transformation matrix to change the distribution of weights and activations, aimed at optimizing the distribution of weight activations and reducing quantization errors. By introducing an affine transformation matrix, AffineQuant can better align the data distribution with the quantization function, thereby reducing quantization errors. The matrix optimization objective is to minimize the mean squared error between pre- and post-quantization feature map, while introducing the Gradual Mask (GM) method to maintain the strictly diagonal dominance of the affine matrix, ensuring the matrix's invertibility and stable convergence. Experimental results show that AffineQuant performs better than existing quantization methods, such as OmniQuant and SmoothQuant, achieving consistent performance improvements across different quantization configurations and datasets.

Code: https://github.com/bytedance/AffineQuant

Paper: https://arxiv.org/abs/2403.12544

How to use

This repository contains models with various quantization configurations. The types of models include: OPT, LLaMA1&2.

Fake Quantization Accuracy

To reproduce the accuracy reported in the paper, we need to use the --model parameter to load the fake-quantized model. At the same time, we need to specify the bit parameter as 16 to skip the quantization step. For example:

CUDA_VISIBLE_DEVICES=0 python main.py \
--model /path/to/llama-13b-w2a16g128 --eval_ppl \
--output_dir ./log/llama-13b-w2a16g128 \
--wbits 16 --abits 16

It is worth noting that if your quantization model is trained using the --let parameter, you need to enable the bias in the layernorm layers and specific linear layers within the transformer repository to load the shift parameters. For instance, for the llama model, we make the following modifications in modeling_llama.py :

Set the bias of the q,k,v,o,up,gate linear layer to True.

self.q_proj = nn.Linear(self.hidden_size, self.num_heads * self.head_dim, bias=True)
self.k_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
self.v_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=True)
self.o_proj = nn.Linear(self.num_heads * self.head_dim, self.hidden_size, bias=True)
self.gate_proj = nn.Linear(self.hidden_size, self.intermediate_size, bias=True)
self.up_proj = nn.Linear(self.hidden_size, self.intermediate_size, bias=True)

Enable the bias in RMSNorm. We directly replace the original RMSNorm with AffineLlamaRMSNorm from AffineQuant.

Inference Overhead

To reproduce the accuracy described in the paper, our weight-only quantization configuration imposes no restrictions on the affine matrices after layernorm. For the weight-activation configuration, such as 4/4 bits, we only update the diagonal elements of the affine matrices after layernorm. Therefore, the model inference with merged parameters incurs no additional overhead.

Benchmarks

We evaluate the quantization performance of LLaMA-7B, 13B, 30B on six zero-shot datasets using 4/4 bit quantization in the following table.

	PIQA($\uparrow$)	ARC-e($\uparrow$)	WinoGrande($\uparrow$)	BoolQ($\uparrow$)	ARC-c($\uparrow$)	HellaSwag($\uparrow$)	Avg.($\uparrow$)
LLaMA-7B, OmniQuant	66.15	45.20	53.43	63.51	31.14	56.44	52.65
LLaMA-7B, AffineQuant	69.37	42.55	55.33	63.73	31.91	57.65	53.42
LLaMA-13B, OmniQuant	69.69	47.39	55.80	62.84	33.10	58.96	54.37
LLaMA-13B, AffineQuant	66.32	43.90	54.70	64.10	29.61	56.88	52.58
LLaMA-30B, OmniQuant	71.21	49.45	59.19	65.33	34.47	64.65	56.63
LLaMA-30B, AffineQuant	70.84	49.41	58.64	70.12	37.12	65.53	58.61

Meanwhile, we compare the 4/4 bit quantization performance of LLaMA1&2 models on WikiText2 and C4 datasets in the following table.

	Methods	WikiText2	C4
LLaMA-7B	OmniQuant	11.26	14.51
	AffineQuant	10.28	13.64
LLaMA-13B	OmniQuant	10.87	13.78
	AffineQuant	10.32	13.44
LLaMA-30B	OmniQuant	10.33	12.49
	AffineQuant	9.35	11.58
LLaMA2-7B	OmniQuant	14.26	18.02
	AffineQuant	12.69	15.76
LLaMA2-13B	OmniQuant	12.30	14.55
	AffineQuant	11.45	13.97

AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration

GPTQ: Accurate Post-training Compression for Generative Pretrained Transformers

RPTQ: Reorder-Based Post-Training Quantization for Large Language Models

OmniQuant: Omnidirectionally Calibrated Quantization for Large Language Models

MLC LLM

AutoGPTQ

Citation

@inproceedings{
ma2024affinequant,
title={AffineQuant: Affine Transformation Quantization for Large Language Models},
author={Yuexiao Ma and Huixia Li and Xiawu Zheng and Feng Ling and Xuefeng Xiao and Rui Wang and Shilei Wen and Fei Chao and Rongrong Ji},
booktitle={The Twelfth International Conference on Learning Representations},
year={2024},
url={https://openreview.net/forum?id=of2rhALq8l}
}

Runs of ByteDance AffineQuant on huggingface.co

Total runs

24-hour runs

3-day runs

7-day runs

30-day runs

More Information About AffineQuant huggingface.co Model

More AffineQuant license Visit here:

https://choosealicense.com/licenses/apache-2.0

AffineQuant huggingface.co

AffineQuant huggingface.co is an AI model on huggingface.co that provides AffineQuant's model effect (), which can be used instantly with this ByteDance AffineQuant model. huggingface.co supports a free trial of the AffineQuant model, and also provides paid use of the AffineQuant. Support call AffineQuant model through api, including Node.js, Python, http.

AffineQuant huggingface.co Url

https://huggingface.co/ByteDance/AffineQuant

ByteDance AffineQuant online free

AffineQuant huggingface.co is an online trial and call api platform, which integrates AffineQuant's modeling effects, including api services, and provides a free online trial of AffineQuant, you can try AffineQuant online for free by clicking the link below.

ByteDance AffineQuant online free url in huggingface.co:

https://huggingface.co/ByteDance/AffineQuant

AffineQuant install

AffineQuant is an open source model from GitHub that offers a free installation service, and any user can find AffineQuant on GitHub to install. At the same time, huggingface.co provides the effect of AffineQuant install, users can directly use AffineQuant installed effect in huggingface.co for debugging and trial. It also supports api for free installation.

AffineQuant install url in huggingface.co:

https://huggingface.co/ByteDance/AffineQuant

huggingface.co

ByteDance/SDXL-Lightning

Total runs: 188.4K

Run Growth: 122.1K

Growth Rate: 64.84%

Updated:April 03 2024

huggingface.co

ByteDance/Hyper-SD

Total runs: 46.2K

Run Growth: -12.6K

Growth Rate: -27.03%

Updated:December 05 2024

huggingface.co

ByteDance/Ouro-1.4B

Total runs: 38.4K

Run Growth: 14.9K

Growth Rate: 38.89%

Updated:January 19 2026

huggingface.co

ByteDance/Ouro-2.6B

Total runs: 34.6K

Run Growth: 31.6K

Growth Rate: 91.21%

Updated:January 19 2026

huggingface.co

ByteDance/LatentSync-1.6

Total runs: 19.1K

Run Growth: -181.4K

Growth Rate: -951.72%

Updated:June 12 2025

huggingface.co

ByteDance/AnimateDiff-Lightning

Total runs: 10.8K

Run Growth: 4.9K

Growth Rate: 45.18%

Updated:January 06 2025

huggingface.co

ByteDance/LatentSync-1.5

Total runs: 9.2K

Run Growth: 4.2K

Growth Rate: 45.69%

Updated:June 12 2025

huggingface.co

ByteDance/Ouro-2.6B-Thinking

Total runs: 8.1K

Run Growth: -2.8K

Growth Rate: -34.90%

Updated:February 27 2026

huggingface.co

ByteDance/Dolphin-v2

Total runs: 4.5K

Run Growth: 2.5K

Growth Rate: 55.04%

Updated:December 12 2025

huggingface.co

ByteDance/Ouro-1.4B-Thinking

Total runs: 3.5K

Run Growth: -1.4K

Growth Rate: -40.97%

Updated:February 27 2026

huggingface.co

ByteDance/Sa2VA-Qwen3-VL-4B

Total runs: 3.3K

Run Growth: -916

Growth Rate: -27.51%

Updated:October 21 2025

huggingface.co

ByteDance/Sa2VA-8B

Total runs: 2.8K

Run Growth: 2.0K

Growth Rate: 70.40%

Updated:September 08 2025

huggingface.co

ByteDance/Dolphin-1.5

Total runs: 1.4K

Run Growth: -24

Growth Rate: -1.70%

Updated:November 12 2025

huggingface.co

ByteDance/Sa2VA-4B

Total runs: 1.2K

Run Growth: -4.1K

Growth Rate: -331.42%

Updated:September 08 2025

huggingface.co

ByteDance/InfiniteYou

Total runs: 913

Run Growth: 211

Growth Rate: 23.11%

Updated:July 25 2025

huggingface.co

ByteDance/Sa2VA-Qwen3-VL-2B

Total runs: 714

Run Growth: 476

Growth Rate: 66.67%

Updated:November 27 2025

huggingface.co

ByteDance/Sa2VA-1B

Total runs: 596

Run Growth: -91

Growth Rate: -15.27%

Updated:September 08 2025

huggingface.co

ByteDance/BindWeave

Total runs: 573

Run Growth: -1.4K

Growth Rate: -236.30%

Updated:November 28 2025

huggingface.co

ByteDance/Dolphin

Total runs: 323

Run Growth: -888

Growth Rate: -274.92%

Updated:July 16 2025

huggingface.co

ByteDance/Sa2VA-InternVL3-8B

Total runs: 177

Run Growth: 50

Growth Rate: 28.25%

Updated:October 16 2025

huggingface.co

ByteDance/Video-As-Prompt-CogVideoX-5B

Total runs: 165

Run Growth: 24

Growth Rate: 14.55%

Updated:October 27 2025

huggingface.co

ByteDance/Video-As-Prompt-Wan2.1-14B

Total runs: 151

Run Growth: 11

Growth Rate: 7.28%

Updated:October 27 2025

huggingface.co

ByteDance/MegaTTS3

Total runs: 126

Run Growth: 4

Growth Rate: 3.17%

Updated:April 04 2025

huggingface.co

ByteDance/Sa2VA-InternVL3-2B

Total runs: 121

Run Growth: -147

Growth Rate: -121.49%

Updated:October 16 2025

huggingface.co

ByteDance/Sa2VA-Qwen2_5-VL-7B

Total runs: 93

Run Growth: -23

Growth Rate: -24.73%

Updated:October 16 2025

huggingface.co

ByteDance/ListConRanker

Total runs: 83

Run Growth: 53

Growth Rate: 63.86%

Updated:June 20 2025

huggingface.co

ByteDance/Sa2VA-InternVL3-14B

Total runs: 56

Run Growth: 8

Growth Rate: 14.29%

Updated:October 16 2025

huggingface.co

ByteDance/sd2.1-base-zsnr-laionaes5

Total runs: 49

Run Growth: 16

Growth Rate: 32.65%

Updated:January 18 2024

huggingface.co

ByteDance/Sa2VA-Qwen2_5-VL-3B

Total runs: 47

Run Growth: -34

Growth Rate: -72.34%

Updated:October 16 2025

huggingface.co

ByteDance/sd2.1-base-zsnr-laionaes6

Total runs: 45

Run Growth: 41

Growth Rate: 91.11%

Updated:January 18 2024

huggingface.co

ByteDance/Sa2VA-26B

Total runs: 39

Run Growth: -87

Growth Rate: -223.08%

Updated:September 08 2025

huggingface.co

ByteDance/XVerse

Total runs: 31

Run Growth: -11

Growth Rate: -35.48%

Updated:July 01 2025

huggingface.co

ByteDance/ContentV-8B

Total runs: 23

Run Growth: 9

Growth Rate: 39.13%

Updated:June 24 2025

huggingface.co

ByteDance/ID-Patch

Total runs: 15

Run Growth: 4

Growth Rate: 26.67%

Updated:April 22 2025

huggingface.co

ByteDance/sd2.1-base-zsnr-laionaes6-perceptual

Total runs: 4

Run Growth: 1

Growth Rate: 25.00%

Updated:January 18 2024

huggingface.co

ByteDance/HLLM

Total runs: 0

Run Growth: 0

Growth Rate: 0.00%

Updated:August 26 2025

huggingface.co

ByteDance/feature-preserve-portrait-editing

Total runs: 0

Run Growth: 0

Growth Rate: 0.00%

Updated:October 26 2024

huggingface.co

ByteDance/FaceCLIP

Total runs: 0

Run Growth: 0

Growth Rate: 0.00%

Updated:November 11 2025

huggingface.co

ByteDance/Attention2Probability

Total runs: 0

Run Growth: 0

Growth Rate: 0.00%

Updated:August 27 2025

huggingface.co

ByteDance/CascadeV

Total runs: 0

Run Growth: 0

Growth Rate: 0.00%

Updated:September 02 2024

huggingface.co

ByteDance/LatentSync

Total runs: 0

Run Growth: 0

Growth Rate: 0.00%

Updated:February 25 2025

huggingface.co

ByteDance/lynx

Total runs: 0

Run Growth: 0

Growth Rate: 0.00%

Updated:September 27 2025

huggingface.co

ByteDance/shot2story

Total runs: 0

Run Growth: 0

Growth Rate: 0.00%

Updated:June 17 2024

huggingface.co

ByteDance/DreamO

Total runs: 0

Run Growth: 0

Growth Rate: 0.00%

Updated:June 24 2025

huggingface.co

ByteDance/NEVC1.0

Total runs: 0

Run Growth: 0

Growth Rate: 0.00%

Updated:September 05 2025

huggingface.co

ByteDance/Q-Insight

Total runs: 0

Run Growth: 0

Growth Rate: 0.00%

Updated:February 13 2026

huggingface.co

ByteDance/Make-An-Audio-2

Total runs: 0

Run Growth: 0

Growth Rate: 0.00%

Updated:May 22 2024

ByteDance / AffineQuant

Introduction of AffineQuant

Model Details of AffineQuant

AffineQuant Model Zoo

How to use

Fake Quantization Accuracy

Inference Overhead

Benchmarks

Related Project

Citation

Runs of ByteDance AffineQuant on huggingface.co

More Information About AffineQuant huggingface.co Model

More AffineQuant license Visit here:

AffineQuant huggingface.co

AffineQuant huggingface.co Url

ByteDance AffineQuant online free

ByteDance AffineQuant online free url in huggingface.co:

AffineQuant install

AffineQuant install url in huggingface.co:

Url of AffineQuant

AffineQuant huggingface.co Url

Provider of AffineQuant huggingface.co

Other API from ByteDance