It achieves an average score of 50.70 on the
HumanEval+
benchmark, whereas the unquantized model achieves 50.25.
Model Optimizations
This model was obtained by quantizing the weights and activations of
starcoder2-15b
to FP8 data type, ready for inference with vLLM >= 0.5.2.
This optimization reduces the number of bits per parameter from 16 to 8, reducing the disk size and GPU memory requirements by approximately 50%.
Only the weights and activations of the linear operators within transformers blocks are quantized. Symmetric per-tensor quantization is applied, in which a single linear scaling maps the FP8 representations of the quantized weights and activations.
AutoFP8
is used for quantization with 512 sequences of UltraChat.
Creation
This model was created by applying
LLM Compressor with calibration samples from UltraChat
, as presented in the code snipet below.
A slight modification to the code was made due to the parameters of the model. Running the below code will throw an index error, and simply replacing the erroneous line with
max_quant_shape = param.shape[0]
resolves the issue.
starcoder2-15b-FP8 huggingface.co is an AI model on huggingface.co that provides starcoder2-15b-FP8's model effect (), which can be used instantly with this neuralmagic starcoder2-15b-FP8 model. huggingface.co supports a free trial of the starcoder2-15b-FP8 model, and also provides paid use of the starcoder2-15b-FP8. Support call starcoder2-15b-FP8 model through api, including Node.js, Python, http.
starcoder2-15b-FP8 huggingface.co is an online trial and call api platform, which integrates starcoder2-15b-FP8's modeling effects, including api services, and provides a free online trial of starcoder2-15b-FP8, you can try starcoder2-15b-FP8 online for free by clicking the link below.
neuralmagic starcoder2-15b-FP8 online free url in huggingface.co:
starcoder2-15b-FP8 is an open source model from GitHub that offers a free installation service, and any user can find starcoder2-15b-FP8 on GitHub to install. At the same time, huggingface.co provides the effect of starcoder2-15b-FP8 install, users can directly use starcoder2-15b-FP8 installed effect in huggingface.co for debugging and trial. It also supports api for free installation.