Skywork-MoE is a high-performance mixture-of-experts (MoE) model with 146 billion parameters, 16 experts, and 22 billion activated parameters. This model is initialized from the pre-existing dense checkpoints of our Skywork-13B model.
We introduce two innovative techniques: Gating Logit Normalization, which enhances expert diversification, and Adaptive Auxiliary Loss Coefficients, which allow for layer-specific adjustment of auxiliary loss coefficients.
Skywork-MoE demonstrates comparable or superior performance to models with more parameters or more activated parameters, such as Grok-1, DBRX, Mistral 8*22, and Deepseek-V2.
from vllm import LLM, SamplingParams
model_path = 'Skywork/Skywork-MoE-Base-FP8'
prompts = [
"The president of the United States is",
"The capital of France is",
]
sampling_params = SamplingParams(temperature=0.3, max_tokens=256)
llm = LLM(
model=model_path,
kv_cache_dtype='auto',
tensor_parallel_size=8,
gpu_memory_utilization=0.95,
enforce_eager=True,
trust_remote_code=True,
)
outputs = llm.generate(prompts, sampling_params)
for output in outputs:
prompt = output.prompt
generated_text = output.outputs[0].text
print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}")
Declaration and License Agreement
Declaration
We hereby declare that the Skywork model should not be used for any activities that pose a threat to national or societal security or engage in unlawful actions. Additionally, we request users not to deploy the Skywork model for internet services without appropriate security reviews and records. We hope that all users will adhere to this principle to ensure that technological advancements occur in a regulated and lawful environment.
We have done our utmost to ensure the compliance of the data used during the model's training process. However, despite our extensive efforts, due to the complexity of the model and data, there may still be unpredictable risks and issues. Therefore, if any problems arise as a result of using the Skywork open-source model, including but not limited to data security issues, public opinion risks, or any risks and problems arising from the model being misled, abused, disseminated, or improperly utilized, we will not assume any responsibility.
License Agreement
The community usage of Skywork model requires
Skywork Community License
. The Skywork model supports commercial use. If you plan to use the Skywork model or its derivatives for commercial purposes, you must abide by terms and conditions within
Skywork Community License
.
Contact Us and Citation
If you find our work helpful, please feel free to cite our paper~
@misc{wei2024skywork,
title={Skywork-MoE: A Deep Dive into Training Techniques for Mixture-of-Experts Language Models},
author={Tianwen Wei, Bo Zhu, Liang Zhao, Cheng Cheng, Biye Li, Weiwei Lü, Peng Cheng, Jianhao Zhang, Xiaoyu Zhang, Liang Zeng, Xiaokun Wang, Yutuan Ma, Rui Hu, Shuicheng Yan, Han Fang, Yahui Zhou},
url={https://arxiv.org/pdf/2406.06563},
year={2024},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
@article{zhao2024longskywork,
title={LongSkywork: A Training Recipe for Efficiently Extending Context Length in Large Language Models},
author={Zhao, Liang and Wei, Tianwen and Zeng, Liang and Cheng, Cheng and Yang, Liu and Cheng, Peng and Wang, Lijie and Li, Chenxia and Wu, Xuejie and Zhu, Bo and others},
journal={arXiv preprint arXiv:2406.00605},
url={https://arxiv.org/abs/2406.00605},
year={2024}
}
Runs of Skywork Skywork-MoE-Base-FP8 on huggingface.co
23
Total runs
0
24-hour runs
1
3-day runs
-14
7-day runs
-4
30-day runs
More Information About Skywork-MoE-Base-FP8 huggingface.co Model
Skywork-MoE-Base-FP8 huggingface.co
Skywork-MoE-Base-FP8 huggingface.co is an AI model on huggingface.co that provides Skywork-MoE-Base-FP8's model effect (), which can be used instantly with this Skywork Skywork-MoE-Base-FP8 model. huggingface.co supports a free trial of the Skywork-MoE-Base-FP8 model, and also provides paid use of the Skywork-MoE-Base-FP8. Support call Skywork-MoE-Base-FP8 model through api, including Node.js, Python, http.
Skywork-MoE-Base-FP8 huggingface.co is an online trial and call api platform, which integrates Skywork-MoE-Base-FP8's modeling effects, including api services, and provides a free online trial of Skywork-MoE-Base-FP8, you can try Skywork-MoE-Base-FP8 online for free by clicking the link below.
Skywork Skywork-MoE-Base-FP8 online free url in huggingface.co:
Skywork-MoE-Base-FP8 is an open source model from GitHub that offers a free installation service, and any user can find Skywork-MoE-Base-FP8 on GitHub to install. At the same time, huggingface.co provides the effect of Skywork-MoE-Base-FP8 install, users can directly use Skywork-MoE-Base-FP8 installed effect in huggingface.co for debugging and trial. It also supports api for free installation.
Skywork-MoE-Base-FP8 install url in huggingface.co: