DeepSeek-Coder-V2-Lite-Instruct-FlashAttnPatch huggingface.co api & BirdL DeepSeek-Coder-V2-Lite-Instruct-FlashAttnPatch github AI Model

Introduction of DeepSeek-Coder-V2-Lite-Instruct-FlashAttnPatch

Model Details of DeepSeek-Coder-V2-Lite-Instruct-FlashAttnPatch

API Platform | How to Use | License |

Paper Link 👁️

DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models in Code Intelligence

All this fork does is take away the flash-attn so that it runs with ZeroGPU UPDATE: might be unneeded now because they fixed it in the unstream repo.

1. Introduction

We present DeepSeek-Coder-V2, an open-source Mixture-of-Experts (MoE) code language model that achieves performance comparable to GPT4-Turbo in code-specific tasks. Specifically, DeepSeek-Coder-V2 is further pre-trained from an intermediate checkpoint of DeepSeek-V2 with additional 6 trillion tokens. Through this continued pre-training, DeepSeek-Coder-V2 substantially enhances the coding and mathematical reasoning capabilities of DeepSeek-V2, while maintaining comparable performance in general language tasks. Compared to DeepSeek-Coder-33B, DeepSeek-Coder-V2 demonstrates significant advancements in various aspects of code-related tasks, as well as reasoning and general capabilities. Additionally, DeepSeek-Coder-V2 expands its support for programming languages from 86 to 338, while extending the context length from 16K to 128K.

In standard benchmark evaluations, DeepSeek-Coder-V2 achieves superior performance compared to closed-source models such as GPT4-Turbo, Claude 3 Opus, and Gemini 1.5 Pro in coding and math benchmarks. The list of supported programming languages can be found here .

2. Model Downloads

We release the DeepSeek-Coder-V2 with 16B and 236B parameters based on the DeepSeekMoE framework, which has actived parameters of only 2.4B and 21B , including base and instruct models, to the public.

Model	#Total Params	#Active Params	Context Length	Download
DeepSeek-Coder-V2-Lite-Base	16B	2.4B	128k	🤗 HuggingFace
DeepSeek-Coder-V2-Lite-Instruct	16B	2.4B	128k	🤗 HuggingFace
DeepSeek-Coder-V2-Base	236B	21B	128k	🤗 HuggingFace
DeepSeek-Coder-V2-Instruct	236B	21B	128k	🤗 HuggingFace

3. Chat Website

You can chat with the DeepSeek-Coder-V2 on DeepSeek's official website: coder.deepseek.com

4. API Platform

We also provide OpenAI-Compatible API at DeepSeek Platform: platform.deepseek.com . Sign up for over millions of free tokens. And you can also pay-as-you-go at an unbeatable price.

5. How to run locally

Here, we provide some examples of how to use DeepSeek-Coder-V2-Lite model. If you want to utilize DeepSeek-Coder-V2 in BF16 format for inference, 80GB*8 GPUs are required.

Inference with Huggingface's Transformers

You can directly employ Huggingface's Transformers for model inference.

Code Completion

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
tokenizer = AutoTokenizer.from_pretrained("deepseek-ai/DeepSeek-Coder-V2-Lite-Base", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("deepseek-ai/DeepSeek-Coder-V2-Lite-Base", trust_remote_code=True, torch_dtype=torch.bfloat16).cuda()
input_text = "#write a quick sort algorithm"
inputs = tokenizer(input_text, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_length=128)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Code Insertion

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
tokenizer = AutoTokenizer.from_pretrained("deepseek-ai/DeepSeek-Coder-V2-Lite-Base", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("deepseek-ai/DeepSeek-Coder-V2-Lite-Base", trust_remote_code=True, torch_dtype=torch.bfloat16).cuda()
input_text = """<｜fim▁begin｜>def quick_sort(arr):
    if len(arr) <= 1:
        return arr
    pivot = arr[0]
    left = []
    right = []
<｜fim▁hole｜>
        if arr[i] < pivot:
            left.append(arr[i])
        else:
            right.append(arr[i])
    return quick_sort(left) + [pivot] + quick_sort(right)<｜fim▁end｜>"""
inputs = tokenizer(input_text, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_length=128)
print(tokenizer.decode(outputs[0], skip_special_tokens=True)[len(input_text):])

Chat Completion

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
tokenizer = AutoTokenizer.from_pretrained("deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct", trust_remote_code=True, torch_dtype=torch.bfloat16).cuda()
messages=[
    { 'role': 'user', 'content': "write a quick sort algorithm in python."}
]
inputs = tokenizer.apply_chat_template(messages, add_generation_prompt=True, return_tensors="pt").to(model.device)
# tokenizer.eos_token_id is the id of <|EOT|> token
outputs = model.generate(inputs, max_new_tokens=512, do_sample=False, top_k=50, top_p=0.95, num_return_sequences=1, eos_token_id=tokenizer.eos_token_id)
print(tokenizer.decode(outputs[0][len(inputs[0]):], skip_special_tokens=True))

The complete chat template can be found within tokenizer_config.json located in the huggingface model repository.

An example of chat template is as belows:

<｜begin▁of▁sentence｜>User: {user_message_1}

Assistant: {assistant_message_1}<｜end▁of▁sentence｜>User: {user_message_2}

Assistant:

You can also add an optional system message:

<｜begin▁of▁sentence｜>{system_message}

User: {user_message_1}

Assistant: {assistant_message_1}<｜end▁of▁sentence｜>User: {user_message_2}

Assistant:

Inference with vLLM (recommended)

To utilize vLLM for model inference, please merge this Pull Request into your vLLM codebase: https://github.com/vllm-project/vllm/pull/4650 .

from transformers import AutoTokenizer
from vllm import LLM, SamplingParams

max_model_len, tp_size = 8192, 1
model_name = "deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct"
tokenizer = AutoTokenizer.from_pretrained(model_name)
llm = LLM(model=model_name, tensor_parallel_size=tp_size, max_model_len=max_model_len, trust_remote_code=True, enforce_eager=True)
sampling_params = SamplingParams(temperature=0.3, max_tokens=256, stop_token_ids=[tokenizer.eos_token_id])

messages_list = [
    [{"role": "user", "content": "Who are you?"}],
    [{"role": "user", "content": "write a quick sort algorithm in python."}],
    [{"role": "user", "content": "Write a piece of quicksort code in C++."}],
]

prompt_token_ids = [tokenizer.apply_chat_template(messages, add_generation_prompt=True) for messages in messages_list]

outputs = llm.generate(prompt_token_ids=prompt_token_ids, sampling_params=sampling_params)

generated_text = [output.outputs[0].text for output in outputs]
print(generated_text)

6. License

This code repository is licensed under the MIT License . The use of DeepSeek-Coder-V2 Base/Instruct models is subject to the Model License . DeepSeek-Coder-V2 series (including Base and Instruct) supports commercial use.

7. Contact

If you have any questions, please raise an issue or contact us at [email protected] .

Runs of BirdL DeepSeek-Coder-V2-Lite-Instruct-FlashAttnPatch on huggingface.co

Total runs

24-hour runs

-1

3-day runs

-1

7-day runs

-9

30-day runs

More Information About DeepSeek-Coder-V2-Lite-Instruct-FlashAttnPatch huggingface.co Model

More DeepSeek-Coder-V2-Lite-Instruct-FlashAttnPatch license Visit here:

https://choosealicense.com/licenses/deepseek-license

DeepSeek-Coder-V2-Lite-Instruct-FlashAttnPatch huggingface.co

DeepSeek-Coder-V2-Lite-Instruct-FlashAttnPatch huggingface.co is an AI model on huggingface.co that provides DeepSeek-Coder-V2-Lite-Instruct-FlashAttnPatch's model effect (), which can be used instantly with this BirdL DeepSeek-Coder-V2-Lite-Instruct-FlashAttnPatch model. huggingface.co supports a free trial of the DeepSeek-Coder-V2-Lite-Instruct-FlashAttnPatch model, and also provides paid use of the DeepSeek-Coder-V2-Lite-Instruct-FlashAttnPatch. Support call DeepSeek-Coder-V2-Lite-Instruct-FlashAttnPatch model through api, including Node.js, Python, http.

DeepSeek-Coder-V2-Lite-Instruct-FlashAttnPatch huggingface.co Url

https://huggingface.co/BirdL/DeepSeek-Coder-V2-Lite-Instruct-FlashAttnPatch

BirdL DeepSeek-Coder-V2-Lite-Instruct-FlashAttnPatch online free

DeepSeek-Coder-V2-Lite-Instruct-FlashAttnPatch huggingface.co is an online trial and call api platform, which integrates DeepSeek-Coder-V2-Lite-Instruct-FlashAttnPatch's modeling effects, including api services, and provides a free online trial of DeepSeek-Coder-V2-Lite-Instruct-FlashAttnPatch, you can try DeepSeek-Coder-V2-Lite-Instruct-FlashAttnPatch online for free by clicking the link below.

BirdL DeepSeek-Coder-V2-Lite-Instruct-FlashAttnPatch online free url in huggingface.co:

https://huggingface.co/BirdL/DeepSeek-Coder-V2-Lite-Instruct-FlashAttnPatch

DeepSeek-Coder-V2-Lite-Instruct-FlashAttnPatch install

DeepSeek-Coder-V2-Lite-Instruct-FlashAttnPatch is an open source model from GitHub that offers a free installation service, and any user can find DeepSeek-Coder-V2-Lite-Instruct-FlashAttnPatch on GitHub to install. At the same time, huggingface.co provides the effect of DeepSeek-Coder-V2-Lite-Instruct-FlashAttnPatch install, users can directly use DeepSeek-Coder-V2-Lite-Instruct-FlashAttnPatch installed effect in huggingface.co for debugging and trial. It also supports api for free installation.

DeepSeek-Coder-V2-Lite-Instruct-FlashAttnPatch install url in huggingface.co:

https://huggingface.co/BirdL/DeepSeek-Coder-V2-Lite-Instruct-FlashAttnPatch

huggingface.co

BirdL/HFPaperData

Total runs: 0

Run Growth: 0

Growth Rate: 0.00%

Updated:October 02 2024

huggingface.co

BirdL/HFPapersCountries

Total runs: 0

Run Growth: 0

Growth Rate: 0.00%

Updated:September 26 2024

huggingface.co

BirdL/FancyVideo

Total runs: 0

Run Growth: 0

Growth Rate: 0.00%

Updated:August 21 2024

BirdL / DeepSeek-Coder-V2-Lite-Instruct-FlashAttnPatch

Introduction of DeepSeek-Coder-V2-Lite-Instruct-FlashAttnPatch

Model Details of DeepSeek-Coder-V2-Lite-Instruct-FlashAttnPatch

DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models in Code Intelligence

1. Introduction

2. Model Downloads

3. Chat Website

4. API Platform

5. How to run locally

Inference with Huggingface's Transformers

Code Completion

Code Insertion

Chat Completion

Inference with vLLM (recommended)

6. License

7. Contact

Runs of BirdL DeepSeek-Coder-V2-Lite-Instruct-FlashAttnPatch on huggingface.co

More Information About DeepSeek-Coder-V2-Lite-Instruct-FlashAttnPatch huggingface.co Model

More DeepSeek-Coder-V2-Lite-Instruct-FlashAttnPatch license Visit here:

DeepSeek-Coder-V2-Lite-Instruct-FlashAttnPatch huggingface.co

DeepSeek-Coder-V2-Lite-Instruct-FlashAttnPatch huggingface.co Url

BirdL DeepSeek-Coder-V2-Lite-Instruct-FlashAttnPatch online free

BirdL DeepSeek-Coder-V2-Lite-Instruct-FlashAttnPatch online free url in huggingface.co:

DeepSeek-Coder-V2-Lite-Instruct-FlashAttnPatch install

DeepSeek-Coder-V2-Lite-Instruct-FlashAttnPatch install url in huggingface.co:

Url of DeepSeek-Coder-V2-Lite-Instruct-FlashAttnPatch

DeepSeek-Coder-V2-Lite-Instruct-FlashAttnPatch huggingface.co Url

Provider of DeepSeek-Coder-V2-Lite-Instruct-FlashAttnPatch huggingface.co

Other API from BirdL