openbmb / MiniCPM-Embedding

huggingface.co
Total runs: 205
24-hour runs: -1
7-day runs: 23
30-day runs: 55
Model's Last Updated: Tháng Một 23 2025
feature-extraction

Introduction of MiniCPM-Embedding

Model Details of MiniCPM-Embedding

MiniCPM-Embedding

MiniCPM-Embedding 是面壁智能与清华大学自然语言处理实验室(THUNLP)共同开发的中英双语言文本嵌入模型,有如下特点:

  • 出色的中文、英文检索能力。
  • 出色的中英跨语言检索能力。

MiniCPM-Embedding 基于 MiniCPM-2B-sft-bf16 训练,结构上采取双向注意力和 Weighted Mean Pooling [1]。采取多阶段训练方式,共使用包括开源数据、机造数据、闭源数据在内的约 600 万条训练数据。

欢迎关注 RAG 套件系列:

MiniCPM-Embedding is a bilingual & cross-lingual text embedding model developed by ModelBest Inc. and THUNLP, featuring:

  • Exceptional Chinese and English retrieval capabilities.
  • Outstanding cross-lingual retrieval capabilities between Chinese and English.

MiniCPM-Embedding is trained based on MiniCPM-2B-sft-bf16 and incorporates bidirectional attention and Weighted Mean Pooling [1] in its architecture. The model underwent multi-stage training using approximately 6 million training examples, including open-source, synthetic, and proprietary data.

We also invite you to explore the RAG toolkit series:

[1] Muennighoff, N. (2022). Sgpt: Gpt sentence embeddings for semantic search. arXiv preprint arXiv:2202.08904.

模型信息 Model Information
  • 模型大小:2.4B

  • 嵌入维度:2304

  • 最大输入token数:512

  • Model Size: 2.4B

  • Embedding Dimension: 2304

  • Max Input Tokens: 512

使用方法 Usage
输入格式 Input Format

本模型支持 query 侧指令,格式如下:

MiniCPM-Embedding supports query-side instructions in the following format:

Instruction: {{ instruction }} Query: {{ query }}

例如:

For example:

Instruction: 为这个医学问题检索相关回答。Query: 咽喉癌的成因是什么?
Instruction: Given a claim about climate change, retrieve documents that support or refute the claim. Query: However the warming trend is slower than most climate models have forecast.

也可以不提供指令,即采取如下格式:

MiniCPM-Embedding also works in instruction-free mode in the following format:

Query: {{ query }}

我们在 BEIR 与 C-MTEB/Retrieval 上测试时使用的指令见 instructions.json ,其他测试不使用指令。文档侧直接输入文档原文。

When running evaluation on BEIR and C-MTEB/Retrieval, we use instructions in instructions.json . For other evaluations, we do not use instructions. On the document side, we directly use the bare document as the input.

环境要求 Requirements
transformers==4.37.2
flash-attn>2.3.5
示例脚本 Demo

from transformers import AutoModel, AutoTokenizer
import torch
import torch.nn.functional as F

model_name = "openbmb/MiniCPM-Embedding"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModel.from_pretrained(model_name, trust_remote_code=True, attn_implementation="flash_attention_2", torch_dtype=torch.float16).to("cuda")
model.eval()

def weighted_mean_pooling(hidden, attention_mask):
    attention_mask_ = attention_mask * attention_mask.cumsum(dim=1)
    s = torch.sum(hidden * attention_mask_.unsqueeze(-1).float(), dim=1)
    d = attention_mask_.sum(dim=1, keepdim=True).float()
    reps = s / d
    return reps

@torch.no_grad()
def encode(input_texts):
    batch_dict = tokenizer(input_texts, max_length=512, padding=True, truncation=True, return_tensors='pt', return_attention_mask=True).to("cuda")
    
    outputs = model(**batch_dict)
    attention_mask = batch_dict["attention_mask"]
    hidden = outputs.last_hidden_state

    reps = weighted_mean_pooling(hidden, attention_mask)   
    embeddings = F.normalize(reps, p=2, dim=1).detach().cpu().numpy()
    return embeddings

queries = ["中国的首都是哪里?"]
passages = ["beijing", "shanghai"]


INSTRUCTION = "Query: "
queries = [INSTRUCTION + query for query in queries]

embeddings_query = encode(queries)
embeddings_doc = encode(passages)

scores = (embeddings_query @ embeddings_doc.T)
print(scores.tolist())  # [[0.3535913825035095, 0.18596848845481873]]
实验结果 Evaluation Results
中文与英文检索结果 CN/EN Retrieval Results
模型 Model C-MTEB/Retrieval (NDCG@10) BEIR (NDCG@10)
bge-large-zh-v1.5 70.46 -
gte-large-zh 72.49 -
Zhihui_LLM_Embedding 76.74
bge-large-en-v1.5 - 54.29
gte-en-large-v1.5 - 57.91
NV-Retriever-v1 - 60.9
bge-en-icl - 62.16
NV-Embed-v2 - 62.65
me5-large 63.66 51.43
bge-m3(Dense) 65.43 48.82
gte-multilingual-base(Dense) 71.95 51.08
gte-Qwen2-1.5B-instruct 71.86 58.29
gte-Qwen2-7B-instruct 76.03 60.25
bge-multilingual-gemma2 73.73 59.24
MiniCPM-Embedding 76.76 58.56
MiniCPM-Embedding+MiniCPM-Reranker 77.08 61.61
中英跨语言检索结果 CN-EN Cross-lingual Retrieval Results
模型 Model MKQA En-Zh_CN (Recall@20) NeuCLIR22 (NDCG@10) NeuCLIR23 (NDCG@10)
me5-large 44.3 9.01 25.33
bge-m3(Dense) 66.4 30.49 41.09
gte-multilingual-base(Dense) 68.2 39.46 45.86
gte-Qwen2-1.5B-instruct 68.52 49.11 45.05
gte-Qwen2-7B-instruct 68.27 49.14 49.6
MiniCPM-Embedding 72.95 52.65 49.95
MiniCPM-Embedding+MiniCPM-Reranker 74.33 53.21 54.12
许可证 License
  • The code in this repo is released under the Apache-2.0 License.
  • The usage of MiniCPM-Embedding model weights must strictly follow MiniCPM Model License.md .
  • The models and weights of MiniCPM-Embedding are completely free for academic research. After filling out a "questionnaire" for registration, MiniCPM-Embedding weights are also available for free commercial use.

Runs of openbmb MiniCPM-Embedding on huggingface.co

205
Total runs
-1
24-hour runs
20
3-day runs
23
7-day runs
55
30-day runs

More Information About MiniCPM-Embedding huggingface.co Model

MiniCPM-Embedding huggingface.co

MiniCPM-Embedding huggingface.co is an AI model on huggingface.co that provides MiniCPM-Embedding's model effect (), which can be used instantly with this openbmb MiniCPM-Embedding model. huggingface.co supports a free trial of the MiniCPM-Embedding model, and also provides paid use of the MiniCPM-Embedding. Support call MiniCPM-Embedding model through api, including Node.js, Python, http.

MiniCPM-Embedding huggingface.co Url

https://huggingface.co/openbmb/MiniCPM-Embedding

openbmb MiniCPM-Embedding online free

MiniCPM-Embedding huggingface.co is an online trial and call api platform, which integrates MiniCPM-Embedding's modeling effects, including api services, and provides a free online trial of MiniCPM-Embedding, you can try MiniCPM-Embedding online for free by clicking the link below.

openbmb MiniCPM-Embedding online free url in huggingface.co:

https://huggingface.co/openbmb/MiniCPM-Embedding

MiniCPM-Embedding install

MiniCPM-Embedding is an open source model from GitHub that offers a free installation service, and any user can find MiniCPM-Embedding on GitHub to install. At the same time, huggingface.co provides the effect of MiniCPM-Embedding install, users can directly use MiniCPM-Embedding installed effect in huggingface.co for debugging and trial. It also supports api for free installation.

MiniCPM-Embedding install url in huggingface.co:

https://huggingface.co/openbmb/MiniCPM-Embedding

Url of MiniCPM-Embedding

MiniCPM-Embedding huggingface.co Url

Provider of MiniCPM-Embedding huggingface.co

openbmb
ORGANIZATIONS

Other API from openbmb

huggingface.co

Total runs: 173.2K
Run Growth: 55.7K
Growth Rate: 33.72%
Updated:Tháng Mười 05 2025
huggingface.co

Total runs: 143.5K
Run Growth: 12.6K
Growth Rate: 8.79%
Updated:Tháng sáu 13 2025
huggingface.co

Total runs: 127.1K
Run Growth: 19.2K
Growth Rate: 15.38%
Updated:Bước đều 10 2026
huggingface.co

Total runs: 117.8K
Run Growth: 4.9K
Growth Rate: 4.17%
Updated:Tháng 9 15 2025
huggingface.co

Total runs: 37.0K
Run Growth: 12.3K
Growth Rate: 35.55%
Updated:Bước đều 07 2026
huggingface.co

Total runs: 25.5K
Run Growth: 411
Growth Rate: 1.61%
Updated:Tháng Mười 24 2025
huggingface.co

Total runs: 11.3K
Run Growth: -26.7K
Growth Rate: -235.99%
Updated:Tháng Một 15 2025
huggingface.co

Total runs: 10.4K
Run Growth: 10.4K
Growth Rate: 99.90%
Updated:Tháng Mười 14 2025
huggingface.co

Total runs: 8.3K
Run Growth: 7.9K
Growth Rate: 95.37%
Updated:Tháng 12 07 2025
huggingface.co

Total runs: 7.7K
Run Growth: -4.1K
Growth Rate: -52.99%
Updated:Tháng hai 27 2025
huggingface.co

Total runs: 6.5K
Run Growth: 523
Growth Rate: 8.06%
Updated:Tháng Một 14 2026
huggingface.co

Total runs: 5.2K
Run Growth: 3.7K
Growth Rate: 70.38%
Updated:Tháng Mười 20 2025
huggingface.co

Total runs: 4.8K
Run Growth: 2.7K
Growth Rate: 56.92%
Updated:Tháng mười một 04 2024
huggingface.co

Total runs: 3.7K
Run Growth: 2.8K
Growth Rate: 76.24%
Updated:Tháng sáu 02 2023
huggingface.co

Total runs: 1.9K
Run Growth: 703
Growth Rate: 36.41%
Updated:Tháng tư 03 2026
huggingface.co

Total runs: 1.5K
Run Growth: 444
Growth Rate: 27.91%
Updated:Tháng Một 15 2025
huggingface.co

Total runs: 1.2K
Run Growth: 988
Growth Rate: 83.03%
Updated:Tháng hai 21 2024
huggingface.co

Total runs: 1.1K
Run Growth: -306
Growth Rate: -26.80%
Updated:Tháng Mười 24 2025
huggingface.co

Total runs: 1.0K
Run Growth: 153
Growth Rate: 14.93%
Updated:Tháng 9 19 2025
huggingface.co

Total runs: 891
Run Growth: 76
Growth Rate: 8.53%
Updated:Tháng sáu 27 2023
huggingface.co

Total runs: 847
Run Growth: 48
Growth Rate: 5.67%
Updated:Tháng tám 24 2023
huggingface.co

Total runs: 805
Run Growth: 738
Growth Rate: 91.68%
Updated:Tháng sáu 11 2025
huggingface.co

Total runs: 734
Run Growth: 377
Growth Rate: 58.81%
Updated:Có thể 14 2024
huggingface.co

Total runs: 476
Run Growth: 426
Growth Rate: 89.50%
Updated:Tháng sáu 11 2025
huggingface.co

Total runs: 436
Run Growth: 372
Growth Rate: 85.32%
Updated:Tháng hai 12 2026
huggingface.co

Total runs: 416
Run Growth: -75
Growth Rate: -18.03%
Updated:Tháng Mười 14 2023
huggingface.co

Total runs: 310
Run Growth: -11
Growth Rate: -3.55%
Updated:Tháng sáu 14 2025
huggingface.co

Total runs: 148
Run Growth: 19
Growth Rate: 12.84%
Updated:Tháng tư 16 2024
huggingface.co

Total runs: 96
Run Growth: -10
Growth Rate: -10.42%
Updated:Tháng tư 12 2024
huggingface.co

Total runs: 96
Run Growth: -5
Growth Rate: -5.32%
Updated:Tháng tư 12 2024
huggingface.co

Total runs: 90
Run Growth: 57
Growth Rate: 63.33%
Updated:Tháng Một 31 2026
huggingface.co

Total runs: 74
Run Growth: 1.1K
Growth Rate: 97.27%
Updated:Tháng tư 08 2024
huggingface.co

Total runs: 65
Run Growth: 27
Growth Rate: 40.30%
Updated:Tháng sáu 10 2025
huggingface.co

Total runs: 65
Run Growth: 20
Growth Rate: 30.77%
Updated:Tháng bảy 26 2023
huggingface.co

Total runs: 48
Run Growth: 24
Growth Rate: 50.00%
Updated:Có thể 28 2024
huggingface.co

Total runs: 42
Run Growth: -11
Growth Rate: -26.19%
Updated:Tháng Một 31 2026
huggingface.co

Total runs: 39
Run Growth: 9
Growth Rate: 24.32%
Updated:Tháng hai 21 2024
huggingface.co

Total runs: 37
Run Growth: 11
Growth Rate: 31.43%
Updated:Tháng hai 21 2024
huggingface.co

Total runs: 35
Run Growth: -3
Growth Rate: -8.82%
Updated:Tháng tư 08 2024
huggingface.co

Total runs: 31
Run Growth: 6
Growth Rate: 21.43%
Updated:Tháng hai 21 2024
huggingface.co

Total runs: 27
Run Growth: 11
Growth Rate: 40.74%
Updated:Có thể 28 2024
huggingface.co

Total runs: 22
Run Growth: -59
Growth Rate: -268.18%
Updated:Bước đều 04 2025
huggingface.co

Total runs: 20
Run Growth: -24
Growth Rate: -120.00%
Updated:Tháng Mười 14 2023