MiniCPM-Embedding
is a bilingual & cross-lingual text embedding model developed by ModelBest Inc. and THUNLP, featuring:
Exceptional Chinese and English retrieval capabilities.
Outstanding cross-lingual retrieval capabilities between Chinese and English.
MiniCPM-Embedding is trained based on
MiniCPM-2B-sft-bf16
and incorporates bidirectional attention and Weighted Mean Pooling [1] in its architecture. The model underwent multi-stage training using approximately 6 million training examples, including open-source, synthetic, and proprietary data.
We also invite you to explore the RAG toolkit series:
[1] Muennighoff, N. (2022). Sgpt: Gpt sentence embeddings for semantic search. arXiv preprint arXiv:2202.08904.
模型信息 Model Information
模型大小:2.4B
嵌入维度:2304
最大输入token数:512
Model Size: 2.4B
Embedding Dimension: 2304
Max Input Tokens: 512
使用方法 Usage
输入格式 Input Format
本模型支持 query 侧指令,格式如下:
MiniCPM-Embedding supports query-side instructions in the following format:
Instruction: {{ instruction }} Query: {{ query }}
例如:
For example:
Instruction: 为这个医学问题检索相关回答。Query: 咽喉癌的成因是什么?
Instruction: Given a claim about climate change, retrieve documents that support or refute the claim. Query: However the warming trend is slower than most climate models have forecast.
也可以不提供指令,即采取如下格式:
MiniCPM-Embedding also works in instruction-free mode in the following format:
When running evaluation on BEIR and C-MTEB/Retrieval, we use instructions in
instructions.json
. For other evaluations, we do not use instructions. On the document side, we directly use the bare document as the input.
The models and weights of MiniCPM-Embedding are completely free for academic research. After filling out a
"questionnaire"
for registration, MiniCPM-Embedding weights are also available for free commercial use.
Runs of openbmb MiniCPM-Embedding on huggingface.co
205
Total runs
-1
24-hour runs
20
3-day runs
23
7-day runs
55
30-day runs
More Information About MiniCPM-Embedding huggingface.co Model
MiniCPM-Embedding huggingface.co
MiniCPM-Embedding huggingface.co is an AI model on huggingface.co that provides MiniCPM-Embedding's model effect (), which can be used instantly with this openbmb MiniCPM-Embedding model. huggingface.co supports a free trial of the MiniCPM-Embedding model, and also provides paid use of the MiniCPM-Embedding. Support call MiniCPM-Embedding model through api, including Node.js, Python, http.
MiniCPM-Embedding huggingface.co is an online trial and call api platform, which integrates MiniCPM-Embedding's modeling effects, including api services, and provides a free online trial of MiniCPM-Embedding, you can try MiniCPM-Embedding online for free by clicking the link below.
openbmb MiniCPM-Embedding online free url in huggingface.co:
MiniCPM-Embedding is an open source model from GitHub that offers a free installation service, and any user can find MiniCPM-Embedding on GitHub to install. At the same time, huggingface.co provides the effect of MiniCPM-Embedding install, users can directly use MiniCPM-Embedding installed effect in huggingface.co for debugging and trial. It also supports api for free installation.