potion-code-16M huggingface.co api & minishlab potion-code-16M github AI Model

Introduction of potion-code-16M

Model Details of potion-code-16M

potion-code-16M Model Card

Overview

potion-code-16M is a fast static code embedding model optimized for code retrieval tasks. It is distilled from nomic-ai/CodeRankEmbed and trained on the CornStack code corpus using Tokenlearn and contrastive fine-tuning.

It uses static embeddings, allowing text and code embeddings to be computed orders of magnitude faster than transformer-based models on both GPU and CPU.

Installation

pip install model2vec

Usage

from model2vec import StaticModel

model = StaticModel.from_pretrained("minishlab/potion-code-16M")

# Embed natural language queries
query_embeddings = model.encode(["How to read a file in Python?"])

# Embed code documents
code_embeddings = model.encode(["def read_file(path):\n    with open(path) as f:\n        return f.read()"])

How it works

potion-code-16M is created using the following pipeline:

Vocabulary mining : code-specific tokens are mined from CornStack and added to the base CodeRankEmbed tokenizer (42k extra tokens → ~62.5k total)
Distillation : the extended vocabulary is distilled from CodeRankEmbed using Model2Vec (256-dimensional embeddings, PCA whitening)
Tokenlearn : the distilled model is fine-tuned on 240k (query, document) pairs from CornStack using cosine similarity loss
Contrastive fine-tuning : the model is further fine-tuned using MultipleNegativesRankingLoss on 120k CornStack query-document pairs
Post-SIF re-regularization : token weights are re-regularized using SIF weighting after each training stage

Results

Results on the CoIR benchmark (NDCG@10, mteb>=2.10 ):

Model	Params	AVG	AppsRetrieval	COIRCodeSearchNet	CodeFeedbackMT	CodeFeedbackST	CodeSearchNetCC	CodeTransContest	CodeTransDL	CosQA	StackOverflow	Text2SQL
CodeRankEmbed	137M	59.14	23.46	94.70	42.61	78.11	76.39	66.43	34.84	35.92	80.53	58.37
potion-code-16M + Hybrid	16M	40.41	5.23	34.03	51.23	64.26	33.22	52.67	31.14	21.63	69.65	41.03
BM25	—	39.11	4.76	32.45	59.69	67.85	33.00	47.29	32.97	15.53	69.54	28.07
potion-code-16M	16M	37.05	3.97	42.99	36.26	50.27	43.40	39.76	31.72	21.37	57.47	43.34
potion-retrieval-32M	32M	32.10	4.22	31.80	36.71	45.11	38.64	29.97	32.62	8.70	56.26	36.93
potion-base-32M	32M	31.42	3.37	29.58	34.77	42.69	37.88	28.51	30.55	14.61	53.36	38.88

CoIR covers a broad range of code retrieval scenarios. For the use case of finding code given a natural language query, CosQA and CodeFeedback (ST/MT) are the most relevant tasks. Others are less so: COIRCodeSearchNetRetrieval retrieves text given a code query (the reverse direction), and the CodeTransOcean tasks target cross-language code translation. The hybrid row combines dense retrieval with BM25 using min-max score normalization and equal weighting (alpha=0.5).

Model Details

Property	Value
Parameters	~16M
Embedding dimensions	256
Vocabulary size	~62,500
Teacher model	nomic-ai/CodeRankEmbed
Training corpus	CornStack (6 languages: Python, Java, JavaScript, Go, PHP, Ruby)
Max sequence length	1,000,000 tokens (static, no limit in practice)

Reproducibility

The full training pipeline (distill → tokenlearn → contrastive) is in train.py . It requires minishlab/tokenlearn-cornstack-docs-coderankembed and minishlab/tokenlearn-cornstack-queries-coderankembed (20k samples per language used).

pip install model2vec tokenlearn sentence-transformers datasets skeletoken einops
python train.py

Citation

@software{minishlab2024model2vec,
  author       = {Stephan Tulkens and {van Dongen}, Thomas},
  title        = {Model2Vec: Fast State-of-the-Art Static Embeddings},
  year         = {2024},
  publisher    = {Zenodo},
  doi          = {10.5281/zenodo.17270888},
  url          = {https://github.com/MinishLab/model2vec},
  license      = {MIT}
}

Runs of minishlab potion-code-16M on huggingface.co

Total runs

24-hour runs

3-day runs

7-day runs

30-day runs

More Information About potion-code-16M huggingface.co Model

More potion-code-16M license Visit here:

https://choosealicense.com/licenses/mit

potion-code-16M huggingface.co

potion-code-16M huggingface.co is an AI model on huggingface.co that provides potion-code-16M's model effect (), which can be used instantly with this minishlab potion-code-16M model. huggingface.co supports a free trial of the potion-code-16M model, and also provides paid use of the potion-code-16M. Support call potion-code-16M model through api, including Node.js, Python, http.

potion-code-16M huggingface.co Url

https://huggingface.co/minishlab/potion-code-16M

minishlab potion-code-16M online free

potion-code-16M huggingface.co is an online trial and call api platform, which integrates potion-code-16M's modeling effects, including api services, and provides a free online trial of potion-code-16M, you can try potion-code-16M online for free by clicking the link below.

minishlab potion-code-16M online free url in huggingface.co:

https://huggingface.co/minishlab/potion-code-16M

potion-code-16M install

potion-code-16M is an open source model from GitHub that offers a free installation service, and any user can find potion-code-16M on GitHub to install. At the same time, huggingface.co provides the effect of potion-code-16M install, users can directly use potion-code-16M installed effect in huggingface.co for debugging and trial. It also supports api for free installation.

potion-code-16M install url in huggingface.co:

https://huggingface.co/minishlab/potion-code-16M

huggingface.co

minishlab/potion-base-8M

Total runs: 2.1M

Run Growth: 187.9K

Growth Rate: 8.97%

Updated:March 27 2026

huggingface.co

minishlab/potion-retrieval-32M

Total runs: 244.7K

Run Growth: 12.6K

Growth Rate: 5.16%

Updated:March 27 2026

huggingface.co

minishlab/potion-multilingual-128M

Total runs: 98.3K

Run Growth: -82.5K

Growth Rate: -84.01%

Updated:April 07 2026

huggingface.co

minishlab/potion-base-32M

Total runs: 76.5K

Run Growth: -2.4K

Growth Rate: -3.12%

Updated:March 27 2026

huggingface.co

minishlab/potion-base-2M

Total runs: 23.7K

Run Growth: -2.3K

Growth Rate: -9.65%

Updated:September 09 2025

huggingface.co

minishlab/potion-base-4M

Total runs: 4.7K

Run Growth: -6.5K

Growth Rate: -136.82%

Updated:September 09 2025

huggingface.co

minishlab/M2V_base_output

Total runs: 3.4K

Run Growth: 0

Growth Rate: 0.00%

Updated:October 15 2024

huggingface.co

minishlab/M2V_multilingual_output

Total runs: 1.8K

Run Growth: 0

Growth Rate: 0.00%

Updated:January 22 2025

huggingface.co

minishlab/potion-science-32M

Total runs: 391

Run Growth: 24

Growth Rate: 6.14%

Updated:January 22 2025

huggingface.co

minishlab/M2V_base_glove

Total runs: 122

Run Growth: 27

Growth Rate: 22.13%

Updated:January 22 2025

huggingface.co

minishlab/M2V_base_glove_subword

Total runs: 103

Run Growth: 24

Growth Rate: 23.30%

Updated:January 22 2025

huggingface.co

minishlab/potion-science-8M

Total runs: 26

Run Growth: 24

Growth Rate: 92.31%

Updated:January 22 2025

huggingface.co

minishlab/potion-8m-edu-classifier

Total runs: 7

Run Growth: 6

Growth Rate: 85.71%

Updated:August 18 2025

minishlab / potion-code-16M

Introduction of potion-code-16M

Model Details of potion-code-16M

potion-code-16M Model Card

Overview

Installation

Usage

How it works

Results

Model Details

Additional Resources

Reproducibility

Citation

Runs of minishlab potion-code-16M on huggingface.co

More Information About potion-code-16M huggingface.co Model

More potion-code-16M license Visit here:

potion-code-16M huggingface.co

potion-code-16M huggingface.co Url

minishlab potion-code-16M online free

minishlab potion-code-16M online free url in huggingface.co:

potion-code-16M install

potion-code-16M install url in huggingface.co:

Url of potion-code-16M

potion-code-16M huggingface.co Url

Provider of potion-code-16M huggingface.co

Other API from minishlab