rasyosef / splade-mini

huggingface.co
Total runs: 659
24-hour runs: 0
7-day runs: 94
30-day runs: 598
Model's Last Updated: October 07 2025
feature-extraction

Introduction of splade-mini

Model Details of splade-mini

SPLADE-BERT-Mini-Distil

This is a SPLADE sparse retrieval model based on BERT-Mini (11M) that was trained by distilling a Cross-Encoder on the MSMARCO dataset. The cross-encoder used was ms-marco-MiniLM-L6-v2 .

This mini SPLADE model is 6x smaller than Naver's official splade-v3-distilbert while having 85% of it's performance on the MSMARCO benchmark. This model is small enough to be used without a GPU on a dataset of a few thousand documents.

Performance

The splade models were evaluated on 55 thousand queries and 8 million documents from the MSMARCO dataset.

Size (# Params) MRR@10 (MS MARCO dev)
BM25 - 18.6
rasyosef/splade-tiny 4.4M 30.8
rasyosef/splade-mini 11.2M 32.8
naver/splade-v3-distilbert 67.0M 38.7
Usage
Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SparseEncoder

# Download from the 🤗 Hub
model = SparseEncoder("rasyosef/splade-mini")
# Run inference
queries = [
    "definition of fermentation in the lab",
]
documents = [
    'Fermentation is a metabolic pathway that produce ATP molecules under anaerobic conditions (only undergoes glycolysis), NAD+ is used directly in glycolysis to form ATP molecules, which is not as efficient as cellular respiration because only 2ATP molecules are formed during the glycolysis.',
    'Essay on Yeast Fermentation ... Yeast Fermentation Lab Report The purpose of this experiment was to observe the process in which cells must partake in a respiration process called anaerobic fermentation and as the name suggests, oxygen is not required.',
    '\ufeffYeast Fermentation Lab Report The purpose of this experiment was to observe the process in which cells must partake in a respiration process called anaerobic fermentation and as the name suggests, oxygen is not required.',
]
query_embeddings = model.encode_query(queries)
document_embeddings = model.encode_document(documents)
print(query_embeddings.shape, document_embeddings.shape)
# [1, 30522] [3, 30522]

# Get the similarity scores for the embeddings
similarities = model.similarity(query_embeddings, document_embeddings)
print(similarities)
# tensor([[20.0220, 17.1372, 15.9159]])
Model Details
Model Description
  • Model Type: SPLADE Sparse Encoder
  • Base model: prajjwal1/bert-mini
  • Maximum Sequence Length: 512 tokens
  • Output Dimensionality: 30522 dimensions
  • Similarity Function: Dot Product
  • Language: en
  • License: mit
Model Sources
Full Model Architecture
SparseEncoder(
  (0): MLMTransformer({'max_seq_length': 512, 'do_lower_case': False, 'architecture': 'BertForMaskedLM'})
  (1): SpladePooling({'pooling_strategy': 'max', 'activation_function': 'relu', 'word_embedding_dimension': 30522})
)
More
Click to expand ## Evaluation
Metrics
Sparse Information Retrieval
Metric Value
dot_accuracy@1 0.4828
dot_accuracy@3 0.8052
dot_accuracy@5 0.9046
dot_accuracy@10 0.9666
dot_precision@1 0.4828
dot_precision@3 0.2757
dot_precision@5 0.1879
dot_precision@10 0.1016
dot_recall@1 0.4673
dot_recall@3 0.792
dot_recall@5 0.8949
dot_recall@10 0.9624
dot_ndcg@10 0.7302
dot_mrr@10 0.658
dot_map@100 0.6535
query_active_dims 19.524
query_sparsity_ratio 0.9994
corpus_active_dims 113.4705
corpus_sparsity_ratio 0.9963
Training Details
Training Dataset
Unnamed Dataset
  • Size: 250,000 training samples
  • Columns: query , positive , negative_1 , negative_2 , negative_3 , negative_4 , and label
  • Approximate statistics based on the first 1000 samples:
    query positive negative_1 negative_2 negative_3 negative_4 label
    type string string string string string string list
    details
    • min: 4 tokens
    • mean: 8.87 tokens
    • max: 43 tokens
    • min: 24 tokens
    • mean: 81.23 tokens
    • max: 259 tokens
    • min: 20 tokens
    • mean: 79.21 tokens
    • max: 197 tokens
    • min: 20 tokens
    • mean: 77.89 tokens
    • max: 207 tokens
    • min: 18 tokens
    • mean: 76.38 tokens
    • max: 271 tokens
    • min: 18 tokens
    • mean: 75.46 tokens
    • max: 214 tokens
    • size: 4 elements
  • Samples:
    query positive negative_1 negative_2 negative_3 negative_4 label
    heart specialists in ridgeland ms Dr. George Reynolds Jr, MD is a cardiology specialist in Ridgeland, MS and has been practicing for 35 years. He graduated from Vanderbilt University School Of Medicine in 1977 and specializes in cardiology and internal medicine. Dr. James Kramer is a Internist in Ridgeland, MS. Find Dr. Kramer's phone number, address and more. Dr. James Kramer is an internist in Ridgeland, Mississippi. He received his medical degree from Loma Linda University School of Medicine and has been in practice for more than 20 years. Dr. James Kramer's Details Chronic Pulmonary Heart Diseases (incl. Pulmonary Hypertension) Coarctation of the Aorta; Congenital Aortic Valve Disorders; Congenital Heart Defects; Congenital Heart Disease; Congestive Heart Failure; Coronary Artery Disease (CAD) Endocarditis; Heart Attack (Acute Myocardial Infarction) Heart Disease; Heart Murmur; Heart Palpitations; Hyperlipidemia; Hypertension A growing shortage of primary care doctors means you might have to look harder for ongoing care. How to Read an OTC Medication Label Purvi Parikh, M.D. Feb. 12, 2018
    does baytril otic require a prescription Baytril Otic Ear Drops-Enrofloxacin/Silver Sulfadiazine-Prices & Information. A prescription is required for this item. A prescription is required for this item. Brand medication is not available at this time. RX required for this item. Click here for our full Prescription Policy and Form. Baytril Otic (enrofloxacin/silver sulfadiazine) Emulsion from Bayer is the first fluoroquinolone approved by the Food and Drug Administration for the topical treatment of canine otitis externa. Product Details. Baytril Otic is a highly effective treatment prescribed by many veterinarians when your pet has an ear infection caused by susceptible bacteria or fungus. Baytril Otic is: a liquid emulsion that is used topically directly in the ear or on the skin in order to treat susceptible bacterial and yeast infections. Baytril for dogs is an antibiotic often prescribed for bacterial infections, particularly those involving the ears. Ear infections are rare in many animals, but quite common in dogs. This is particularly true for dogs with long droopy ears, where it will stay very warm and moist. Administer 5-10 Baytril ear drops per treatment in dogs 35 lbs or less and 10-15 drops per treatment in dogs more than 35 lbs. [1.0, 3.640146493911743, 6.450072288513184, 11.96937084197998]
    what is on a gyro Report Abuse. Gyros or gyro (giros) (pronounced /ˈjɪəroʊ/ or /ˈdʒaɪroʊ/, Greek: γύρος turn) is a Greek dish consisting of meat (typically lamb and/or beef), tomato, onion, and tzatziki sauce, and is served with pita bread. Chicken and pork meat can be used too. A gyroscope (from Ancient Greek γῦρος gûros, circle and σκοπέω skopéō, to look) is a spinning wheel or disc in which the axis of rotation is free to assume any orientation by itself. When rotating, the orientation of this axis is unaffected by tilting or rotation of the mounting, according to the conservation of angular momentum. Diagram of a gyro wheel. Reaction arrows about the output axis (blue) correspond to forces applied about the input axis (green), and vice versa. A gyroscope is a wheel mounted in two or three gimbals, which are a pivoted supports that allow the rotation of the wheel about a single axis. A fair number of our users are unsure of how to pronounce gyro. This isn't surprising, since there are two different gyros and they have two different pronunciations. The earlier gyro is the one that is a shortened form of gyrocompass or gyroscope, and it has a pronunciation that conforms to one's expectations: /JEYE-roh/. Vibration Gyro Sensors. Vibration gyro sensors sense angular velocity from the Coriolis force applied to a vibrating element. For this reason, the accuracy with which angular velocity is measured differs significantly depending on element material and structural differences. [2.1750364303588867, 2.634796142578125, 4.30520486831665, 6.382436752319336]
  • Loss: SpladeLoss with these parameters:
    {
        "loss": "SparseMarginMSELoss",
        "document_regularizer_weight": 0.3,
        "query_regularizer_weight": 0.5
    }
    
Training Hyperparameters
Non-Default Hyperparameters
  • eval_strategy : epoch
  • per_device_train_batch_size : 48
  • per_device_eval_batch_size : 48
  • learning_rate : 8e-05
  • num_train_epochs : 6
  • lr_scheduler_type : cosine
  • warmup_ratio : 0.025
  • fp16 : True
  • load_best_model_at_end : True
  • optim : adamw_torch_fused
All Hyperparameters
Click to expand
  • overwrite_output_dir : False
  • do_predict : False
  • eval_strategy : epoch
  • prediction_loss_only : True
  • per_device_train_batch_size : 48
  • per_device_eval_batch_size : 48
  • per_gpu_train_batch_size : None
  • per_gpu_eval_batch_size : None
  • gradient_accumulation_steps : 1
  • eval_accumulation_steps : None
  • torch_empty_cache_steps : None
  • learning_rate : 8e-05
  • weight_decay : 0.0
  • adam_beta1 : 0.9
  • adam_beta2 : 0.999
  • adam_epsilon : 1e-08
  • max_grad_norm : 1.0
  • num_train_epochs : 6
  • max_steps : -1
  • lr_scheduler_type : cosine
  • lr_scheduler_kwargs : {}
  • warmup_ratio : 0.025
  • warmup_steps : 0
  • log_level : passive
  • log_level_replica : warning
  • log_on_each_node : True
  • logging_nan_inf_filter : True
  • save_safetensors : True
  • save_on_each_node : False
  • save_only_model : False
  • restore_callback_states_from_checkpoint : False
  • no_cuda : False
  • use_cpu : False
  • use_mps_device : False
  • seed : 42
  • data_seed : None
  • jit_mode_eval : False
  • use_ipex : False
  • bf16 : False
  • fp16 : True
  • fp16_opt_level : O1
  • half_precision_backend : auto
  • bf16_full_eval : False
  • fp16_full_eval : False
  • tf32 : None
  • local_rank : 0
  • ddp_backend : None
  • tpu_num_cores : None
  • tpu_metrics_debug : False
  • debug : []
  • dataloader_drop_last : False
  • dataloader_num_workers : 0
  • dataloader_prefetch_factor : None
  • past_index : -1
  • disable_tqdm : False
  • remove_unused_columns : True
  • label_names : None
  • load_best_model_at_end : True
  • ignore_data_skip : False
  • fsdp : []
  • fsdp_min_num_params : 0
  • fsdp_config : {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap : None
  • accelerator_config : {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed : None
  • label_smoothing_factor : 0.0
  • optim : adamw_torch_fused
  • optim_args : None
  • adafactor : False
  • group_by_length : False
  • length_column_name : length
  • ddp_find_unused_parameters : None
  • ddp_bucket_cap_mb : None
  • ddp_broadcast_buffers : False
  • dataloader_pin_memory : True
  • dataloader_persistent_workers : False
  • skip_memory_metrics : True
  • use_legacy_prediction_loop : False
  • push_to_hub : False
  • resume_from_checkpoint : None
  • hub_model_id : None
  • hub_strategy : every_save
  • hub_private_repo : None
  • hub_always_push : False
  • hub_revision : None
  • gradient_checkpointing : False
  • gradient_checkpointing_kwargs : None
  • include_inputs_for_metrics : False
  • include_for_metrics : []
  • eval_do_concat_batches : True
  • fp16_backend : auto
  • push_to_hub_model_id : None
  • push_to_hub_organization : None
  • mp_parameters :
  • auto_find_batch_size : False
  • full_determinism : False
  • torchdynamo : None
  • ray_scope : last
  • ddp_timeout : 1800
  • torch_compile : False
  • torch_compile_backend : None
  • torch_compile_mode : None
  • include_tokens_per_second : False
  • include_num_input_tokens_seen : False
  • neftune_noise_alpha : None
  • optim_target_modules : None
  • batch_eval_metrics : False
  • eval_on_start : False
  • use_liger_kernel : False
  • liger_kernel_config : None
  • eval_use_gather_object : False
  • average_tokens_across_devices : False
  • prompts : None
  • batch_sampler : batch_sampler
  • multi_dataset_batch_sampler : proportional
  • router_mapping : {}
  • learning_rate_mapping : {}
Training Logs
Epoch Step Training Loss dot_ndcg@10
1.0 5209 30541.8683 0.6969
2.0 10418 13.3966 0.7167
3.0 15627 11.6531 0.7262
4.0 20836 9.9781 0.7280
5.0 26045 8.881 0.7289
6.0 31254 8.3454 0.7302
  • The bold row denotes the saved checkpoint.
Framework Versions
  • Python: 3.11.13
  • Sentence Transformers: 5.0.0
  • Transformers: 4.53.2
  • PyTorch: 2.6.0+cu124
  • Accelerate: 1.8.1
  • Datasets: 4.0.0
  • Tokenizers: 0.21.2
Citation
BibTeX
Sentence Transformers
@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}
SpladeLoss
@misc{formal2022distillationhardnegativesampling,
      title={From Distillation to Hard Negative Sampling: Making Sparse Neural IR Models More Effective},
      author={Thibault Formal and Carlos Lassance and Benjamin Piwowarski and Stéphane Clinchant},
      year={2022},
      eprint={2205.04733},
      archivePrefix={arXiv},
      primaryClass={cs.IR},
      url={https://arxiv.org/abs/2205.04733},
}
SparseMarginMSELoss
@misc{hofstätter2021improving,
    title={Improving Efficient Neural Ranking Models with Cross-Architecture Knowledge Distillation},
    author={Sebastian Hofstätter and Sophia Althammer and Michael Schröder and Mete Sertkan and Allan Hanbury},
    year={2021},
    eprint={2010.02666},
    archivePrefix={arXiv},
    primaryClass={cs.IR}
}
FlopsLoss
@article{paria2020minimizing,
    title={Minimizing flops to learn efficient sparse representations},
    author={Paria, Biswajit and Yeh, Chih-Kuan and Yen, Ian EH and Xu, Ning and Ravikumar, Pradeep and P{'o}czos, Barnab{'a}s},
    journal={arXiv preprint arXiv:2004.05665},
    year={2020}
}

Runs of rasyosef splade-mini on huggingface.co

659
Total runs
0
24-hour runs
0
3-day runs
94
7-day runs
598
30-day runs

More Information About splade-mini huggingface.co Model

More splade-mini license Visit here:

https://choosealicense.com/licenses/mit

splade-mini huggingface.co

splade-mini huggingface.co is an AI model on huggingface.co that provides splade-mini's model effect (), which can be used instantly with this rasyosef splade-mini model. huggingface.co supports a free trial of the splade-mini model, and also provides paid use of the splade-mini. Support call splade-mini model through api, including Node.js, Python, http.

splade-mini huggingface.co Url

https://huggingface.co/rasyosef/splade-mini

rasyosef splade-mini online free

splade-mini huggingface.co is an online trial and call api platform, which integrates splade-mini's modeling effects, including api services, and provides a free online trial of splade-mini, you can try splade-mini online for free by clicking the link below.

rasyosef splade-mini online free url in huggingface.co:

https://huggingface.co/rasyosef/splade-mini

splade-mini install

splade-mini is an open source model from GitHub that offers a free installation service, and any user can find splade-mini on GitHub to install. At the same time, huggingface.co provides the effect of splade-mini install, users can directly use splade-mini installed effect in huggingface.co for debugging and trial. It also supports api for free installation.

splade-mini install url in huggingface.co:

https://huggingface.co/rasyosef/splade-mini

Url of splade-mini

splade-mini huggingface.co Url

Provider of splade-mini huggingface.co

rasyosef
ORGANIZATIONS

Other API from rasyosef

huggingface.co

Total runs: 5.1K
Run Growth: 794
Growth Rate: 15.69%
Updated:October 07 2025
huggingface.co

Total runs: 0
Run Growth: -1
Growth Rate: 0.00%
Updated:September 16 2024