sCellTransformer (sCT) is a long-range foundation model designed for zero-shot
prediction tasks in single-cell RNA-seq and spatial transcriptomics data. It processes
raw gene expression profiles across multiple cells to predict discretized gene
expression levels for unseen cells without retraining. The model can handle up to 20,000
protein-coding genes and a bag of 50 cells in the same sample. This ability
(around a million-gene expressions tokens) allows it to learn cross-cell
relationships and capture long-range dependencies in gene expression data,
and to mitigate the sparsity typical in single-cell datasets.
sCT is trained on a large dataset of single-cell RNA-seq and finetuned on spatial
transcriptomics data. Evaluation tasks include zero-shot imputation of masked gene
expression, and zero-shot prediction of cell types.
Until its next release, the transformers library needs to be installed from source with
the following command in order to use the models.
PyTorch should also be installed.
A more concrete example is provided in the notebook example on one of the downstream
evaluation dataset.
Training data
The model was trained following a two-step procedure:
pre-training on single-cell data, then finetuning on spatial transcriptomics data.
The single-cell data used for pre-training, comes from the
Cellxgene Census collection datasets
used to train the scGPT models. It consists of around 50 millions
cells and approximately 60,000 genes. The spatial data comes from both the
human
breast cell atlas
and
the human heart atlas
.
Training procedure
As detailed in the paper, the gene expressions are first binned into a pre-defined
number of bins. This allows the model to better learn the distribution of the gene
expressions through sparsity mitigation, noise reduction, and extreme-values handling.
Then, the training objective is to predict the masked gene expressions in a cell,
following a BERT-like style training.
BibTeX entry and citation info
@misc{joshi2025a,
title={A long range foundation model for zero-shot predictions in single-cell and
spatial transcriptomics data},
author={Ameya Joshi and Raphael Boige and Lee Zamparo and Ugo Tanielian and Juan Jose
Garau-Luis and Michail Chatzianastasis and Priyanka Pandey and Janik Sielemann and
Alexander Seifert and Martin Brand and Maren Lang and Karim Beguir and Thomas PIERROT},
year={2025},
url={https://openreview.net/forum?id=VdX9tL3VXH}
}
Runs of InstaDeepAI sCellTransformer on huggingface.co
9
Total runs
0
24-hour runs
0
3-day runs
-4
7-day runs
-8
30-day runs
More Information About sCellTransformer huggingface.co Model
sCellTransformer huggingface.co
sCellTransformer huggingface.co is an AI model on huggingface.co that provides sCellTransformer's model effect (), which can be used instantly with this InstaDeepAI sCellTransformer model. huggingface.co supports a free trial of the sCellTransformer model, and also provides paid use of the sCellTransformer. Support call sCellTransformer model through api, including Node.js, Python, http.
sCellTransformer huggingface.co is an online trial and call api platform, which integrates sCellTransformer's modeling effects, including api services, and provides a free online trial of sCellTransformer, you can try sCellTransformer online for free by clicking the link below.
InstaDeepAI sCellTransformer online free url in huggingface.co:
sCellTransformer is an open source model from GitHub that offers a free installation service, and any user can find sCellTransformer on GitHub to install. At the same time, huggingface.co provides the effect of sCellTransformer install, users can directly use sCellTransformer installed effect in huggingface.co for debugging and trial. It also supports api for free installation.