SigLIP 2
extends the pretraining objective of
SigLIP
with prior, independently developed techniques
into a unified recipe, for improved semantic understanding, localization, and dense features.
Intended uses
You can use the raw model for tasks like zero-shot image classification and
image-text retrieval, or as a vision encoder for VLMs (and other vision tasks).
Here is how to use this model to perform zero-shot image classification:
from transformers import pipeline
# load pipeline
ckpt = "google/siglip2-base-patch16-224"
image_classifier = pipeline(model=ckpt, task="zero-shot-image-classification")
# load image and candidate labels
url = "http://images.cocodataset.org/val2017/000000039769.jpg"
candidate_labels = ["2 cats", "a plane", "a remote"]
# run inference
outputs = image_classifier(image, candidate_labels)
print(outputs)
You can encode an image using the Vision Tower like so:
import torch
from transformers import AutoModel, AutoProcessor
from transformers.image_utils import load_image
# load the model and processor
ckpt = "google/siglip2-base-patch16-224"
model = AutoModel.from_pretrained(ckpt, device_map="auto").eval()
processor = AutoProcessor.from_pretrained(ckpt)
# load the image
image = load_image("https://huggingface.co/datasets/merve/coco/resolve/main/val2017/000000000285.jpg")
inputs = processor(images=[image], return_tensors="pt").to(model.device)
# run infernecewith torch.no_grad():
image_embeddings = model.get_image_features(**inputs)
print(image_embeddings.shape)
The model was trained on up to 2048 TPU-v5e chips.
Evaluation results
Evaluation of SigLIP 2 is shown below (taken from the paper).
BibTeX entry and citation info
@misc{tschannen2025siglip2multilingualvisionlanguage,
title={SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features},
author={Michael Tschannen and Alexey Gritsenko and Xiao Wang and Muhammad Ferjad Naeem and Ibrahim Alabdulmohsin and Nikhil Parthasarathy and Talfan Evans and Lucas Beyer and Ye Xia and Basil Mustafa and Olivier Hénaff and Jeremiah Harmsen and Andreas Steiner and Xiaohua Zhai},
year={2025},
eprint={2502.14786},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2502.14786},
}
Runs of google siglip2-base-patch16-224 on huggingface.co
595.4K
Total runs
0
24-hour runs
-8.1K
3-day runs
253
7-day runs
38.2K
30-day runs
More Information About siglip2-base-patch16-224 huggingface.co Model
siglip2-base-patch16-224 huggingface.co is an AI model on huggingface.co that provides siglip2-base-patch16-224's model effect (), which can be used instantly with this google siglip2-base-patch16-224 model. huggingface.co supports a free trial of the siglip2-base-patch16-224 model, and also provides paid use of the siglip2-base-patch16-224. Support call siglip2-base-patch16-224 model through api, including Node.js, Python, http.
siglip2-base-patch16-224 huggingface.co is an online trial and call api platform, which integrates siglip2-base-patch16-224's modeling effects, including api services, and provides a free online trial of siglip2-base-patch16-224, you can try siglip2-base-patch16-224 online for free by clicking the link below.
google siglip2-base-patch16-224 online free url in huggingface.co:
siglip2-base-patch16-224 is an open source model from GitHub that offers a free installation service, and any user can find siglip2-base-patch16-224 on GitHub to install. At the same time, huggingface.co provides the effect of siglip2-base-patch16-224 install, users can directly use siglip2-base-patch16-224 installed effect in huggingface.co for debugging and trial. It also supports api for free installation.
siglip2-base-patch16-224 install url in huggingface.co: