Exploring the Efficiency Limits of VLM with LLM-based Vision Encoders
📰 News
2025.03
— PenguinVL-Encoder now available for general use.
2025.03
— Released PenguinVL-2B, PenguinVL-8B.
🌟 Model Overview
PenguinVL is a compact Vision-Language Model, designed to explore the efficiency limits of small-scale VLMs.
Unlike most existing VLMs that rely on contrastive-pretrained vision encoders (e.g., CLIP/SigLIP), Penguin-VL initializes its vision encoder directly from a
text-only LLM
. This design avoids the objective mismatch between contrastive learning and autoregressive language modeling, enabling tighter alignment between visual representations and the language backbone.
Key Characteristics
🧠
LLM-based Vision Encoder
The vision encoder is adapted from a pretrained text LLM (Qwen3-0.6B), modified with bidirectional attention and 2D-RoPE for spatial modeling.
This provides strong semantic priors and native compatibility with the downstream LLM.
🧪 Quick Start — Transformers Inference
import torch
from transformers import AutoModel, AutoImageProcessor
from transformers.image_utils import load_image
model_name = "tencent/Penguin-Encoder"
image_path = "your_img.jpg"
images = load_image(image_path)
model = AutoModel.from_pretrained(
model_name,
trust_remote_code=True,
device_map="auto",
torch_dtype=torch.bfloat16,
attn_implementation="flash_attention_2",
)
processor = AutoImageProcessor.from_pretrained(model_name, trust_remote_code=True)
inputs = processor(images=images, merge_size=1)
inputs = {k: torch.tensor(v).cuda() for k, v in inputs.items()}
if"pixel_values"in inputs:
inputs["pixel_values"] = inputs["pixel_values"].to(torch.bfloat16)
image_features = model(**inputs)
Penguin-Encoder huggingface.co is an AI model on huggingface.co that provides Penguin-Encoder's model effect (), which can be used instantly with this tencent Penguin-Encoder model. huggingface.co supports a free trial of the Penguin-Encoder model, and also provides paid use of the Penguin-Encoder. Support call Penguin-Encoder model through api, including Node.js, Python, http.
Penguin-Encoder huggingface.co is an online trial and call api platform, which integrates Penguin-Encoder's modeling effects, including api services, and provides a free online trial of Penguin-Encoder, you can try Penguin-Encoder online for free by clicking the link below.
tencent Penguin-Encoder online free url in huggingface.co:
Penguin-Encoder is an open source model from GitHub that offers a free installation service, and any user can find Penguin-Encoder on GitHub to install. At the same time, huggingface.co provides the effect of Penguin-Encoder install, users can directly use Penguin-Encoder installed effect in huggingface.co for debugging and trial. It also supports api for free installation.