# from transformers import AutoTokenizer
model_name = "michaelfeil/ct2fast-LaBSE"
model_name_orig="setu4993/LaBSE"from hf_hub_ctranslate2 import EncoderCT2fromHfHub
model = EncoderCT2fromHfHub(
# load in int8 on CUDA
model_name_or_path=model_name,
device="cuda",
compute_type="int8_float16"
)
outputs = model.generate(
text=["I like soccer", "I like tennis", "The eiffel tower is in Paris"],
max_length=64,
) # perform downstream tasks on outputs
outputs["pooler_output"]
outputs["last_hidden_state"]
outputs["attention_mask"]
# alternative, use SentenceTransformer Mix-In# for end-to-end Sentence embeddings generation# (not pulling from this CT2fast-HF repo)from hf_hub_ctranslate2 import CT2SentenceTransformer
model = CT2SentenceTransformer(
model_name_orig, compute_type="int8_float16", device="cuda"
)
embeddings = model.encode(
["I like soccer", "I like tennis", "The eiffel tower is in Paris"],
batch_size=32,
convert_to_numpy=True,
normalize_embeddings=True,
)
print(embeddings.shape, embeddings)
scores = (embeddings @ embeddings.T) * 100# Hint: you can also host this code via REST API and# via github.com/michaelfeil/infinity
This is just a quantized version. Licence conditions are intended to be idential to original huggingface repo.
Original description
LaBSE
Model description
Language-agnostic BERT Sentence Encoder (LaBSE) is a BERT-based model trained for sentence embedding for 109 languages. The pre-training process combines masked language modeling with translation language modeling. The model is useful for getting multilingual sentence embeddings and for bi-text retrieval.
This is migrated from the v2 model on the TF Hub, which uses dict-based input. The embeddings produced by both the versions of the model are
equivalent
.
Usage
Using the model:
import torch
from transformers import BertModel, BertTokenizerFast
tokenizer = BertTokenizerFast.from_pretrained("setu4993/LaBSE")
model = BertModel.from_pretrained("setu4993/LaBSE")
model = model.eval()
english_sentences = [
"dog",
"Puppies are nice.",
"I enjoy taking long walks along the beach with my dog.",
]
english_inputs = tokenizer(english_sentences, return_tensors="pt", padding=True)
with torch.no_grad():
english_outputs = model(**english_inputs)
To get the sentence embeddings, use the pooler output:
Details about data, training, evaluation and performance metrics are available in the
original paper
.
BibTeX entry and citation info
@misc{feng2020languageagnostic,
title={Language-agnostic BERT Sentence Embedding},
author={Fangxiaoyu Feng and Yinfei Yang and Daniel Cer and Naveen Arivazhagan and Wei Wang},
year={2020},
eprint={2007.01852},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
Runs of michaelfeil ct2fast-LaBSE on huggingface.co
740
Total runs
0
24-hour runs
85
3-day runs
429
7-day runs
138
30-day runs
More Information About ct2fast-LaBSE huggingface.co Model
ct2fast-LaBSE huggingface.co is an AI model on huggingface.co that provides ct2fast-LaBSE's model effect (), which can be used instantly with this michaelfeil ct2fast-LaBSE model. huggingface.co supports a free trial of the ct2fast-LaBSE model, and also provides paid use of the ct2fast-LaBSE. Support call ct2fast-LaBSE model through api, including Node.js, Python, http.
ct2fast-LaBSE huggingface.co is an online trial and call api platform, which integrates ct2fast-LaBSE's modeling effects, including api services, and provides a free online trial of ct2fast-LaBSE, you can try ct2fast-LaBSE online for free by clicking the link below.
michaelfeil ct2fast-LaBSE online free url in huggingface.co:
ct2fast-LaBSE is an open source model from GitHub that offers a free installation service, and any user can find ct2fast-LaBSE on GitHub to install. At the same time, huggingface.co provides the effect of ct2fast-LaBSE install, users can directly use ct2fast-LaBSE installed effect in huggingface.co for debugging and trial. It also supports api for free installation.