semanticClone huggingface.co api & cgldo semanticClone github AI Model

Introduction of semanticClone

Model Details of semanticClone

gte-large

Gegeral Text Embeddings (GTE) model.

The GTE models are trained by Alibaba DAMO Academy. They are mainly based on the BERT framework and currently offer three different sizes of models, including GTE-large , GTE-base , and GTE-small . The GTE models are trained on a large-scale corpus of relevance text pairs, covering a wide range of domains and scenarios. This enables the GTE models to be applied to various downstream tasks of text embeddings, including information retrieval , semantic textual similarity , text reranking , etc.

Metrics

We compared the performance of the GTE models with other popular text embedding models on the MTEB benchmark. For more detailed comparison results, please refer to the MTEB leaderboard .

Model Name	Model Size (GB)	Dimension	Sequence Length	Average (56)	Clustering (11)	Pair Classification (3)	Reranking (4)	Retrieval (15)	STS (10)	Summarization (1)	Classification (12)
gte-large	0.67	1024	512	63.13	46.84	85.00	59.13	52.22	83.35	31.66	73.33
gte-base	0.22	768	512	62.39	46.2	84.57	58.61	51.14	82.3	31.17	73.01
e5-large-v2	1.34	1024	512	62.25	44.49	86.03	56.61	50.56	82.05	30.19	75.24
e5-base-v2	0.44	768	512	61.5	43.80	85.73	55.91	50.29	81.05	30.28	73.84
gte-small	0.07	384	512	61.36	44.89	83.54	57.7	49.46	82.07	30.42	72.31
text-embedding-ada-002	-	1536	8192	60.99	45.9	84.89	56.32	49.25	80.97	30.8	70.93
e5-small-v2	0.13	384	512	59.93	39.92	84.67	54.32	49.04	80.39	31.16	72.94
sentence-t5-xxl	9.73	768	512	59.51	43.72	85.06	56.42	42.24	82.63	30.08	73.42
all-mpnet-base-v2	0.44	768	514	57.78	43.69	83.04	59.36	43.81	80.28	27.49	65.07
sgpt-bloom-7b1-msmarco	28.27	4096	2048	57.59	38.93	81.9	55.65	48.22	77.74	33.6	66.19
all-MiniLM-L12-v2	0.13	384	512	56.53	41.81	82.41	58.44	42.69	79.8	27.9	63.21
all-MiniLM-L6-v2	0.09	384	512	56.26	42.35	82.37	58.04	41.95	78.9	30.81	63.05
contriever-base-msmarco	0.44	768	512	56.00	41.1	82.54	53.14	41.88	76.51	30.36	66.68
sentence-t5-base	0.22	768	512	55.27	40.21	85.18	53.09	33.63	81.14	31.39	69.81

Usage

Code example

import torch.nn.functional as F
from torch import Tensor
from transformers import AutoTokenizer, AutoModel

def average_pool(last_hidden_states: Tensor,
                 attention_mask: Tensor) -> Tensor:
    last_hidden = last_hidden_states.masked_fill(~attention_mask[..., None].bool(), 0.0)
    return last_hidden.sum(dim=1) / attention_mask.sum(dim=1)[..., None]

input_texts = [
    "what is the capital of China?",
    "how to implement quick sort in python?",
    "Beijing",
    "sorting algorithms"
]

tokenizer = AutoTokenizer.from_pretrained("thenlper/gte-large")
model = AutoModel.from_pretrained("thenlper/gte-large")

# Tokenize the input texts
batch_dict = tokenizer(input_texts, max_length=512, padding=True, truncation=True, return_tensors='pt')

outputs = model(**batch_dict)
embeddings = average_pool(outputs.last_hidden_state, batch_dict['attention_mask'])

# (Optionally) normalize embeddings
embeddings = F.normalize(embeddings, p=2, dim=1)
scores = (embeddings[:1] @ embeddings[1:].T) * 100
print(scores.tolist())

Use with sentence-transformers:

from sentence_transformers import SentenceTransformer
from sentence_transformers.util import cos_sim

sentences = ['That is a happy person', 'That is a very happy person']

model = SentenceTransformer('thenlper/gte-large')
embeddings = model.encode(sentences)
print(cos_sim(embeddings[0], embeddings[1]))

Limitation

This model exclusively caters to English texts, and any lengthy texts will be truncated to a maximum of 512 tokens.

Runs of cgldo semanticClone on huggingface.co

Total runs

-2

24-hour runs

-2

3-day runs

-3

7-day runs

-13

30-day runs

More Information About semanticClone huggingface.co Model

More semanticClone license Visit here:

https://choosealicense.com/licenses/mit

semanticClone huggingface.co

semanticClone huggingface.co is an AI model on huggingface.co that provides semanticClone's model effect (), which can be used instantly with this cgldo semanticClone model. huggingface.co supports a free trial of the semanticClone model, and also provides paid use of the semanticClone. Support call semanticClone model through api, including Node.js, Python, http.

semanticClone huggingface.co Url

https://huggingface.co/cgldo/semanticClone

cgldo semanticClone online free

semanticClone huggingface.co is an online trial and call api platform, which integrates semanticClone's modeling effects, including api services, and provides a free online trial of semanticClone, you can try semanticClone online for free by clicking the link below.

cgldo semanticClone online free url in huggingface.co:

https://huggingface.co/cgldo/semanticClone

semanticClone install

semanticClone is an open source model from GitHub that offers a free installation service, and any user can find semanticClone on GitHub to install. At the same time, huggingface.co provides the effect of semanticClone install, users can directly use semanticClone installed effect in huggingface.co for debugging and trial. It also supports api for free installation.

semanticClone install url in huggingface.co:

https://huggingface.co/cgldo/semanticClone

huggingface.co

cgldo/diffusionclone

Total runs: 20

Run Growth: 19

Growth Rate: 95.00%

Updated:August 04 2023

cgldo / semanticClone

Introduction of semanticClone

Model Details of semanticClone

gte-large

Metrics

Usage

Limitation

Runs of cgldo semanticClone on huggingface.co

More Information About semanticClone huggingface.co Model

More semanticClone license Visit here:

semanticClone huggingface.co

semanticClone huggingface.co Url

cgldo semanticClone online free

cgldo semanticClone online free url in huggingface.co:

semanticClone install

semanticClone install url in huggingface.co:

Url of semanticClone

semanticClone huggingface.co Url

Provider of semanticClone huggingface.co

Other API from cgldo