Pretrained bidirectional encoder for russian language.
The model was trained using standard MLM objective on large text corpora including open social data.
See
Training Details
section for more information.
⚠️ This model contains only the encoder part without any pretrained head.
Languages:
Mostly russian and small fraction of other languages
License:
Apache 2.0
How to Get Started with the Model
from transformers import AutoTokenizer, AutoModel
tokenizer = AutoTokenizer.from_pretrained("deepvk/deberta-v1-base")
model = AutoModel.from_pretrained("deepvk/deberta-v1-base")
text = "Привет, мир!"
inputs = tokenizer(text, return_tensors='pt')
predictions = model(**inputs)
Training Details
Training Data
400 GB of filtered and deduplicated texts in total.
A mix of the following data: Wikipedia, Books, Twitter comments, Pikabu, Proza.ru, Film subtitles, News websites, and Social corpus.
Deduplication procedure
Calculate shingles with size of 5
Calculate MinHash with 100 seeds → for every sample (text) have a hash of size 100
Split every hash into 10 buckets → every bucket, which contains (100 / 10) = 10 numbers, get hashed into 1 hash → we have 10 hashes for every sample
For each bucket find duplicates: find samples which have the same hash → calculate pair-wise jaccard similarity → if the similarity is >0.7 than it's a duplicate
Gather duplicates from all the buckets and filter
Training Hyperparameters
Argument
Value
Training regime
fp16 mixed precision
Optimizer
AdamW
Adam betas
0.9,0.98
Adam eps
1e-6
Weight decay
1e-2
Batch size
2240
Num training steps
1kk
Num warm-up steps
10k
LR scheduler
Linear
LR
2e-5
Gradient norm
1.0
The model was trained on a machine with 8xA100 for approximately 30 days.
Architecture details
Argument
Value
Encoder layers
12
Encoder attention heads
12
Encoder embed dim
768
Encoder ffn embed dim
3,072
Activation function
GeLU
Attention dropout
0.1
Dropout
0.1
Max positions
512
Vocab size
50266
Tokenizer type
Byte-level BPE
Evaluation
We evaluated the model on
Russian Super Glue
dev set.
The best result in each task is marked in bold.
All models have the same size except the distilled version of DeBERTa.
deberta-v1-base huggingface.co is an AI model on huggingface.co that provides deberta-v1-base's model effect (), which can be used instantly with this deepvk deberta-v1-base model. huggingface.co supports a free trial of the deberta-v1-base model, and also provides paid use of the deberta-v1-base. Support call deberta-v1-base model through api, including Node.js, Python, http.
deberta-v1-base huggingface.co is an online trial and call api platform, which integrates deberta-v1-base's modeling effects, including api services, and provides a free online trial of deberta-v1-base, you can try deberta-v1-base online for free by clicking the link below.
deepvk deberta-v1-base online free url in huggingface.co:
deberta-v1-base is an open source model from GitHub that offers a free installation service, and any user can find deberta-v1-base on GitHub to install. At the same time, huggingface.co provides the effect of deberta-v1-base install, users can directly use deberta-v1-base installed effect in huggingface.co for debugging and trial. It also supports api for free installation.