VMware / vbert-2021-large

huggingface.co
Total runs: 16
24-hour runs: 0
7-day runs: 6
30-day runs: 10
Model's Last Updated: June 28 2023
fill-mask

Introduction of vbert-2021-large

Model Details of vbert-2021-large

vBERT-2021-LARGE

Model Info:
  • Authors: R&D AI Lab, VMware Inc.
  • Model date: April, 2022
  • Model version: 2021-base
  • Model type: Pretrained language model
  • License: Apache 2.0
Motivation

Traditional BERT models struggle with VMware-specific words (Tanzu, vSphere, etc.), technical terms, and compound words. ( Weaknesses of WordPiece Tokenization )

We pretrained thevBERT model to address the aforementioned issues using our We have pretrained our vBERT model to address the aforementioned issues using our BERT Pretraining Library .
We have replaced the first 1k unused tokens of BERT's vocabulary with VMware-specific terms to create a modified vocabulary. We then pretrained the 'bert-large-uncased' model for additional 66K steps (60k with MSL_128 and 6k with MSL_512) on VMware domain data.

Intended Use

The model functions as a VMware-specific Language Model.

How to Use

Here is how to use this model to get the features of a given text in PyTorch:

from transformers import BertTokenizer, BertModel
tokenizer = BertTokenizer.from_pretrained('VMware/vbert-2021-large')
model = BertModel.from_pretrained("VMware/vbert-2021-large")
text = "Replace me by any text you'd like."
encoded_input = tokenizer(text, return_tensors='pt')
output = model(**encoded_input)

and in TensorFlow:

from transformers import BertTokenizer, TFBertModel
tokenizer = BertTokenizer.from_pretrained('VMware/vbert-2021-large')
model = TFBertModel.from_pretrained('VMware/vbert-2021-large')
text = "Replace me by any text you'd like."
encoded_input = tokenizer(text, return_tensors='tf')
output = model(encoded_input)
Training
- Datasets

Publically available VMware text data such as VMware Docs, Blogs etc. were used for creating the pretraining corpus. Sourced in May, 2021. (~320,000 Documents)

- Preprocessing
  • Decoding HTML
  • Decoding Unicode
  • Stripping repeated characters
  • Splitting compound word
  • Spelling correction
- Model performance measures

We benchmarked vBERT on various VMware-specific NLP downstream tasks (IR, classification, etc). The model scored higher than the 'bert-base-uncased' model on all benchmarks.

Limitations and bias

Since the model is further pretrained on the BERT model, it may have the same biases embedded within the original BERT model.

The data needs to be preprocessed using our internal vNLP Preprocessor (not available to the public) to maximize its performance.

Runs of VMware vbert-2021-large on huggingface.co

16
Total runs
0
24-hour runs
0
3-day runs
6
7-day runs
10
30-day runs

More Information About vbert-2021-large huggingface.co Model

More vbert-2021-large license Visit here:

https://choosealicense.com/licenses/apache-2.0

vbert-2021-large huggingface.co

vbert-2021-large huggingface.co is an AI model on huggingface.co that provides vbert-2021-large's model effect (), which can be used instantly with this VMware vbert-2021-large model. huggingface.co supports a free trial of the vbert-2021-large model, and also provides paid use of the vbert-2021-large. Support call vbert-2021-large model through api, including Node.js, Python, http.

vbert-2021-large huggingface.co Url

https://huggingface.co/VMware/vbert-2021-large

VMware vbert-2021-large online free

vbert-2021-large huggingface.co is an online trial and call api platform, which integrates vbert-2021-large's modeling effects, including api services, and provides a free online trial of vbert-2021-large, you can try vbert-2021-large online for free by clicking the link below.

VMware vbert-2021-large online free url in huggingface.co:

https://huggingface.co/VMware/vbert-2021-large

vbert-2021-large install

vbert-2021-large is an open source model from GitHub that offers a free installation service, and any user can find vbert-2021-large on GitHub to install. At the same time, huggingface.co provides the effect of vbert-2021-large install, users can directly use vbert-2021-large installed effect in huggingface.co for debugging and trial. It also supports api for free installation.

vbert-2021-large install url in huggingface.co:

https://huggingface.co/VMware/vbert-2021-large

Url of vbert-2021-large

vbert-2021-large huggingface.co Url

Provider of vbert-2021-large huggingface.co

VMware
ORGANIZATIONS

Other API from VMware