bert-base-german-cased huggingface.co api & google-bert bert-base-german-cased github AI Model

Introduction of bert-base-german-cased

Model Details of bert-base-german-cased

German BERT

Overview

Language model: bert-base-cased
Language: German
Training data: Wiki, OpenLegalData, News (~ 12GB)
Eval data: Conll03 (NER), GermEval14 (NER), GermEval18 (Classification), GNAD (Classification)
Infrastructure : 1x TPU v2
Published : Jun 14th, 2019

Update April 3rd, 2020 : we updated the vocabulary file on deepset's s3 to conform with the default tokenization of punctuation tokens. For details see the related FARM issue . If you want to use the old vocab we have also uploaded a "deepset/bert-base-german-cased-oldvocab" model.

Details

We trained using Google's Tensorflow code on a single cloud TPU v2 with standard settings.
We trained 810k steps with a batch size of 1024 for sequence length 128 and 30k steps with sequence length 512. Training took about 9 days.
As training data we used the latest German Wikipedia dump (6GB of raw txt files), the OpenLegalData dump (2.4 GB) and news articles (3.6 GB).
We cleaned the data dumps with tailored scripts and segmented sentences with spacy v2.1. To create tensorflow records we used the recommended sentencepiece library for creating the word piece vocabulary and tensorflow scripts to convert the text to data usable by BERT.

See https://deepset.ai/german-bert for more details

Hyperparameters

batch_size = 1024
n_steps = 810_000
max_seq_len = 128 (and 512 later)
learning_rate = 1e-4
lr_schedule = LinearWarmup
num_warmup_steps = 10_000

Performance

During training we monitored the loss and evaluated different model checkpoints on the following German datasets:

germEval18Fine: Macro f1 score for multiclass sentiment classification
germEval18coarse: Macro f1 score for binary sentiment classification
germEval14: Seq f1 score for NER (file names deuutf.*)
CONLL03: Seq f1 score for NER
10kGNAD: Accuracy for document classification

Even without thorough hyperparameter tuning, we observed quite stable learning especially for our German model. Multiple restarts with different seeds produced quite similar results.

We further evaluated different points during the 9 days of pre-training and were astonished how fast the model converges to the maximally reachable performance. We ran all 5 downstream tasks on 7 different model checkpoints - taken at 0 up to 840k training steps (x-axis in figure below). Most checkpoints are taken from early training where we expected most performance changes. Surprisingly, even a randomly initialized BERT can be trained only on labeled downstream datasets and reach good performance (blue line, GermEval 2018 Coarse task, 795 kB trainset size).

Authors

Branden Chan: branden.chan [at] deepset.ai
Timo Möller: timo.moeller [at] deepset.ai
Malte Pietsch: malte.pietsch [at] deepset.ai
Tanay Soni: tanay.soni [at] deepset.ai

About us

We bring NLP to the industry via open source!
Our focus: Industry specific language models & large scale QA systems.

Some of our work:

German BERT (aka "bert-base-german-cased")
FARM
Haystack

Get in touch: Twitter | LinkedIn | Website

Runs of google-bert bert-base-german-cased on huggingface.co

479.2K

Total runs

2.5K

24-hour runs

1.1K

3-day runs

8.0K

7-day runs

3.0K

30-day runs

More Information About bert-base-german-cased huggingface.co Model

More bert-base-german-cased license Visit here:

https://choosealicense.com/licenses/mit

bert-base-german-cased huggingface.co

bert-base-german-cased huggingface.co is an AI model on huggingface.co that provides bert-base-german-cased's model effect (), which can be used instantly with this google-bert bert-base-german-cased model. huggingface.co supports a free trial of the bert-base-german-cased model, and also provides paid use of the bert-base-german-cased. Support call bert-base-german-cased model through api, including Node.js, Python, http.

bert-base-german-cased huggingface.co Url

https://huggingface.co/google-bert/bert-base-german-cased

google-bert bert-base-german-cased online free

bert-base-german-cased huggingface.co is an online trial and call api platform, which integrates bert-base-german-cased's modeling effects, including api services, and provides a free online trial of bert-base-german-cased, you can try bert-base-german-cased online for free by clicking the link below.

google-bert bert-base-german-cased online free url in huggingface.co:

https://huggingface.co/google-bert/bert-base-german-cased

bert-base-german-cased install

bert-base-german-cased is an open source model from GitHub that offers a free installation service, and any user can find bert-base-german-cased on GitHub to install. At the same time, huggingface.co provides the effect of bert-base-german-cased install, users can directly use bert-base-german-cased installed effect in huggingface.co for debugging and trial. It also supports api for free installation.

bert-base-german-cased install url in huggingface.co:

https://huggingface.co/google-bert/bert-base-german-cased

huggingface.co

google-bert/bert-base-uncased

Total runs: 59.7M

Run Growth: -13.9M

Growth Rate: -23.90%

Updated:February 19 2024

huggingface.co

google-bert/bert-base-cased

Total runs: 4.3M

Run Growth: -552.9K

Growth Rate: -13.18%

Updated:February 19 2024

huggingface.co

google-bert/bert-base-multilingual-uncased

Total runs: 4.2M

Run Growth: -798.8K

Growth Rate: -19.72%

Updated:February 19 2024

huggingface.co

google-bert/bert-base-multilingual-cased

Total runs: 3.2M

Run Growth: -895.2K

Growth Rate: -30.49%

Updated:February 19 2024

huggingface.co

google-bert/bert-base-chinese

Total runs: 1.2M

Run Growth: -2.3M

Growth Rate: -203.52%

Updated:July 03 2025

huggingface.co

google-bert/bert-large-uncased

Total runs: 1.1M

Run Growth: 438.0K

Growth Rate: 38.58%

Updated:February 19 2024

huggingface.co

google-bert/bert-large-uncased-whole-word-masking-finetuned-squad

Total runs: 330.6K

Run Growth: 169.6K

Growth Rate: 50.95%

Updated:February 19 2024

huggingface.co

google-bert/bert-large-cased

Total runs: 115.1K

Run Growth: 41.1K

Growth Rate: 38.28%

Updated:February 19 2024

huggingface.co

google-bert/bert-large-cased-whole-word-masking-finetuned-squad

Total runs: 39.5K

Run Growth: -1.6K

Growth Rate: -4.30%

Updated:February 19 2024

huggingface.co

google-bert/bert-base-cased-finetuned-mrpc

Total runs: 33.1K

Run Growth: 2.6K

Growth Rate: 8.04%

Updated:February 19 2024

huggingface.co

google-bert/bert-base-german-dbmdz-uncased

Total runs: 10.2K

Run Growth: -2.4K

Growth Rate: -27.36%

Updated:February 19 2024

huggingface.co

google-bert/bert-large-uncased-whole-word-masking

Total runs: 7.4K

Run Growth: -414

Growth Rate: -5.66%

Updated:February 19 2024

huggingface.co

google-bert/bert-large-cased-whole-word-masking

Total runs: 589

Run Growth: 60

Growth Rate: 10.03%

Updated:April 10 2024

huggingface.co

google-bert/bert-base-german-dbmdz-cased

Total runs: 272

Run Growth: 190

Growth Rate: 69.34%

Updated:February 19 2024

google-bert / bert-base-german-cased

Introduction of bert-base-german-cased

Model Details of bert-base-german-cased

German BERT

Overview

Details

Hyperparameters

Performance

Authors

About us

Runs of google-bert bert-base-german-cased on huggingface.co

More Information About bert-base-german-cased huggingface.co Model

More bert-base-german-cased license Visit here:

bert-base-german-cased huggingface.co

bert-base-german-cased huggingface.co Url

google-bert bert-base-german-cased online free

google-bert bert-base-german-cased online free url in huggingface.co:

bert-base-german-cased install

bert-base-german-cased install url in huggingface.co:

Url of bert-base-german-cased

bert-base-german-cased huggingface.co Url

Provider of bert-base-german-cased huggingface.co

Other API from google-bert