The output from Automated Speak Recognition software is usually uncased and without any punctation. This does not make a very readable text.
The DeUnCaser is a sequence-to-sequence model that is reversing this process. It adds punctation, and capitalises the correct words. In some languages this means adding capital letters at start of sentences and on all proper nouns, in other languages, like German, it means capitalising the first letter of all nouns. It will also make attempts at adding hyphens and parentheses if this is making the meaning clearer.
It is using based on the multi-lingual T5 model. It is finetuned for 100,000 steps. The finetuning scripts is based on 100,000 training examples from each of the 44 languages with Latin alphabet that is both part of OSCAR and the mT5 training set: Afrikaans, Albanian, Basque, Catalan, Cebuano, Czech, Danish, Dutch, English, Esperanto, Estonian, Finnish, French, Galician, German, Haitian Creole, Hungarian, Icelandic, Indonesian, Irish, Italian, Kurdish, Latin, Latvian, Lithuanian, Luxembourgish, Malagasy, Malay, Maltese, Norwegian Bokmål, Norwegian Nynorsk, Polish, Portuguese, Romanian, Slovak, Spanish, Sundanese, Swahili, Swedish, Turkish, Uzbek, Vietnamese, Welsh, West Frisian.
A Notebook for creating the training corpus is available
here
.
Runs of pere multi-sentencefix-mt5 on huggingface.co
0
Total runs
0
24-hour runs
0
3-day runs
0
7-day runs
0
30-day runs
More Information About multi-sentencefix-mt5 huggingface.co Model
multi-sentencefix-mt5 huggingface.co is an AI model on huggingface.co that provides multi-sentencefix-mt5's model effect (), which can be used instantly with this pere multi-sentencefix-mt5 model. huggingface.co supports a free trial of the multi-sentencefix-mt5 model, and also provides paid use of the multi-sentencefix-mt5. Support call multi-sentencefix-mt5 model through api, including Node.js, Python, http.
multi-sentencefix-mt5 huggingface.co is an online trial and call api platform, which integrates multi-sentencefix-mt5's modeling effects, including api services, and provides a free online trial of multi-sentencefix-mt5, you can try multi-sentencefix-mt5 online for free by clicking the link below.
pere multi-sentencefix-mt5 online free url in huggingface.co:
multi-sentencefix-mt5 is an open source model from GitHub that offers a free installation service, and any user can find multi-sentencefix-mt5 on GitHub to install. At the same time, huggingface.co provides the effect of multi-sentencefix-mt5 install, users can directly use multi-sentencefix-mt5 installed effect in huggingface.co for debugging and trial. It also supports api for free installation.
multi-sentencefix-mt5 install url in huggingface.co: