BSC-LT / PL-BERT-ca

huggingface.co
Total runs: 0
24-hour runs: 0
7-day runs: 0
30-day runs: 0
Model's Last Updated: March 06 2026

Introduction of PL-BERT-ca

Model Details of PL-BERT-ca

PL-BERT-ca

Overview
Click to expand

Model Description

PL-BERT-ca is a phoneme-level masked language model trained on Catalan text with diverse regional accents. It is based on the PL-BERT architecture , which learns phoneme representations via a BERT-style masked language modeling objective.

This model is designed to support phoneme-based text-to-speech (TTS) systems , including but not limited to StyleTTS2 . Thanks to its Catalan-specific phoneme vocabulary and contextual embedding capabilities, it can serve as a phoneme encoder for any TTS architecture requiring phoneme-level features.

Features of our PL-BERT:

  • It is trained exclusively on Catalan phonemized text.
  • It uses a reduced phoneme vocabulary of 178 tokens .
  • It uses a simple tokenizer for words.
  • It includes a custom token_maps.pkl and adapted util.py .

Intended Uses and Limitations
Intended uses
  • Integration into phoneme-based TTS pipelines such as StyleTTS2, Matxa-TTS, or custom diffusion-based synthesizers.
  • Accent-aware synthesis and phoneme embedding extraction for Catalan.
  • Research on phoneme-level language modeling in low-resource or multi-accent settings.
Limitations
  • Not designed for general NLP tasks like classification or sentiment analysis.
  • Only supports Catalan phoneme tokens.
  • Some accents may be underrepresented in the training data.

How to Get Started with the Model

Here is an example of how to use this model within the StyleTTS2 framework:

  1. Clone the StyleTTS2 repository: https://github.com/yl4579/StyleTTS2

  2. Inside the Utils directory, create a new folder, for example: PLBERT_cat_multiaccent .

  3. Copy the following files into that folder:

    • config.yml (training configuration)
    • step_1000000.t7 (trained checkpoint)
    • token_maps.pkl (phoneme to ID mapping)
    • util.py (modified to fix position ID loading)
  4. In your StyleTTS2 configuration file, update the PLBERT_dir entry to:

    PLBERT_dir: Utils/PLBERT_cat_multiaccent

  5. Update the import statement in your code to:

    from Utils.PLBERT_cat_multiaccent.util import load_plbert

  6. Use espeak-ng with the language code ca to phonemize your Catalan text files for training and validation.

Note: Although this example uses StyleTTS2, the model is compatible with other TTS architectures that operate on phoneme sequences. You can use the contextualized phoneme embeddings from PL-BERT in any compatible synthesis system.


Training Details
Training data

The model was trained on a Catalan corpus phonemized using espeak-ng. The dataset includes sentences from speakers across Catalonia, Balearic Islands, and Valencia. It uses a consistent phoneme token set with boundary markers and masking tokens.

Tokenizer: custom (split using whitespaces)
Phoneme masking strategy: word-level and phoneme-level masking and replacement
Training steps: 1,000,000
Precision: Mixed (fp16)

Training configuration

Model parameters:

  • Vocabulary size: 178
  • Hidden size: 768
  • Attention heads: 12
  • Intermediate size: 2048
  • Number of layers: 12
  • Max position embeddings: 512
  • Dropout: 0.1

Other parameters:

  • Batch size: 8
  • Max mel length: 512
  • Word mask probability: 0.15
  • Phoneme mask probability: 0.1
  • Replacement probability: 0.2
  • Token separator: space
  • Token mask: M
  • Word separator ID: 102

Evaluation

The model has not been benchmarked via perplexity or extrinsic evaluation, but has been successfully integrated into TTS pipelines such as StyleTTS2, where it enables the synthesis of Catalan with regional accent variation.


Citation

If this code contributes to your research, please cite the work:

@misc{zevallos2025plbertca,
      title={PL-BERT-ca}, 
      author={Rodolfo Zevallos, Jose Giraldo and Carme Armentano-Oller},
      organization={Barcelona Supercomputing Center},
      url={https://huggingface.co/langtech-veu/PL-BERT-ca},
      year={2025}
}
Additional Information
Author

The Language Technologies Laboratory of the Barcelona Supercomputing Center by Rodolfo Zevallos .

Contact

For further information, please send an email to [email protected] .

Copyright

Copyright(c) 2025 by Language Technologies Laboratory, Barcelona Supercomputing Center.

License

Apache-2.0

Funding

This work is funded by the Ministerio para la Transformación Digital y de la Función Pública - Funded by EU – NextGenerationEU within the framework of the project Desarrollo de Modelos ALIA.

Runs of BSC-LT PL-BERT-ca on huggingface.co

0
Total runs
0
24-hour runs
0
3-day runs
0
7-day runs
0
30-day runs

More Information About PL-BERT-ca huggingface.co Model

More PL-BERT-ca license Visit here:

https://choosealicense.com/licenses/apache-2.0

PL-BERT-ca huggingface.co

PL-BERT-ca huggingface.co is an AI model on huggingface.co that provides PL-BERT-ca's model effect (), which can be used instantly with this BSC-LT PL-BERT-ca model. huggingface.co supports a free trial of the PL-BERT-ca model, and also provides paid use of the PL-BERT-ca. Support call PL-BERT-ca model through api, including Node.js, Python, http.

PL-BERT-ca huggingface.co Url

https://huggingface.co/BSC-LT/PL-BERT-ca

BSC-LT PL-BERT-ca online free

PL-BERT-ca huggingface.co is an online trial and call api platform, which integrates PL-BERT-ca's modeling effects, including api services, and provides a free online trial of PL-BERT-ca, you can try PL-BERT-ca online for free by clicking the link below.

BSC-LT PL-BERT-ca online free url in huggingface.co:

https://huggingface.co/BSC-LT/PL-BERT-ca

PL-BERT-ca install

PL-BERT-ca is an open source model from GitHub that offers a free installation service, and any user can find PL-BERT-ca on GitHub to install. At the same time, huggingface.co provides the effect of PL-BERT-ca install, users can directly use PL-BERT-ca installed effect in huggingface.co for debugging and trial. It also supports api for free installation.

PL-BERT-ca install url in huggingface.co:

https://huggingface.co/BSC-LT/PL-BERT-ca

Url of PL-BERT-ca

PL-BERT-ca huggingface.co Url

Provider of PL-BERT-ca huggingface.co

BSC-LT
ORGANIZATIONS

Other API from BSC-LT

huggingface.co

Total runs: 4.7K
Run Growth: 4.2K
Growth Rate: 89.25%
Updated:April 10 2026
huggingface.co

Total runs: 888
Run Growth: 879
Growth Rate: 98.99%
Updated:October 29 2024
huggingface.co

Total runs: 501
Run Growth: -849
Growth Rate: -169.46%
Updated:October 22 2025
huggingface.co

Total runs: 436
Run Growth: -1.1K
Growth Rate: -237.16%
Updated:March 27 2026
huggingface.co

Total runs: 415
Run Growth: 56
Growth Rate: 13.49%
Updated:October 22 2025
huggingface.co

Total runs: 359
Run Growth: -565
Growth Rate: -157.38%
Updated:April 10 2026
huggingface.co

Total runs: 329
Run Growth: -188
Growth Rate: -56.97%
Updated:August 07 2025
huggingface.co

Total runs: 202
Run Growth: 37
Growth Rate: 18.32%
Updated:October 22 2025
huggingface.co

Total runs: 119
Run Growth: 16
Growth Rate: 13.45%
Updated:September 06 2021
huggingface.co

Total runs: 101
Run Growth: 62
Growth Rate: 61.39%
Updated:April 10 2026
huggingface.co

Total runs: 64
Run Growth: 2
Growth Rate: 3.28%
Updated:April 22 2026
huggingface.co

Total runs: 35
Run Growth: 22
Growth Rate: 62.86%
Updated:October 26 2021
huggingface.co

Total runs: 33
Run Growth: -263
Growth Rate: -796.97%
Updated:April 10 2026
huggingface.co

Total runs: 5
Run Growth: 0
Growth Rate: 0.00%
Updated:September 10 2024
huggingface.co

Total runs: 0
Run Growth: -2
Growth Rate: 0.00%
Updated:December 18 2023
huggingface.co

Total runs: 0
Run Growth: 0
Growth Rate: 0.00%
Updated:March 09 2026