mmukh / SOBertLarge

huggingface.co
Total runs: 2
24-hour runs: 0
7-day runs: -1
30-day runs: -15
Model's Last Updated: March 10 2026

Introduction of SOBertLarge

Model Details of SOBertLarge

SOBertLarge

Model Description

SOBertLarge is a 762M parameter BERT model trained on 27 billion tokens of SO data StackOverflow answer and comment text using the Megatron Toolkit.

SOBert is pre-trained with 19 GB data presented as 15 million samples where each sample contains an entire post and all its corresponding comments. We also include all code in each answer so that our model is bimodal in nature. We use a SentencePiece tokenizer trained with BytePair Encoding, which has the benefit over WordPiece of never labeling tokens as “unknown". Additionally, SOBert is trained with a a maximum sequence length of 2048 based on the empirical length distribution of StackOverflow posts and a relatively large batch size of 0.5M tokens. A smaller 109 million parameter model can also be found here . More details can be found in the paper Stack Over-Flowing with Results: The Case for Domain-Specific Pre-Training Over One-Size-Fits-All Models .

How to use
from transformers import MegatronBertModel,PreTrainedTokenizerFast
model = MegatronBertModel.from_pretrained("mmukh/SOBertLarge")
tokenizer = PreTrainedTokenizerFast.from_pretrained("mmukh/SOBertLarge")
BibTeX entry and citation info
@article{mukherjee2023stack,
  title={Stack Over-Flowing with Results: The Case for Domain-Specific Pre-Training Over One-Size-Fits-All Models},
  author={Mukherjee, Manisha and Hellendoorn, Vincent J},
  journal={arXiv preprint arXiv:2306.03268},
  year={2023}
}

Runs of mmukh SOBertLarge on huggingface.co

2
Total runs
0
24-hour runs
0
3-day runs
-1
7-day runs
-15
30-day runs

More Information About SOBertLarge huggingface.co Model

SOBertLarge huggingface.co

SOBertLarge huggingface.co is an AI model on huggingface.co that provides SOBertLarge's model effect (), which can be used instantly with this mmukh SOBertLarge model. huggingface.co supports a free trial of the SOBertLarge model, and also provides paid use of the SOBertLarge. Support call SOBertLarge model through api, including Node.js, Python, http.

SOBertLarge huggingface.co Url

https://huggingface.co/mmukh/SOBertLarge

mmukh SOBertLarge online free

SOBertLarge huggingface.co is an online trial and call api platform, which integrates SOBertLarge's modeling effects, including api services, and provides a free online trial of SOBertLarge, you can try SOBertLarge online for free by clicking the link below.

mmukh SOBertLarge online free url in huggingface.co:

https://huggingface.co/mmukh/SOBertLarge

SOBertLarge install

SOBertLarge is an open source model from GitHub that offers a free installation service, and any user can find SOBertLarge on GitHub to install. At the same time, huggingface.co provides the effect of SOBertLarge install, users can directly use SOBertLarge installed effect in huggingface.co for debugging and trial. It also supports api for free installation.

SOBertLarge install url in huggingface.co:

https://huggingface.co/mmukh/SOBertLarge

Url of SOBertLarge

SOBertLarge huggingface.co Url

Provider of SOBertLarge huggingface.co

mmukh
ORGANIZATIONS

Other API from mmukh

huggingface.co

Total runs: 3
Run Growth: -26
Growth Rate: -866.67%
Updated:March 10 2026
huggingface.co

Total runs: 0
Run Growth: 0
Growth Rate: 0.00%
Updated:November 11 2025