Madras1 / RobertaBioClass

huggingface.co
Total runs: 4
24-hour runs: 0
7-day runs: 0
30-day runs: -1
Model's Last Updated: December 01 2025
text-classification

Introduction of RobertaBioClass

Model Details of RobertaBioClass

License: MIT Framework: PyTorch Task: Text Classification Language: Python

RobertaBioClass 🧬

RobertaBioClass is a fine-tuned RoBERTa model designed to distinguish biological texts from other general topics. It was trained to filter large datasets, prioritizing high recall to ensure relevant biological content is captured.

Model Details
  • Model Architecture: RoBERTa Base
  • Task: Binary Text Classification
  • Language: English (and Portuguese capabilities depending on training data mix)
  • Author: Madras1
Performance Metrics 📊

The model was evaluated on a held-out validation set of ~16k samples. It is optimized for High Recall , making it excellent for filtering pipelines where missing a biological text is worse than including a false positive.

Metric Score Description
Accuracy 86.8% Overall correctness
F1-Score 78.5% Harmonic mean of precision and recall
Recall (Bio) 83.1% Ability to find biological texts (Sensitivity)
Precision 74.4% Correctness when predicting "Bio"
Label Mapping

The model outputs the following labels:

  • LABEL_0 : Non-Biology (General text, News, Finance, Sports, etc.)
  • LABEL_1 : Biology (Genetics, Medicine, Anatomy, Ecology, etc.)
Training Data & Procedure
Data Overview

The dataset consists of approximately 80,000 text samples aggregated from multiple sources.

  • Total Samples: ~79,700
  • Class Balance: The dataset was imbalanced, with ~71% belonging to the "Non-Bio" class and ~29% to the "Bio" class.
  • Preprocessing: Scripts were used to clean delimiter issues in CSVs, remove duplicates, and perform a stratified split for validation.
Training Procedure

To address the class imbalance without discarding valuable data (undersampling), we employed a custom Weighted Cross-Entropy Loss .

  • Class Weights: Calculated using sklearn.utils.class_weight . The model was penalized significantly more for missing a Biology sample than for misclassifying a general text, which directly contributed to the high Recall score.
Hyperparameters

The model was fine-tuned using the Hugging Face Trainer with the following configuration:

  • Optimizer: AdamW
  • Learning Rate: 2e-5
  • Batch Size: 16
  • Epochs: 2
  • Weight Decay: 0.01
  • Hardware: Trained on a NVIDIA T4 GPU
How to Use

You can use this model directly with the Hugging Face pipeline :

from transformers import pipeline

# Load the pipeline
classifier = pipeline("text-classification", model="Madras1/RobertaBioClass")

# Test strings
examples = [
    "The mitochondria is the powerhouse of the cell.",
    "The stock market crashed yesterday due to inflation."
]

# Get predictions
predictions = classifier(examples)
print(predictions)
# Output:
# [{'label': 'LABEL_1', 'score': 0.99...},  <- Biology
#  {'label': 'LABEL_0', 'score': 0.98...}]  <- Non-Biology

Sem título

Intended Use This model is ideal for:

Filtering biological data from Common Crawl or other web datasets.

Categorizing academic papers.

Tagging educational content.

Limitations Since the model prioritizes Recall (83%), it may generate some False Positives (Precision ~74%). It might occasionally classify related scientific fields (like Chemistry or Physics) as Biology depending on the context.

Runs of Madras1 RobertaBioClass on huggingface.co

4
Total runs
0
24-hour runs
0
3-day runs
0
7-day runs
-1
30-day runs

More Information About RobertaBioClass huggingface.co Model

More RobertaBioClass license Visit here:

https://choosealicense.com/licenses/mit

RobertaBioClass huggingface.co

RobertaBioClass huggingface.co is an AI model on huggingface.co that provides RobertaBioClass's model effect (), which can be used instantly with this Madras1 RobertaBioClass model. huggingface.co supports a free trial of the RobertaBioClass model, and also provides paid use of the RobertaBioClass. Support call RobertaBioClass model through api, including Node.js, Python, http.

RobertaBioClass huggingface.co Url

https://huggingface.co/Madras1/RobertaBioClass

Madras1 RobertaBioClass online free

RobertaBioClass huggingface.co is an online trial and call api platform, which integrates RobertaBioClass's modeling effects, including api services, and provides a free online trial of RobertaBioClass, you can try RobertaBioClass online for free by clicking the link below.

Madras1 RobertaBioClass online free url in huggingface.co:

https://huggingface.co/Madras1/RobertaBioClass

RobertaBioClass install

RobertaBioClass is an open source model from GitHub that offers a free installation service, and any user can find RobertaBioClass on GitHub to install. At the same time, huggingface.co provides the effect of RobertaBioClass install, users can directly use RobertaBioClass installed effect in huggingface.co for debugging and trial. It also supports api for free installation.

RobertaBioClass install url in huggingface.co:

https://huggingface.co/Madras1/RobertaBioClass

Url of RobertaBioClass

RobertaBioClass huggingface.co Url

Provider of RobertaBioClass huggingface.co

Madras1
ORGANIZATIONS

Other API from Madras1

huggingface.co

Total runs: 1.4K
Run Growth: 1.2K
Growth Rate: 87.96%
Updated:April 21 2026
huggingface.co

Total runs: 1.3K
Run Growth: 1.1K
Growth Rate: 80.76%
Updated:April 23 2026
huggingface.co

Total runs: 587
Run Growth: -131
Growth Rate: -22.32%
Updated:April 29 2026
huggingface.co

Total runs: 296
Run Growth: -97
Growth Rate: -32.77%
Updated:April 29 2026
huggingface.co

Total runs: 154
Run Growth: -273
Growth Rate: -177.27%
Updated:April 27 2026
huggingface.co

Total runs: 125
Run Growth: -479
Growth Rate: -383.20%
Updated:March 24 2026
huggingface.co

Total runs: 66
Run Growth: -270
Growth Rate: -409.09%
Updated:March 18 2026
huggingface.co

Total runs: 16
Run Growth: -177
Growth Rate: -1106.25%
Updated:March 24 2026
huggingface.co

Total runs: 12
Run Growth: 4
Growth Rate: 33.33%
Updated:December 04 2025
huggingface.co

Total runs: 4
Run Growth: -13
Growth Rate: -325.00%
Updated:February 22 2026
huggingface.co

Total runs: 4
Run Growth: 1
Growth Rate: 25.00%
Updated:March 01 2026
huggingface.co

Total runs: 3
Run Growth: -2
Growth Rate: -66.67%
Updated:November 24 2025
huggingface.co

Total runs: 0
Run Growth: 0
Growth Rate: 0.00%
Updated:January 23 2026
huggingface.co

Total runs: 0
Run Growth: 0
Growth Rate: 0.00%
Updated:September 25 2025