kenhktsui / math-fasttext-classifier

huggingface.co
Total runs: 6.4K
24-hour runs: 0
7-day runs: 1.9K
30-day runs: 6.4K
Model's Last Updated: July 03 2025
text-classification

Introduction of math-fasttext-classifier

Model Details of math-fasttext-classifier

maths-fasttext-classifier

Dataset

This is part of my fasttext classifier collection for curating pretraining dataset. This classifier classifies a text into Maths or Others.
The model is trained over 1.6M records, which is a 50:50 mix of maths and non maths in website and achieved a test F1 score of 0.99 (too good to be true?). It is an intended upsampling of maths data. The classifier can be used for LLM pretraining data curation, to enhance capability in mathematics. It is ultra fast ⚡ with a throughtput of ~2000 doc/s with CPU.

Don't underestimate the "old" fasttext classiifer! It is indeed a good and scalable practice. For example, QWEN2.5-MATH leverages fasttext to curate pretraining data, althought its classifier is not open sourced.

🛠️Usage
from typing import List
import re
from huggingface_hub import hf_hub_download
import fasttext


model_hf = fasttext.load_model(hf_hub_download("kenhktsui/maths-fasttext-classifier", "model.bin"))


def replace_newlines(text: str) -> str:
  return re.sub("\n+", " ", text)


def predict(text_list: List[str]) -> List[dict]:
  text_list = [replace_newlines(text) for text in text_list]
  pred = model.predict(text_list)
  return [{"label": l[0].lstrip("__label__"), "score": s[0]}
           for l, s in zip(*pred)]


predict([
  """This is a lightning fast model, which can classify at throughtput of 2000 doc/s with CPU""",
  """Differential geometry is a mathematical discipline that studies the geometry of smooth shapes and smooth spaces, otherwise known as smooth manifolds. It uses the techniques of single variable calculus, vector calculus, linear algebra and multilinear algebra.""",
  """Given $p$: $|4x-3|\leqslant 1$ and $q$: $x^{2}-(2a+1)x+a^{2}+a\leqslant 0$, find the range of values for $a$ if $p$ is a necessary but not sufficient condition for $q$."""
])
# [{'label': 'Others', 'score': 1.00000834},
# {'label': 'Maths', 'score': 0.99995351},
# {'label': 'Maths', 'score': 0.99801832}]
📊Evaluation

full version

              precision    recall  f1-score   support

       Maths       0.99      0.98      0.99    200000
      Others       0.98      0.99      0.99    200000

    accuracy                           0.99    400000
   macro avg       0.99      0.99      0.99    400000
weighted avg       0.99      0.99      0.99    400000
⚠️Known Limitation

The classifier does not handle short text well, which might not be surprising.

Runs of kenhktsui math-fasttext-classifier on huggingface.co

6.4K
Total runs
0
24-hour runs
-1
3-day runs
1.9K
7-day runs
6.4K
30-day runs

More Information About math-fasttext-classifier huggingface.co Model

More math-fasttext-classifier license Visit here:

https://choosealicense.com/licenses/mit

math-fasttext-classifier huggingface.co

math-fasttext-classifier huggingface.co is an AI model on huggingface.co that provides math-fasttext-classifier's model effect (), which can be used instantly with this kenhktsui math-fasttext-classifier model. huggingface.co supports a free trial of the math-fasttext-classifier model, and also provides paid use of the math-fasttext-classifier. Support call math-fasttext-classifier model through api, including Node.js, Python, http.

math-fasttext-classifier huggingface.co Url

https://huggingface.co/kenhktsui/math-fasttext-classifier

kenhktsui math-fasttext-classifier online free

math-fasttext-classifier huggingface.co is an online trial and call api platform, which integrates math-fasttext-classifier's modeling effects, including api services, and provides a free online trial of math-fasttext-classifier, you can try math-fasttext-classifier online for free by clicking the link below.

kenhktsui math-fasttext-classifier online free url in huggingface.co:

https://huggingface.co/kenhktsui/math-fasttext-classifier

math-fasttext-classifier install

math-fasttext-classifier is an open source model from GitHub that offers a free installation service, and any user can find math-fasttext-classifier on GitHub to install. At the same time, huggingface.co provides the effect of math-fasttext-classifier install, users can directly use math-fasttext-classifier installed effect in huggingface.co for debugging and trial. It also supports api for free installation.

math-fasttext-classifier install url in huggingface.co:

https://huggingface.co/kenhktsui/math-fasttext-classifier

Url of math-fasttext-classifier

math-fasttext-classifier huggingface.co Url

Provider of math-fasttext-classifier huggingface.co

kenhktsui
ORGANIZATIONS

Other API from kenhktsui