BeeTokenizer huggingface.co api & BEE-spoke-data BeeTokenizer github AI Model

Introduction of BeeTokenizer

Model Details of BeeTokenizer

BeeTokenizer

note: this is literally a tokenizer trained on beekeeping text

After minutes of hard work, it is now available.

from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("BEE-spoke-data/BeeTokenizer")

test_string = "When dealing with Varroa destructor mites, it's crucial to administer the right acaricides during the late autumn months, but only after ensuring that the worker bee population is free from pesticide contamination."

output = tokenizer(test_string)
print(f"Test string: {test_string}")
print(f"Tokens ({len(output.input_ids)}):\n\t{output.input_ids}")

Notes

the default tokenizer (on branch main ) has a vocab size of 32000
based on the SentencePieceBPETokenizer class

How to Tokenize Text and Retrieve Offsets

To tokenize a complex sentence and also retrieve the offsets mapping, you can use the following Python code snippet:

from transformers import AutoTokenizer

# Initialize the tokenizer
tokenizer = AutoTokenizer.from_pretrained("BEE-spoke-data/BeeTokenizer")

# Sample complex sentence related to beekeeping
test_string = "When dealing with Varroa destructor mites, it's crucial to administer the right acaricides during the late autumn months, but only after ensuring that the worker bee population is free from pesticide contamination."

# Tokenize the input string and get the offsets mapping
output = tokenizer.encode_plus(test_string, return_offsets_mapping=True)

print(f"Test string: {test_string}")

# Tokens
tokens = tokenizer.convert_ids_to_tokens(output['input_ids'])
print(f"Tokens: {tokens}")

# Offsets
offsets = output['offset_mapping']
print(f"Offsets: {offsets}")

This should result in the following ( Feb '24 version ):

>>> print(f"Test string: {test_string}")
Test string: When dealing with Varroa destructor mites, it's crucial to administer the right acaricides during the late autumn months, but only after ensuring that the worker bee population is free from pesticide contamination.
>>>
>>> # Tokens
>>> tokens = tokenizer.convert_ids_to_tokens(output['input_ids'])
>>> print(f"Tokens: {tokens}")
Tokens: ['When', '▁dealing', '▁with', '▁Varroa', '▁destructor', '▁mites,', "▁it's", '▁cru', 'cial', '▁to', '▁administer', '▁the', '▁right', '▁acar', 'icides', '▁during', '▁the', '▁late', '▁autumn', '▁months,', '▁but', '▁only', '▁after', '▁ensuring', '▁that', '▁the', '▁worker', '▁bee', '▁population', '▁is', '▁free', '▁from', '▁pesticide', '▁contam', 'ination.']
>>>
>>> # Offsets
>>> offsets = output['offset_mapping']
>>> print(f"Offsets: {offsets}")
Offsets: [(0, 4), (4, 12), (12, 17), (17, 24), (24, 35), (35, 42), (42, 47), (47, 51), (51, 55), (55, 58), (58, 69), (69, 73), (73, 79), (79, 84), (84, 90), (90, 97), (97, 101), (101, 106), (106, 113), (113, 121), (121, 125), (125, 130), (130, 136), (136, 145), (145, 150), (150, 154), (154, 161), (161, 165), (165, 176), (176, 179), (179, 184), (184, 189), (189, 199), (199, 206), (206, 214)]

if you compare this to the output of the llama tokenizer (below), you can quickly see which is more suited for beekeeping related language modeling.

>>> print(f"Test string: {test_string}")
Test string: When dealing with Varroa destructor mites, it's crucial to administer the right acaricides during the late autumn months, but only after ensuring that the worker bee population is free from pesticide contamination.
>>> # Tokens
>>> tokens = tokenizer.convert_ids_to_tokens(output['input_ids'])
>>> print(f"Tokens: {toke>>> print(f"Tokens: {tokens}")
Tokens: ['<s>', '▁When', '▁dealing', '▁with', '▁Var', 'ro', 'a', '▁destruct', 'or', '▁mit', 'es', ',', '▁it', "'", 's', '▁cru', 'cial', '▁to', '▁admin', 'ister', '▁the', '▁right', '▁ac', 'ar', 'ic', 'ides', '▁during', '▁the', '▁late', '▁aut', 'umn', '▁months', ',', '▁but', '▁only', '▁after', '▁ens', 'uring', '▁that', '▁the', '▁worker', '▁be', 'e', '▁population', '▁is', '▁free', '▁from', '▁p', 'estic', 'ide', '▁cont', 'am', 'ination', '.']
>>> offsets = output['offset_mapping']
>>> print(f"Offsets: {offsets}")
Offsets: [(0, 0), (0, 4), (4, 12), (12, 17), (17, 21), (21, 23), (23, 24), (24, 33), (33, 35), (35, 39), (39, 41), (41, 42), (42, 45), (45, 46), (46, 47), (47, 51), (51, 55), (55, 58), (58, 64), (64, 69), (69, 73), (73, 79), (79, 82), (82, 84), (84, 86), (86, 90), (90, 97), (97, 101), (101, 106), (106, 110), (110, 113), (113, 120), (120, 121), (121, 125), (125, 130), (130, 136), (136, 140), (140, 145), (145, 150), (150, 154), (154, 161), (161, 164), (164, 165), (165, 176), (176, 179), (179, 184), (184, 189), (189, 191), (191, 196), (196, 199), (199, 204), (204, 206), (206, 213), (213, 214)]

Runs of BEE-spoke-data BeeTokenizer on huggingface.co

Total runs

24-hour runs

3-day runs

7-day runs

30-day runs

More Information About BeeTokenizer huggingface.co Model

More BeeTokenizer license Visit here:

https://choosealicense.com/licenses/apache-2.0

BeeTokenizer huggingface.co

BeeTokenizer huggingface.co is an AI model on huggingface.co that provides BeeTokenizer's model effect (), which can be used instantly with this BEE-spoke-data BeeTokenizer model. huggingface.co supports a free trial of the BeeTokenizer model, and also provides paid use of the BeeTokenizer. Support call BeeTokenizer model through api, including Node.js, Python, http.

BeeTokenizer huggingface.co Url

https://huggingface.co/BEE-spoke-data/BeeTokenizer

BEE-spoke-data BeeTokenizer online free

BeeTokenizer huggingface.co is an online trial and call api platform, which integrates BeeTokenizer's modeling effects, including api services, and provides a free online trial of BeeTokenizer, you can try BeeTokenizer online for free by clicking the link below.

BEE-spoke-data BeeTokenizer online free url in huggingface.co:

https://huggingface.co/BEE-spoke-data/BeeTokenizer

BeeTokenizer install

BeeTokenizer is an open source model from GitHub that offers a free installation service, and any user can find BeeTokenizer on GitHub to install. At the same time, huggingface.co provides the effect of BeeTokenizer install, users can directly use BeeTokenizer installed effect in huggingface.co for debugging and trial. It also supports api for free installation.

BeeTokenizer install url in huggingface.co:

https://huggingface.co/BEE-spoke-data/BeeTokenizer

huggingface.co

BEE-spoke-data/smol_llama-220M-GQA

Total runs: 874

Run Growth: -1.4K

Growth Rate: -143.37%

Updated:December 29 2025

huggingface.co

BEE-spoke-data/smol_llama-101M-GQA

Total runs: 717

Run Growth: -1.0K

Growth Rate: -132.33%

Updated:December 29 2025

huggingface.co

BEE-spoke-data/smol_llama-81M-tied

Total runs: 406

Run Growth: -439

Growth Rate: -102.57%

Updated:December 29 2025

huggingface.co

BEE-spoke-data/verysmol_llama-v11-KIx2

Total runs: 352

Run Growth: -582

Growth Rate: -156.45%

Updated:December 29 2025

huggingface.co

BEE-spoke-data/smol_llama-220M-openhermes

Total runs: 331

Run Growth: -469

Growth Rate: -133.24%

Updated:December 29 2025

huggingface.co

BEE-spoke-data/Mixtral-GQA-400m-v2

Total runs: 324

Run Growth: -445

Growth Rate: -127.87%

Updated:December 29 2025

huggingface.co

BEE-spoke-data/mega-ar-126m-4k

Total runs: 277

Run Growth: -481

Growth Rate: -160.33%

Updated:December 29 2025

huggingface.co

BEE-spoke-data/zephyr-220m-sft-full

Total runs: 266

Run Growth: -475

Growth Rate: -167.84%

Updated:December 29 2025

huggingface.co

BEE-spoke-data/tFINE-900m-e16-d32-instruct

Total runs: 143

Run Growth: 141

Growth Rate: 96.58%

Updated:December 29 2025

huggingface.co

BEE-spoke-data/pegasus-x-base-synthsumm_open-16k

Total runs: 96

Run Growth: 55

Growth Rate: 57.29%

Updated:December 29 2025

huggingface.co

BEE-spoke-data/smol_llama-220M-GQA-fineweb_edu

Total runs: 95

Run Growth: 87

Growth Rate: 90.63%

Updated:December 29 2025

huggingface.co

BEE-spoke-data/NVIDIA-Nemotron-Parse-v1.2

Total runs: 77

Run Growth: 77

Growth Rate: 100.00%

Updated:February 23 2026

huggingface.co

BEE-spoke-data/tFINE-900m-instruct-orpo

Total runs: 61

Run Growth: 58

Growth Rate: 98.31%

Updated:December 29 2025

huggingface.co

BEE-spoke-data/tFINE-900m-e16-d32-flan

Total runs: 53

Run Growth: 47

Growth Rate: 92.16%

Updated:December 29 2025

huggingface.co

BEE-spoke-data/tFINE-900m-e16-d32-instruct_2e

Total runs: 52

Run Growth: 51

Growth Rate: 98.08%

Updated:December 29 2025

huggingface.co

BEE-spoke-data/Meta-Llama-3-8Bee

Total runs: 42

Run Growth: 29

Growth Rate: 69.05%

Updated:December 29 2025

huggingface.co

BEE-spoke-data/tFINE-900m-e16-d32-flan-infinity-instruct-7m-T2T_en-1024

Total runs: 33

Run Growth: 0

Growth Rate: 0.00%

Updated:September 14 2024

huggingface.co

BEE-spoke-data/smol_llama-101M-GQA-python

Total runs: 22

Run Growth: -13

Growth Rate: -68.42%

Updated:December 29 2025

huggingface.co

BEE-spoke-data/roberta-base-description2genre

Total runs: 20

Run Growth: -4

Growth Rate: -20.00%

Updated:December 29 2025

huggingface.co

BEE-spoke-data/phi-1bee5

Total runs: 20

Run Growth: 16

Growth Rate: 84.21%

Updated:December 29 2025

huggingface.co

BEE-spoke-data/beecoder-220M-python

Total runs: 15

Run Growth: -21

Growth Rate: -150.00%

Updated:December 29 2025

huggingface.co

BEE-spoke-data/smol_llama-220M-open_instruct

Total runs: 12

Run Growth: 3

Growth Rate: 25.00%

Updated:December 29 2025

huggingface.co

BEE-spoke-data/zephyr-220m-dpo-full

Total runs: 11

Run Growth: 5

Growth Rate: 45.45%

Updated:December 29 2025

huggingface.co

BEE-spoke-data/bert-plus-L8-v1.0-allNLI_matryoshka

Total runs: 11

Run Growth: 11

Growth Rate: 100.00%

Updated:December 29 2025

huggingface.co

BEE-spoke-data/NanoLlama-GQA-L10-A32_KV8-v13-KI

Total runs: 11

Run Growth: 5

Growth Rate: 45.45%

Updated:December 29 2025

huggingface.co

BEE-spoke-data/smol_llama-220M-bees-internal

Total runs: 10

Run Growth: 7

Growth Rate: 77.78%

Updated:December 29 2025

huggingface.co

BEE-spoke-data/Mixtral-GQA-400m-v4-4096

Total runs: 10

Run Growth: 7

Growth Rate: 77.78%

Updated:December 29 2025

huggingface.co

BEE-spoke-data/Mixtral-GQA-400m-v3

Total runs: 8

Run Growth: 4

Growth Rate: 57.14%

Updated:December 29 2025

huggingface.co

BEE-spoke-data/tiny-random-MPNetForMaskedLM

Total runs: 8

Run Growth: 7

Growth Rate: 87.50%

Updated:December 29 2025

huggingface.co

BEE-spoke-data/TinyLlama-3T-1.1bee

Total runs: 7

Run Growth: -5

Growth Rate: -71.43%

Updated:December 29 2025

huggingface.co

BEE-spoke-data/mobilebert-uncased-title2genre

Total runs: 7

Run Growth: 1

Growth Rate: 14.29%

Updated:December 29 2025

huggingface.co

BEE-spoke-data/TinyLlama-1.1bee

Total runs: 7

Run Growth: -7

Growth Rate: -100.00%

Updated:December 29 2025

huggingface.co

BEE-spoke-data/tFINE-680m-e32-d16-gqa-flan

Total runs: 6

Run Growth: 5

Growth Rate: 100.00%

Updated:December 29 2025

huggingface.co

BEE-spoke-data/smol_llama-101M-midjourney-messages

Total runs: 6

Run Growth: -1

Growth Rate: -20.00%

Updated:December 29 2025

huggingface.co

BEE-spoke-data/mega-small-embed-synthSTS-16384-v1

Total runs: 5

Run Growth: -8

Growth Rate: -160.00%

Updated:December 29 2025

huggingface.co

BEE-spoke-data/Mistral-7B-v0.3-stepbasin-books-20k

Total runs: 5

Run Growth: 2

Growth Rate: 50.00%

Updated:December 29 2025

huggingface.co

BEE-spoke-data/albert-xxlarge-v2-description2genre

Total runs: 5

Run Growth: -2

Growth Rate: -40.00%

Updated:December 29 2025

huggingface.co

BEE-spoke-data/roberta-large-title2genre

Total runs: 4

Run Growth: 0

Growth Rate: 0.00%

Updated:December 29 2025

huggingface.co

BEE-spoke-data/mega-ar-350m-L3t-v0.08-ultraTBfw

Total runs: 4

Run Growth: -6

Growth Rate: -150.00%

Updated:December 29 2025

huggingface.co

BEE-spoke-data/bert-plus-L8-v1.0-synthSTSv3-4k

Total runs: 4

Run Growth: 4

Growth Rate: 80.00%

Updated:December 29 2025

huggingface.co

BEE-spoke-data/bert-plus-L8-4096-v1.0

Total runs: 3

Run Growth: -1

Growth Rate: -33.33%

Updated:December 29 2025

huggingface.co

BEE-spoke-data/bert-plus-L8-v1.0-syntheticSTS-4k

Total runs: 3

Run Growth: 0

Growth Rate: 0.00%

Updated:December 29 2025

huggingface.co

BEE-spoke-data/language-perceiver-title2genre

Total runs: 3

Run Growth: -6

Growth Rate: -300.00%

Updated:December 29 2025

huggingface.co

BEE-spoke-data/neobert-100k-test

Total runs: 3

Run Growth: 2

Growth Rate: 66.67%

Updated:December 29 2025

huggingface.co

BEE-spoke-data/Jamba-900M-doc-writer

Total runs: 3

Run Growth: -7

Growth Rate: -350.00%

Updated:December 29 2025

huggingface.co

BEE-spoke-data/mega-encoder-small-16k-v1

Total runs: 3

Run Growth: -3

Growth Rate: -75.00%

Updated:December 29 2025

huggingface.co

BEE-spoke-data/Qwen2-1.5B-stepbasin-books

Total runs: 2

Run Growth: -4

Growth Rate: -200.00%

Updated:December 29 2025

huggingface.co

BEE-spoke-data/tFINE-680m-e32-d16-infinity_instruct-L2

Total runs: 1

Run Growth: -2

Growth Rate: -200.00%

Updated:December 29 2025

huggingface.co

BEE-spoke-data/llama3-t5-tokenizer

Total runs: 0

Run Growth: 0

Growth Rate: 0.00%

Updated:July 07 2024

huggingface.co

BEE-spoke-data/bpe-tokenizer-32k-smolNeoX

Total runs: 0

Run Growth: 0

Growth Rate: 0.00%

Updated:December 29 2025

huggingface.co

BEE-spoke-data/tiny-random-MPNetForMaskedLM-padded

Total runs: 0

Run Growth: 0

Growth Rate: 0.00%

Updated:June 23 2025

huggingface.co

BEE-spoke-data/cl100k_base-mlm

Total runs: 0

Run Growth: 0

Growth Rate: 0.00%

Updated:December 29 2025

huggingface.co

BEE-spoke-data/wordpiece-tokenizer-32k-en_code-orig

Total runs: 0

Run Growth: 0

Growth Rate: 0.00%

Updated:December 29 2025

huggingface.co

BEE-spoke-data/slimpajama_tok-48128-BPE-forT5

Total runs: 0

Run Growth: 0

Growth Rate: 0.00%

Updated:December 29 2025

huggingface.co

BEE-spoke-data/cl100k_base

Total runs: 0

Run Growth: 0

Growth Rate: 0.00%

Updated:December 29 2025

huggingface.co

BEE-spoke-data/wordpiece-tokenizer-32k-en_code-msp

Total runs: 0

Run Growth: 0

Growth Rate: 0.00%

Updated:December 29 2025

huggingface.co

BEE-spoke-data/MiniTokenizer-20480

Total runs: 0

Run Growth: 0

Growth Rate: 0.00%

Updated:December 29 2025

huggingface.co

BEE-spoke-data/tFINE-900m-e16-d32-flan-infinity-instruct-7m-T2T_en-1024-infinity-instruct-7m-T2T_en-1024-v2

Total runs: 0

Run Growth: 0

Growth Rate: 0.00%

Updated:September 20 2024

huggingface.co

BEE-spoke-data/claude-tokenizer

Total runs: 0

Run Growth: 0

Growth Rate: 0.00%

Updated:December 29 2025

huggingface.co

BEE-spoke-data/claude-tokenizer-forT5

Total runs: 0

Run Growth: 0

Growth Rate: 0.00%

Updated:December 29 2025

huggingface.co

BEE-spoke-data/verysmol_llama-v8-minipile_x2

Total runs: 0

Run Growth: 0

Growth Rate: 0.00%

Updated:December 29 2025

BEE-spoke-data / BeeTokenizer

Introduction of BeeTokenizer

Model Details of BeeTokenizer

BeeTokenizer

Notes

Runs of BEE-spoke-data BeeTokenizer on huggingface.co

More Information About BeeTokenizer huggingface.co Model

More BeeTokenizer license Visit here:

BeeTokenizer huggingface.co

BeeTokenizer huggingface.co Url

BEE-spoke-data BeeTokenizer online free

BEE-spoke-data BeeTokenizer online free url in huggingface.co:

BeeTokenizer install

BeeTokenizer install url in huggingface.co:

Url of BeeTokenizer

BeeTokenizer huggingface.co Url

Provider of BeeTokenizer huggingface.co

Other API from BEE-spoke-data