NuExtract-1.5-tiny huggingface.co api & numind NuExtract-1.5-tiny github AI Model

Introduction of NuExtract-1.5-tiny

Model Details of NuExtract-1.5-tiny

NuExtract-tiny-v1.5 by NuMind 🔥

NuExtract-tiny-v1.5 is a fine-tuning of Qwen/Qwen2.5-0.5B , trained on a private high-quality dataset for structured information extraction. It supports long documents and several languages (English, French, Spanish, German, Portuguese, and Italian). To use the model, provide an input text and a JSON template describing the information you need to extract.

Note: This model is trained to prioritize pure extraction, so in most cases all text generated by the model is present as is in the original text.

We also provide a 3.8B version which is based on Phi-3.5-mini-instruct: NuExtract-v1.5

Check out the blog post .

Try the 3.8B model here: Playground

⚠️ We recommend using NuExtract with a temperature at or very close to 0. Some inference frameworks, such as Ollama, use a default of 0.7 which is not well suited to pure extraction tasks.

Benchmark

Zero-shot performance (English):

Few-shot fine-tuning:

Usage

To use the model:

import json
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

def predict_NuExtract(model, tokenizer, texts, template, batch_size=1, max_length=10_000, max_new_tokens=4_000):
    template = json.dumps(json.loads(template), indent=4)
    prompts = [f"""<|input|>\n### Template:\n{template}\n### Text:\n{text}\n\n<|output|>""" for text in texts]
    
    outputs = []
    with torch.no_grad():
        for i in range(0, len(prompts), batch_size):
            batch_prompts = prompts[i:i+batch_size]
            batch_encodings = tokenizer(batch_prompts, return_tensors="pt", truncation=True, padding=True, max_length=max_length).to(model.device)

            pred_ids = model.generate(**batch_encodings, max_new_tokens=max_new_tokens)
            outputs += tokenizer.batch_decode(pred_ids, skip_special_tokens=True)

    return [output.split("<|output|>")[1] for output in outputs]

model_name = "numind/NuExtract-tiny-v1.5"
device = "cuda"
model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.bfloat16, trust_remote_code=True).to(device).eval()
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)

text = """We introduce Mistral 7B, a 7–billion-parameter language model engineered for
superior performance and efficiency. Mistral 7B outperforms the best open 13B
model (Llama 2) across all evaluated benchmarks, and the best released 34B
model (Llama 1) in reasoning, mathematics, and code generation. Our model
leverages grouped-query attention (GQA) for faster inference, coupled with sliding
window attention (SWA) to effectively handle sequences of arbitrary length with a
reduced inference cost. We also provide a model fine-tuned to follow instructions,
Mistral 7B – Instruct, that surpasses Llama 2 13B – chat model both on human and
automated benchmarks. Our models are released under the Apache 2.0 license.
Code: <https://github.com/mistralai/mistral-src>
Webpage: <https://mistral.ai/news/announcing-mistral-7b/>"""

template = """{
    "Model": {
        "Name": "",
        "Number of parameters": "",
        "Number of max token": "",
        "Architecture": []
    },
    "Usage": {
        "Use case": [],
        "Licence": ""
    }
}"""

prediction = predict_NuExtract(model, tokenizer, [text], template)[0]
print(prediction)

Sliding window prompting:

import json

MAX_INPUT_SIZE = 20_000
MAX_NEW_TOKENS = 6000

def clean_json_text(text):
    text = text.strip()
    text = text.replace("\#", "#").replace("\&", "&")
    return text

def predict_chunk(text, template, current, model, tokenizer):
    current = clean_json_text(current)

    input_llm =  f"<|input|>\n### Template:\n{template}\n### Current:\n{current}\n### Text:\n{text}\n\n<|output|>" + "{"
    input_ids = tokenizer(input_llm, return_tensors="pt", truncation=True, max_length=MAX_INPUT_SIZE).to("cuda")
    output = tokenizer.decode(model.generate(**input_ids, max_new_tokens=MAX_NEW_TOKENS)[0], skip_special_tokens=True)

    return clean_json_text(output.split("<|output|>")[1])

def split_document(document, window_size, overlap):
    tokens = tokenizer.tokenize(document)
    print(f"\tLength of document: {len(tokens)} tokens")

    chunks = []
    if len(tokens) > window_size:
        for i in range(0, len(tokens), window_size-overlap):
            print(f"\t{i} to {i + len(tokens[i:i + window_size])}")
            chunk = tokenizer.convert_tokens_to_string(tokens[i:i + window_size])
            chunks.append(chunk)

            if i + len(tokens[i:i + window_size]) >= len(tokens):
                break
    else:
        chunks.append(document)
    print(f"\tSplit into {len(chunks)} chunks")

    return chunks

def handle_broken_output(pred, prev):
    try:
        if all([(v in ["", []]) for v in json.loads(pred).values()]):
            # if empty json, return previous
            pred = prev
    except:
        # if broken json, return previous
        pred = prev

    return pred

def sliding_window_prediction(text, template, model, tokenizer, window_size=4000, overlap=128):
    # split text into chunks of n tokens
    tokens = tokenizer.tokenize(text)
    chunks = split_document(text, window_size, overlap)

    # iterate over text chunks
    prev = template
    for i, chunk in enumerate(chunks):
        print(f"Processing chunk {i}...")
        pred = predict_chunk(chunk, template, prev, model, tokenizer)

        # handle broken output
        pred = handle_broken_output(pred, prev)
            
        # iterate
        prev = pred

    return pred

Runs of numind NuExtract-1.5-tiny on huggingface.co

2.2K

Total runs

24-hour runs

3-day runs

132

7-day runs

-938

30-day runs

More Information About NuExtract-1.5-tiny huggingface.co Model

More NuExtract-1.5-tiny license Visit here:

https://choosealicense.com/licenses/mit

NuExtract-1.5-tiny huggingface.co

NuExtract-1.5-tiny huggingface.co is an AI model on huggingface.co that provides NuExtract-1.5-tiny's model effect (), which can be used instantly with this numind NuExtract-1.5-tiny model. huggingface.co supports a free trial of the NuExtract-1.5-tiny model, and also provides paid use of the NuExtract-1.5-tiny. Support call NuExtract-1.5-tiny model through api, including Node.js, Python, http.

NuExtract-1.5-tiny huggingface.co Url

https://huggingface.co/numind/NuExtract-1.5-tiny

numind NuExtract-1.5-tiny online free

NuExtract-1.5-tiny huggingface.co is an online trial and call api platform, which integrates NuExtract-1.5-tiny's modeling effects, including api services, and provides a free online trial of NuExtract-1.5-tiny, you can try NuExtract-1.5-tiny online for free by clicking the link below.

numind NuExtract-1.5-tiny online free url in huggingface.co:

https://huggingface.co/numind/NuExtract-1.5-tiny

NuExtract-1.5-tiny install

NuExtract-1.5-tiny is an open source model from GitHub that offers a free installation service, and any user can find NuExtract-1.5-tiny on GitHub to install. At the same time, huggingface.co provides the effect of NuExtract-1.5-tiny install, users can directly use NuExtract-1.5-tiny installed effect in huggingface.co for debugging and trial. It also supports api for free installation.

NuExtract-1.5-tiny install url in huggingface.co:

https://huggingface.co/numind/NuExtract-1.5-tiny

huggingface.co

numind/NuExtract-v1.5

Total runs: 108.5K

Run Growth: 0

Growth Rate: 0.00%

Updated:November 06 2024

huggingface.co

numind/NuMarkdown-8B-Thinking

Total runs: 53.3K

Run Growth: -777.8K

Growth Rate: -1458.58%

Updated:November 13 2025

huggingface.co

numind/NuExtract-tiny

Total runs: 17.2K

Run Growth: -465

Growth Rate: -2.71%

Updated:October 17 2024

huggingface.co

numind/NuNER_Zero

Total runs: 14.7K

Run Growth: 1.6K

Growth Rate: 10.83%

Updated:February 26 2025

huggingface.co

numind/NuExtract-2.0-8B

Total runs: 13.2K

Run Growth: 10.2K

Growth Rate: 77.34%

Updated:November 13 2025

huggingface.co

numind/NuExtract-2.0-4B

Total runs: 9.7K

Run Growth: 7.7K

Growth Rate: 79.54%

Updated:November 13 2025

huggingface.co

numind/NuExtract-2.0-2B

Total runs: 9.3K

Run Growth: 3.2K

Growth Rate: 34.81%

Updated:November 13 2025

huggingface.co

numind/NuExtract-2-4B-experimental

Total runs: 8.2K

Run Growth: 8.2K

Growth Rate: 99.96%

Updated:April 15 2025

huggingface.co

numind/NuExtract-1.5

Total runs: 7.1K

Run Growth: -79.0K

Growth Rate: -1108.25%

Updated:July 17 2025

huggingface.co

numind/NuNER-v0.1

Total runs: 7.1K

Run Growth: 1.2K

Growth Rate: 16.59%

Updated:April 30 2024

huggingface.co

numind/NuNER-v2.0

Total runs: 6.6K

Run Growth: 503

Growth Rate: 7.66%

Updated:May 07 2024

huggingface.co

numind/NuNER-multilingual-v0.1

Total runs: 6.0K

Run Growth: 811

Growth Rate: 13.52%

Updated:March 15 2024

huggingface.co

numind/NuExtract-2-8B

Total runs: 2.0K

Run Growth: 0

Growth Rate: 0.00%

Updated:March 26 2025

huggingface.co

numind/NuExtract-2-4B

Total runs: 1.7K

Run Growth: 0

Growth Rate: 0.00%

Updated:March 26 2025

huggingface.co

numind/NuExtract-2.0-8B-GPTQ

Total runs: 1.4K

Run Growth: 1.3K

Growth Rate: 92.52%

Updated:August 20 2025

huggingface.co

numind/NuExtract

Total runs: 1.4K

Run Growth: 997

Growth Rate: 71.16%

Updated:October 17 2024

huggingface.co

numind/NuExtract-tiny-v1.5

Total runs: 1.3K

Run Growth: 0

Growth Rate: 0.00%

Updated:October 15 2024

huggingface.co

numind/NuExtract-2-1B

Total runs: 1.1K

Run Growth: 0

Growth Rate: 0.00%

Updated:March 26 2025

huggingface.co

numind/NuExtract-2-2B

Total runs: 850

Run Growth: 0

Growth Rate: 0.00%

Updated:March 26 2025

huggingface.co

numind/NuExtract-2-8B-experimental

Total runs: 522

Run Growth: 505

Growth Rate: 96.74%

Updated:April 15 2025

huggingface.co

numind/NuExtract-2.0-8B-GGUF

Total runs: 454

Run Growth: -113

Growth Rate: -24.89%

Updated:September 26 2025

huggingface.co

numind/NuExtract-2.0-2B-GGUF

Total runs: 425

Run Growth: 115

Growth Rate: 27.06%

Updated:September 26 2025

huggingface.co

numind/NuMarkdown-8B-Thinking-GGUF

Total runs: 382

Run Growth: 58

Growth Rate: 15.18%

Updated:October 11 2025

huggingface.co

numind/NuExtract-1.5-smol

Total runs: 358

Run Growth: 269

Growth Rate: 75.14%

Updated:November 18 2024

huggingface.co

numind/NuExtract-2.0-4B-GGUF

Total runs: 259

Run Growth: -115

Growth Rate: -44.40%

Updated:September 26 2025

huggingface.co

numind/NuExtract-2.0-4B-GPTQ

Total runs: 104

Run Growth: 73

Growth Rate: 70.19%

Updated:June 25 2025

huggingface.co

numind/NuMarkdown-8B-Thinking-mlx-8bits

Total runs: 97

Run Growth: 89

Growth Rate: 91.75%

Updated:November 24 2025

huggingface.co

numind/NuNER_Zero-4k

Total runs: 83

Run Growth: 62

Growth Rate: 74.70%

Updated:February 03 2025

huggingface.co

numind/NuNER_Zero-span

Total runs: 48

Run Growth: 4

Growth Rate: 8.33%

Updated:February 03 2025

huggingface.co

numind/NuExtract-large

Total runs: 40

Run Growth: -35

Growth Rate: -87.50%

Updated:June 28 2024

huggingface.co

numind/NuSentiment

Total runs: 22

Run Growth: -3

Growth Rate: -13.64%

Updated:September 06 2023

huggingface.co

numind/NuMarkdown-8B-Thinking-mlx-4bits

Total runs: 18

Run Growth: 2

Growth Rate: 11.11%

Updated:November 25 2025

huggingface.co

numind/NuSentiment-multilingual

Total runs: 18

Run Growth: -7

Growth Rate: -38.89%

Updated:January 26 2024

huggingface.co

numind/NuNER-v1.0

Total runs: 11

Run Growth: 2

Growth Rate: 18.18%

Updated:April 30 2024

huggingface.co

numind/NuExtract-2-1B-experimental

Total runs: 8

Run Growth: -1

Growth Rate: -12.50%

Updated:April 15 2025

huggingface.co

numind/NuExtract-2-2B-experimental

Total runs: 7

Run Growth: -2

Growth Rate: -28.57%

Updated:April 15 2025

huggingface.co

numind/NuExtract-multi-2

Total runs: 5

Run Growth: 0

Growth Rate: 0.00%

Updated:August 02 2024

huggingface.co

numind/NuNER-BERT-v1.0

Total runs: 5

Run Growth: 2

Growth Rate: 40.00%

Updated:March 20 2024

huggingface.co

numind/NuTopic

Total runs: 4

Run Growth: 1

Growth Rate: 25.00%

Updated:August 17 2023

huggingface.co

numind/NuTopic-multilingual

Total runs: 3

Run Growth: 2

Growth Rate: 66.67%

Updated:September 04 2023

huggingface.co

numind/NuToxicity

Total runs: 1

Run Growth: -1

Growth Rate: -100.00%

Updated:August 17 2023

huggingface.co

numind/NuToxicity-multilingual

Total runs: 1

Run Growth: 0

Growth Rate: 0.00%

Updated:September 04 2023

huggingface.co

numind/VLM-4b

Total runs: 0

Run Growth: 0

Growth Rate: 0.00%

Updated:August 23 2024

huggingface.co

numind/NuExtract-large-mix-multi

Total runs: 0

Run Growth: 0

Growth Rate: 0.00%

Updated:August 07 2024

numind / NuExtract-1.5-tiny

Introduction of NuExtract-1.5-tiny

Model Details of NuExtract-1.5-tiny

NuExtract-tiny-v1.5 by NuMind 🔥

Benchmark

Usage

Runs of numind NuExtract-1.5-tiny on huggingface.co

More Information About NuExtract-1.5-tiny huggingface.co Model

More NuExtract-1.5-tiny license Visit here:

NuExtract-1.5-tiny huggingface.co

NuExtract-1.5-tiny huggingface.co Url

numind NuExtract-1.5-tiny online free

numind NuExtract-1.5-tiny online free url in huggingface.co:

NuExtract-1.5-tiny install

NuExtract-1.5-tiny install url in huggingface.co:

Url of NuExtract-1.5-tiny

NuExtract-1.5-tiny huggingface.co Url

Provider of NuExtract-1.5-tiny huggingface.co

Other API from numind