This model is fine-tuned from
google-bert/bert-base-uncased
for token-classification tasks, specifically Named Entity Recognition (NER) in the electrical engineering domain. The model has been optimized to extract entities such as components, materials, standards, and design parameters from technical texts with high precision and recall.
Training Data
The model was trained on the
disham993/ElectricalNER
dataset, a GPT-4o-mini-generated dataset curated for the electrical engineering domain. This dataset includes diverse technical contexts, such as circuit design, testing, maintenance, installation, troubleshooting, or research.
The model was fine-tuned using the following hyperparameters:
Evaluation Strategy:
epoch
Learning Rate:
1e-5
Batch Size:
64 (for both training and evaluation)
Number of Epochs:
5
Weight Decay:
0.01
Evaluation Results
The following metrics were achieved during evaluation:
Precision:
0.9193
Recall:
0.9303
F1 Score:
0.9247
Accuracy:
0.9660
Evaluation Runtime:
2.2917 seconds
Samples Per Second:
658.454
Steps Per Second:
10.472
Usage
You can use this model for Named Entity Recognition tasks as follows:
from transformers import AutoTokenizer, AutoModelForTokenClassification, pipeline
model_name = "disham993/electrical-ner-bert-base"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForTokenClassification.from_pretrained(model_name)
nlp = pipeline("ner", model=model, tokenizer=tokenizer, aggregation_strategy="simple")
text = "The Xilinx Vivado development suite was used to program the Artix-7 FPGA."
ner_results = nlp(text)
defclean_and_group_entities(ner_results, min_score=0.40):
""" Cleans and groups named entity recognition (NER) results based on a minimum score threshold. Args: ner_results (list of dict): A list of dictionaries containing NER results. Each dictionary should have the keys: - "word" (str): The recognized word or token. - "entity_group" (str): The entity group or label. - "start" (int): The start position of the entity in the text. - "end" (int): The end position of the entity in the text. - "score" (float): The confidence score of the entity recognition. min_score (float, optional): The minimum score threshold for considering an entity. Defaults to 0.40. Returns: list of dict: A list of grouped entities that meet the minimum score threshold. Each dictionary contains: - "entity_group" (str): The entity group or label. - "word" (str): The concatenated word or token. - "start" (int): The start position of the entity in the text. - "end" (int): The end position of the entity in the text. - "score" (float): The minimum confidence score of the grouped entity. """
grouped_entities = []
current_entity = Nonefor result in ner_results:
# Skip entities with score below thresholdif result["score"] < min_score:
if current_entity:
# Add current entity if it meets thresholdif current_entity["score"] >= min_score:
grouped_entities.append(current_entity)
current_entity = Nonecontinue
word = result["word"].replace("##", "") # Remove subword token markersif current_entity and result["entity_group"] == current_entity["entity_group"] and result["start"] == current_entity["end"]:
# Continue the current entity
current_entity["word"] += word
current_entity["end"] = result["end"]
current_entity["score"] = min(current_entity["score"], result["score"])
# If combined score drops below threshold, discard the entityif current_entity["score"] < min_score:
current_entity = Noneelse:
# Finalize the current entity if it meets thresholdif current_entity and current_entity["score"] >= min_score:
grouped_entities.append(current_entity)
# Start a new entity
current_entity = {
"entity_group": result["entity_group"],
"word": word,
"start": result["start"],
"end": result["end"],
"score": result["score"]
}
# Add the last entity if it meets thresholdif current_entity and current_entity["score"] >= min_score:
grouped_entities.append(current_entity)
return grouped_entities
cleaned_results = clean_and_group_entities(ner_results)
Limitations and Bias
While this model performs well in the electrical engineering domain, it is not designed for use in other domains. Additionally, it may:
Misclassify entities due to potential inaccuracies in the GPT-4o-mini generated dataset.
Struggle with ambiguous contexts or low-confidence predictions - this is minimized with help of
clean_and_group_entities
function.
This model is intended for research and educational purposes only, and users are encouraged to validate results before applying them to critical applications.
Training Infrastructure
For a complete guide covering the entire process - from data tokenization to pushing the model to the Hugging Face Hub - please refer to the
GitHub repository
.
Last Update
2024-12-31
Runs of disham993 electrical-ner-bert-base on huggingface.co
36
Total runs
0
24-hour runs
0
3-day runs
0
7-day runs
0
30-day runs
More Information About electrical-ner-bert-base huggingface.co Model
electrical-ner-bert-base huggingface.co is an AI model on huggingface.co that provides electrical-ner-bert-base's model effect (), which can be used instantly with this disham993 electrical-ner-bert-base model. huggingface.co supports a free trial of the electrical-ner-bert-base model, and also provides paid use of the electrical-ner-bert-base. Support call electrical-ner-bert-base model through api, including Node.js, Python, http.
electrical-ner-bert-base huggingface.co is an online trial and call api platform, which integrates electrical-ner-bert-base's modeling effects, including api services, and provides a free online trial of electrical-ner-bert-base, you can try electrical-ner-bert-base online for free by clicking the link below.
disham993 electrical-ner-bert-base online free url in huggingface.co:
electrical-ner-bert-base is an open source model from GitHub that offers a free installation service, and any user can find electrical-ner-bert-base on GitHub to install. At the same time, huggingface.co provides the effect of electrical-ner-bert-base install, users can directly use electrical-ner-bert-base installed effect in huggingface.co for debugging and trial. It also supports api for free installation.
electrical-ner-bert-base install url in huggingface.co: