Fine-Tuning Electrical Engineering Named Entity Recognition (NER) Models

Overview

This repository focuses on automating Named Entity Recognition (NER) for electrical engineering texts using transformer-based encoder models. We will be making use of Electrical Engineering NER Dataset - ElectricalNER for fine-tuning BERT family models, and then use tools to push the models to the Hugging Face Hub. The project enables efficient entity extraction from technical texts, streamlining tasks like document analysis, data organization, and semantic search.

Features

Fine-Tuning Pipeline: Implements a complete pipeline for fine-tuning models like BERT and ModernBERT.
Model Evaluation: Includes detailed metrics like precision, recall, F1, and accuracy.
NER Utilities: Provides tools for post-processing NER results.
Hugging Face Integration: Pushes fine-tuned models to the Hugging Face Hub with detailed model cards.

Project Structure

├── data/                            # Contains tokenized datasets
├── models/                          # Fine-tuned models
├── logs/                            # Training and evaluation logs
├── notebooks/                       # Jupyter notebooks for various stages of the pipeline
│   ├── 01_data_tokenization.ipynb   # Tokenizing and preparing the dataset
│   ├── 02_model_training.ipynb      # Fine-tuning transformer models
│   ├── 03_evaluation.ipynb          # Evaluating model performance
│   ├── 04_inference_local.ipynb     # Performing inference on unseen data - local models
│   ├── 05_push_model_to_hub.ipynb   # Pushing models and cards to Hugging Face Hub
│   ├── 04_inference_hf.ipynb        # Performing inference on unseen data - models that has been pushed to hub
├── utilities/                       # Helper scripts and constants
│   ├── __init__.py                  # Initialize utilities as a package
│   ├── constants.py                 # Configuration and constants
│   ├── helper.py                    # Utility functions for NER and Hugging Face integration
├── README.md                        # Project documentation (this file)

Getting Started

Prerequisites

Python 3.10+

Create conda environment:

conda create -n electrical_ner python=3.11
conda activate electrical_ner

Install required libraries:
```
pip install -r requirements.txt
```
Hugging Face Token:
- Create an account on Hugging Face.
- Generate a personal access token from your account settings.

Setting Up Environment

Add the Hugging Face token to a .env file:

HF_TOKEN=your_hugging_face_token

Ensure the repository is structured as described above.

Running the Pipeline

Dataset Preparation:
- Run 01_data_tokenization.ipynb to tokenize and prepare the dataset.
Model Training:
- Execute 02_finetuning.ipynb to fine-tune models on the electrical NER dataset.
Model Evaluation:
- Use 03_evaluation.ipynb to evaluate model performance.
Model Inference - Local Models:
- Use 04_inference_local.ipynb to test the fine-tuned models on unseen data on locally saved models.
Model Upload to Hugging Face Hub:
- Use 05_push_to_hub.ipynb to push models and model cards to the Hugging Face Hub.
Model Inference - Hugging Face Models:
- Use 06_inference_hf.ipynb to test the fine-tuned models on unseen data that has been pushed to the hub.

Results and Evaluation

Evaluation Metric Plots

Final Metrics Comparison

Refer to the Medium article for in-depth analysis of these obtained results.

How to Use the Models

Inference Example

After deploying a model to the Hugging Face Hub, use the following code snippet for inference:

from transformers import AutoTokenizer, AutoModelForTokenClassification, pipeline
from utilities import clean_and_group_entities

model_name = "disham993/electrical-ner-ModernBERT-large"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForTokenClassification.from_pretrained(model_name)
nlp = pipeline("ner", model=model, tokenizer=tokenizer, aggregation_strategy="simple")

text = "The Xilinx Vivado development suite was used to program the Artix-7 FPGA."
ner_results = nlp(text)
cleaned_results = clean_and_group_entities(ner_results)
print(cleaned_results)

Available Models

The following models are fine-tuned and available on the Hugging Face Hub:

Model	Repository	Description
BERT Base	Link	Lightweight model for NER.
BERT Large	Link	High-accuracy model for NER.
DistilBERT Base	Link	Efficient model for quick tasks.
ModernBERT Base	Link	Advanced base model.
ModernBERT Large	Link	High-performance NER model.

Contributing

Contributions are welcome! To contribute:

Fork the repository.
Create a new branch (git checkout -b feature-name).
Commit your changes (git commit -m 'Add feature').
Push to the branch (git push origin feature-name).
Open a pull request.

License

This project is licensed under the MIT License. See the LICENSE file for details.

Contact

For questions or suggestions, feel free to reach out:

Name: Isham Rashik
Email: d.isham.993@gmail.com
Hugging Face Profile: disham993

Let’s revolutionize electrical engineering with state-of-the-art NLP! ⚡

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Fine-Tuning Electrical Engineering Named Entity Recognition (NER) Models

Overview

Features

Project Structure

Getting Started

Prerequisites

Setting Up Environment

Running the Pipeline

Results and Evaluation

How to Use the Models

Inference Example

Available Models

Contributing

License

Contact

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
data		data
logs		logs
models		models
notebooks		notebooks
screenshot		screenshot
utilities		utilities
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

License

di37/ner-electrical-engineering-finetuning

Folders and files

Latest commit

History

Repository files navigation

Fine-Tuning Electrical Engineering Named Entity Recognition (NER) Models

Overview

Features

Project Structure

Getting Started

Prerequisites

Setting Up Environment

Running the Pipeline

Results and Evaluation

How to Use the Models

Inference Example

Available Models

Contributing

License

Contact

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages