GitHub - Sadat-Shahriyar/Bangla-Named-Entity-Recognition: Implementation of Bangla named entity recognition using BERT, SVM and CRF

NLP Hackathon on Named Entity Recognition

banglabert token-classification submodule was used for the fine tuning task
SVM classifier was used as the hand-crafted ML model

Description

During the Bangladesh National NLP Hackathon, we developed a project to detect named entities in Bangla sentences using BanglaBERT, Support Vector Machine (SVM), and Conditional Random Fields (CRF). The dataset was augmented with techniques like word swapping, token replacement by label, synonym replacement, random insertion/deletion, stemming, lemmatization, and stopword removal. CRF was enhanced with features such as context words, word suffixes, named entity information, and digit features. The models achieved a macro average F1 score of 0.7938 with BanglaBERT, 0.68 with SVM, and 0.34 with CRF.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
Data_Preprocessing.ipynb		Data_Preprocessing.ipynb
MLmodel.ipynb		MLmodel.ipynb
NER.ipynb		NER.ipynb
README.md		README.md
Report.pdf		Report.pdf
finalized_model.sav		finalized_model.sav
token_classification.py		token_classification.py
trainer.sh		trainer.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NLP Hackathon on Named Entity Recognition

Description

About

Releases

Packages

Contributors 2

Languages

Sadat-Shahriyar/Bangla-Named-Entity-Recognition

Folders and files

Latest commit

History

Repository files navigation

NLP Hackathon on Named Entity Recognition

Description

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages