Skip to content

Implementation of Bangla named entity recognition using BERT, SVM and CRF

Notifications You must be signed in to change notification settings

Sadat-Shahriyar/Bangla-Named-Entity-Recognition

Repository files navigation

NLP Hackathon on Named Entity Recognition

  • banglabert token-classification submodule was used for the fine tuning task
  • SVM classifier was used as the hand-crafted ML model

Description

During the Bangladesh National NLP Hackathon, we developed a project to detect named entities in Bangla sentences using BanglaBERT, Support Vector Machine (SVM), and Conditional Random Fields (CRF). The dataset was augmented with techniques like word swapping, token replacement by label, synonym replacement, random insertion/deletion, stemming, lemmatization, and stopword removal. CRF was enhanced with features such as context words, word suffixes, named entity information, and digit features. The models achieved a macro average F1 score of 0.7938 with BanglaBERT, 0.68 with SVM, and 0.34 with CRF.

About

Implementation of Bangla named entity recognition using BERT, SVM and CRF

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published