This repository contains the official implementation of "Named entity recognition in Turkish: A comparative study with detailed error analysis" paper. Additionaly, detailed evaluation results supported by statistical tests are provided.
This study provides a comparative analysis on the performances of the state-of-the-art approaches for Turkish named entity recognition using existing datasets with varying domains. The study includes a detailed error analysis that examines both quantitative (entity types, varying entity lengths, and changing word orders) and qualitative (ambiguous entities and noisy texts) factors that can affect the model performance.
- Python 3.8.11
- PyTorch 1.11.0
- Tensorflow 2.6.0
To install the environment using Conda:
$ conda env create -f requirements.yml
This command creates a Conda environment named ner_tr
. The environment includes all necessary packages for the training of the models in the study. After installation of the environment, activate it using the command below:
$ conda activate ner_tr
To train the models in this study, run the command below.
$ python main.py [R_MODE] [D_PATH] [M_PATH] [M_NAME] -r
Parameter Name | Type | Definition |
---|---|---|
[R_MODE] |
str |
Run mode: 'train' or 'test' |
[D_PATH] |
str |
Path of the data folder containing train.tsv and test.tsv files |
[M_PATH] |
str |
Path for the model (save model when R_MODE='train', load when R_MODE='test') |
[M_NAME] |
str |
The name of the model (berturk_crf, bilstm, etc.) |
-r |
str |
Path for the evaluation report (use only in test mode) |
Example command is below to train BERTurk-CRF model.
$ python main.py train '/src/data/atisner/' '/models/berturk_crf/' berturk_crf
To test the fine-tuned models, run the command below.
Example command is below to train BERTurk-CRF model.
$ python main.py test '/src/data/atisner/' '/models/berturk_crf/' berturk_crf -r '/results/berturk_crf/'
If you make use of this code, please cite the following paper:
@article{OZCELIK2022103065,
title = {Named entity recognition in Turkish: A comparative study with detailed error analysis},
journal = {Information Processing & Management},
volume = {59},
number = {6},
pages = {103065},
year = {2022},
issn = {0306-4573},
doi = {https://doi.org/10.1016/j.ipm.2022.103065},
url = {https://www.sciencedirect.com/science/article/pii/S0306457322001674},
author = {Oguzhan Ozcelik and Cagri Toraman},
keywords = {Comparative analysis, Error analysis, Named entity recognition, Deep learning model, Turkish text, Transformer-based language model}
}