Skip to content

Latest commit

 

History

History
83 lines (66 loc) · 3.97 KB

README.md

File metadata and controls

83 lines (66 loc) · 3.97 KB

HASOC 2021

Offensive and problematic content including insulting, hurtful, derogatory or obscene user contributions are pervasive in social media. Societies need to develop adequate response mechanisms in order to find a balance between freedom of expression on one side and the ability to live without oppressive remarks on the other side. A requirement for any response is robust technology for identifying problematic content automatically. HASOC provides a forum for developing and testing text classification systems for various languages

Sub-task A: Identifying Hate, offensive and profane content

This task focus on Hate speech and Offensive language identification offered for English, German, and Hindi. Sub-task A is coarse-grained binary classification in which participating system are required to classify tweets into two classes, namely: Hate and Offensive (HOF) and Non- Hate and offensive (NOT).

  • NOT : Non Hate-Offensive - This post does not contain any Hate speech, profane, offensive content.
  • HOF : Hate and Offensive - This post contains Hate, offensive, and profane content.
Model Accuracy
Gaussian NM 50%
Logistic Regression 80%
KNN 78%
SVC 84%
Random Forest 82%
LSTM 78%
BERT 78%

Sub-task B: Discrimination between Hate, profane and offensive posts

This sub-task is a fine-grained classification offered for English, German, and Hindi. Hate-speech and offensive posts from the sub-task A are further classified into three categories:

  • HATE : Hate speech:- Posts under this class contain Hate speech content.
  • OFFN : Offenive:- Posts under this class contain offensive content.
  • PRFN : Profane:- These posts contain profane words.
Model Accuracy
Gaussian NM 45%
KNN 64%
SVC 66%
Decision Tree 53%
Random Forest 69%
LSTM 60%
BERT 61%
Proposed Model 63%

Proposed Model:

Model Accuracy
BERT_A 78%
BERT_OFFN/HATE 79% (2 epochs)
Profanity 92%

Results

Sub-Task Classifier Macro F1-score
A DistilBERT 75%
B DistilBERT 57%

Publication

Cite our paper

S. Saseendran, S. R, S. V, S. Giri, Classification of Hate Speech and Offensive Content 
using an approach based on DistilBERT, in: Forum for Information Retrieval Evaluation
(Working Notes) (FIRE), CEUR-WS.org, 2021.

Code Files: