Skip to content

swetha4444/HASOC2021

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

HASOC 2021

Offensive and problematic content including insulting, hurtful, derogatory or obscene user contributions are pervasive in social media. Societies need to develop adequate response mechanisms in order to find a balance between freedom of expression on one side and the ability to live without oppressive remarks on the other side. A requirement for any response is robust technology for identifying problematic content automatically. HASOC provides a forum for developing and testing text classification systems for various languages

Sub-task A: Identifying Hate, offensive and profane content

This task focus on Hate speech and Offensive language identification offered for English, German, and Hindi. Sub-task A is coarse-grained binary classification in which participating system are required to classify tweets into two classes, namely: Hate and Offensive (HOF) and Non- Hate and offensive (NOT).

  • NOT : Non Hate-Offensive - This post does not contain any Hate speech, profane, offensive content.
  • HOF : Hate and Offensive - This post contains Hate, offensive, and profane content.
Model Accuracy
Gaussian NM 50%
Logistic Regression 80%
KNN 78%
SVC 84%
Random Forest 82%
LSTM 78%
BERT 78%

Sub-task B: Discrimination between Hate, profane and offensive posts

This sub-task is a fine-grained classification offered for English, German, and Hindi. Hate-speech and offensive posts from the sub-task A are further classified into three categories:

  • HATE : Hate speech:- Posts under this class contain Hate speech content.
  • OFFN : Offenive:- Posts under this class contain offensive content.
  • PRFN : Profane:- These posts contain profane words.
Model Accuracy
Gaussian NM 45%
KNN 64%
SVC 66%
Decision Tree 53%
Random Forest 69%
LSTM 60%
BERT 61%
Proposed Model 63%

Proposed Model:

Model Accuracy
BERT_A 78%
BERT_OFFN/HATE 79% (2 epochs)
Profanity 92%

Results

Sub-Task Classifier Macro F1-score
A DistilBERT 75%
B DistilBERT 57%

Publication

Cite our paper

S. Saseendran, S. R, S. V, S. Giri, Classification of Hate Speech and Offensive Content 
using an approach based on DistilBERT, in: Forum for Information Retrieval Evaluation
(Working Notes) (FIRE), CEUR-WS.org, 2021.

Code Files:

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •