HASOC 2021

Offensive and problematic content including insulting, hurtful, derogatory or obscene user contributions are pervasive in social media. Societies need to develop adequate response mechanisms in order to find a balance between freedom of expression on one side and the ability to live without oppressive remarks on the other side. A requirement for any response is robust technology for identifying problematic content automatically. HASOC provides a forum for developing and testing text classification systems for various languages

Sub-task A: Identifying Hate, offensive and profane content

This task focus on Hate speech and Offensive language identification offered for English, German, and Hindi. Sub-task A is coarse-grained binary classification in which participating system are required to classify tweets into two classes, namely: Hate and Offensive (HOF) and Non- Hate and offensive (NOT).

NOT : Non Hate-Offensive - This post does not contain any Hate speech, profane, offensive content.
HOF : Hate and Offensive - This post contains Hate, offensive, and profane content.

Model	Accuracy
Gaussian NM	50%
Logistic Regression	80%
KNN	78%
SVC	84%
Random Forest	82%
LSTM	78%
BERT	78%

Sub-task B: Discrimination between Hate, profane and offensive posts

This sub-task is a fine-grained classification offered for English, German, and Hindi. Hate-speech and offensive posts from the sub-task A are further classified into three categories:

HATE : Hate speech:- Posts under this class contain Hate speech content.
OFFN : Offenive:- Posts under this class contain offensive content.
PRFN : Profane:- These posts contain profane words.

Model	Accuracy
Gaussian NM	45%
KNN	64%
SVC	66%
Decision Tree	53%
Random Forest	69%
LSTM	60%
BERT	61%
Proposed Model	63%

Proposed Model:

Model	Accuracy
BERT_A	78%
BERT_OFFN/HATE	79% (2 epochs)
Profanity	92%

Results

Sub-Task	Classifier	Macro F1-score
A	DistilBERT	75%
B	DistilBERT	57%

Publication

Cite our paper

S. Saseendran, S. R, S. V, S. Giri, Classification of Hate Speech and Offensive Content 
using an approach based on DistilBERT, in: Forum for Information Retrieval Evaluation
(Working Notes) (FIRE), CEUR-WS.org, 2021.

Code Files:

Preprocess.ipynb : Preprocessing the data.
dataAnalysis.ipynb : Analysing the data.
ml_techniques_A.ipynb : Applying ML models to classify the data and find the accuracy for Subtask A.
ml_techniques_B.ipynb : Applying ML models to classify the data and find the accuracy for Subtask B.
DL_LSTM for task A.ipynb : Applying DL LSTM model to classify the data and find the accuracy for Subtask A.
DL_LSTM for task B.ipynb : Applying DL LSTM model to classify the data and find the accuracy for Subtask B.
BERT_A.ipynb : Applying DistilBERT model to classify the data and find the accuracy for Subtask A.
BERT_B.ipynb : Applying DistilBERT model to classify the data and find the accuracy for Subtask B.
Proposed Model (Folder) : Files related to the Proposed Model:
- profanity_check.ipynb : Survey of various pre-trained models/libraries for profanity check.
- ProposedModel.ipynb : Implementation of the proposed model.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

HASOC 2021

Sub-task A: Identifying Hate, offensive and profane content

Sub-task B: Discrimination between Hate, profane and offensive posts

Proposed Model:

Results

Publication

Cite our paper

Code Files:

About

Releases

Packages

Contributors 4

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 149 Commits
Subtask A		Subtask A
Subtask B		Subtask B
Preprocess.ipynb		Preprocess.ipynb
README.md		README.md
dataAnalysis.ipynb		dataAnalysis.ipynb

swetha4444/HASOC2021

Folders and files

Latest commit

History

Repository files navigation

HASOC 2021

Sub-task A: Identifying Hate, offensive and profane content

Sub-task B: Discrimination between Hate, profane and offensive posts

Proposed Model:

Results

Publication

Cite our paper

Code Files:

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages