GitHub - franciellevargas/SSA: SSA is a counterfactual explanation approach to assess social bias in hate speech classifiers by stereotypes and counter-stereotypes

SSA: A Counterfactual Explanation Approach to Assess Social Bias in Hate Speech Classifiers

SSA - Social Stereotype Bias Analysis consists of a counterfactual explanation approach to assess social bias in hate speech classifiers by stereotypes and counter-stereotypes. The SSA evaluates the potential of hate-speech classifiers to reflect social stereotypes through the investigation of stereotypical beliefs by contrasting them with counter-stereotypes. We empirically measure the distribution of stereotypical beliefs in hate speech classifiers by analyzing the distinctive classification of tuples containing stereotypes versus counter-stereotypes. Experiment results show that hate speech classifiers attribute unreal or negligent offensiveness to social group identifiers (e.g. women, gay, etc.) by reflecting and reinforcing stereotypical beliefs regarding minorities.

CITING

Vargas, F., Carvalho, I., Hürriyetoğlu, A., Pardo, T.A.S., Benevenuto, F. (2023). Socially Responsible Hate Speech Detection: Can Classifiers Reflect Social Stereotypes?. In Proceedings of the 14th International Conference on Recent Advances in Natural Language Processing (RANLP 2023), pp.1187-1196. Varna, Bulgaria. https://aclanthology.org/2023.ranlp-1.126.

BIBTEX

@inproceedings{vargas-etal-2023-socially, title = "Socially Responsible Hate Speech Detection: Can Classifiers Reflect Social Stereotypes?", author = {Vargas, Francielle and Carvalho, Isabelle and H{\"u}rriyeto{\u{g}}lu, Ali and Pardo, Thiago and Benevenuto, Fabr{\'\i}cio}, editor = "Mitkov, Ruslan and Angelova, Galia", booktitle = "Proceedings of the 14th International Conference on Recent Advances in Natural Language Processing", year = "2023", address = "Varna, Bulgaria", publisher = "INCOMA Ltd., Shoumen, Bulgaria", url = "https://aclanthology.org/2023.ranlp-1.126", pages = "1187--1196", }

Name		Name	Last commit message	Last commit date
Latest commit History 71 Commits
datasets		datasets
models		models
tuples		tuples
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SSA: A Counterfactual Explanation Approach to Assess Social Bias in Hate Speech Classifiers

CITING

BIBTEX

FUNDING

About

Releases 2

Packages

Contributors 2

License

franciellevargas/SSA

Folders and files

Latest commit

History

Repository files navigation

SSA: A Counterfactual Explanation Approach to Assess Social Bias in Hate Speech Classifiers

CITING

BIBTEX

FUNDING

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 2

Packages 0

Contributors 2

Packages