SSA - Social Stereotype Bias Analysis consists of a counterfactual explanation approach to assess social bias in hate speech classifiers by stereotypes and counter-stereotypes. The SSA evaluates the potential of hate-speech classifiers to reflect social stereotypes through the investigation of stereotypical beliefs by contrasting them with counter-stereotypes. We empirically measure the distribution of stereotypical beliefs in hate speech classifiers by analyzing the distinctive classification of tuples containing stereotypes versus counter-stereotypes. Experiment results show that hate speech classifiers attribute unreal or negligent offensiveness to social group identifiers (e.g. women, gay, etc.) by reflecting and reinforcing stereotypical beliefs regarding minorities.
Vargas, F., Carvalho, I., Hürriyetoğlu, A., Pardo, T.A.S., Benevenuto, F. (2023). Socially Responsible Hate Speech Detection: Can Classifiers Reflect Social Stereotypes?. In Proceedings of the 14th International Conference on Recent Advances in Natural Language Processing (RANLP 2023), pp.1187-1196. Varna, Bulgaria. https://aclanthology.org/2023.ranlp-1.126.
@inproceedings{vargas-etal-2023-socially, title = "Socially Responsible Hate Speech Detection: Can Classifiers Reflect Social Stereotypes?", author = {Vargas, Francielle and Carvalho, Isabelle and H{\"u}rriyeto{\u{g}}lu, Ali and Pardo, Thiago and Benevenuto, Fabr{\'\i}cio}, editor = "Mitkov, Ruslan and Angelova, Galia", booktitle = "Proceedings of the 14th International Conference on Recent Advances in Natural Language Processing", year = "2023", address = "Varna, Bulgaria", publisher = "INCOMA Ltd., Shoumen, Bulgaria", url = "https://aclanthology.org/2023.ranlp-1.126", pages = "1187--1196", }