Skip to content

Latest commit

 

History

History
40 lines (30 loc) · 3.31 KB

README.md

File metadata and controls

40 lines (30 loc) · 3.31 KB

Spanish Seed Words

DOI

The corpus is a curated dataset of Spanish words categorized by grammatical gender, intended for use in bias analysis and related fields. The words are organized into feminine and masculine categories, each containing multiple subcategories of lexical items.

Data files are in .csv format, located in the data/ directory, and are separated into subcategories as follows:

Dataset Structure

Feminine Words (data/Feminine/)

Masculine Words (data/Masculine/)

Notes

Each .csv file contains a list of words, separated by line endings, that fall into the respective category. This structure provides a foundation for exploring linguistic patterns, gender-specific word usage, or adapting models for inclusive language processing.

Applications

This corpus was created to support research in gender-aware language modeling and is a resource for linguists, developers, and researchers striving for inclusivity and precision in Spanish-language computational applications.

License

Creative Commons License
Spanish Seed Words Corpus is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.