Claude is a bot framework to automatically add IDs from dictionaries to French lexemes on Wikidata.
Features:
- Crawls dictionaries, following redirects when relevant, and caching results.
- Matches lexemes using lemma, lexical category, and grammatical gender.
- Tracks added IDs to avoid re-adding them in case they are removed.
- Generates reports for unmatched lexemes (example).
Dictionaries:
- Le Robert (file
lerobert.py
) - Littré (file
littre.py
) - TLFi (file
tlfi.py
, work in progress)
- Python 3
- MySQL 5.7+
Install Python. Example on a Debian-like system:
apt install php python3 python3-pip
Download the project:
git clone "https://github.com/envlh/claude.git"
Install the Python requirements. Example of the command to use at the root of the project:
pip3 install -r requirements.txt
You can install the MySQL server using the official repositories.
Create a new database on your MySQL server:
CREATE DATABASE `claude` DEFAULT CHARACTER SET 'utf8mb4';
Create a user (change the password):
CREATE USER 'claude'@'localhost' IDENTIFIED BY 'xxxxxxx';
Grant to the user all rights on the database:
GRANT ALL ON `claude`.* TO 'claude'@'localhost';
Grant to the user the right to access files:
GRANT FILE on *.* to 'claude'@'localhost';
Initialize the schema with the script sql/schema.sql
.
Edit the file conf/general.json
.
The bot uses Pywikibot. A way to login to Wikidata is to use a bot password.
Download Pywikibot:
git clone "https://gerrit.wikimedia.org/r/pywikibot/core"
After creating your bot password, generate configuration files:
python3 pwb.py generate_user_files.py
Copy generated files user-config.py
and user-password.py
at the root of the claude
project.
python3 bot.py
This project, by Envel Le Hir (@envlh), is under AGPLv3 license. See LICENSE
, NOTICE
, and CONTRIBUTORS
files for complete credits.