SDGclassy

SDG classification of texts using LDA topic modeling

This tool classifies texts based on the 17 Sustainable Development Goals (SDGs). Each SDG is defined by training texts derived from official UN publications. Since February 2024, the training set also includes synthetic texts generated by ChatGPT, enhancing the classifier's coverage.

To learn more about the methodology, see:

Note: This tool does not determine if a text is SDG-related. Instead, it calculates scores based on how well the text fits within the SDG vocabulary. For details, see the section "Interpreting the Results."

Requirements

Mallet 2.0.8 (Download here)
Text files to classify (in .txt format)

Text Preparation:

Ensure your text files are cleaned to exclude irrelevant material (e.g., front matter). Cleaned data yields better classification results.

Supported Platforms:

Mac OS X (Zsh shell recommended for newer macOS versions)
Windows (requires additional configuration, see below)
Linux

On Windows, the Mallet bigrams command may need fixing. Refer to this GitHub issue.

Installation

For Mac OS and UNIX:

Clone the repository:

git clone https://github.com/SeaCelo/SDGclassy.git SDGclassy
cd SDGclassy
chmod +x infer-scores.sh

Install Mallet inside the cloned SDGclassy directory:
```
wget http://mallet.cs.umass.edu/dist/mallet-2.0.8.zip
unzip mallet-2.0.8.zip
rm mallet-2.0.8.zip
```
Keeping Mallet in the same directory ensures all dependencies remain self-contained.
Download the required model files:
- sdgclassy.mallet: Download from Google Drive
- inferring-temp.mallet: Download from Google Drive
After downloading, manually place both files into the /SDGclassy/classifier/ directory.

These files are too large to be included in the repository, so they must be downloaded and added manually.

Usage

For Mac OS and UNIX:

Option 1: Use Predefined Directories

Prepare your text files:
- Ensure all files are in plain .txt format (no PDFs or directories).
- Clean the files by removing irrelevant material (e.g., front matter or unrelated content).
Place the files in the predefined input directory:
```
/SDGclassy/target/input/
```
Run the classification script:
```
./infer-scores.sh
```
Find the results in the predefined output directory:
```
/SDGclassy/target/output/SDG-scores-out.txt
```

Option 2: Specify Custom Directories

If your input files are stored elsewhere or you want to save results in a different location:
- Use the alternative script infer-scores2.sh.

Run the script with custom paths:

./infer-scores2.sh -i /path/to/your/input -o /path/to/your/output

Check the specified output directory for the results.

For Windows:

Prepare your text files:
- Convert files to plain .txt format.
- Clean up irrelevant content.
Place the text files in the /SDGclassy/target/input/ directory.
Run the script:
- Right-click infer-scores.ps1 and select "Run with Powershell".

Results will be saved in:

/SDGclassy/target/output/SDG-scores-out.txt

Interpreting the Results

The output file SDG-scores-out.txt lists topics (0–18) and their corresponding scores. Each topic maps to an SDG, except one filter topic, which should be ignored.
Use /classifier/topic-sdg_mapping.csv to match topics with SDGs.
Scores do not sum to 100% due to the extra category. Rescale them if necessary for your analysis.

Additional Notes

You can install Mallet elsewhere and adjust the scripts accordingly. Alternatively, add Mallet to your $PATH variable.
If Mallet runs out of memory during processing, allocate more memory:
1. Navigate to the Mallet installation directory:
```
cd /path/to/mallet-2.0.8/bin
```
2. Edit the binary file:
```
nano mallet
```
3. Set the memory allocation:
```
MEMORY=8g
```

Name		Name	Last commit message	Last commit date
Latest commit History 176 Commits
classifier		classifier
target		target
.gitattributes		.gitattributes
.gitignore		.gitignore
CITATION.cff		CITATION.cff
LICENSE		LICENSE
README.md		README.md
infer-scores.ps1		infer-scores.ps1
infer-scores.sh		infer-scores.sh
infer-scores2.sh		infer-scores2.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SDGclassy

Requirements

Text Preparation:

Supported Platforms:

Installation

For Mac OS and UNIX:

Usage

For Mac OS and UNIX:

Option 1: Use Predefined Directories

Option 2: Specify Custom Directories

For Windows:

Interpreting the Results

Additional Notes

About

Releases

Packages

Contributors 3

Languages

License

SeaCelo/SDGclassy

Folders and files

Latest commit

History

Repository files navigation

SDGclassy

Requirements

Text Preparation:

Supported Platforms:

Installation

For Mac OS and UNIX:

Usage

For Mac OS and UNIX:

Option 1: Use Predefined Directories

Option 2: Specify Custom Directories

For Windows:

Interpreting the Results

Additional Notes

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages