Welcome, this page is auxiliary for the paper: "Is your code harmful too? Understanding harmful code through transfer learning", here we display the results as well provide the datasets used.

Installation Guide:

pip install -r requirements.txt

Study Design:

Harmful Code results for the transfer learning combined for all code smells and divided by each language:

main.py script:

In this repository, the main file opens a "project_patches.csv", with the following header "project,commit,file_path,patch", and compares its data with a local PostgreSQL database. It checks if the file path from the csv is contained within the database, if so it verifies if there are any smells in the code and saves a jsonb to the database, then it checks if the line number from that class and its methods are contained in the hunk intervals from the csv patch, marking it in the database as a bug fix.

get_tokens_csv.py script:

This script generates temporary files, based on a csv file, to generate tokens from the code text, it requires tokenizer to run.

make_tokenizer_csv.py script:

This script will get all relevant data from the PostgreSQL database and save it in a csv file. This was necessary to run the get_tokens.py script in a Linux virtual machine.

make_train_test.py script:

This script receives a csv file and, for each language and code smell, it does an 80/20 split, creating train and test files.

plot_test.py script:

This script has all the code needed to run transfer-learning-experiments, however only the RQ3: Plot function is not commented.

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
.gitignore		.gitignore
1st_find_bugs_and_smells.ipynb		1st_find_bugs_and_smells.ipynb
2nd_classify_and_make_train_test.ipynb		2nd_classify_and_make_train_test.ipynb
3rd_transfer_learning_experiments.ipynb		3rd_transfer_learning_experiments.ipynb
README.md		README.md
database_example_file.sql		database_example_file.sql
projects_patches.csv		projects_patches.csv
requirements.txt		requirements.txt
tokenizer		tokenizer

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Welcome, this page is auxiliary for the paper: "Is your code harmful too? Understanding harmful code through transfer learning", here we display the results as well provide the datasets used.

Installation Guide:

Study Design:

Harmful Code results for the transfer learning combined for all code smells and divided by each language:

main.py script:

get_tokens_csv.py script:

make_tokenizer_csv.py script:

make_train_test.py script:

plot_test.py script:

About

Releases

Packages

Contributors 2

Languages

opus-research/sbqs2023_harmful_code_transfer_learning

Folders and files

Latest commit

History

Repository files navigation

Welcome, this page is auxiliary for the paper: "Is your code harmful too? Understanding harmful code through transfer learning", here we display the results as well provide the datasets used.

Installation Guide:

Study Design:

Harmful Code results for the transfer learning combined for all code smells and divided by each language:

main.py script:

get_tokens_csv.py script:

make_tokenizer_csv.py script:

make_train_test.py script:

plot_test.py script:

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages