Welcome, this page is auxiliary for the paper: "Is your code harmful too? Understanding harmful code through transfer learning", here we display the results as well provide the datasets used.
- pip install -r requirements.txt
Harmful Code results for the transfer learning combined for all code smells and divided by each language:
In this repository, the main file opens a "project_patches.csv", with the following header "project,commit,file_path,patch", and compares its data with a local PostgreSQL database. It checks if the file path from the csv is contained within the database, if so it verifies if there are any smells in the code and saves a jsonb to the database, then it checks if the line number from that class and its methods are contained in the hunk intervals from the csv patch, marking it in the database as a bug fix.
This script generates temporary files, based on a csv file, to generate tokens from the code text, it requires tokenizer to run.
This script will get all relevant data from the PostgreSQL database and save it in a csv file. This was necessary to run the get_tokens.py script in a Linux virtual machine.
This script receives a csv file and, for each language and code smell, it does an 80/20 split, creating train and test files.
This script has all the code needed to run transfer-learning-experiments, however only the RQ3: Plot function is not commented.