The most important python packages are:
- python == 3.6.7
- pytorch == 1.2.0
- torch == 0.4.1
- tensorboard == 1.13.1
- rdkit == 2019.09.3
- scikit-learn == 0.22.2.post1
- hyperopt == 0.2.5
- numpy == 1.18.2
For using our model more conveniently, we provide the environment file <environment.txt> to install environment directly.
Use train.py
Args:
- data_path : The path of input CSV file. E.g. input.csv
- dataset_type : The type of dataset. E.g. classification or regression
- save_path : The path to save output model. E.g. model_save
- log_path : The path to record and save the result of training. E.g. log
E.g.
python train.py --data_path data/test.csv --dataset_type classification --save_path model_save --log_path log
Use predict.py
Args:
- predict_path : The path of input CSV file to predict. E.g. input.csv
- result_path : The path of output CSV file. E.g. output.csv
- model_path : The path of trained model. E.g. model_save/model.pt
E.g.
python predict.py --predict_path data/test.csv --model_path model_save/test.pt --result_path result.csv
Use hyper_opti.py
Args:
- data_path : The path of input CSV file. E.g. input.csv
- dataset_type : The type of dataset. E.g. classification or regression
- save_path : The path to save output model. E.g. model_save
- log_path : The path to record and save the result of hyperparameters optimization. E.g. log
E.g.
python hyper_opti.py --data_path data/test.csv --dataset_type classification --save_path model_save --log_path log
Use interpretation_fp.py
Args:
- predict_path : The path of input CSV file. E.g. input.csv
- model_path : The path of trained model. E.g. model_save/model.pt
- result_path : The path of result. E.g. result.txt
E.g.
python interpretation_fp.py --predict_path test.csv --model_path model_save/test.pt --result_path result.txt
Use interpretation_graph.py
Args:
- predict_path : The path of input CSV file. E.g. input.csv
- model_path : The path of trained model. E.g. model_save/model.pt
- figure_path : The path to save figures of graph interpretation. E.g. figure
E.g.
python interpretation_graph.py --predict_path test.csv --model_path model_save/test.pt --figure_path figure
We provide the three public benchmark datasets used in our study: <Data.rar>
Or you can use your own dataset:
The dataset file should be a CSV file with a header line and label columns. E.g.
SMILES,BT-20
O(C(=O)C(=O)NCC(OC)=O)C,0
FC1=CNC(=O)NC1=O,0
...
The dataset file should be a CSV file with a header line and without label columns. E.g.
SMILES
O(C(=O)C(=O)NCC(OC)=O)C
FC1=CNC(=O)NC1=O
...
The dataset file should be a CSV file with a header line and label columns. E.g.
SMILES,BT-20
O(C(=O)C(=O)NCC(OC)=O)C,0
FC1=CNC(=O)NC1=O,0
...
The dataset file should be a CSV file with a header line and without label columns. E.g.
SMILES
O(C(=O)C(=O)NCC(OC)=O)C
FC1=CNC(=O)NC1=O
...
Decompress the Data.rar and find BACE dataset file in Data/MoleculeNet/bace.csv.
Use command:
python train.py --data_path Data/MoleculeNet/bace.csv --dataset_type classification --save_path model_save/bace --log_path log/bace
The trained model is in model_save/bace/Seed_0/model.pt
Use command:
python predict.py --predict_path test.csv --model_path model_save/bace/Seed_0/model.pt --result_path result.csv
Interpreting fingerprints should use the training data and the trained model
Use command:
python interpretation_fp.py --predict_path Data/MoleculeNet/bace.csv --model_path model_save/bace/Seed_0/model.pt --result_path result.txt
Interpreting molecular graphs with the specific molecules (e.g. in test.csv) and the trained model
Use command:
python interpretation_graph.py --predict_path test.csv --model_path model_save/bace/Seed_0/model.pt --figure_path figure/bace