Skip to content

Latest commit

 

History

History
46 lines (29 loc) · 2.06 KB

README_RF_NN.md

File metadata and controls

46 lines (29 loc) · 2.06 KB

Predicting Bio-Activity

In order to make a structure based predict on the bio-activity of molecules a list of features is generated with a KNIME workflow. This list is used as input for either a Neural Network or a Random Forest Predictor. In both scripts the input data is splitted into training and test data, 70% of the data is used to train the predictor. Furthermore, the parameters of the predictors are adjusted by GridSearchCV: The predictor is trained multiple times with different combinations of available parameters and the best predictor is then used to predict the bio-activity.

Feature Calculation

The KNIME workflow featureGeneration.knar receives an input file containing SMILES and the predicted bio-activity of the molecule in a comma separated csv file. It generates a list of features for the molecules and outputs a comma separated file containing the activity, the SMILES structure the molecules corresponding features.

Classification

In order to run the program one has to specify

-t Path of the input csv file generated by the KNIME workflow -o Destination path of the resulting prediction csv

Random Forest Classifier

randomForest_GridSearch.py -t trainingData_Features.csv -o rfc_GridSearch_res.csv

Neural Network Classifier

neuronalNetwork_GridSearch.py -t trainingData_Features.csv -o rfc_GridSearch_res.csv

Built With

  • KNIME - Analytics Platform (3.7)
  • RDKIT - Software Package to read and analyse SMILE data (3.4.0v)
  • Python - Python programming language (3.6)
  • scikit-learn - Software Package for Machine Learning (v0.20.1)
  • keras - Open Source Deep Learning Library (2.24)
  • matplotlib - 2D Plotting Library (2.2.2)
  • pandas - Datastructures and Dataframes (v0.23.4)
  • numpy - Scientific computing with Python (v1.15.2)

Authors

Jennifer Bödker Tobias Nietsch