Prepare the following files:
- T2 that contains all compounds with images (make sure that the used input_compound_id match their analogs in your MELLODDY data when they exist). it will later be referred to as T2_images.csv
- A file with columns corresponding to image features (standardized) and indexed with the input_compound_id. The file preperation manual is available on box ( In the scripts it's referred to as T_image_features_std.csv
- Run scripts in 01_datapreparation/ in the order of their numbering:
- will generate T0, T1, T2 files for image compounds that exist also in Melloddy data with their corresponding tasks.
- 02_preprocessing_synchronized_thresholds.ipynb will modify T0-2 files to align thresholds with your original melloddy tuner run.
- executes melloddy tuner on image only data, using updated T0-2.
- 04_generating_x_image_features.ipynb creates an analog of X matrix with image features instead of ECFP.
- Run scirpts in 02_image_model_training/ :
- Update file in 02_image_model_training/
- Execute 01_setup_run_folders.bash and 02_submit_all.bash. This will initiate HP scan.
- Identify the best model using scripts_and_notebooks/step1_3_HP_selection.ipynb - take the last one outputted by the notebook
- Run scripts_and_notebooks/step2_1_ysparse_generation_main_quality_tasks_fold2.ipynb
- Use the best model identified in 1.3 to execute 03_image_predictions/01_main_tasks_fold2.
- you need to edit paths in 03_image_predictions/01_main_tasks_fold2/01_link_files.bash and image_predictions/main_tasks_fold2/
- Run scripts_and_notebooks/step2_2_CPfitting.ipynb, step2_3_taskstats.ipynb and step2_4_ysparse_inference.ipynb
- Execute scripts in 03_image_predictions/02_all_cmpds
- you might need to edit paths in 03_image_predictions/02_all_cmpds/01_link_files.bash and 03_image_predictions/02_all_cmpds/
- Run scripts_and_notebooks/step2_5_CPapplication_auxdata.ipynb
- it refers to T2_images.csv, this file is described in the beginning of the readme. It contain all image compounds, not only ones in MELLODDY
- Run scripts_and_notebooks/step2_6_Tfilegeneration.ipynb
- Run scripts_and_notebooks/step2_7_labels_to_auxtasks.ipynb
- tuner_output_baseline there refers to results of melloddy tuner without images
- Run scripts_and_notebooks/step2_8_concat_label_imputation.ipynb
- Execute scripts in 04_aux_data_preparation/
- will take as much time as a melloddy tuner run on your original data (or more :))
- Run scripts_and_notebooks/step3_1_confidence_selection.ipynb
- Execute scripts in 05_aux_data_training/
- you need to edit paths in
- in 00_best_hyperparameters.dat change model HPs to the optimal ones for your MELLODDY dataset
- Run scripts_and_notebooks/step3_2_y_sparse_generation_inference.ipynb
- Execute scripts in 06_aux_data_predictions/
- edit paths/envs in and
- Run scripts_and_notebooks/step3_3_evaluation.ipynb
- Execute aux_data_preparation/ - this will generate different weight files
- Modify the best HPs in 05_aux_data_training/00_best_hyperparameters.dat to be the same as in 05_aux_data_training/
- Modify 05_aux_data_training/ analogously to
- Run 05_aux_data_training/01_setup_run_folders_aux.bash
- Run 05_aux_data_training/02_submit_all_aux.bash
- if you want to skip runs that were already done, modify the corresponding foldernames