Skip to content

exploring latent space of MoLeR AI for drug discovery

Notifications You must be signed in to change notification settings

peetceenatoo/drug-discover

Repository files navigation

Drug discovery through AI

Context

Research and development of new drugs to combat diseases often require many years of work to achieve the desired success. In this context, artificial intelligence can provide valuable support. Pharmaceutical companies are engaged in the search for molecules that can be suitable for treating specific conditions, which, however, are often too costly to allow for a profit margin.

In this project, a generative approach was employed to explore the chemical space by generating molecules similar to an input molecule. However, since these compounds can be difficult to practically implement or expensive, a similarity search is then conducted within a given dataset containing easily and economically producible molecules.

How

At the state of the art, two categories of models are distinguished in the generative approach: Variational Autoencoders (VAEs) and Generative Adversarial Networks (GANs).

Since GANs exhibit a high degree of specialization, making it more challenging to find a suitable network for specific objectives, we chose to adopt a VAE. In particular, we used @molecule_generation, a Python implementation of the MoLeR model developed by Microsoft Research Department.

One method to address the problem of exploring a latent space involves calculating the minimum hypervolume containing the n nearest molecules to a given molecule. However, since the complexity of this problem is intractable, for the implementation of the project, we devised an approximate solution that reduces the set of molecules to be visited to a subset with computationally feasible dimensions and that contains all the molecules of greatest interest.

Subsequently, for each generated molecule, a representative is chosen from the most similar ones in the dataset to be displayed as output (specifically, the one most similar to the input molecule).

How to use

To use the generator, it is first necessary to use the file "fingerprints_generator.py" to convert a file containing SMILES strings into a .h5 file, which will be used for the similarity search. Once this is done, you can use the "generator.py" file to computationally generate, given a SMILES string of a molecule, the strings in the dataset that most resemble the input molecule.

Dependencies

Molecule Generation: A model for Molecule Generation
@molecule_generation

FPSim2: Simple package for fast molecular similarity searches
@FPSim2

NumPy: The fundamental package for scientific computing with Python
@numpy

Credits

Mattia Piccinato @peetceenatoo
Francesco Rita @FraRita

About

exploring latent space of MoLeR AI for drug discovery

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published