The GRPM system is an advanced tool designed for the integration and analysis of genetic polymorphism data corresponding to specific biomedical domains. It consists of five modular components that facilitate data retrieval, merging, analysis, and the incorporation of GWAS data.
The GRPM system is a Python-based framework designed for the construction of a comprehensive dataset of human genetic polymorphisms associated with nutrition. By integrating data from multiple sources and utilizing MeSH ontology as semantic retrieval tool, this workflow enables researchers to investigate genetic variants with significant associations to specified biomedical subjects. The primary objective of developing this resource was to support nutritionists in exploring gene-diet interactions and implementing personalized nutrition strategies.
You can visualize and query the developed datasets by installing our package via:
pip install git+https://github.com/johndef64/GRPM_system.git
Example queries are available in the tests
directory and test.ipynb
.
The workflow is composed of five distinct modules, each executing a crucial function to assist in the integration and analysis of genetic polymorphism data associated with nutrition. The modules are outlined below:
No. | Module | Description | Notebook |
---|---|---|---|
1. | Dataset Builder | Retrieves and integrates data from the LitVar and PubMed databases in a structured format. | |
2. | MeSH Term Selection | Extracts a coherent MeSH lists to query the GRPM Dataset starting from simple biomedial terms collections (NLP based). | |
3. | Dataset Querying | Exexute MeSH query in the GRPM dataset, extracting a subset of matching entities, and generates a data report. | |
4. | Gene Prioritization | Analyzes retrieved data and computes gene interest index to filter significative results. | |
5. | GWAS Data Integration | Merges GWAS data, associating phenotypes and potential risk/effect alleles with the GRPM data (BioBERT based). |
To reproduce our pipeline, execute each module individually by selecting the "Open in Colab" option. Ensure that all necessary dependencies and files are imported. Google Drive synchronization is available.
Each Jupyter notebook includes commands to download and install the necessary dependencies for execution.
Comprehensive instructions for the usage of each module are found within the respective Jupyter Notebooks provided. Follow the guidelines closely and install the necessary Python packages specified for each module.
The GRPM Dataset accessible on Zenodo represents a version of LitVar1, which has since been deprecated and replaced by LitVar2. Module 1 (Dataset Builder) has been updated for compatibility with LitVar2. The other modules in the pipeline remain operational using the original GRPM Dataset as available on Zenodo.
All requirements are outlines in requirements.txt
and setup.py