Skip to content

Causal inference for multiple risk factors and diseases from genomics data

License

Notifications You must be signed in to change notification settings

medical-genomics-group/ci-gwas

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CI-GWAS: Causal inference for multiple risk factors and diseases from genomics data

This is the official implementation of CI-GWAS as described in our paper: Causal inference for multiple risk factors and diseases from genomics data

Prerequisites

  • cmake >= 3.18
  • python >= 3.9.6 , with numpy >= 1.22.1 and scipy >= 1.11.3.
  • R >= 4.1.0, with dependencies:
install.packages("BiocManager")
BiocManager::install(pkgs=c("graph","Rgraphviz", "RBGL"))
install.packages(c( "abind", "igraph", "ggm", "corpcor", "robustbase", "vcd", "Rcpp", "bdsmatrix", "sfsmisc", "fastICA", "clue", "MASS", "Matrix", "mvtnorm"," huge", "ggplot2", "dagitty", 'pcalg', 'Matrix'))

Installation

CI-GWAS is a conglomerate of scripts and compiled programs bundled in a python command-line-interface.

First simply clone the repo:

git clone --recurse-submodules https://github.com/medical-genomics-group/ci-gwas.git

The cli should already be accessible via

./ci-gwas.py

The cusk part of the project has to compiled:

cd cusk
cmake -S . -B build
cmake --build build

You can then run the tests to check that everything works:

cd build && ctest

Running

Get help:

./ci-gwas.py -h

First of all, make sure that any marker data you want to plug in is LD pruned, or at least does not have markers with a correlation of 1.

A standard analysis, if you have data at the individual-level available, consists of subsquent calls to

  1. ci-gwas.py prep-bed to compute means and variances of all markers (make sure to exclude any markers that have no recorded variation; the .stds file should not have any 0.0 entries)
  2. ci-gwas.py block to block the LD matrix
  3. ci-gwas.py cusk (once for each block) to compute skeletons (make sure that the trait values are standardized)
  4. ci-gwas.py merge-block-outputs to merge all skeletons
  5. ci-gwas.py cuskss-merged to run another of cuda-skeleton using only the selected markers. This is optional, but can help to reduce the marker-trait FDR.
  6. ci-gwas.py sepselect to find separation sets
  7. ci-gwas.py srfci to infer a PAG
  8. ci-gwas.py mvivw to run mvivw with the IVs inferred in the skeleton construction

Alternatively, if you have correlations from summarized data, you can start at step 3) with cuskss instead of cusk. In that case it is important that

  • the traits have the same order in the mxp and pxp files
  • the markers have the same order in the mxm and mxp files

In addition, when including binary or ordinal traits, cuskss-het should be used with heterologous correlations instead of cusk or cuskss.

Common Errors

invalid device function, no kernel image is available for execution on the device,

Can be caused by the chosen device having a lower GPU Compute Capability than the one cusk was compiled for. The Compute Capability targeted by the build is specified in the top level CMakeLists.txt. If there are multiple devices on the machine and only a subset of them have the appropriate Compute Capability, you can choose one by setting CUDA_VISIBLE_DEVICES=X where X is the index of the device. The device list can be inspected with nvidia-smi -L.

About

Causal inference for multiple risk factors and diseases from genomics data

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published