-
Notifications
You must be signed in to change notification settings - Fork 1
Home
Welcome to the gmrm
wiki!
gmrm
is hybrid-parallel software for a Bayesian grouped mixture of
regressions model for genome-wide association studies (GWAS). It is written in
C++ using extensive optimisations and code vectorisation. It relies on plink's
.bed format. It can handle multiple traits simultaneously.
gmrm
is designed exclusively for HPC and is best supported by intel compilers
on AVX >=2 architectures. It can be used either on a single compute node, or across
multiple compute nodes in a distributed environment through MPI.
Because MPI set-ups will differ across HPC systems we cannot provide an executable for linux-based systems, and we strongly recommend compiling it yourself or getting a system administrator to do that on your behalf. And we provide guidance for doing this here: Compiling
gmrm
has two main steps:
-
Obtaining a posterior distribution of SNP marker effects, group-specific variance components, and group-specific mixture components with a Bayesian grouped mixture of regressions model. The groupings of the SNP markers are specified by the user, as is the number of Gaussian mixtures used to estimate the SNP marker effects within each group. Groups are independent. The posterior mean SNP estimates can be used for genomic prediction and their full posterior distribution can be used for Bayesian fine-mapping or probabilistic association testing across the genome.
-
Using the Bayesian estimates of the SNP marker effects to obtain approximations of frequentist mixed-linear association model test statistics. This yields marginal SNP estimates as markers are tested one-at-a-time adjusting for the effects of other markers within other regions of the genome. Through specifying the number of MPI processes used, the data can be divided into user-specified genomic segments.
An example of a typical workflow is given on the following page: Workflow
The output of the first step of gmrm
consists of:
-
a .csv file with the model's hyper-parameters (variance explained by group, residual variance, SNP-heritability and markers in the model per sample). Each column of the file represents a hyperparameter and each row a posterior sample.
-
a binary file with extension .bet, which stores the effect sizes per SNP per posterior sample, this file needs post-processing in order to be amenable to downstream analyses.
-
a binary file with extension .cpn which stores the allocation to mixture per snp per posterior sample, this file also needs post-processing in order to be amenable to downstream analyses.
The output of the second step of gmrm
consists of a single file .mlma with
the marginal mixed-linear association model regression coefficient estimates,
their standard error, the associated t-statistic, and p-value. These markers
are in the same order as the .bim file.
We show how to process this binary output on the following page: Workflow
If you find gmrm
useful for your research, please cite us:
We are a small research group and welcome anyone who wishes to assist us in the development of this software.