Skip to content
Complex Trait Genetics Group edited this page Jul 26, 2021 · 12 revisions

Welcome to the gmrm wiki!

gmrm is hybrid-parallel software for a Bayesian grouped mixture of regressions model for genome-wide association studies (GWAS). It is written in C++ using extensive optimisations and code vectorisation. It relies on plink's .bed format. It can handle multiple traits simultaneously.

Getting started

gmrm is designed exclusively for HPC and is best supported by intel compilers on AVX >=2 architectures. It can be used either on a single compute node, or across multiple compute nodes in a distributed environment through MPI.

Because MPI set-ups will differ across HPC systems we cannot provide an executable for linux-based systems, and we strongly recommend compiling it yourself or getting a system administrator to do that on your behalf. And we provide guidance for doing this here: Compiling

Documentation

gmrm has two main steps:

  • Obtaining a posterior distribution of SNP marker effects, group-specific variance components, and group-specific mixture components with a Bayesian grouped mixture of regressions model. The groupings of the SNP markers are specified by the user, as is the number of Gaussian mixtures used to estimate the SNP marker effects within each group. Groups are independent. The posterior mean SNP estimates can be used for genomic prediction and their full posterior distribution can be used for Bayesian fine-mapping or probabilistic association testing across the genome.

  • Using the Bayesian estimates of the SNP marker effects to obtain approximations of frequentist mixed-linear association model test statistics. This yields marginal SNP estimates as markers are tested one-at-a-time adjusting for the effects of other markers within other regions of the genome. Through specifying the number of MPI processes used, the data can be divided into user-specified genomic segments.

An example of a typical workflow is given on the following page: Workflow

Output

The output of the first step of gmrm consists of:

  • a .csv file with the model's hyper-parameters (variance explained by group, residual variance, SNP-heritability and markers in the model per sample). Each column of the file represents a hyperparameter and each row a posterior sample.

  • a binary file with extension .bet, which stores the effect sizes per SNP per posterior sample, this file needs post-processing in order to be amenable to downstream analyses.

  • a binary file with extension .cpn which stores the allocation to mixture per snp per posterior sample, this file also needs post-processing in order to be amenable to downstream analyses.

The output of the second step of gmrm consists of a single file .mlma with the marginal mixed-linear association model regression coefficient estimates, their standard error, the associated t-statistic, and p-value. These markers are in the same order as the .bim file.

We show how to process this binary output on the following page: Workflow

Citing

If you find gmrm useful for your research, please cite us:

We are a small research group and welcome anyone who wishes to assist us in the development of this software.

Clone this wiki locally