Skip to content

Workflow

orliac edited this page Jul 22, 2021 · 10 revisions

Workflow

An analysis using gmrm has the following steps:

  1. Run the model
  2. Estimate the MLMA effects
  3. Convert binary output to summarise the posterior

Dataset

You will find a simulated test.bed/.bim/.fam fileset with 20,000 markers and 10,000 individuals in gmrm/example.

We simulated two different phenotypes in test1.phen and test2.phen.

The generation of the data is described in gmrm/example/data_sim.R

Running Step 1

We are going to run the model for two traits simultaneously. A full list of options can be found here: List of Options

Assuming Slurm as job scheduler, create a file test_gmrm.sh along these lines:

#!/bin/bash

#SBATCH --ntasks 5
#SBATCH --cpus-per-task 4
#SBATCH --mem 1G
#SBATCH --time 01:00:00

module load gcc mvapich2 boost

srun gmrm \
--bed-file /path/to/test.bed \
--dim-file /path/to/test.dim \
--phen-files /path/to/test1.phen,/path/to/test2.phen \
--group-index-file /path/to/test.gri \
--group-mixture-file /path/to/test.grm \
--shuffle-markers 1 \
--seed 171014 \
--iterations 500 \

Then submit it to a slurm scheduler system:

sbatch test_gmrm.sh

Running Step 2

We are then going to generate mixed-linear model marginal SNP effect size estimates

Assuming Slurm as job scheduler, create a file test_mlma.sh along these lines:

#!/bin/bash

#SBATCH --ntasks 5
#SBATCH --cpus-per-task 4
#SBATCH --mem 1G
#SBATCH --time 01:00:00

module load gcc mvapich2 boost

srun gmrm \
--bed-file /path/to/test.bed \
--dim-file /path/to/test.dim \
--phen-files /path/to/test1.phen,/path/to/test2.phen \
--bim-file /path/to/test.bim \
--ref-bim-file /path/to/test.bim \
--predict \

Then submit it to a slurm scheduler system:

sbatch test_mlma.sh

Step 3 - post processing

We provide an executable to extract the posterior distribution of the marker effects from binary format


./extract_non_zero_betaAll test.bet start stop > test_tmp.betlong

where start is the starting iteration, and stop is the number of iterations.

Then a little file cleaning:


awk '{$1+=0}1' test_tmp.betLong > test.betLong
rm test_tmp.betLong

This produces a file with three columns:

  • iteration number
  • marker number, corresponding to the row of the .bim file
  • effect size estimate of the marker at the iteration

Markers are only included in the file if they were sampled from the non-zero mixture distribution of their group at each iteration. One can then simply sum the effects for each marker across iterations and divide by the total number of iterations to obtain the posterior mean.

Clone this wiki locally