Workflow

An analysis using gmrm has the following steps:

Run the model
Estimate the MLMA effects
Convert binary output to summarise the posterior

Dataset

You will find a simulated test.bed/.bim/.fam fileset with 20,000 markers and 10,000 individuals in gmrm/example.

We simulated two different phenotypes in test1.phen and test2.phen.

The generation of the data is described in gmrm/example/data_sim.R

Running Step 1

We are going to run the model for two traits simultaneously. A full list of options can be found here: List of Options

Assuming Slurm as job scheduler, create a file test_gmrm.sh along these lines:

#!/bin/bash

#SBATCH --ntasks 5
#SBATCH --cpus-per-task 4
#SBATCH --mem 1G
#SBATCH --time 01:00:00

module load gcc mvapich2 boost

srun gmrm \
--bed-file /path/to/test.bed \
--dim-file /path/to/test.dim \
--phen-files /path/to/test1.phen,/path/to/test2.phen \
--group-index-file /path/to/test.gri \
--group-mixture-file /path/to/test.grm \
--shuffle-markers 1 \
--seed 171014 \
--iterations 500 \

Then submit it to a slurm scheduler system:

sbatch test_gmrm.sh

Running Step 2

We are then going to generate mixed-linear model marginal SNP effect size estimates

Assuming Slurm as job scheduler, create a file test_mlma.sh along these lines:

#!/bin/bash

#SBATCH --ntasks 5
#SBATCH --cpus-per-task 4
#SBATCH --mem 1G
#SBATCH --time 01:00:00

module load gcc mvapich2 boost

srun gmrm \
--bed-file /path/to/test.bed \
--dim-file /path/to/test.dim \
--phen-files /path/to/test1.phen,/path/to/test2.phen \
--bim-file /path/to/test.bim \
--ref-bim-file /path/to/test.bim \
--predict \

Then submit it to a slurm scheduler system:

sbatch test_mlma.sh

Step 3 - post processing

We provide an executable to extract the posterior distribution of the marker effects from binary format


./extract_non_zero_betaAll test.bet start stop > test_tmp.betlong

where start is the starting iteration, and stop is the number of iterations.

Then a little file cleaning:


awk '{$1+=0}1' test_tmp.betLong > test.betLong
rm test_tmp.betLong

This produces a file with three columns:

iteration number
marker number, corresponding to the row of the .bim file
effect size estimate of the marker at the iteration

Markers are only included in the file if they were sampled from the non-zero mixture distribution of their group at each iteration. One can then simply sum the effects for each marker across iterations and divide by the total number of iterations to obtain the posterior mean.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Workflow

Workflow

Running Step 1

Running Step 2

Step 3 - post processing

Clone this wiki locally