-
Notifications
You must be signed in to change notification settings - Fork 1
Workflow
An analysis using gmrm
has the following steps:
- Run the model
- Estimate the MLMA effects
- Convert binary output to summarise the posterior
Dataset
You will find a simulated test.bed/.bim/.fam
fileset with 20,000 markers and
10,000 individuals in gmrm/example
.
We simulated two different phenotypes in test1.phen
and test2.phen
.
The generation of the data is described in gmrm/example/data_sim.R
We are going to run the model for two traits simultaneously. A full list of options can be found here: List of Options
Assuming Slurm as job scheduler, create a file test_gmrm.sh
along these lines:
#!/bin/bash
#SBATCH --ntasks 5
#SBATCH --cpus-per-task 4
#SBATCH --mem 1G
#SBATCH --time 01:00:00
module load gcc mvapich2 boost
srun gmrm \
--bed-file /path/to/test.bed \
--dim-file /path/to/test.dim \
--phen-files /path/to/test1.phen,/path/to/test2.phen \
--group-index-file /path/to/test.gri \
--group-mixture-file /path/to/test.grm \
--shuffle-markers 1 \
--seed 171014 \
--iterations 500 \
Then submit it to a slurm scheduler system:
sbatch test_gmrm.sh
We are then going to generate mixed-linear model marginal SNP effect size estimates
Assuming Slurm as job scheduler, create a file test_mlma.sh
along these lines:
#!/bin/bash
#SBATCH --ntasks 5
#SBATCH --cpus-per-task 4
#SBATCH --mem 1G
#SBATCH --time 01:00:00
module load gcc mvapich2 boost
srun gmrm \
--bed-file /path/to/test.bed \
--dim-file /path/to/test.dim \
--phen-files /path/to/test1.phen,/path/to/test2.phen \
--bim-file /path/to/test.bim \
--ref-bim-file /path/to/test.bim \
--predict \
Then submit it to a slurm scheduler system:
sbatch test_mlma.sh
We provide an executable to extract the posterior distribution of the marker effects from binary format
./extract_non_zero_betaAll test.bet start stop > test_tmp.betlong
where start is the starting iteration, and stop is the number of iterations.
Then a little file cleaning:
awk '{$1+=0}1' test_tmp.betLong > test.betLong
rm test_tmp.betLong
This produces a file with three columns:
- iteration number
- marker number, corresponding to the row of the .bim file
- effect size estimate of the marker at the iteration
Markers are only included in the file if they were sampled from the non-zero mixture distribution of their group at each iteration. One can then simply sum the effects for each marker across iterations and divide by the total number of iterations to obtain the posterior mean.