A robust pipeline for constructing mosaic reference standards. The reference standards are consisting of 386,613 mosaic SNVs and INDELs in a wide range of variant allele frequencies, from 0.5% to 56%. Negative controls of non-variant positions (35,113,417) and germline variants (19,936 SNVs and INDELs) are accompanied with abundant positive controls. The reference standards were constructed by mixing genetic materials of six pre-genotyped normal cell lines, mimicking the cumulative aspect of mosaic variant acquisition in the early development.
- Alignment and preprocessings
- Strelka2
- DeepVariant
- CNVkit
- Positive controls
+ Mutually exclusive germline variants
- Negative controls
+ Common wildtype positions (Set A)
+ MRC5 germine variants (Set B)
- Down-sampling of MRC5 (39 times with random seeds)
- Extraction of reads embedding positive controls from Set A
- Replacement of extracted reads to MRC5 down-sampled data
- Sequencing coverage
- Variant coverage
- High-quality alternative alleles