-
Notifications
You must be signed in to change notification settings - Fork 6
Output Formats
There are two sets of output from Picky pipeline.
In Picky's selectRep step, selected representative alignments for reads are recorded in .align file. It is a text file that records parameters needed for .sam file generation and most importantly the selected alignments from all lastal generated alignments used to chain the representative alignments for each read.
The lastal version and sequence database are recorded at the beginning of file to help with the .sam file generation in callSV step.
# @PG_ID lastal
# @PG_PN lastal
# @PG_VN 755
# @PG_DB hg19.lastdb
# @PG_END
For each read that lastal returns the list of possible alignments, Picky selectRep will record some general information along with the chosen alignment candidate(s).
# 23535 fmh_15l4526_20161102_FNFAB42810_MN16457_sequencing_run_161102_Human_genomic_run4_LSK108R9_4_36382_ch269_read7280_strand.fast5 (X) align(110,60) seed(3) nonseed(2)
# score EG2 E = %= X %X D %D I %I qStart qEnd qStrand qALen q% refId refStart refEnd refStrand refALen cigar
### candidate#1/1
S6802 0 0 9959 88.12 277 2.45 657 5.81 409 3.62 2 10647 + 10645 45.23 chr20 40787508 40798401 + 10893 2S18=2D11=...<snipped>...2I2X37=12888S
E3460 0 0 5134 88.12 158 2.71 278 4.77 256 4.39 10629 16177 + 5548 23.57 chr20 40798415 40803985 + 5570 10629S17=1X1=...<snipped>...8=1X9=7358S
e512 5.5e-280 7.1e-284 964 83.10 47 4.05 91 7.84 58 5.00 16225 17294 + 1069 4.54 chr20 40804046 40805148 + 1102 16225S12=1X1D...<snipped>...31=1D11=6241S
E3919 0 0 5716 88.55 189 2.93 317 4.91 233 3.61 17360 23498 + 6138 26.08 chr20 40805236 40811458 + 6222 17360S41=1X23=...<snipped>...11=1D12=37S
The first line is the general information. It starts with the length of the read, and the read id. The third cell indicates the general selection result as follow.
3rd cell | Description |
---|---|
{!!!} | there is no alignment left after filtering |
(1) | single fragment single-locus alignment; no SV possible |
(X) | Multi-fragments with single locus; possible SV |
[ ] | Multi-fragments with multi-loci; possible SV but non-unique alignment location |
The fourth cell align(x,y) reports the total number of alignments (x) reported by lastal and the total number of alignments left (y) after filtering by EG2/E-value and/or %Identity.
The fifth and sixth cells tells us the number of collated seed alignments in seed(..) and the number of collated non-seed alignments in nonseed(..).
The second line is the header for each of the tab-delimitered columns of read alignment records. The columns are self-explanatory except for the encoding used in the first character of the score column. Possible letters are "S" for seed alignment, "E" (note cap) for seed alignment used as extension, and "e" (note small letter) for alignment used as extension.
Each candidate extension alignment 'block' is prefixed with '### candidate#<x>/<y>'. A blank line or the prefix marks the end of the block. The alignment rows in each block is ordered by the read coordinates.
In Picky's callSV step, SV-specific files are generated along with other auxiliary files.
File | RecordType | Description |
---|---|---|
<oprefix>.DEL.xls | Span | tab-delimited file for deletions |
<oprefix>.INS.xls | Span-like | file for insertions |
<oprefix>.INDEL.xls | Span / Span-like | tab-delimited file for possible co-insertion-and-deletion |
<oprefix>.INV.xls | Span & Breakpoint | tab-delimited file for inversions |
<oprefix>.TTLC.xls | Breakpoint | tab-delimited file for translocations |
<oprefix>.TDSR.xls | Span | tab-delimited file for tandem duplications where read span the junction |
<oprefix>.TDC.xls | Span | tab-delimited file for tandem duplications where read completely cover the duplications |
<oprefix>.xls | N.A. | tab-delimited file for all read alignment segments |
Span record type will have the columns "SVChrom", "SVStart", "SVEnd", and "SVSpan".
Breakpoint record type will have the columns "SVChrom1", "SVPos1", "SVStrand1", "SVChrom2", "SVPos2", and "SVStrand2".
File | Description |
---|---|
<oprefix>.profile.sam | read alignment in .sam format. (See Picky-specific tags below.) |
<oprefix>.profile.bed | 6-columns bed file for aligned read fragment with optional 7th column recording the SV(s) harbored in the read; use to aid visualization in IGV with additional filtering |
<oprefix>.profile.exclude | Records all reads (id) which has no alignment candidate. An alignment summary is appended at the end. |
Tag | Type | Description |
---|---|---|
zi | f | %Identity of this aligned fragment |
zq | f | Percentage of the read length this aligned fragment represent |
zl | i | This aligned fragment length |
zs | i | Lastal's score for this aligned fragment |
ze | f | Lastal reported EG2 value for this aligned fragment |
zt | c | This aligned read fragment type. "S" for seed alignment, "E" (note cap) for seed alignment used as extension, and "e" (note small letter) for alignment used as extension. |
zc | i | Number of alignment candidates for this read |
zk | Z | Read alignment category. MC : multiple candidates SCSF : Single candidate single fragment; NOT a split read SCMFSL : Single candidate multiple fragment single locus, i.e. split read aligned to a single genomic location SCMFML : Single candidate multiple fragment multiple location, i.e. split read with some fragments aligned to multiple genomic locations |
zn | Z | This read alignment fragment is f<i>/<total_fragments> |
CO | Z | Comment tag is used to recorded the detected SVs for the read |
Please refer to Variant calling data files section for VCF v4.1, VCF v4.2, and VCF v4.3 specification.
The specific INFO field entries reported by Picky are:
Entry | Description |
---|---|
IMPRECISE/PRECISE | Indicates the confidence of the exact breakpoint positions (bp). |
SVMETHOD= | "picky"; SV detection method |
END= | The position (bp) of the second breakpoint of the reported SV. |
SVTYPE= | The type of the SV. [DEL,INS,DUP, and BND] |
RE= | Number of reads supporting the reported SV. |
RNAMES= | A comma separated list of read names that support the reported SV. |
SVLEN= | Indicates the length of SVs. |
CIPOS= | Confidence interval around POS. |
CIEND= | Confidence interval around END. |
NOTE= | Additional notes on called SV. |
ISVTYPE= | Internal type of structural variant supporting the reported SV separated by comma and suffix with "(<number of read support>)". Internal type = [DEL,INS,INDEL,TDC,TDSR,TTLC,INV] |
BERS= | Breakend replacement string; replicate of ALT for float tip in IGV. |