Skip to content

Latest commit

 

History

History
125 lines (79 loc) · 6.54 KB

output.md

File metadata and controls

125 lines (79 loc) · 6.54 KB

Main output

vcf_stats/{sample}.GT.csv The genotypes of (cell,variant). Rows are cell barcodes and columns are variants. This file can be used as the input to downstream analysis tools such as Seurat/Scanpy. variants are in the format of {gene}-{HGVS_C}-{HGVS_P}

,IDH2-c.419G>A-p.Arg140Gln,DNMT3A-c.2645G>A-p.Arg882His,NPM1-c.860_863dupTCTG-p.Trp288fs,NPM1-c.*29dupA-
AAGCTTGCG_CACGCAATA_TGCCTTGGA,NA,NA,NA,NA
AACACACAG_TTCGAGGAT_GCGAGCTTA,NA,NA,NA,NA
AACACACAG_CGATAAGGC_TGGTTGTAC,0/0,NA,1/1,0/0
CTCAGAACT_CAATGCAAC_CTAGGTTGC,NA,NA,0/0,0/0

Genotypes: From the VCF version 4.1:

GT : genotype, encoded as allele values separated by either of / or |. The allele values are 0 for the reference allele (what is in the REF field), 1 for the first allele listed in ALT, 2 for the second allele list in ALT and so on. For diploid calls examples could be 0/1, 1 | 0, or 1/2, etc.

'NA' means not available(no reads at this position were found).

HGVS_C: Variant in HGVS DNA notation

HGVS_P: Variant in HGVS protein notation

Modules

fastqc

FastQC gives general quality metrics about your sequenced reads. It provides information about the quality score distribution across your reads, per base sequence content (%A/T/G/C), adapter contamination and overrepresented sequences. For further reading and documentation see the FastQC help pages.

Output files

  • *_fastqc.html: FastQC report containing quality metrics.
  • *_fastqc.zip: Zip archive containing the FastQC report, tab-delimited data file and plot images.

filter_gtf

This module has the same functionality as cellranger mkgtf

GTF files can contain entries for non-polyA transcripts that overlap with protein-coding gene models. These entries can cause reads to be flagged as mapped to multiple genes (multi-mapped) because of the overlapping annotations. In the case where reads are flagged as multi-mapped, they are not counted.

We recommend filtering the GTF file so that it contains only gene categories of interest by using the cellranger mkgtf tool. Which genes to filter depends on your research question.

The filtering criteria is controlled by the argument --keep_attributes. The default value of this argument is the same as the reference used by cellranger

Note

gtf files from genecode use gene_type instead of gene_biotype.

--keep_attributes "gene_type=protein_coding,lncRNA..."

Output files

  • *.filtered.gtf GTF file after filtering.
  • gtf_filter.log log file containing number of lines filtered in the original gtf file.

star_genome

Generate STAR genome index. Detailed documents can be found in the STAR Manual.

Tip

Once you have the indices from a workflow run you should save them somewhere central and reuse them in subsequent runs using custom config files or command line parameters.

Output files

  • {genome_name}/ STAR genome index folder.

protocol_cmd

Automatically detect GEXSCOPE protocol from R1 reads and generate STARSolo command-line arguments accordingly.

Output files

  • {sample}.protocol.txt Detected protocol.
  • {sample}.starsolo_cmd.txt STARSolo command-line arguments.

starsolo

Descriptions of parameters and files can be found in STARSolo documents and STAR Manual. When you have questions, STAR’s github issue is also a great place to find answers and help.

Note

The command line arguments in this STARsolo documentation may not be up to date. For the latest STARSolo arguments, please refer to The STAR Manual.

Output files

  • {sample}.Aligned.sortedByCoord.out.bam Bam file contains coordinate-sorted reads aligned to the genome.

Output files

  • {sample}.vcf.gz VCF file before filtering.

multiqc-sgr

MultiQC is a visualization tool that generates a single HTML report summarising all samples in your project. Most of the pipeline QC results are visualised in the report and further statistics are available in the report data directory.

multiqc-sgr adds some modules on this basis to facilitate the visualization of single cell-related data.

Output files

  • multiqc_report.html: a standalone HTML file that can be viewed in your web browser.
  • multiqc_data/: directory containing parsed statistics from the different tools used in the pipeline.
  • multiqc_plots/: directory containing static images from the report in various formats.

pipeline_info

Nextflow provides excellent functionality for generating various reports relevant to the running and execution of the pipeline. This will allow you to troubleshoot errors with the running of the pipeline, and also provide you with other information such as launch commands, run times and resource usage.

Output files

  • Reports generated by Nextflow: execution_report.html, execution_timeline.html, execution_trace.txt and pipeline_dag.dot/pipeline_dag.svg.
  • Reports generated by the pipeline: pipeline_report.html, pipeline_report.txt and software_versions.yml. The pipeline_report* files will only be present if the --email / --email_on_fail parameter's are used when running the pipeline.
  • Reformatted samplesheet files used as input to the pipeline: samplesheet.valid.csv.
  • Parameters used by the pipeline run: params.json.