Skip to content

File Definitions

Timothy Tickle edited this page Aug 1, 2016 · 62 revisions

Several files are associated with different aspects of studies. Each section here describes a file, indicates their use, and provides an example. Depending on the wishes of the study owner, these files can supplied and downloaded as a part of a study.

Primary sequencing files

Fastq.gz

Purpose: Fastq files contain sequence and sequence quality information, availability of these files allows analysis to be repeated and explored at the sequence level. Fastq files must be gzip compressed before uploading. This allows faster uploading and downloading for users of the portal. We encourage you to load primary fastq.gz files of non-human samples. Sequence files derived from human samples should be placed in a biological sequence archive and linked to your study. Loading a non-human fastq.gz file or a link to a human fastq file in an external archive are both supported.

Format: To learn more about fastq files and their format try this Wikipedia entry.

Note: Not required for study visualization.

Analysis description files

Cluster coordinates file

Purpose: This file creates the main ordination of cells in the study.

Format: This is a tab delimited file of three columns: cell name, x coordinate, and y coordinate. Each row of this file is a point in the main study ordination.

Example Cluster Coordinates File

Note: Required for study visualization.

Cluster assignments

Purpose: This file indicates which samples are in which cluster and/or sub-cluster. The current system supports visualizing cells as clusters. In ordinations all members of clusters are painted the same color; when viewing a gene expression's, expression is grouped into box plots per cluster. Optional sub-clusters can be defined; sub-clusters are used in tiered analysis when, after clusters large clusters are determined, clusters are decomposed to groups of finer resolution.

Format: This is a tab delimited file of three columns: cell name, cluster, and sub-cluster. The first column contains a cell id, the major cluster grouping for the cell, and a sub-cluster grouping for the cell (if given). Please use names and not numbers for the different cluster groupings to better describe what the grouping represents.

Example Cluster Assignments File

Note: Required for study visualization.

Expression matrices

Purpose: This file contains the RNA-Seq expression of a study. The values are used throughout the study in many visualizations. Although the form of the expression data is ultimately up to the author of the study, we recommend some variant of log(TPM +1).

Format: This is a tab delimited file; columns as cells, rows as genes. The upper left corner should be "Gene". The first column and first row should be gene ids and cell ids respectively.

Example Expression Matrices File

Note: Required for study visualization.

Gene Lists

Purpose: Panels of genes are often important results in analysis. These can be derived by many methods including differential expression, enrichment analysis, or expert knowledge. Multiple gene lists can be uploaded which will allow others to explore the expression pattern of the set of genes in the list. Parts of the portal will visualize these panels of genes as a group expression within clusters or sub-clusters of cells (eg. using boxplots). Given clusters, the expression of genes in the cluster is under the control of the author and supplied in this file. Different authors may wish to use different methods to define a measurement that works like an average gene expression in each cluster for the gene.

Format: This is a tab delimited file of at least 2 columns. The first column contains the gene names. The next columns are the measurement (like an average expression) of the genes in a cluster or sub-cluster (the details of this summary measurement are up to the author of the study). There can be many columns representing different clusters or sub-clusters.

Example Marker Gene Lists File

Note: Required for study visualization.

Other

Purpose: There is an option to upload an "Other" type file. Please use this to upload files you find important to communicate or document the findings of your study that are not supported specifically. Please make sure the description of the file is clear so other scientists can interpret the file correctly.

Note: Not required for study visualization.

Clone this wiki locally