RMSQv3

RMSQ for MSstats v3

Installing MSStats

Step 1: install dependencies

Packages from CRAN

install.packages(c("gplots", "lme4", "ggplot2", "ggrepel", "reshape2", "data.table", "Rcpp", "survival", "getopt", "yaml", "pheatmap"))

Packages from bioconductor

source("https://bioconductor.org/biocLite.R")
biocLite(c("limma","marray","preprocessCore","MSnbase","biomaRt"))

Step 2: Install MSstats

Recommended option: from bioconductor like this:

source("https://bioconductor.org/biocLite.R")
biocLite("MSstats")

Alternatively, use the version available in this repository MSstats_3.12.3.tar.gz. Install it like this:

install.packages(pkgs="MSstats_3.12.3.tar.gz",repos=NULL,type="source")

It is also available another version used a lot in the past (MSstats_3.3.10.tar.gz)

Step 3: load the library of MSstats

library(MSstats)

Step 4: getting started. Check that it works

?MSstats

Input files

3 tab delimited files are required to run RMSQv3:

`evidence.txt`

The output of the quantitative proteomics software package MaxQuant. It combines all the information about the identified peptides and normally is the only file required for processing the results.

`keys.txt`

It contains the experimental design. This file will be merged with the evidence.txt file (i.e., the output of MaxQuant with the peptides identified) through the "Raw.file" column. Each raw file corresponds to a unique individual technical replicate / biological replicate / Condition / Run.

Example (see test/example-keys.txt)

RawFile	IsotopeLabelType	Condition	BioReplicate	Run
FU20170922-17	L	H1N1_03H	H1N1_03H-1	9
FU20170922-19	L	H1N1_03H	H1N1_03H-2	10
FU20170922-21	L	H1N1_06H	H1N1_06H-1	11
FU20170922-23	L	H1N1_06H	H1N1_06H-2	12
FU20170922-35	L	H1N1_12H	H1N1_12H-1	13
FU20170922-37	L	H1N1_12H	H1N1_12H-2	14
FU20170922-39	L	H1N1_18H	H1N1_18H-1	15
FU20170922-41	L	H1N1_18H	H1N1_18H-2	16
FU20170922-01	L	MOCK_03H	MOCK_03H-1	1
FU20170922-03	L	MOCK_03H	MOCK_03H-2	2
FU20170922-05	L	MOCK_06H	MOCK_06H-1	3
FU20170922-07	L	MOCK_06H	MOCK_06H-2	4
FU20170922-09	L	MOCK_12H	MOCK_12H-1	5
FU20170922-11	L	MOCK_12H	MOCK_12H-2	6
FU20170922-13	L	MOCK_18H	MOCK_18H-1	7
FU20170922-15	L	MOCK_18H	MOCK_18H-2	8

`contrast.txt`

The comparisons between conditions that we want to quantified. The written comparisons must follow the following consensus:

Condition_A-Condition_B_mutant

The two conditions to be compared are separated by a dash symbol (-)
The condition on the left will take the positive log2FC sign (if it is more abundant) and the one on the right the negative log2FC (if it is more abundant)
The only special character allowed for the condition names is the underscore (_)

Example (see test/example-contrast.txt)

H1N1_03H-MOCK_03H
H1N1_06H-MOCK_06H
H1N1_12H-MOCK_12H
H1N1_18H-MOCK_18H

The configuration file (`.yaml`)

The configuration file in yaml format contains the details of the analyses performed by RMSQv3. Check the folder test for a sample configuration file depending on the experiment. It currently covers the quantification of protein abundance, phosphorylation (ph), ubiquitination (ub), and acetylation (ac).

The configuration file contains the following sections:

files :
  keys : /path/to/project/data/keys.txt
  data : /path/to/project/data/data/evidence.txt
  contrasts : /path/to/project/data/contrast.txt
  output : /path/to/project/results/20160621-results.txt
  sample_plots : 1

The file path/name of the required files.

The option sample_plots (1/0) creates quality control plots, including heatmaps of the peptide features based on intensity values, and peptide counts.

data:
  enabled : 1

1 to pre-process the data provided in the files section.
0 won't process the data (and a pre-generated MSstats file will be expected)

fractions: 
  enabled : 0 # 1 for protein fractions, 0 otherwise
  aggregate_fun : sum

Multiple fractionation or separation methods are often combined in proteomics to improve signal-to-noise and proteome coverage and to reduce interference between peptides in quantitative proteomics. Use 1 to enable processing protein fractionation datasets. See the Special case: Protein fractionation section below for details.

silac: 
  enabled : 0 # 1 for SILAC experiments, 0 otherwise

Mark 1 if the files belong to a SILAC experiment. See Special case: SILAC below for details

filters: 
  enabled : 1 # Enables filtering
  contaminants : 1 # Removes contaminants (CON__ and REV__)
  protein_groups : remove # remove or keep protein groups
  modifications :  # empty for all, PH, UB, or AC

Filtering the datasets:

contaminants : 1 to remove contaminants (CON__ and REV__)
protein_groups : remove or keep protein groups
modifications : empty for all, ph to select phospho-peptides, ub ubiquitinated peptides, or ac to select acetylated peptides.

msstats :
  enabled : 1
  msstats_input : 
  version :  # blank = R library version, MSstats.daily = a location where to find it
  profilePlots : before-after # before, after, before-after, none
  normalization_method : equalizeMedians # globalStandards (include a reference protein(s) ), equalizeMedians, quantile, 0
  normalization_reference :  #should be a value in the Protein column
  summaryMethod : TMP # "TMP"(default) means Tukey's median polish, which is robust estimation method. "linear" uses linear mixed model. "logOfSum" conducts log2 (sum of intensities) per run.
  censoredInt : NA  # Missing values are censored or at random. 'NA' (default) assumes that all 'NA's in 'Intensity' column are censored. '0' uses zero intensities as censored intensity. In this case, NA intensities are missing at random. The output from Skyline should use '0'. Null assumes that all NA intensites are randomly missing.
  cutoffCensored : minFeature  # Cutoff value for censoring. only with censoredInt='NA' or '0'. Default is 'minFeature', which uses minimum value for each feature.'minFeatureNRun' uses the smallest between minimum value of corresponding feature and minimum value of corresponding run. 'minRun' uses minumum value for each run.
  MBimpute : 1 # only for summaryMethod="TMP" and censoredInt='NA' or '0'. TRUE (default) imputes 'NA' or '0' (depending on censoredInt option) by Accelated failure model. FALSE uses the values assigned by cutoffCensored.
  feature_subset: all # all|highQuality  : highQuality seems to be buggy right now

If enabled (1), it will run MSstats with all the specified options. To find out more about the meaning of all the options, please, check the MSstats documentation

output_extras :
  enabled : 1
  msstats_output : 
  annotate : 0 # 1|0 whether to annotate the proteins in the results or not
  species : HUMAN # if annotate = 1, provide specie name. It can use multiple species, but separate with a "-" eg. HUMAN-MOUSE-HIV-...
  annotation_dir : /path/to/the/files # Required if "annotate = 1"
  comparisons : all # or any grep expression that returns a subset of the contrasts file
  LFC : -1 1
  FDR : 0.05
  heatmap : 1 
  heatmap_cluster_cols : 0
  heatmap_display : log2FC #or pvalue
  volcano : 1

Extra actions to perform based on the MSstats results.

Special case: Protein fractionation

To handle protein fractionation experiments, two options need to be activated

The keys' file must contain and additional column named "FractionKey" with the information of fractions. For example:

Raw.file	IsotopeLabelType	Condition	BioReplicate	Run	FractionKey
S9524_Fx1	L	AB	AB-1	1	1
S9524_Fx2	L	AB	AB-1	1	2
S9524_Fx3	L	AB	AB-1	1	3
S9524_Fx4	L	AB	AB-1	1	4
S9524_Fx5	L	AB	AB-1	1	5
S9524_Fx6	L	AB	AB-1	1	6
S9524_Fx7	L	AB	AB-1	1	7
S9524_Fx8	L	AB	AB-1	1	8
S9524_Fx9	L	AB	AB-1	1	9
S9524_Fx10	L	AB	AB-1	1	10
S9525_Fx1	L	AB	AB-2	2	1
S9525_Fx2	L	AB	AB-2	2	2
S9525_Fx3	L	AB	AB-2	2	3
S9525_Fx4	L	AB	AB-2	2	4
S9525_Fx5	L	AB	AB-2	2	5
S9525_Fx6	L	AB	AB-2	2	6
S9525_Fx7	L	AB	AB-2	2	7
S9525_Fx8	L	AB	AB-2	2	8
S9525_Fx9	L	AB	AB-2	2	9
S9525_Fx10	L	AB	AB-2	2	10
S9526_Fx1	L	AB	AB-3	3	1
S9526_Fx2	L	AB	AB-3	3	2
S9526_Fx3	L	AB	AB-3	3	3
S9526_Fx4	L	AB	AB-3	3	4
S9526_Fx5	L	AB	AB-3	3	5
S9526_Fx6	L	AB	AB-3	3	6
S9526_Fx7	L	AB	AB-3	3	7
S9526_Fx8	L	AB	AB-3	3	8
S9526_Fx9	L	AB	AB-3	3	9
S9526_Fx10	L	AB	AB-3	3	10

Internally, the function getMSstatsFormat handles the key step (just a simple sum aggregation)

Enable fractions in the configuration file as follow:

fractions: 
  enabled : 1 # 1 for protein fractions, 0 otherwise
  aggregate_fun : sum

Special case: SILAC

One of the most widely used techniques that enable relative protein quantitation is stable isotope labeling by amino acids in cell culture (SILAC). The keys file will capture the typical SILAC experiment. For example, let's show a SILAC experiment with two conditions, two biological replicates and two technical replicates:

RawFile	IsotopeLabelType	Condition	BioReplicate	Run
QE20140321-01	H	iso	iso-1	1
QE20140321-02	H	iso	iso-1	2
QE20140321-04	L	iso	iso-2	3
QE20140321-05	L	iso	iso-2	4
QE20140321-01	L	iso_M	iso_M-1	1
QE20140321-02	L	iso_M	iso_M-1	2
QE20140321-04	H	iso_M	iso_M-2	3
QE20140321-05	H	iso_M	iso_M-2	4

It is also required to activate the silac option in the yaml file to be activated as follow:

silac: 
  enabled : 1 # 1 for SILAC experiments

Running MSstats v3

`MSstats_main.R`

MSstats_main.R -c configuration_file.yaml

`MaxQ_utilities.R`

These functions are designed to work with MaxQuant evidence files. The functions include a wide variety of options listed below.

The Arguments section lists the arguments needed for each function. The arcuments (proceeded by the short flag alias -x) should be entered in the same line of the terminal, separated by spaces.

Typical workflows

The RMSQ pipeline was designed to run in a certain order. The following is the propper order to perform the analysis with RMSQ for SILAC, PTM, and AMPS datasets.

PTM analysis

MaxQ_utilities -> convert-silac (for SILAC data only)
MaxQ_utilities -> convert-sites
MSstats
MaxQ_utilities -> results-wide
MaxQ_utilities -> mapback-sites
MaxQ_utilities -> annotate

APMS Global Analysis

MSstats
MaxQ_utilities -> results-wide
MaxQ_utilities -> annotate

Functions

concat

convert-silac

Converts SILAC data to a format compatible with MSstats.

Arguments:
-c convert-silac
-f evidence_file_path
-o output_file_path

getRawFiles

Get the list of unique Raw.files from an evidence file.

Arguments:
-c getRawFiles
-f evidence_file_path
-o output_file_path

convert-sites

This function is a preprocessing function that is used with PTM data. If you want to run a site specific analysis, before running MSStats on a PH/UB/AC set, you need to convert the evidence file into a format that MSStats will be able to diffentiate the sites with. This function outputs a new file that should be used as the input file for your MSStats group comparison analysis.

Arguments:
-c convert-sites
-f evidence_file_path
-o output_file_path
-p reference_proteom_file_path
-t mod_type (ph|ub|ac)

annotate

Annotates the proteins in the results or results-wide files after they've been through the MSStats pipeline. Multiple species can be searched at once, simply separate them by a "-". (eg. human-mouse)

Arguments:
-c annotate
-f results_file_path
-o output_file_path
-s species (human|mouse)
-d uniprot_dir (directory containing uniprot file with protein masses)

results-wide

Converts the normal MSStats output file into "wide" format where each row represents a protein's results, and each column represents the comparison made by MSStats. The fold change and p-value of each comparison will be it's own column.

Arguments:
-c results-wide
-f results_file_path
-o output_file_path

mapback-sites

Used with PTM datasets. Map back the sites to correct proteins after MSStats analysis. This file is created previously when running the conver-sites function.

Arguments:
-c mapback-sites
-f results_file_path
-o output_file_path
-m mapping_file_path

heatmap

Outputs a heatmap of the MSStats results created using the log2 fold changes.

Arguments:
-c heatmap
-f results_file_path
-o output_file_path
-l log2FC lower bound
-u log2FC upper bound
-q FDR significance cutoff

replicateplots

Outputs a replicate plots based on a user provied file containing the replicates to be compared. Values are based on the log2 value of the maximum intensities per modified sequence. The "replicate file" should describe which replicates of which conditions should be compared against each other. Each row represents a replicate plot to be created. The file should be structured using the following format and column names:

condition1	rep1_1	rep1_2	condition2	rep2_1	rep2_2
Infected	Infected_Rep1_name	Infected_Rep2_name	Negative	Negative_Rep1_name	Negative_Rep2_name
etc...
etc...

Example:

condition1	rep1_1	rep1_2	condition2	rep2_1	rep2_2
H1N1_03H	H1N1_03H-1	H1N1_03H-2	MOCK_03H	MOCK_03H-1	MOCK_03H-2
H1N1_06H	H1N1_06H-1	H1N1_06H-2	MOCK_06H	MOCK_06H-1	MOCK_06H-2
H1N1_12H	H1N1_12H-1	H1N1_12H-2	MOCK_12H	MOCK_12H-1	MOCK_12H-2
MOCK_03H	MOCK_03H-1	MOCK_03H-2	MOCK_06H	MOCK_06H-1	MOCK_06H-2
MOCK_03H	MOCK_03H-1	MOCK_03H-2	MOCK_12H	MOCK_12H-1	MOCK_12H-2

The arguments for this optionare as follows:

Arguments:
-c replicateplots
-f evidence_file_path
-k keys_file_path
-r replicate_plot_info_file_path
-o output_file_path

saint-format

Converts the MaxQuant evidence file to the 3 required files for SAINTexpress. One can choose to either use the spectral counts or the intensities for the analysis.

Arguments:
-c saint-format
-f evidence_file_path
-k keys_file_path
-o output_file_path
-p reference_proteome
-i identifier_column (sepctral_count|ms1)

data-plots

Protein abundance dot plots for each condition. It takes as input the -normalized.txt output from MSstats.

Arguments:
-c data-plots
-f normalized.txt file path
-o output file name (add the `.pdf` extension)

spectral-counts

Outputs the spectral counts from the MaxQuant evidence file.

Arguments:
-c spectral-counts
-f evidence_file_path
-k keys_file_path
-o output_file_path

mist

Converts MaxQuant evidence file into a file format compatible with the MiST pipeline using MS/MS.Count. Note that this is the MiST data file, and that an additional keys file will have to be constructed before running MiST. Multiple species can be searched at once, simply separate them by a "-". (eg. HUMAN-MOUSE)

Arguments:
-c mist
-f evidence_file_path
-k keys_file_path
-o output_file_path
-s species (e.g.: `-s HUMAN-DENGUE`)
-d uniprot_dir (directory containing uniprot file with protein length)

mistint

Very similar to the mist function, but instead of using MS/MS.Count it uses Intensity values. Converts MaxQuant evidence file into a file format compatible with the MiST pipeline. Note that this is the MiST data file, and that an additional keys file will have to be constructed before running MiST. Multiple species can be searched at once, simply separate them by a "-". (eg. HUMAN-MOUSE)

Arguments:
-c mistint
-f evidence_file_path
-k keys_file_path
-o output_file_path
-s species (e.g.: `-s HUMAN-DENGUE`)
-d uniprot_dir (directory containing uniprot file with protein length. E.g: ~/Box Sync/db/mist/)

samplequant

Aggregates the normalized abundance and replicate data from the samples. Uses the MSstat output file ...mss-sampleQuant.txt for the aggregations, and is applied directly to the MSstats results in wide format. The resulting file will have "abundance" appended to the end of the file name.

Arguments:
-f sampleQuant_file_path
-n contrast_file_path
-r results_wide_file_path

Name		Name	Last commit message	Last commit date
Latest commit History 117 Commits
R		R
test		test
.gitignore		.gitignore
MSstats_3.12.3.tar.gz		MSstats_3.12.3.tar.gz
MSstats_3.3.10.tar.gz		MSstats_3.3.10.tar.gz
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RMSQv3

Installing MSStats

Input files

`evidence.txt`

`keys.txt`

`contrast.txt`

The configuration file (`.yaml`)

Special case: Protein fractionation

Special case: SILAC

Running MSstats v3

`MSstats_main.R`

`MaxQ_utilities.R`

Typical workflows

PTM analysis

APMS Global Analysis

Functions

concat

convert-silac

getRawFiles

convert-sites

annotate

results-wide

mapback-sites

heatmap

replicateplots

saint-format

data-plots

spectral-counts

mist

mistint

samplequant

About

Releases

Packages

Languages

pattilab/RMSQv3

Folders and files

Latest commit

History

Repository files navigation

RMSQv3

Installing MSStats

Input files

evidence.txt

keys.txt

contrast.txt

The configuration file (.yaml)

Special case: Protein fractionation

Special case: SILAC

Running MSstats v3

MSstats_main.R

MaxQ_utilities.R

Typical workflows

PTM analysis

APMS Global Analysis

Functions

concat

convert-silac

getRawFiles

convert-sites

annotate

results-wide

mapback-sites

heatmap

replicateplots

saint-format

data-plots

spectral-counts

mist

mistint

samplequant

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

`evidence.txt`

`keys.txt`

`contrast.txt`

The configuration file (`.yaml`)

`MSstats_main.R`

`MaxQ_utilities.R`

Packages