Code and notebooks to reproduce figures and results for the manuscript PIPP: Improving peptide identity propagation using neural networks
PIPP is a deep learning framework for match-between-run in DDA PASEF data quantified by MaxQuant. We have trained a deep neural network model which learns an embedding of MS1 features of peptide identifications quantified in two large-scale DDA-PASEF datasets up to the date, namely PXD019086 and PXD010012 datasets. The model is learnt by a novel modification of Prototypical Networks, which is a few-shot learning classification algorithm. The pre-trained model is used for peptide identity propagation to match identifications between runs, increase protein coverage and improve data completeness.
The pre-trained model, train/test splits and pre-computed embeddings can be downloaded from Zenodo.
Change working directory to peptideprotonet_reproducibility/
, then (with your virtual environment activated) execute:
pip install .
This will install the PIPP library, including all the dependencies needed to use the library and run the notebooks in examples/
.
import pipp
model = pipp.Peptideprotonet.load('path/to/model.pt')
# MS: pandas dataframe with columns ['Charge','Mass', 'm/z', 'Retention time', 'Retention length', 'Ion mobility index', 'Ion mobility length', 'Number of isotopic peaks']
z = model.get_latent_representations(MS)
# MSMS: pandas dataframe with columns ['PrecursorID', 'Charge', 'Mass', 'm/z', 'Retention time', 'Retention length', 'Ion mobility index', 'Ion mobility length', 'Number of isotopic peaks']
identities, confidence = model.propagate(MS, MSMS)
To train a new model, replace path_data
and path_valid_data
in pipp/main.py
. Make sure you specify a model name when writing (saving) the model. Then from the command line, run:
# define the number of shots for train and test e.g. 0-shot, 1-shot, 5-shot etc
n_shot_train
n_shot_test
# define the number of classes for train and test e.g. 2-way, 3-way, 5-way classification etc
n_test
n_train
# define the number of support instances (query) for train and test.
# These are number of instances that are selected as instances in the "support set"
# support set is used to compute the prototype at each batch/round of train and test
nq_test
nq_train
python main.py --max-epoch 300
--shot n_shot_train
--test-way n_test
--test-shot n_shot_test
--test-query nq_test
--train-query nq_train
--train-way n_train
If you wish to train a new model, a few more package dependencies are required. See below or import
statements in pipp/main.py
. The code supports training on the GPU.
- _future_
- argparse
- pickle
- learn2learn
Any problems? Let us know by openning a new issue!