Peptide identity propagation and match-between-runs by few-shot learning

Code and notebooks to reproduce figures and results for the manuscript PIPP: Improving peptide identity propagation using neural networks

PIPP is a deep learning framework for match-between-run in DDA PASEF data quantified by MaxQuant. We have trained a deep neural network model which learns an embedding of MS1 features of peptide identifications quantified in two large-scale DDA-PASEF datasets up to the date, namely PXD019086 and PXD010012 datasets. The model is learnt by a novel modification of Prototypical Networks, which is a few-shot learning classification algorithm. The pre-trained model is used for peptide identity propagation to match identifications between runs, increase protein coverage and improve data completeness.

The pre-trained model, train/test splits and pre-computed embeddings can be downloaded from Zenodo.

Installation

Change working directory to peptideprotonet_reproducibility/, then (with your virtual environment activated) execute:

pip install .

This will install the PIPP library, including all the dependencies needed to use the library and run the notebooks in examples/.

Usage

Using the pre-trained model

import pipp
model = pipp.Peptideprotonet.load('path/to/model.pt')

# MS: pandas dataframe with columns ['Charge','Mass', 'm/z', 'Retention time', 'Retention length', 'Ion mobility index', 'Ion mobility length', 'Number of isotopic peaks']
z = model.get_latent_representations(MS)

# MSMS: pandas dataframe with columns ['PrecursorID', 'Charge', 'Mass', 'm/z', 'Retention time', 'Retention length', 'Ion mobility index', 'Ion mobility length', 'Number of isotopic peaks']
identities, confidence = model.propagate(MS, MSMS)

Train a new model

To train a new model, replace path_data and path_valid_data in pipp/main.py. Make sure you specify a model name when writing (saving) the model. Then from the command line, run:


# define the number of shots for train and test e.g. 0-shot, 1-shot, 5-shot etc
n_shot_train
n_shot_test

# define the number of classes for train and test e.g. 2-way, 3-way, 5-way classification etc
n_test
n_train

# define the number of support instances (query) for train and test.
# These are number of instances that are selected as instances in the "support set"
# support set is used to compute the prototype at each batch/round of train and test
nq_test
nq_train



python main.py --max-epoch 300
               --shot n_shot_train
               --test-way n_test
               --test-shot n_shot_test
               --test-query nq_test
               --train-query nq_train
               --train-way n_train

If you wish to train a new model, a few more package dependencies are required. See below or import statements in pipp/main.py. The code supports training on the GPU.

Additional dependencies required to train a new model

_future_
argparse
pickle
learn2learn

Any problems? Let us know by openning a new issue!

Name		Name	Last commit message	Last commit date
Latest commit History 66 Commits
encodings		encodings
examples		examples
figures		figures
pipp		pipp
.gitignore		.gitignore
README.md		README.md
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Peptide identity propagation and match-between-runs by few-shot learning

Installation

Usage

Using the pre-trained model

Train a new model

Additional dependencies required to train a new model

About

Releases

Packages

Contributors 2

Languages

DavisLaboratory/peptideprotonet_reproducibility

Folders and files

Latest commit

History

Repository files navigation

Peptide identity propagation and match-between-runs by few-shot learning

Installation

Usage

Using the pre-trained model

Train a new model

Additional dependencies required to train a new model

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages