Skip to content

Latest commit

 

History

History
55 lines (40 loc) · 2.61 KB

README.md

File metadata and controls

55 lines (40 loc) · 2.61 KB

ESMIFDesign

This repository focuses on designing T-cell receptors (TCRs) using the ESM-IF1 deep learning method.

The ESM-IF1 inverse folding method is built for predicting protein sequences from their backbone atom coordinates. Here, we use the ESM-IF1 model (esm_if1_gvp4_t16_142M_UR50 - fair-esm v2.0.1) to design part of the TCRs sequences, considering their peptide-major histocompatibility complex (pMHC) complex. Positions outside the specified criteria were held constant. The designed positions were changed to tokens, and all amino acid substitutions, including cysteine, were allowed. Given the multi-chain structure, a padding of 10 tokens was used to separate the chains.

Dependencies

To install the dependencies, run:

pip install -r requirements.txt

Usage

The configuration file is located at config.json, where it specifies, for each structure, the residues and chains to be designed. The file is organized as follows:

{
    "6zkw": ["110D","111D","112D","134D","135D","113E","114E","133E"],
    ...,
    "8shi": ["109D","110D","111D","112D","113D","114D","135D","110E","111E","112E","113E","134E","135E"]
}

To design TCR sequences, run:

python run.py

Testing

We tested some conditions to check the performance of the model.

  1. Testing on 6ZKW structure:
    • 6ZKW.pdb represents a TCR-pMHC complex.
    • 6ZKW_DE.pdb includes only the TCR (D and E chains).
    • 6ZKW_contact_to_gly.pdb is a TCR-pMHC complex with mutations on the TCR residues in contact with the pMHC, changing them to glycine.
    • 6ZKW_all_gly.pdb is a TCR-pMHC complex with mutations on all residues, changing them to glycine.
  2. Testing on pMHC interference on protein sequence design (tests/pMHC1.config)
  3. Testing temperature in the interval (tests/temperature.config): [1e-6, 1e-5, 1e-4, 1e-3, 1e-2, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0, 2, 5]
  4. Testing number of samples in the interval (tests/sampling.config): [5, 10, 25, 50, 100, 250, 500]
  5. Testing design approaches:
    • CDR3 interface (tests/CDR3_interface.config): restricting the design to CDR3 (α and β TCR chains) within a proximity of 5 Å to either the peptide or MHC;
    • CDR3 (tests/CDR3.config): designing the entire CDR3 (α and β TCR chains).
    • CDRs interface (tests/CDRs_interface.config): designing CDR1, CDR2, or CDR3 positions (α and β TCR chains) within a 5 Å distance to the peptide or MHC;

To run the tests, navigate to the tests directory and execute:

cd tests
python testing.py