Skip to content
eddy-elisee edited this page Nov 22, 2024 · 3 revisions

ASMC aims at highlighting the amino acid diversity that composes the active site of a given homologous protein family. To achieve this, ASMC requires at least one reference protein structure – as we shall see later, more relevant groups are obtained by increasing the number of reference protein structures. Reference structures should be carefully selected under the biological context (open/close state, monomer, ligands, enzymatic results...) and priority should be given to high-resolution holo structures, namely protein-ligand complexes, since active sites are often better characterized.

The ASMC pipeline is designed as a user-friendly automated framework to achieve the modeling and clustering of homologous protein active sites. Otherwise, ASMC can be executed in several ways, depending on the user’s objective:

  • ASMC default
  • ASMC with user-refined pocket(s) - RECOMMENDED
  • Pocket Search
  • Homology Modeling
  • Structural Alignment
  • Clustering
    • MSA Clustering
    • Re-Clustering

Output Summary

ASMC default ASMC with user-refined pocket(s) Pocket Search Homology Modeling Structural Alignment MSA Clustering Re-Clustering
pocket.csv * *
prank_outputs * *
models/ *
models.txt *
identity_targets_ref.tsv *
pairwise/
superposition/
active_site_alignment.fasta
groups_x_min_y.tsv
groups_logo.png
fasta file for each group

(*): this output will be created depending on whether the user provides specific inputs. See sections below for more details.

ASMC default

Run ASMC in a blind way (unknown active site) including Pocket Search, Homology Modeling, Structural Alignment and Clustering.

User must provide a reference file relating to protein reference(s) and a set of homologous protein fasta sequences.

asmc run --log run_asmc.log --threads 6 -r reference_file -s sequences.fasta

The algorithm will set the best-ranked P2RANK pocket as the pocket reference to be used for both the structural alignment and clustering steps.

However, we recommend users to carefully define their active site by following the Pocket Search step before running ASMC with the protocole presented hereafter.

ASMC with user-refined pocket(s) - RECOMMENDED

Skip Pocket Search and run Homology Modeling, Structural Alignment and Clustering. It is advisable to manually define the active site positions, based on the literature and/or your own expertise.

User must provide a reference file relating to protein reference(s), a pocket csv file and a set of homologous protein fasta sequences.

asmc run --log run_asmc.log --threads 6 -r reference_file -p pocket.csv -s sequences.fasta

ASMC with specific objectives

The ASMC workflow allows each step to be performed independently, depending on the user's objective:

  • identify protein pockets (Pocket Search)
  • generate 3D models (Homology Modeling)
  • align 3D models on reference structure(s) (Structural Alignment)
  • cluster structure- or MSA-based active sites (Clustering).

Pocket Search

Stop ASMC after the Pocket Search - option --end pocket.

User must provide a reference file relating to protein reference(s) and a set of homologous protein fasta sequences.

asmc run --log run_asmc.log --threads 6 -r reference_file -s sequences.fasta --end pocket

Homology Modeling

Stop ASMC after the Homology Modeling - option --end modeling.

User must provide a reference file relating to protein reference(s), a pocket csv file and a set of homologous protein fasta sequences.

asmc run --log run_asmc.log --threads 6 -r reference_file -p pocket.csv -s sequences.fasta --end modeling 

Structural Alignment

Stop ASMC after the Structural Alignment - option --end alignment.

User must provide a reference file relating to protein reference(s), a pocket csv file and a model file relating to 3D models obtained with MODELLER, AlphaFold or other method (PDB format).

asmc run --log run_asmc.log --threads 6 -r reference_file -p pocket.csv -m models.txt --end alignment

Clustering

Run the Clustering step starting from either a list of models, a Multiple Sequence Alignment (MSA) or an ASMC output group.

List of Models

User want to run ASMC clustering starting from existing 3D models and structural references with known pocket.

User must provide the pocket and the model files with the options -p and -m, respectively.

asmc run --log run_asmc.log --threads 6 -r reference_file -p pocket.csv -m models.txt

MSA Clustering

First, user must calculate the identity percentage between targets and reference(s). This step is optional if there is only one reference and mandatory if there are several.

User must provide a MSA in fasta format (not performed by ASMC) and a specific file called using the --msa option.

asmc run --log run_asmc.log --threads 6 --msa msa.txt

Re-Clustering

Warning: user must provide a directory name with the -o option when using these command lines in order to avoid erasing the previous files. If the directory doesn't exist, the run option will create it on the fly.

- with different DBSCAN parameters

User must provide the active site alignment file generated by the ASMC pipeline, using the -a option, and values for --eps and --min-samples options (e.g, 0.1 and 15, respectively).

asmc run -o output_directory --log run_asmc.log --threads 6 -a active_sites_alignment.fasta --eps 0.1 --min-samples 15

- with an existing ASMC group

User must provide the active site alignment file generated by the ASMC pipeline for the queried ASMC group, using the -a option (e.g, for the G2 group).

asmc run -o group_split --log run_asmc.log --threads 6 -a G2.fasta

How to deal with ASMC outputs

When the ASMC process is complete, you should get two important output files: groups_logo.png, displaying the sequence logos for each ASMC group in a column, and groups_x_min_y.tsv from which you can obtain some interesting information with the following subcommands.

subcommand extract

Extracts lines (i.e. active site sequences) that contain a specific amino acid or residue type at a queried position (cf. subcommand extract).

subcommand unique

Remove duplicated active site sequences (cf. subcommand unique).

subcommand compare

Compare with another file groups_x_min_y.tsv, e.g, between structure- and MSA-based clustering (cf. subcommand compare).

subcommand to_xlsx

Format as XLSX to facilitate manual investigation in a spreadsheet program (cf. subcommand to_xlsx).

subcommand pymol

Visualize with Pymol a specific target superimposed on its reference structure, by following the steps described here.