-
Notifications
You must be signed in to change notification settings - Fork 0
Usages
ASMC aims at highlighting the amino acid diversity that composes the active site of a given homologous protein family. To achieve this, ASMC requires at least one reference protein structure – as we shall see later, more relevant groups are obtained by increasing the number of reference protein structures. Reference structures should be carefully selected under the biological context (open/close state, monomer, ligands, enzymatic results...) and priority should be given to high-resolution holo structures, namely protein-ligand complexes, since active sites are often better characterized.
The ASMC pipeline is designed as a user-friendly automated framework to achieve the modeling and clustering of homologous protein active sites. Otherwise, ASMC can be executed in several ways, depending on the user’s objective:
- ASMC default
- ASMC with user-refined pocket(s) - RECOMMENDED
- Pocket Search
- Homology Modeling
- Structural Alignment
- Clustering
- MSA Clustering
- Re-Clustering
ASMC default | ASMC with user-refined pocket(s) | Pocket Search | Homology Modeling | Structural Alignment | MSA Clustering | Re-Clustering | |
---|---|---|---|---|---|---|---|
pocket.csv | ✅ | ✅ | * | * | |||
prank_outputs | ✅ | ✅ | * | * | |||
models/ | ✅ | ✅ | ✅ | * | |||
models.txt | ✅ | ✅ | ✅ | * | |||
identity_targets_ref.tsv | ✅ | ✅ | ✅ | * | |||
pairwise/ | ✅ | ✅ | ✅ | ||||
superposition/ | ✅ | ✅ | ✅ | ||||
active_site_alignment.fasta | ✅ | ✅ | ✅ | ||||
groups_x_min_y.tsv | ✅ | ✅ | ✅ | ✅ | |||
groups_logo.png | ✅ | ✅ | ✅ | ✅ | |||
fasta file for each group | ✅ | ✅ | ✅ | ✅ |
(*): this output will be created depending on whether the user provides specific inputs. See sections below for more details.
Run ASMC in a blind way (unknown active site) including Pocket Search, Homology Modeling, Structural Alignment and Clustering.
User must provide a reference file relating to protein reference(s) and a set of homologous protein fasta sequences.
asmc run --log run_asmc.log --threads 6 -r reference_file -s sequences.fasta
The algorithm will set the best-ranked P2RANK pocket as the pocket reference to be used for both the structural alignment and clustering steps.
However, we recommend users to carefully define their active site by following the Pocket Search step before running ASMC with the protocole presented hereafter.
Skip Pocket Search and run Homology Modeling, Structural Alignment and Clustering. It is advisable to manually define the active site positions, based on the literature and/or your own expertise.
User must provide a reference file relating to protein reference(s), a pocket csv file and a set of homologous protein fasta sequences.
asmc run --log run_asmc.log --threads 6 -r reference_file -p pocket.csv -s sequences.fasta
The ASMC workflow allows each step to be performed independently, depending on the user's objective:
- identify protein pockets (Pocket Search)
- generate 3D models (Homology Modeling)
- align 3D models on reference structure(s) (Structural Alignment)
- cluster structure- or MSA-based active sites (Clustering).
Stop ASMC after the Pocket Search - option --end pocket
.
User must provide a reference file relating to protein reference(s) and a set of homologous protein fasta sequences.
asmc run --log run_asmc.log --threads 6 -r reference_file -s sequences.fasta --end pocket
Stop ASMC after the Homology Modeling - option --end modeling
.
User must provide a reference file relating to protein reference(s), a pocket csv file and a set of homologous protein fasta sequences.
asmc run --log run_asmc.log --threads 6 -r reference_file -p pocket.csv -s sequences.fasta --end modeling
Stop ASMC after the Structural Alignment - option --end alignment
.
User must provide a reference file relating to protein reference(s), a pocket csv file and a model file relating to 3D models obtained with MODELLER, AlphaFold or other method (PDB format).
asmc run --log run_asmc.log --threads 6 -r reference_file -p pocket.csv -m models.txt --end alignment
Run the Clustering step starting from either a list of models, a Multiple Sequence Alignment (MSA) or an ASMC output group.
User want to run ASMC clustering starting from existing 3D models and structural references with known pocket.
User must provide the pocket and the model files with the options -p
and -m
, respectively.
asmc run --log run_asmc.log --threads 6 -r reference_file -p pocket.csv -m models.txt
First, user must calculate the identity percentage between targets and reference(s). This step is optional if there is only one reference and mandatory if there are several.
User must provide a MSA in fasta format (not performed by ASMC) and a specific file called using the --msa
option.
asmc run --log run_asmc.log --threads 6 --msa msa.txt
Warning: user must provide a directory name with the -o
option when using these command lines in order to avoid erasing the previous files. If the directory doesn't exist, the run
option will create it on the fly.
- with different DBSCAN parameters
User must provide the active site alignment file generated by the ASMC pipeline, using the -a
option, and values for --eps
and --min-samples
options (e.g, 0.1 and 15, respectively).
asmc run -o output_directory --log run_asmc.log --threads 6 -a active_sites_alignment.fasta --eps 0.1 --min-samples 15
- with an existing ASMC group
User must provide the active site alignment file generated by the ASMC pipeline for the queried ASMC group, using the -a
option (e.g, for the G2 group).
asmc run -o group_split --log run_asmc.log --threads 6 -a G2.fasta
When the ASMC process is complete, you should get two important output files: groups_logo.png
, displaying the sequence logos for each ASMC group in a column, and groups_x_min_y.tsv
from which you can obtain some interesting information with the following subcommands.
Extracts lines (i.e. active site sequences) that contain a specific amino acid or residue type at a queried position (cf. subcommand extract
).
Remove duplicated active site sequences (cf. subcommand unique
).
Compare with another file groups_x_min_y.tsv
, e.g, between structure- and MSA-based clustering (cf. subcommand compare
).
Format as XLSX to facilitate manual investigation in a spreadsheet program (cf. subcommand to_xlsx
).
Visualize with Pymol a specific target superimposed on its reference structure, by following the steps described here.