Collection of files and Jupyter lab notebooks written to extract ABC protein domains from proteomes. Performed for FEBS review.
- notebooks - Folder of Jupyter lab notebooks that contain all the sequence extraction, alignment, tree construction and plotting for the FEBS review.
- RefSeq - Folder of proteomes (FASTA files) from UniProt to represent species across the tree of life and that are reference proteomes from the UniProt database.
- Pfam - Folder of seed alignments and HMM profiles of Pfam families that are contained in known ABC proteins.
- dictionary - Folder of CSV files that contain all the metadata from Pfam and UniProt databases for the relevant Pfam families and species proteomes that are saved in the corresponding folders.
Check the first code-cell in the python notebook for all python packages that are imported and used for the analysis.
The python notebooks here depend on the following 3rd party binaries and their licences must be respected.
- HMMER v3.3 (http://hmmer.org/)
- EASEL 0.46 - installed with HMMER commandline tools (https://github.com/EddyRivasLab/easel)
- FastTree 2.1.10 (No SSE3) (http://www.microbesonline.org/fasttree/)