x86-64 Pipeline used to trim and combine sanger raw data to retrieve a consesus read and other results
- Tsiouri, O. (2024). Sanger analysis pipeline: An x86-64 Pipeline used to trim and combine sanger raw data to retrieve a consesus read and other results v2.0. https://github.com/olgatsiouri1996/sanger_analysis_pipeline
- Kuan-Hao Chao, Kirston Barton, Sarah Palmer, and Robert Lanfear (2021). "sangeranalyseR: simple and interactive processing of Sanger sequencing data in R" in Genome Biology and Evolution. DOI: doi.org/10.1093/gbe/evab028
- Install Docker
- Install the following containers on your machine:
docker pull olgatsiouri/sanger_analysis:latest
docker pull olgatsiouri/python-pandas:latest
The data used were derived from the sangeranalyseR
and can be found on input_data.zip
In order to add the sangeranalyseR
parameters to be used for trimming and alignment of reads a parameters.txt
tab seperated file is used.
- You can open
parameters.txt
on excel to make modifications - Do not change the filename
parameters.txt
- Do not change the parameters: printLevel, inputSource, processMethod, ABIF_Directory, FASTA_File, CSV_NamesConversion and geneticCode
parameters.txt
should be in the same folder as the raw.ab1
files
In order to run the container type the following in ubuntu or intel mac machines:
docker run --rm -v /path/to/ab1/folder:/ab1 -v /path/to/save/report:/report -v /path/to/save/fasta:/fasta olgatsiouri/sanger_analysis
on mac silicon:
docker run --rm --platform=linux/amd64 -v /path/to/ab1/folder:/ab1 -v /path/to/save/report:/report -v /path/to/save/fasta:/fasta olgatsiouri/sanger_analysis
or on windows machines:
docker run --rm -v C:\path\to\ab1\folder:/ab1 -v C:\path\to\save\report:/report -v C:\path\to\save\fasta:/fasta olgatsiouri/sanger_analysis
This command:
- Imports the
.ab1
andparameters.txt
in the container - Selects in which folder to generate a report containing the consesus sequence and other data
- Selects in which folder to save the trimmed reads and their alignment in fasta format
If you want to retrieve the consesus sequence in fasta format do the following:
- Navigate to the folder you have save the report and open the
SangerAlignment
folder - open
SangerAlignment_Report.html
- Go to
Contigs Consensus
and clickMORE DETAILS
- Click on the top left diagonal box to select the whole consesus sequence
- right click on
1
to the right of the diagonial box - click
save as
- This will Download a
jexcel.csv
file that you can use to convert to fasta - download the
consesus_csv_to_fasta.py
from thesrc/
folder - put the script and
jexcel.csv
at the same directory - run docker
on linux/mac os x64-86:
docker run --rm -it -v /path/to/folder:/data python-pandas /data/consesus_csv_to_fasta.py <input_csv> <fasta_width> <output_fasta>
on mac silicon:
docker run --platform=linux/amd64 --rm -it -v /path/to/folder:/data python-pandas /data/consesus_csv_to_fasta.py <input_csv> <fasta_width> <output_fasta>
or on windows:
docker run --rm -it -v C:\path\to\folder:/data python-pandas /data/consesus_csv_to_fasta.py <input_csv> <fasta_width> <output_fasta>
example:
docker run --rm -it -v /home/linuxubuntu2004/Desktop:/data python-pandas /data/consesus_csv_to_fasta.py jexcel.csv 80 drosho_consesus.fasta
The output fasta file will look like:
>consesus_seq
TTATATTTTATTTTTGGAGCTTGAGCTGGAATAGTTGGAACATCTTTAAGAATTTTAATT
CGAGCTGAATTAGGACATCCTGGAGCATTAATTGGAGATGATCAAATTTATAATGTAATT
GTAACTGCACATGCTTTTATTATAATTTTTTTTATAGTTATACCTATTATAATTGGTGGA
TTTGGAAATTGATTAGTGCCTTTAATATTAGGTGCTCCTGATATAGCATTCCCACGAATA
AATAATATAAGATTTTGACTTCTACCTCCTGCTCTTTCTTTACTATTAGTAAGTAGAATA
GTTGAAAATGGAGCTGGGACAGGATGAACATGTTTATCCACCTCTATCCGAGCTGGAATT
GCTCATGGTGGAGCTTCAGTTGATTTAGCTATTTTTTCTCTACATTTAGCAGGAATTTCT
TCAATTTTAGGAGCTGTAAATTTTATTACAACTGTAATTAATATACGATCAACAGGAATT
TCATTAGATCGTATACCTTTATTTGTTTGATCAGTAGTTATTACTGCTTTATTATTATTA
TTATCACTTCCAGTACTAGCAGGAGCTATTACTATATTATTAACAGATCGAAATTTAAAT
ACATCATTTTTTGACCCAGCGGGAGGAGGAGATCCTATTTTATACCAACATTTATT