-
Notifications
You must be signed in to change notification settings - Fork 6
Cluster Support
If it is taking too long to process your .fastq file and you have access to a cluster running PBS, you can use the following procedure to tap your cluster compute power.
Picky preparepbs will assume that the template file is named "template.pbs".
You may create "template.pbs" with
./picky.pl preparepbs --init template.pbs
The default template.pbs content is:
#!/bin/bash
#PBS -l nodes=1:ppn=16
#PBS -l walltime=01:00:00
#PBS -l mem=18GB
cd "$PBS_O_WORKDIR"
export LASTAL=last-755/src/lastal
export LASTALDB=hg19.lastdb
export LASTALDBFASTA=hg19.fa
export PICKY=./picky.pl
export RUN=
time (${LASTAL} -v -C2 -K2 -r1 -q3 -a2 -b1 -v -P16 -Q1 ${LASTALDB} ${RUN}.fastq 2>${RUN}.lastal.log | ${PICKY} selectRep --thread 16 --preload 6 1>${RUN}.align 2>${RUN}.selectRep.log)
time (cat ${RUN}.align | ${PICKY} callSV --oprefix ${RUN}.sv --fastq ${RUN}.fastq --genome ${LASTALDBFASTA} --exclude=chrM --sam)
IMPORTANT: You MUST leave the line "export RUN=" as Picky will initialize the corresponding value for each chunk.
Depending on your cluster environment, you will have to set the various resource settings appropriately. The (above) default settings work for a 1000-reads chunk in our environment with a 2-fold buffer for execution time and an 1.8GB buffer for memory using 16 cores.
In addition, you should configure the first four export lines according to your installation and project.
Once your template.pbs has been configured for your cluster environment, you are ready to chunk the fastq (says "SCP20.fastq") and write the corresponding PBS script with:
./picky.pl preparepbs --fastq SCP20.fastq
In the above example, SCP20.fastq was converted from the public ONT dataset Scrappie chr20 FASTA using kent-util's faToFastq as follow:
# download faToFastq
curl -O http://hgdownload.soe.ucsc.edu/admin/exe/linux.x86_64/faToFastq
# make user executable
chmod u+x faToFastq
# download Scrappie based-called chr20 reads
curl -O http://s3.amazonaws.com/nanopore-human-wgs/na12878.chr20ScrappieFiltered.fasta
# convert fasta to fastq with default base quality 'H'
./faToFastq -qual=H na12878.chr20ScrappieFiltered.fasta SCP20.fastq
The file contains 277,054 reads. Picky preparepbs will generate 278 chunk .fastq files (SCP20-c000001.fastq to SCP20-c000278.fastq) and the corresponding 278 PBS scripts (SCP20-c000001.pbs to SCP20-c000278.pbs).
You can submit the generated .pbs script according to your cluster configuration. If there is no restriction to the number of submitted jobs per user, you can submit all the script at once with:
for i in SCP20-c??????.pbs ; do echo ${i}; qsub ${i}; done
Each chunk will have produced its own set of results as outlined in the output documentation. There are numerious way to combine the individual chunk of results into a single result set.
If one is interested in the deletion, the consolidated result can be generated as follow:
cat SCP20-c??????.sv.profile.DEL.xls > SCP20.all.sv.profile.DEL.xls
./picky.pl xls2vcf \
--xls SCP20.all.sv.profile.DEL.xls \
> SCP20.all.sv.profile.DEL.vcf
This can be repeated for all other SV types.
If one will like to proper run level auxiliary files, it is better to concatenate the chunks' .align file and re-perform Pikcy callSV.
cat SCP20-c??????.align > SCP20.all.align
cat SCP20.all.align \
| ./picky.pl callSV \
--oprefix SCP20.all \
--fastq SCP20.fastq \
--exclude=chrM \
--sam 2>SCP20.all.callSV.log
Most often, the .sam file is the only auxiliary file one needs. As Option 2 does take up additional storage and time, one may just merge the chunks' .sam file by handling the extraneous sam headers appropriately.