-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Question about minimap2 parameters in long_read_typing.py #28
Comments
Hi, Thank you for the patient effort. For long-read data only, |
Thanks for your insights and the great the tool! I will modify the script for my hifi data. It's low coverage (~10X) and I appreciate it's not optimal for hla typing. I'm now running both HLA*LA and SpecHLA to have more confidence in the genotypes that match. |
Hi, Thanks again! |
Hi,
Hope it can help. Best, |
Hi, thanks for the insights! I have been observing the alignment files outputted by specHLA (DPB1.bam), and the read quality and mapping seem fine to me. I have long read data (pacbio hifi), and therefore only ran the Best, Thanks for the help! |
Hi, SpecHLA indeed bins the long reads as well. Please look at the file Best, |
Right, first of all, it's great to have all the outputs for troubleshooting! Seems that there's a lot going on under the hood. So for most of the samples, the file For your information, I did change the mapping parameters to suit the hifi -reads because the original got stuck in the mapping step for a long time: under class Pacbio_Binning: Thank you for your helpful comments and your patience! |
Wow, great, thank you for your trouble! Interesting, that there is an identical sequence in BAC and DPB1. I will test blast, and am currently downloading the databases as my data is sensitive and I can't use the web tool. Would this BAC presence be due to contamination? And where does it come from, I understand it is used as cloning vector, but I am dealing with whole genomes measured from blood. Sorry, seems that I have questions again, but they are not urgent. I'll let you know when I have blasted a few DPB1 -reads. |
I also guess the BAC sequences are contamination introduced in sequencing; they might be used for sequencing accuracy assessment or human DNA amplification. However, further surveys are needed to confirm this. Waiting for your alignment results. |
Hi, after blasting a number of the dubious reads, they do hit some kind of BACs, though with slightly different id than in your screenshot. I tried running HiFiAdapterfilt but seems that the sequences aren't included in the list of PacBio adapters, so that was not helpful. Not sure which BAC library to align the reads to, and how to go about it. How big of an effect would you think these reads have in the hla-haplotyping? And if it's big, would it affect all the genes or only DPB1? |
Thanks for the reply. We have not assessed the influence of BAC yet. I think these BAC-derived reads only harm high-resolution typing results (e.g., 8-digit). I will construct a pipeline to remove such reads, and see whether it improves the performance of SpecHLA. If it does, I will add it to long_read_typing.py. |
By the way, what's the location of this HLA DPB1 region, i.e., ~500bp region in all the samples with spuriously high depth (1000-2000)? In my data, this region is DPB1:2248-2832bp based on the SpecHLA reference. |
Ok, this is a relief, that the results should hold :) And like I said, the results are pretty identical to those from hla*la. The region in my data is 2265-2832, slightly different start. Many thanks for your help and insights! |
It is great to hear the consistency between the results of HLA*LA and SpecHLA, indicating they are reliable. I am collecting the BAC sequences from NCBI. |
Hi! I'm running long_read_typing.py for PacBio -reads for which I have extracted the HLA-regions (originally aligned data to chm13, then extracted hla with HLA*LA and ran bam2fastq). I was wondering about the alignment, as the same script is used also for typing Nanopore reads and there is only one set of minimap2 parameters:
minimap2 -t {parameter.threads} -p 0.1 -N 100000 -a $ref $fq
And wondered why those recommended for pacbio aren't used? E.g:
minimap2 -ax map-pb ref.fa pacbio-reads.fq
or
-k19 -w19 -U50,500 -g10k -A1 -B4 -O6,26 -E2,1 -s200
Should instead use the script SpecHLA.sh or am I correct in only running the long_read_typing.py at this point?
Many thanks!
The text was updated successfully, but these errors were encountered: