You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I've been using your wonderful dada2 pipeline for years now but I encountered the below error yesterday for the first time. I'm pasting the relevant subsection of the log below. The 16S primer-pair 341F/805R was used in the PCRs and the 92 samples that were sequenced here on the NovaSeq platform represent marine sediments. I'm using a high end Linux workstation with 104 CPUs, 512 GB RAM & several terabytes of storage space to spare. The dada inference steps for the forward & reverse reads took days to finish here and although I guess this is due to the higher number of unique sequences, I see upon scouring through the issues tab that other users have successfully run dada2 with even higher number of unique sequences. Checking the libraries, I see that >99% of the reads do not appear to contain the primer sequences (my cutadapt command-line was cutadapt --no-indels --pair-filter=any --error-rate=0.2 --match-read-wildcards --discard-untrimmed -b CCTACGGGNGGCWGCAG -B GACTACHVGGGTATCTAATCC -o $k -p $l $i $j). I tried also the big data workflow with the same dataset but I got the same error pertaining to the inability to make a table with >= 2^31 elements much earlier i.e., at the dada forward inference step itself and so I'm very confused as to what is happening here?
dada_reverse <- dada(derep_reverse, err=err_reverse_reads, multithread=TRUE)
merged_amplicons <- mergePairs(dada_forward, derep_forward, dada_reverse,
derep_reverse, maxMismatch=1, minOverlap=10)
Sample 1 - 934443 reads in 243372 unique sequences.
Sample 2 - 2053836 reads in 477881 unique sequences.
Sample 3 - 1552150 reads in 394546 unique sequences.
Sample 4 - 1032061 reads in 269258 unique sequences.
Sample 5 - 749550 reads in 182212 unique sequences.
Sample 6 - 532773 reads in 144814 unique sequences.
Sample 7 - 562890 reads in 130689 unique sequences.
Sample 8 - 529784 reads in 124819 unique sequences.
Sample 9 - 633911 reads in 186200 unique sequences.
Sample 10 - 800203 reads in 178587 unique sequences.
.....
Error in table(pairdf$forward, pairdf$reverse) :
attempt to make a table with >= 2^31 elements
Calls: mergePairs -> lapply -> FUN -> table
Execution halted
In an older issue with the same kind of error, you asked for the below and I'm pasting here the relevant info. Upon BLAST searching these sequences on the NCBI-nt database, I can clearly see that these sequences indeed represent the prokaryote 16S rRNA gene. I also tried mapping a couple of these dada2 input read libraries to the dada2 GTDB 16S database fasta file and nearly 98% of the reads in the libraries mapped to it and so I don't think the majority of the library reads are represented by non-16S rRNA genes/non-specific amplification products. Any pointers here will be very much appreciated.
@benjjneb Sorry to bother you but I'm not sure if you saw this post and hence I'm pinging you here again. For now, I've got around this problem by randomly downsampling the original read libraries by 50% and then dada2 finished successfully in reasonable time. I was wondering if you would have any other workarounds here so as to not waste half of the reads in the sequenced libraries? Thank you very much in advance for your time and effort to respond
Thank you for getting back to me @benjjneb. While using the standard workflow, I got the below error and the dada forward & reverse steps had actually taken a week to finish
Error in table(pairdf$forward, pairdf$reverse) :
attempt to make a table with >= 2^31 elements
Calls: mergePairs -> lapply -> FUN -> table
Execution halted
and the above error was specifically after I had run
While attempting the big data workflow with the same above dataset, I got a similar "Execution halted:attempt to make a table with >= 2^31 elements" error after running
Hi @benjjneb
I've been using your wonderful dada2 pipeline for years now but I encountered the below error yesterday for the first time. I'm pasting the relevant subsection of the log below. The 16S primer-pair 341F/805R was used in the PCRs and the 92 samples that were sequenced here on the NovaSeq platform represent marine sediments. I'm using a high end Linux workstation with 104 CPUs, 512 GB RAM & several terabytes of storage space to spare. The dada inference steps for the forward & reverse reads took days to finish here and although I guess this is due to the higher number of unique sequences, I see upon scouring through the issues tab that other users have successfully run dada2 with even higher number of unique sequences. Checking the libraries, I see that >99% of the reads do not appear to contain the primer sequences (my cutadapt command-line was
cutadapt --no-indels --pair-filter=any --error-rate=0.2 --match-read-wildcards --discard-untrimmed -b CCTACGGGNGGCWGCAG -B GACTACHVGGGTATCTAATCC -o $k -p $l $i $j)
. I tried also the big data workflow with the same dataset but I got the same error pertaining to the inability to make a table with >= 2^31 elements much earlier i.e., at the dada forward inference step itself and so I'm very confused as to what is happening here?In an older issue with the same kind of error, you asked for the below and I'm pasting here the relevant info. Upon BLAST searching these sequences on the NCBI-nt database, I can clearly see that these sequences indeed represent the prokaryote 16S rRNA gene. I also tried mapping a couple of these dada2 input read libraries to the dada2 GTDB 16S database fasta file and nearly 98% of the reads in the libraries mapped to it and so I don't think the majority of the library reads are represented by non-16S rRNA genes/non-specific amplification products. Any pointers here will be very much appreciated.
The text was updated successfully, but these errors were encountered: