Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error at mergePairs step: table(pairdf$forward, pairdf$reverse) : attempt to make a table with >= 2^31 elements #2055

Open
Anto007 opened this issue Nov 16, 2024 · 3 comments

Comments

@Anto007
Copy link

Anto007 commented Nov 16, 2024

Hi @benjjneb

I've been using your wonderful dada2 pipeline for years now but I encountered the below error yesterday for the first time. I'm pasting the relevant subsection of the log below. The 16S primer-pair 341F/805R was used in the PCRs and the 92 samples that were sequenced here on the NovaSeq platform represent marine sediments. I'm using a high end Linux workstation with 104 CPUs, 512 GB RAM & several terabytes of storage space to spare. The dada inference steps for the forward & reverse reads took days to finish here and although I guess this is due to the higher number of unique sequences, I see upon scouring through the issues tab that other users have successfully run dada2 with even higher number of unique sequences. Checking the libraries, I see that >99% of the reads do not appear to contain the primer sequences (my cutadapt command-line was cutadapt --no-indels --pair-filter=any --error-rate=0.2 --match-read-wildcards --discard-untrimmed -b CCTACGGGNGGCWGCAG -B GACTACHVGGGTATCTAATCC -o $k -p $l $i $j). I tried also the big data workflow with the same dataset but I got the same error pertaining to the inability to make a table with >= 2^31 elements much earlier i.e., at the dada forward inference step itself and so I'm very confused as to what is happening here?

dada_reverse <- dada(derep_reverse, err=err_reverse_reads, multithread=TRUE)
merged_amplicons <- mergePairs(dada_forward, derep_forward, dada_reverse,
                    derep_reverse, maxMismatch=1, minOverlap=10)
Sample 1 - 934443 reads in 243372 unique sequences.
Sample 2 - 2053836 reads in 477881 unique sequences.
Sample 3 - 1552150 reads in 394546 unique sequences.
Sample 4 - 1032061 reads in 269258 unique sequences.
Sample 5 - 749550 reads in 182212 unique sequences.
Sample 6 - 532773 reads in 144814 unique sequences.
Sample 7 - 562890 reads in 130689 unique sequences.
Sample 8 - 529784 reads in 124819 unique sequences.
Sample 9 - 633911 reads in 186200 unique sequences.
Sample 10 - 800203 reads in 178587 unique sequences.
.....
Error in table(pairdf$forward, pairdf$reverse) : 
  attempt to make a table with >= 2^31 elements
Calls: mergePairs -> lapply -> FUN -> table
Execution halted

In an older issue with the same kind of error, you asked for the below and I'm pasting here the relevant info. Upon BLAST searching these sequences on the NCBI-nt database, I can clearly see that these sequences indeed represent the prokaryote 16S rRNA gene. I also tried mapping a couple of these dada2 input read libraries to the dada2 GTDB 16S database fasta file and nearly 98% of the reads in the libraries mapped to it and so I don't think the majority of the library reads are represented by non-16S rRNA genes/non-specific amplification products. Any pointers here will be very much appreciated.

fnF <- "path/to/first_FWD_sample.fastq.gz" # CHANGE ME
fnR <- "path/to/first_REV_sample.fastq.gz" # CHANGE ME
dada2:::pfasta(sample(getSequences(fnF), 20))
dada2:::pfasta(sample(getSequences(fnR), 20))
>1
CAGGGAATCTTGCGCAATGGGCGAAAGCCTGACGCAGCGACGCCGCGTGGGGGATGAAGGCCTTCGGGTTGTAAACCCCTTTCAGGAGGGAAGAAAATGACGGTACCTCCAGAAGAAGCCCCGGCCAACTACGTGCCAGCAGCCGCGGTAATACGTAGGGGGCGAGCGTTGTCCGGATTTATTGGGCGTAAAGGGCTCGTAGGCGGCTTGACAAGTCGATCGTGAAAACTCGGG
>2
TGGGGAATATTGGACAATGGGGGCAACCCTGATCCAGCCATGCCGCGTGAGTGAAGAAGGCCTTAGGGTTGTAAAGCTCTTTCGCCCGTGAAGATGATGACGGTAGCGGGAGAAGAAGCCCCGGCTAACTCCGTGCCAGCAGCCGCGGTAATACGGAGGGGGCGAGCGTTGTTCGGAATTACTGGGCGTAAAGGGCGCGTAGGCGGAGATCCAAGTCAGGGGTGAAAGTCCTGG
>3
TCGAGAATCTTCGGCAATGGGCGCAAGCCTGACCGAGCGACGCCGCGTGCGGGACGAAGGCCCCTGGGTTGTAAACCGCTGTCAGAGGGGATGAAATGCGAGGGGGTTATCCCTCTCGTTTGACAAAGCCTCAGAGGAAGCACGGGCTAAGTACGTGCCAGCAGCCGCGGTAACACGTACTGTGCGAACGTTATTCGGAATCACTGGGCTTAAAGGGTGCGTAGGCGGCCGAAT
>4
TGGGGAATATTGGACAATGGGGGCAACCCTGATCCAGCCATGCCGCGTGAGTGACGAAGGCCCTAGGGTTGTAAAGCTCTTTCAGCGGGGAAGATGATGACGGTACCCGCAGAAGAAGCCCCGGCTAACTCCGTGCCAGCAGCCGCGGTAATACGGAGGGGGCTAGCGTTGTTCGGAATTACTGGGCGTAAAGGGCGTGTAGGCGGATCGAGTAGTCAGGCGTGAAAGCCCCGG
>5
TGGGGAATATTGGACAATGGGGGAAACCCTGATCCAGCCATGCCGCGTGAGTGAAGAAGGCCTTCGGGTCGTAAACTCCTGTCAGGTGGGACGAAACGGCCGAGTTAAATAGGCTCGGTAACTGACGGTACCACCAGAGGAAGCCCCGGCTAACTCCGTGCCAGCAGCCGCGGTAAAACGGAGGGGGCAAGCGTTATTCGGATTTACTGGGCGTAAAGGGCGCGTAGGCGGCAT
>6
TCGAGAATCTTCGGCAATGGGCGCAAGCCTGACCGAGCGACGCCGCGTGCGGGATGAAGGCCCTCGGGTTGTAAACCGCTGTCAGAGGGGAAGAAATGCATGAGGGTTCTCTCTCATGTTTGACTGATCCTCAGAGGAAGTACGGGCTAAGTTCGTGCCAGCAGCCGCGGTAACACGAACCGTACGAACGTTATTCGGAATCACTGGGCTTAAAGAGTGCGTAGGCGGCTTTAC
>7
TAACGAATCTTCCGCAATGCGCGAAAGCGTGACGGAGCAATGCCGCGTGCAGGATGAAGCTTCTCGGAGTGTAAACTGCTGTCAGGGTTTAGCAACACAATGAGCAGACCCAAAGGAAGGGCAGGCTAACTTCGTGCCAGCAGCCGCGGTAATACGAAGGGCCCCAGCGGTGCGCGGAATCACTGGGCTTAAAGCGTACGTAGGCGGGCGCGCAGGCGTTGTGTGAAAGCCAAC
>8
TGAGGAATATTGGTCAATGGTCGAGAGACTGAACCAGCCATGCCGCGTGTAGGAAGAAGGTTCTACGAATTGTAAACTACTTTTATACAGGAAGAAACCTATCTACGTGTAGATAGCTGACGGTACTGTAGGAATAAGGACCGGCTAACTCCGTGCCAGCAGCCGCGGTAATACGGAGGGTCCGAGCGTTATCCGGAATCATTGGGTTTAAAGGGTGCGTAGGCGGTCTTTTAA
>9
TAGGGAATTTTGGGCAATGGGCGAAAGCCTGACCCAGCAACGCCGCGTGTAGGATGAAGGCCCTCGGGTCGTAAACTACTGTCAGGAGGGAAGAACAGCCGTGCGGTCAATACCCGCGCGGTCCGACGGTACCTCCAAAGGAAGCGCCGGCTAACTCCGTGCCAGCAGCCGCGGTAATACGGAGGGCGCAAGCGTTGTTCGGAATTACTGGGCGTAAAGCGCGTGCAGGTGGTC
>10
TGGGGAATATTGGACAATGGGCGAAAGCCTGATCCAGCTACACCGCGTGTGTGAAGAAGGCCCTCGGGTCGTAAAGCACTGTCGGGAGGGACGAAGCCTTCGGGTTGACGGTACCATATGAATAAGCACCGGCTAACTCCGTGCCAGCAGCCGCGGTAATACGGAGGGTGCAAGCGTTGTCCGGATTCACTGGGTGTAAAGGGTGTGCAGGCGGGGCGATAAGTCAGAGGTGAA
>11
TCACGAATCATTGGCAATGCGCGAAAGCGTGACCATGCAATGCCGCGTGGGCGATGAAGGCCTTCGGGTTGTAAAGCCCTGTCAGGGGTGAGGAAACGTACTTCGGTACTTGACGTTAACCCCAGAGGAAGTCACGGCTAACTCCGTGCCAGCAGCCGCGGTAATACGGAGGTGGCAAACGTTGCTCGGAATCACTGGGCTTAAAGGGCGTGTAGGCGGCCCACTAAGTCGGAT
>12
TGGGGAATATTGCACAATGGGGGAAACCCTGATGCAGCCATGCCGCGTGTGTGAAGAAGGCCTTCGGGTTGTAAAGCACTTTCAGTTGTGAGGAAAAGTTAGTAGTTAATACCTGCTAGCCGTGACGTTAACAACAGAAGAAGCACCGGCTAACTCCGTGCCAGCAGCCGCGGTAATACGGAGGGTGCGAGCGTTAATCGGAATTACTGGGCGTAAAGCGCACGCAGGCGGTGT
>13
TCGCGAATCATTCGCAATGCGCGCAAGCGTGACGATGCGACGCCGCGTGGGCGATGAAGGCCTTCGGGTTGTAAAGCCCTGTCAGGGGTGAGTAAAGCTGCGGGTCCACTCGTAGTTGAATTAAGCCCCAGAGGAAGTCACGGCTAACTCCGTGCCAGCAGCCGCGGTAAGACGGAGGTGGCAAGCGTTGCTCGGAATCACTGGGCTTAAAGGGCGCGTAGGCGGCCGTTCTAG
>14
TGGGGAATATTGGACAATGGGGGAAACCCTGATCCAGCAACGCCGCGTGGAGGATGAAGGCCTTCGGGTTGTAAACTCCTGTCAGGTGGGACGAAATGGCGCCGGTCAATAGCCGGTGTCTTTGACGGTACCACCGGAGGAAGCCCCGGCTAACTCCGTGCCAGCAGCCGCGGTAATACGGAGGGGGCAAGCGTTATTCGGAATTATTGGGCGTAAAGGGCGCGTAGGCGGCCT
>15
TGAGGAATTTTGCGCAATGGGGGAAACCCTGACGCAGCGACGCCGCGTGGAGGAAGAAGGCCTTCGGGTCGTAAACTCCTGTCAAGTGGGACGAATGCTACGAGGATGAATAAGCCTCGTGGTTGACGGTACCACTGGAGGAAGCCCCGGCTAACTCCGTGCCAGCAGCCGCGGTAATACGGAGGGGGCCAGCGTTGTTCGGAATTATTGGGCGTAAAGGGCGCGTAGGCGGCC
>16
TGGGGAATCTTGGGCAATGGGCGAAAGCCTGACCCAGCCACGCCGCGTGGAGGAAGACACCCCTATGGGGCGTAAACTCCTTTTATGTGGGAAGAACACCTTCCTCGGGAAGGCTTGACGGTACCACATGAATAAGCACCGGCTAACTCCGTGCCAGCAGCCGCGGTAATACGGAGGGTGCAAGCGTTGTCCGGATTCACTGGGTGTAAAGGGTGTGTAGGCGGAGCTGTCAGT
>17
TGGGGAATATTGGACAATGGGCGAAAGCCTGATCCAGCCATGCCGCGTGAGTGACGAAGGCCTTAGGGTTGTAAAGCTCTTTTGGCGGGGACGATAATGACGGTACCCGCAGAATAAGCCCCGGCTAACTTCGTGCCAGCAGCCGCGGTAATACGAAGGGGGCTAGCGTTGTTCGGAATCACTGGGCGTAAAGCGCACGTAGGCTGACTGGTCAGTTGGGGGTGAAATCCCGGG
>18
GCTTCGTGTAACGCAGCAAGATGTTGACGGCCTGCAGGAAAAGAACCAGTGAAAAAAGGCCCACTGCCACGCCCTCCGCCACCGCCCGCGTCGTTCGAAAGAGGACCTTCATCGGACATCCGATCCGAGGCGTTGGGGACGTCCCGTCTCGCGCTCCACGTGACCCAGCCACTCGAGATAGAACCGGTACGGACCGTCCCCCAGGCTCTTCATGTAATCATCCCGAAAGCGACG
>19
TCGAGAATCTTCCGCAATGGGCGCAAGCCTGACGGAGCGACGCCGCGTGCGGGATGAAGGCCTTCGGGTTGTAAACCGCTGTCAGTTGGGAGGAAATGCCATAGGGTACTCTCTATGGTTTGACCGATCTTCAGAGGAAGTCCGGGCTAAGTTCGTGCCAGCAGCCGCGGTAAGACGAACCGGACGAACGTTATTCGGAATTACTGGGCTTAAAGGGTGCGTAGGCGGCCTTGT
>20
TAGGGAATCTTGGTCAATGGGGGAAACCCTGAACCAGCAACTCCGCGTGAGGGATGAAGGCCCTCGGGTCGTAAACCTCTGTCGGGAGGGAAGAACGGGCCGTCGGTTAATACCCGACGGTCTTGACGGTACCTCCAAAGGAAGGACCGGCTAACTCCGTGCCAGCAGCCGCGGTAATACGGAGGGTCCAAGCGTTGTTCGGAATCATTGGGCGTAAAGCGGGTGTAGGCGGCT

@Anto007
Copy link
Author

Anto007 commented Nov 21, 2024

@benjjneb Sorry to bother you but I'm not sure if you saw this post and hence I'm pinging you here again. For now, I've got around this problem by randomly downsampling the original read libraries by 50% and then dada2 finished successfully in reasonable time. I was wondering if you would have any other workarounds here so as to not waste half of the reads in the sequenced libraries? Thank you very much in advance for your time and effort to respond

@benjjneb
Copy link
Owner

Could you clarify which command exacly is throwing this error?

I'm not clear from your original post whether it is dada on the reverse reads, or mergePairs subsequently.

Whichever command it is, please show the exact command that produced the error, and the exact error message produced..

@Anto007
Copy link
Author

Anto007 commented Nov 23, 2024

Thank you for getting back to me @benjjneb. While using the standard workflow, I got the below error and the dada forward & reverse steps had actually taken a week to finish

Error in table(pairdf$forward, pairdf$reverse) : 
  attempt to make a table with >= 2^31 elements
Calls: mergePairs -> lapply -> FUN -> table
Execution halted

and the above error was specifically after I had run

merged_amplicons <- mergePairs(dada_forward, derep_forward, dada_reverse,
                    derep_reverse, maxMismatch=1, minOverlap=10)

While attempting the big data workflow with the same above dataset, I got a similar "Execution halted:attempt to make a table with >= 2^31 elements" error after running

for(sam in sample.names) {
  cat("Processing:", sam, "\n")
  derep <- derepFastq(filts[[sam]])
  dds[[sam]] <- dada(derep, err=err, multithread=TRUE)
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants