Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

General question about pseudo-pooling #2047

Open
cjfields opened this issue Nov 1, 2024 · 1 comment
Open

General question about pseudo-pooling #2047

cjfields opened this issue Nov 1, 2024 · 1 comment

Comments

@cjfields
Copy link

cjfields commented Nov 1, 2024

I'm using an approach like that described in the 'Big Data' workflow, but with the dada step in the loop farmed out to independent worker jobs on a cluster so these can be run in parallel. These are then merged afterwards, combined into a sequence table, and then chimeras are removed.

So far this works quite well, but we'd like to increase sensitivity. What I am wondering is whether we could essentially emulate what pseudo-pooling does by running a first-pass like the above, generate a set of priors from the output, then run a second pass (again parallel on the cluster) but including the priors (generated similar to

dada2/R/dada.R

Line 400 in 278f5f3

pseudo_priors <- colnames(st)[colSums(st>0) >= opts$PSEUDO_PREVALENCE | colSums(st) >= opts$PSEUDO_ABUNDANCE]
). I'm not seeing anything in the function that immediately gives me pause, but would you know if there is anything we need to consider when implementing this (set.seed or any parameters that should be included in the following round)?

Thanks!

@benjjneb
Copy link
Owner

What you are suggesting looks exactly right to me. This is what pool="pseudo" does, but the built-in implementation can't farm out samples to different nodes as you are doing.

if there is anything we need to consider when implementing this (set.seed or any parameters that should be included in the following round)?

I don't immediately see any issue.
I think what you are doing is consistent with our approach.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants