You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm using an approach like that described in the 'Big Data' workflow, but with the dada step in the loop farmed out to independent worker jobs on a cluster so these can be run in parallel. These are then merged afterwards, combined into a sequence table, and then chimeras are removed.
So far this works quite well, but we'd like to increase sensitivity. What I am wondering is whether we could essentially emulate what pseudo-pooling does by running a first-pass like the above, generate a set of priors from the output, then run a second pass (again parallel on the cluster) but including the priors (generated similar to
). I'm not seeing anything in the function that immediately gives me pause, but would you know if there is anything we need to consider when implementing this (set.seed or any parameters that should be included in the following round)?
Thanks!
The text was updated successfully, but these errors were encountered:
What you are suggesting looks exactly right to me. This is what pool="pseudo" does, but the built-in implementation can't farm out samples to different nodes as you are doing.
if there is anything we need to consider when implementing this (set.seed or any parameters that should be included in the following round)?
I don't immediately see any issue.
I think what you are doing is consistent with our approach.
I'm using an approach like that described in the 'Big Data' workflow, but with the
dada
step in the loop farmed out to independent worker jobs on a cluster so these can be run in parallel. These are then merged afterwards, combined into a sequence table, and then chimeras are removed.So far this works quite well, but we'd like to increase sensitivity. What I am wondering is whether we could essentially emulate what pseudo-pooling does by running a first-pass like the above, generate a set of priors from the output, then run a second pass (again parallel on the cluster) but including the priors (generated similar to
dada2/R/dada.R
Line 400 in 278f5f3
set.seed
or any parameters that should be included in the following round)?Thanks!
The text was updated successfully, but these errors were encountered: