Skip to content
This repository has been archived by the owner on Jul 20, 2021. It is now read-only.

Sequence Clustering and Mapping #113

Open
1 of 10 tasks
gregcaporaso opened this issue Apr 2, 2015 · 0 comments
Open
1 of 10 tasks

Sequence Clustering and Mapping #113

gregcaporaso opened this issue Apr 2, 2015 · 0 comments

Comments

@gregcaporaso
Copy link
Member

  • expand reference-based clustering discussion
  • update plots at the end to be based on several different random subsets of Greengenes to plot average and variance
  • add greedier cluster function
  • port clustering code to scikit-bio
  • get useful layout of 16S graph visualizations
  • add all alignments as edges with metadata indicating whether they resulted in a cluster
  • add some discussion of real world run time for OTU picking (several people asked questions about doing this iteratively - like iterative msa - which is interesting, but runtime would be a limiting factor)
  • add discussion of why approximations are required (i.e., why can't you compute distances between all pairs of sequences, build a tree, and define OTUs based on clades in the tree?) - this should go in the top of the notebook so it's clear why we don't compare all sequences against all other sequences.
  • add max_accepts and max_rejects options
  • add optional kmer-based cluster pre-selection
gregcaporaso added a commit that referenced this issue Apr 3, 2015
@gregcaporaso gregcaporaso changed the title Clustering Sequence Clustering and Mapping Apr 6, 2015
@gregcaporaso gregcaporaso added this to the 0.2.0 milestone Apr 6, 2015
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

1 participant