Skip to content

Commit

Permalink
support filtering of context sequences without metadata (#47)
Browse files Browse the repository at this point in the history
this is now documented in the tutorial and included in the Snakefile

fixes #45
  • Loading branch information
gregcaporaso authored Jul 1, 2020
1 parent 5218631 commit 9efae5e
Show file tree
Hide file tree
Showing 2 changed files with 29 additions and 5 deletions.
20 changes: 17 additions & 3 deletions docs/tutorial.md
Original file line number Diff line number Diff line change
Expand Up @@ -107,6 +107,19 @@ qiime tools import \
--type FeatureData[Sequence]
```

If you're obtaining context sequences from a public repository, you may
encounter context sequences that don't have associated metadata records. Steps
in this workflow that require context sequence metadata would fail as a
result. At this stage we therefore filter any context sequences that don't
have metadata records.

```
qiime feature-table filter-seqs \
--i-data context-seqs.qza \
--m-metadata-file context-metadata.tsv \
--o-filtered-data context-seqs-w-metadata.qza
```

Next, we'll apply a quality filter to the sequence data. Technically this is
an optional step for both the context and focal sequences, but in practice if
you obtain your context sequences from a public repository you should
Expand All @@ -119,7 +132,7 @@ ambiguous characters.

```
qiime genome-sampler filter-seqs \
--i-sequences context-seqs.qza \
--i-sequences context-seqs-w-metadata.qza \
--p-max-proportion-ambiguous 0.01 \
--o-filtered-sequences filtered-context-seqs.qza
Expand Down Expand Up @@ -249,7 +262,8 @@ phylogenetic reconstruction in the
[q2-alignment](https://docs.qiime2.org/2020.2/plugins/available/alignment/)
and
[q2-phylogeny](https://docs.qiime2.org/2020.2/plugins/available/phylogeny/)
plugins. If you'd like, you can use these for the next steps of your
plugins (which are not installed by default with genome-sampler). If you'd
like, you can use these for the next steps of your
analyses. These would take the `sequences.qza` file as input, so you could
just postpone the export step that you ran above. For example, you could
align and build a tree as follows. Note however that usually you would
Expand Down Expand Up @@ -284,4 +298,4 @@ which is adapted from the [Contributor
Covenant](https://www.contributor-covenant.org), version 1.4.

## Citing `genome-sampler`
If you use `genome-sampler` in published work, please cite our pre-print (link to follow when available).
If you use `genome-sampler` in published work, please cite [our paper](https://f1000research.com/articles/9-657).
14 changes: 12 additions & 2 deletions snakemake/Snakefile
Original file line number Diff line number Diff line change
Expand Up @@ -44,6 +44,7 @@ rule all:
context_seqs_qza = OUTPUT_DIR + 'context-seqs.qza',
focal_seqs_qza = OUTPUT_DIR + 'focal-seqs.qza',
context_visualization = OUTPUT_DIR + 'context-seqs.qzv',
context_seqs_w_metadata_qza = OUTPUT_DIR + 'context-seqs-w-metadata.qza',
filtered_context = OUTPUT_DIR + 'filtered-context-seqs.qza',
focal_seqs = OUTPUT_DIR + ('filtered-' if FILTER_FOCAL_SEQS else '') + 'focal-seqs.qza',
date_selection = OUTPUT_DIR + 'date-selection.qza',
Expand Down Expand Up @@ -79,13 +80,22 @@ rule view_context:
shell:
"qiime feature-table tabulate-seqs --i-data {input.context_seqs_qza} --o-visualization {output.context_visualization}"

rule filter_context_missing_metadata:
input:
context_seqs_qza = OUTPUT_DIR + 'context-seqs.qza',
context_metadata = CONTEXT_METADATA_FP
output:
context_seqs_w_metadata_qza = OUTPUT_DIR + 'context-seqs-w-metadata.qza'
shell:
"qiime feature-table filter-seqs --i-data {input.context_seqs_qza} --m-metadata-file {input.context_metadata} --o-filtered-data {output.context_seqs_w_metadata_qza}"

rule filter_seqs:
input:
context_seqs_qza = OUTPUT_DIR + 'context-seqs.qza'
context_seqs_w_metadata_qza = OUTPUT_DIR + 'context-seqs-w-metadata.qza'
output:
filtered_context = OUTPUT_DIR + 'filtered-context-seqs.qza'
shell:
"qiime genome-sampler filter-seqs --i-sequences {input.context_seqs_qza} --p-max-proportion-ambiguous {MAX_AMBIGUOUS} --o-filtered-sequences {output.filtered_context}"
"qiime genome-sampler filter-seqs --i-sequences {input.context_seqs_w_metadata_qza} --p-max-proportion-ambiguous {MAX_AMBIGUOUS} --o-filtered-sequences {output.filtered_context}"

if FILTER_FOCAL_SEQS:
rule filter_focal_seqs:
Expand Down

0 comments on commit 9efae5e

Please sign in to comment.