-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixes issue with fastq.gz files having _I1_ etc. #109
base: master
Are you sure you want to change the base?
Conversation
Fixes issue with fastq.gz files having _I1_, _R1_, etc. in parts of the file-name other than the expected location.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A question. Also, will this be propagated to the other libraries for SPP or do we need PRs updating those too?
i2_only = [x for x in files if '_I2_' in x] | ||
# use capturing to handle both raw files as well as trimmed and | ||
# filtered files. We don't need to process the captured string. | ||
i1_files = compile(r"^.*_L\d{3}_I1_\d{3}\.(trimmed\.|filtered" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why the capture group, it doesn't seem like it's used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A side of effect of using the capture group is that I can use one regex to handle '.trimmed.fastq.gz', '.filtered.fastq.gz', and '.fastq.gz' files. Brackets ('[',']') and ORs ('|') didn't work for me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is not good practice to rely on side effects... Suggest adding ?:
, see
>>> import re
>>> re.compile(r'(?:foo|bar)?baz').match('baz')
<re.Match object; span=(0, 3), match='baz'>
>>> re.compile(r'(?:foo|bar)?baz').match('foobaz')
<re.Match object; span=(0, 6), match='foobaz'>
>>> re.compile(r'(?:foo|bar)?baz').match('barbaz')
<re.Match object; span=(0, 6), match='barbaz'>
>>> re.compile(r'(?:foo|bar)?baz').match('fbaz')
>>>
r1_only = [] | ||
r2_only = [] | ||
|
||
for some_path in files: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add a regression test to verify that the bug exists and these changes fix it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I'll add a test with the old filenames to show how they fail.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. It's important that bugs are coupled with regression tests to ensure they are fixed and don't arise again
Co-authored-by: Daniel McDonald <d3mcdonald@eng.ucsd.edu>
Co-authored-by: Daniel McDonald <d3mcdonald@eng.ucsd.edu>
Pull Request Test Coverage Report for Build 6698670902
💛 - Coveralls |
Fixes issue with fastq.gz files having I1, R1, etc. in parts of the file-name other than the expected location.
This fix is currently working in qiita-rc.