Skip to content
This repository has been archived by the owner on Jul 29, 2020. It is now read-only.

Pulldown: select surveys to be exported + merged table #163

Merged
merged 28 commits into from
Mar 31, 2017

Conversation

sjanssen2
Copy link
Collaborator

@sjanssen2 sjanssen2 commented Nov 30, 2016

  1. Update to the metadata pulldown handler, such that the user now sees which different internal surveys are available plus he/she can select which of those should be exported to the download.

  2. There is now an option to join all columns from all surveys into one huge file.

  3. Filenames in resulting zip archive are more speaking now.

  4. There seems to be a change in the flake8 on Travis. Thus, I had to add some blank lines to certain files.

(-4, 'Surfers', True),
(-5, 'Personal_Microbiome', False)],
db.list_ag_surveys([-2, -4]))

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to clarify - this is testing the real database if these surveys in the db?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's testing that this method can obtain survey names AND that the third component is set to True for all surveys if NONE is provided and only those set to True that are in a list given to the function as third argument.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do think that this is a really critical check. Now it looks like every time that a survey gets added, this test would need to be updated. What do you think about this? Do you think that it is worth while to raise an issue to make these tests - so that we don't need to update these tests every time that the surveys are updated.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that adding new surveys is a frequently occurring task. Thus, updating the test each time should be fine.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok that's fine - but what do you think about having some sort of documentation that contains all of the tests that need to be updated? If there is an additional survey being added - it could be a pain in the butt to hunt and find which tests need to be updated.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On a second thought, I totally agree with our argument that this is not future save. Thus, I changed the code in a way that additional surveys should not influence the outcome.

<div>Select which American Gut surveys shall be included in the pulldown archive. <br/>Add 'all in one file' if you also want to have one addition file<br/>that contains all available columns from all American Gut surveys.</div>
<select name="agsurveys" id="agsurveys" multiple />
{% for s in agsurveys %}
{% if s[2] == True %}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if s[2]

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

<select name="external" id="external" multiple />
{% for s in surveys %}
<option value='{{s}}'>{{s}}</option>
{% end %}
</select>
</p>
{% if merged == 'True' %}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a string comparison?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes it is, since 'merged' holds the content of 'value' of the checkbox which I defined as the string 'True'.

@wasade
Copy link
Member

wasade commented Dec 20, 2016

@sjanssen2 looks like there are merge conflicts

@sjanssen2 sjanssen2 closed this Mar 24, 2017
@sjanssen2 sjanssen2 deleted the addedSurveyTestData branch March 24, 2017 23:47
@sjanssen2 sjanssen2 restored the addedSurveyTestData branch March 24, 2017 23:47
@sjanssen2 sjanssen2 reopened this Mar 28, 2017
@sjanssen2 sjanssen2 force-pushed the addedSurveyTestData branch from 80a9f6d to 1bf5fc0 Compare March 29, 2017 21:22
@sjanssen2
Copy link
Collaborator Author

We have issues with Geocoder searching for locations where addresses are now random utf-8 characters. This results in unstable results for metadata pulldown. Thus, I only compare the first 1000 characters of each file in the pulldown archive - basically to continue working on #184
Let us fix Geocoder later.

@coveralls
Copy link

Coverage Status

Coverage increased (+1.4%) to 93.409% when pulling a4c6729 on sjanssen2:addedSurveyTestData into 9b0c0ca on biocore:master.

@sjanssen2
Copy link
Collaborator Author

@wasade @josenavas @antgonza can I please get a rapid review to be able to continue working on #184

Copy link
Member

@josenavas josenavas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few comments

# be the name of the external survey. In order to not block their
# pulldown I check that a skipped survey ID must be in the set of
# all available surveys.
if ((-1 * survey) in selected_ag_surveys) or \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

abs_survey = abs(survey) and use abs_survey instead of -1 * survey in order to avoid code duplication

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

pd_meta.set_index('sample_name', inplace=True)
results_as_pd.append(pd_meta)

pd_all = pd.DataFrame()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should these 3 lines of code be moved inside the if below? Not sure how expensive is the merge, but if it is only needed when merged = True then just do it there.

Copy link
Collaborator Author

@sjanssen2 sjanssen2 Mar 30, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch. Fixed

Parameters
----------
selected : list of int
To keep track of which surveys have been selected by the user via
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the resulting list of tupels third element holds this information as a boolean value -> I don't understand what this means, can you rephrase?

Copy link
Collaborator Author

@sjanssen2 sjanssen2 Mar 30, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

better now?

sql = """SELECT group_order, american
FROM ag.survey_group
WHERE group_order < 0"""
return [(id, name, (selected is None) or (id in selected))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

id -> id_
id is a reserved word

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed


Returns
-------
list of (int, str, bool)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is the last boolean needed? It is also confusing that the selected attribute is only used to mark this boolean - shouldn't it limit which ones are returned, rather than the user then going back over the list to filter the ones that he is not interested in? And by user I mean the developer that consumes this function.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function is used to render all survey names on the metadata pulldown website. The user (who is browsing labadmin) has the option to tick a survey to include or exclude it for the actual pulldown.
Thus, this function is a hybrid of DB query and interface rendering, but since it is so simple I thought it would be OK.

@@ -61,6 +61,42 @@ def write_to_buffer(self):
return self.in_memory_data.getvalue()


def extract_zip(input_zip):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this function blow up the memory usage?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for sure, if you extract a huge archive. But it need to do the extraction for unit testing and I thought it is more elegant than doing system calls to 'unzip'

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok!


Parameters
----------
len : int
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

archive is missing

@@ -329,6 +328,23 @@ def test_get_ag_barcode_details(self):
self.assertEqual({k: obs[key][k] for k in exp[key]}, exp[key])
self.assertIn(obs[key]['participant_name'], participant_names)

def test_list_ag_surveys(self):
truth = [(-1, 'Personal Information', True),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this test can be simplified as

self.assertItemsEqual(db.list_ag_surveys(), truth)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nope. Consider that we add surveys in the future. This would brake your test.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test would not be correct, since it will not be testing that all surveys are returned.

@@ -71,14 +80,34 @@ <h3 style="color:red">Pulldown Processing, please wait for file download. It may
<form enctype="multipart/form-data" action="/ag_pulldown/" name="agForm" id="agForm" method="post">
<p>Upload qiime mapping file, or other file with first line header and one barcode per line</p>
<p>Barcodes File <input type="file" name="barcodes" id="barcodes"/></p>
<p>American Gut Surveys:
<div>Select which American Gut surveys shall be included in the pulldown archive. <br/>Add 'all in one file' if you also want to have one addition file<br/>that contains all available columns from all American Gut surveys.</div>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

addition -> additional

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@@ -53,6 +55,13 @@ def test_get(self):
self.assertIn("<option value='%s'>%s</option>" % (survey, survey),
response.body)
self.assertNotIn('<input type="submit" disabled>', response.body)
for (id, name, selected) in db.list_ag_surveys():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think in this case none are selected, correct? according to the get call that you are doing all should be false.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remember the definition of list_ag_surveys. There it is said that all get "selected" if selected is not set, i.e it is None. So in fact, all are selected in this case.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, in that case what it means is that you don't need the if statement because all are selected. Given the get issued, all should be selected so the test should actually be:

for (_id, name, selected) in db.list_ag_surveys():
    self.assertIn("<option value='%i' selected>%s</option>" % (_id, name), response.body)

Note also the change from id -> _id.

@coveralls
Copy link

Coverage Status

Coverage increased (+1.4%) to 93.412% when pulling 7b1d57d on sjanssen2:addedSurveyTestData into 66eff37 on biocore:master.

@sjanssen2
Copy link
Collaborator Author

@josenavas could you review again please?

@sjanssen2
Copy link
Collaborator Author

@wasade @antgonza @josenavas I need some timely reviews please!

Copy link
Contributor

@antgonza antgonza left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some questions/comments.

if self.get_argument('external'):

# query which surveys have been selected by the user
if self.get_argument('selected_ag_surveys', []):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this if really necessary? you could do:
selected_ag_surveys = map(int, self.get_argument('selected_ag_surveys', []).split(','))
right?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I figured that it can be even easier than that, if we use get_aguments (not the s). Then, you can provide a list of strings as parameters.

else:
selected_ag_surveys = []

if self.get_argument('external', []):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same with this.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see above


# check database about what surveys are available
available_agsurveys = {}
for (id, name, selected) in db.list_ag_surveys():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you replace id -> _id and perhaps selected by _ as it's not being used?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

-------
dict{str, str} where the first component is the filename and the second
the first <len> characters of the file."""
return map(lambda (k, v): {k: v[:len]}, archive.items())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just FYI, if the string is shorter it will return the full string ... this can cause problems in certain cases.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be fine in this case, but thanks for the info

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I now know :-)

Copy link
Member

@josenavas josenavas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few more comments


# check database about what surveys are available
available_agsurveys = {}
for (id, name, selected) in db.list_ag_surveys():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

id -> _id

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also catched by Antonio. Fixed.

if (abs_survey in selected_ag_surveys) or \
(abs_survey not in available_agsurveys):
meta_zip.append('survey_%s_md.txt' %
available_agsurveys[-1 * survey], meta)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-1 * survey -> abs_survey

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that would be wrong, because those keys are actually negative numbers; it's what the user provided (i.e. the database)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't it then be just survey? Note that survey holds the negative number, while abs_survey holds the positive one. [-1 * survey] == abs_survey as the code stands.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I now fixed it. We cannot change the keys in available_agsurveys which are negative, but we can flip the sign for variable survey :-)

@@ -61,6 +61,42 @@ def write_to_buffer(self):
return self.in_memory_data.getvalue()


def extract_zip(input_zip):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok!

@@ -329,6 +328,23 @@ def test_get_ag_barcode_details(self):
self.assertEqual({k: obs[key][k] for k in exp[key]}, exp[key])
self.assertIn(obs[key]['participant_name'], participant_names)

def test_list_ag_surveys(self):
truth = [(-1, 'Personal Information', True),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test would not be correct, since it will not be testing that all surveys are returned.

@@ -53,6 +55,13 @@ def test_get(self):
self.assertIn("<option value='%s'>%s</option>" % (survey, survey),
response.body)
self.assertNotIn('<input type="submit" disabled>', response.body)
for (id, name, selected) in db.list_ag_surveys():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, in that case what it means is that you don't need the if statement because all are selected. Given the get issued, all should be selected so the test should actually be:

for (_id, name, selected) in db.list_ag_surveys():
    self.assertIn("<option value='%i' selected>%s</option>" % (_id, name), response.body)

Note also the change from id -> _id.

@coveralls
Copy link

Coverage Status

Coverage increased (+1.4%) to 93.394% when pulling ff7aee5 on sjanssen2:addedSurveyTestData into 66eff37 on biocore:master.

@sjanssen2
Copy link
Collaborator Author

@josenavas regarding test_list_ag_surveys:
One of @mortonjt first comments addressed that and I changed my originally static tests to account for future addition of surveys. You know ask me to revert back. I don't how to satisfy both of you. Personally, I prefer Jamies way.

@josenavas
Copy link
Member

@sjanssen2 Unit tests should test that the functions are doing what they are supposed to do. The current get call on test_list_ag_surveys is used to test that all surveys are returned. If we add another survey and there is a bug in which we don't return the new survey, the test will pass w/o detecting the issue.

However, if we change to use assertItemsEqual, if a new survey is added the test would fail, which is the expected behavior since you will also need to update the test to make sure that the function still does what it is supposed to do, correct?

@sjanssen2
Copy link
Collaborator Author

@josenavas regarding test_list_ag_surveys:
OK, reverted as you said.

@antgonza
Copy link
Contributor

Just remembered that you can retrieve a list with: self.request.arguments.get('yourvar', your_default_value)

@sjanssen2
Copy link
Collaborator Author

@antgonza unfortunately, its not working here :-(

@antgonza
Copy link
Contributor

k, thanks for testing.

@coveralls
Copy link

Coverage Status

Coverage increased (+1.4%) to 93.412% when pulling 7d49672 on sjanssen2:addedSurveyTestData into 66eff37 on biocore:master.

@sjanssen2
Copy link
Collaborator Author

ready to merge @josenavas ?

@sjanssen2
Copy link
Collaborator Author

would be great to get this done before I leave into the weekend, which is in 20 minutes

Copy link
Member

@josenavas josenavas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 comment

@@ -92,3 +135,22 @@ def get(self):
except Exception as e:
msg = 'ERROR: %s' % str(e)
self.write(msg)


def listify(list):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

list is a python function, use a different variable name (e.g. l it's ok given that the function is simple).
This function is also missing unit test.

I recommend to use some kind of text editor that has python syntax highlighting to easily identify the reserved words.

@coveralls
Copy link

Coverage Status

Coverage increased (+1.4%) to 93.412% when pulling dc55575 on sjanssen2:addedSurveyTestData into 66eff37 on biocore:master.

@josenavas josenavas merged commit 4a0e2f8 into biocore:master Mar 31, 2017
@sjanssen2
Copy link
Collaborator Author

finally. Cool! Thanks @josenavas @antgonza @mortonjt

@sjanssen2 sjanssen2 deleted the addedSurveyTestData branch April 3, 2017 19:02
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants