-
-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix failures to find most recently added version index object #12
Conversation
[{ | ||
id: project, | ||
name: project, | ||
index_date_timestamp: Date.now(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we run into problems if people from different regions ran the script?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think so? It's Heroku Scheduler that runs the script at an interval, so the timezone would be whichever Heroku uses.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think @ijlee2 is correct, the Heroku Scheduler should be the only thing executing this script, but yes if we ran the script locally I suppose we could run into issues.
This approach is meant as a temporary fix until we can better assess how much of the process could be refactored or improved.
FWIW I did confirm via some manual testing that this logic finds the correct index object.
Note: After checking that the updated script works, Jared and I hope to add tests and simplify some code next. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should merge this for sure. I think it is an improvement and will fix the issue in general
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for doing this!
We started to experience significant churn in the processing of
json-api-docs
around the 25th/26th of June 2020. Every time the scheduler on Heroku was run (which was every hour) these scripts processed certain versions of Ember/Ember Data which had already been processed.This extra processing caused us to go over our operations quota for Algolia. After contact w/ Algolia support and getting the limits raised, we turned the frequency of the scheduler down to 1x/day from 1x/hour while we worked on this solution.
The above-referenced bug was caused by reliance on the numerical hierarchy of Algolia's self-assigned
objectID
's. It seems that Algolia moved to a different set of numbers ofobjectID
's and this broke our reliance on the highestobjectID
being the most recently added object.As a result, these scripts were constantly finding an older index object (which contains a list of ember versions as an array) that did not include certain Ember/Ember Data versions and needlessly re-processed the related
json-api-docs
for those missing versions.This PR seeks to break that reliance and introduce a datetime stamp field on our version index objects which can be used to find the most recently added object in a more deterministic way.
Separately (manually through the Algolia dashboard), we have introduced this datetime stamp field and populated it on the current most recent object in the version index.