Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Eliminate duplicate staff report uploads #52

Open
2 tasks
krammy19 opened this issue Nov 8, 2021 · 0 comments
Open
2 tasks

Eliminate duplicate staff report uploads #52

krammy19 opened this issue Nov 8, 2021 · 0 comments
Assignees

Comments

@krammy19
Copy link
Collaborator

krammy19 commented Nov 8, 2021

Right now our scraper is downloading all the relevant staff reports for its search parameters and uploading those documents to our shared Google Drive.

The problem is there is no verification if the file already exists on the Google Drive. If the scraper is run twice for the same date range, it will upload duplicates of all the staff reports. Google Drive does not prohibit duplicate files or file names.

Our GoogleDrive_upload needs to be updated to include some check to see if a particular filename already exists on the Google Drive in the specified city folder.

Tasks:

  • Write a new function in GoogleDrive_upload to check if a given filename exists in the current_city folder. The function should return True / False.
  • Add the filename_check function as a condition to the Legistar_Selenium scraper to prevent uploads of duplicate filenames
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants