-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
added option to optionally localize retreived timeseries to UTC #53
base: develop
Are you sure you want to change the base?
Conversation
This change will just optionally give you timezone-aware datetimes with the timezone set to UTC, and only on that endpoint. What mongodb returns will always be UTC, is there any advantage in ever returning a naive datetime? Would it not be better to 1) always return timezone-aware datetimes (instantiate MongoClient with The real trouble then comes from storing naive datetimes which are not in UTC, but that seems to be less easy to control on the database level (pymongo just stores them as UTC). https://pymongo.readthedocs.io/en/stable/examples/datetimes.html |
If I understand the pymongo documentation correctly, on the database level:
Therefore, I can see three options: 1. store timezone in database timestamps
Examples:
Pro / Con: 2. set timezone on loading
Example: Pro / Con: 3. store timezone in metadata
Example: Pro / Con: After writing it down like that, Option 3. would be my preferred solution. |
These 3 options don't feel quite right to be honest. Conceptually:
Therefore, I would argue that:
Option 3 breaks with the pymongo convention on how timestamps are saved, and makes a simple timestamp ambigious if you do not know where the metadata is found. You would need to scrub timezone information from incoming timestamps without converting them (otherwise pymongo would convert to UTC on insert). It also adds code to pandahub doing things already implemented in pymongo. I do not quite understand the usecase for retrieving timeseries in the timezone they originated in - if you need to work with them you probably want UTC, if you just need to display them you probably want the local time zone of the user. But here is a POC implementing that requirement: from datetime import datetime
from zoneinfo import ZoneInfo
ph = PandaHub()
ts = datetime.now(ZoneInfo("Europe/Berlin"))
ts_id = ph.test_save_timestamp(ts)
## default timezone (UTC)
ph.test_get_timestamp(ts_id)
#> datetime.datetime(2024, 7, 31, 12, 32, 15, 127000, tzinfo=zoneinfo.ZoneInfo(key='UTC'))
## timezone from metadata (when saved)
ph.test_get_timestamp(ts_id, "ORIGIN")
#> datetime.datetime(2024, 7, 31, 14, 32, 15, 127000, tzinfo=zoneinfo.ZoneInfo(key='Europe/Berlin'))
## specific timezone
ph.test_get_timestamp(ts_id, "America/New_York")
#> datetime.datetime(2024, 7, 31, 8, 32, 15, 127000, tzinfo=zoneinfo.ZoneInfo(key='America/New_York'))
## change default timezone on PandaHub initialization
ph = PandaHub(tzinfo="Europe/Berlin")
## default timezone (Europe/Berlin)
ph.test_get_timestamp(ts_id)
#> datetime.datetime(2024, 7, 31, 14, 32, 15, 127000, tzinfo=zoneinfo.ZoneInfo(key='Europe/Berlin')) The changes also return timezone-aware datetimes everywhere (defaulting to UTC) , which should be the correct thing to do but may require some migrations for existing collections if they contain timestamps saved as naive objects meant to be in local time. |
Yes I think there where still some misunderstandings in my logic about the way pymongo handles the timestamps. I think I understand now and I agree with your points. There probably is a good reasons why it is recommended to always store in UTC and we should keep it that way as well. The only thing that is still unclear to me: what is the value of tz_aware? If all timestamps are stored in the MongoDB as UTC, then the status quo is that they are returned as naive timestamps. pandas seems to implicitly assume naive timestamps are UTC timestamps by default. So getting a timeseries with naive timestamps or UTC timestamps seems to be kind of the same? That would be the only difference with tz_aware, right? |
Yes, if pandas defaults to UTC for naive datetimes, it should make no difference in this case. But you need |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
testing PR approval
Timestamps are stored in UTC format in MongoDB. Before this PR timeseries timestamps retreived from the database had no localisation which resulted in shifted timestamps in applications because they had no information about the timezone.
This PR adds an option to localize timestamps as UTC to address this issue.