-
Notifications
You must be signed in to change notification settings - Fork 65
DateTimeFormats ‐ a friendly Date and Time parser
There are plenty of parsers out there. Heck, JDK has DateTimeFormatter
. Who needs yet another parser, right?
Right! It's insane to build a standalone Java DateTime parser when DateTimeFormatter
works fine.
But that doesn't mean we can't improve it.
What's to improve?
Yesterday I was using BigQuery to query some dates from a table. BigQuery by default displays all time in UTC, which is annoying so I wanted to use format_timetamp(period_timestamp, 'America/Los_Angeles')
so that I don't have to translate the timestamps in my head (I suck).
But nope. BigQuery didn't like me being so amateur. After some googling, I learned that it wanted me to use the %c
format specifier. And that reminded me: I had tried to re-learn that maybe 13 times in the past? Just can't keep it in my head.
"But that's BigQuery. Not Java", you say.
You are absolutely right. But Java's isn't much better. It's happened to me a few times when I was looking at some datetime strings, such as Tue, 10 Jun 2008 11:05:30 America/New_York
. They were either from a csv file I'm trying to load and parse or like an example string given by an API document. And each time I'd scratch my head and wonder: "What format string should I use?".
I did figure out, with enough javadoc reading and trial and error. And honestly, when I had to do the same thing in C++, it's even worse.
But you see the problem? If human eyes can look at a datetime string and immediately understand what it means, with no ambiguity, without anyone giving me the hint "friendly spoiler: that string format is 'EEE, d MMM yyyy HH:mm:ss zzz'", why can't the Java code do the same? It's not 1990 anymore!
When I mentioned this frustration with my colleagues, I was pointed to Golang's time library. It's an interesting library and I was quite jealous of golang having an easy way to specify date formats.
But then I leanred that golang does it by requiring the 2006 Jan 2 15:04:05
as the reference time. And while they picked this time to help with mnemonics, I still find it hard to remember, and arbitrary even.
There is a reason golang picked that reference date, because otherwise it's ambiguous when you see 11/12/2020
. In different regions of the world, it can mean November 12, or December 11.
And golang only supports a limited number of formatting options. Java DateTimeFormatter
has support for a wider set of formats, with things I didn't know existed, like ISO_ORDINAL_DATE
as in "2012-337" and ISO_WEEK_DATE
as in "2012-W48-6".
Can we do better?
That's a tough challenge. But do we need to? My goal is to solve an everyday problem where the datetime strings are clear to human eyes. If in rare cases I run into this exotice cases, I deserve spending the time to find the right format.
So ignore the exotic formats and the formats with localization. Let's just focus on common occurrences like 2024-01-20 12:30pm -08:00
, Tue, 2021/10/30T08:12:30.000 America/Los_Angeles
.
With the scope being restricted to human understandable, unambiguous date and time, shall we go ahead and build it?
Before we start, how confident are we that we won't mis-parse the string and produce incorrect result? The date time has a very complex set of rules. What if we make a mistake?
That's a very legit concern and we shouldn't try to parse by ourselves. So this is what I plan to do:
- Infer the
DateTimeFormatter
that can be used to parse the datetime string. - Be honest and strict. If something we can't infer, just fail.
- We don't blindly trust our inferred DateTimeFormatter. Instead, we shall use the inferred formatter to do a roundtrip parse-then-format. Only if we can successfully roundtrip to the same input string will we say that the inference succeeded.