You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I’m using the flatfile (gzipped) that contains quotes and trades data for backtesting. When I compare the quotes data for some stocks between the flatfile and the API, I notice discrepancies.
For context, I’m only focusing on data during regular trading hours. As an example, you can pick a couple of stocks, such as JOBY or RKLB, and examine the first-minute quotes for 2025-01-08.
This retrieves quotes from the first minute of trading. However, there’s no record of sequence number 406852 in the API response.
On the other hand, if I use the flatfile, unzip it, and extract the trades for JOBY from the first minute, the sequence number 406852 does appear. This is one example.
I’m concerned about the data integrity of the flatfile. Could someone please investigate if I’m missing anything? I need to identify a reliable source of truth since the data would need to arrive in real-time. Flatfile offers a much better way to download data for backtest (I'd argue you should use bzip2 or some other format, though not related)
The text was updated successfully, but these errors were encountered:
Here’s another example with FUBO (2025-01-08). The trade data appears consistent, but the quotes data shows approximately a 12% discrepancy in the number of quotes between the API and the flatfile (during regular trading hours)
I’m using the flatfile (gzipped) that contains quotes and trades data for backtesting. When I compare the quotes data for some stocks between the flatfile and the API, I notice discrepancies.
For context, I’m only focusing on data during regular trading hours. As an example, you can pick a couple of stocks, such as JOBY or RKLB, and examine the first-minute quotes for 2025-01-08.
For instance, using the API:
This retrieves quotes from the first minute of trading. However, there’s no record of sequence number
406852
in the API response.On the other hand, if I use the flatfile, unzip it, and extract the trades for JOBY from the first minute, the sequence number
406852
does appear. This is one example.Here’s a summary:
list_quotes
API:{'timestamp_gte': 1736346599999999999, 'timestamp_lt': 1736346660000000001, 'order': 'asc'}
I’m concerned about the data integrity of the flatfile. Could someone please investigate if I’m missing anything? I need to identify a reliable source of truth since the data would need to arrive in real-time. Flatfile offers a much better way to download data for backtest (I'd argue you should use bzip2 or some other format, though not related)
The text was updated successfully, but these errors were encountered: