Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mismatch between flatfile quotes data and API quotes data. #828

Open
mangled-data opened this issue Jan 9, 2025 · 1 comment
Open

Mismatch between flatfile quotes data and API quotes data. #828

mangled-data opened this issue Jan 9, 2025 · 1 comment
Labels
bug Something isn't working

Comments

@mangled-data
Copy link

I’m using the flatfile (gzipped) that contains quotes and trades data for backtesting. When I compare the quotes data for some stocks between the flatfile and the API, I notice discrepancies.

For context, I’m only focusing on data during regular trading hours. As an example, you can pick a couple of stocks, such as JOBY or RKLB, and examine the first-minute quotes for 2025-01-08.

For instance, using the API:

curl "https://api.polygon.io/v3/quotes/JOBY?timestamp.gt=1736346600060827136&order=asc&limit=2000&sort=timestamp&apiKey=YOUR-KEY" | jq .results[].sequence_number | grep 406852

This retrieves quotes from the first minute of trading. However, there’s no record of sequence number 406852 in the API response.

On the other hand, if I use the flatfile, unzip it, and extract the trades for JOBY from the first minute, the sequence number 406852 does appear. This is one example.

Here’s a summary:

  • Flatfile 1-minute trading data:
    • Total trades: 1295
    • Total quotes: 1580
  • Using list_quotes API:
    • Query parameters: {'timestamp_gte': 1736346599999999999, 'timestamp_lt': 1736346660000000001, 'order': 'asc'}
    • Total trades: 1295
    • Total quotes: 1371

I’m concerned about the data integrity of the flatfile. Could someone please investigate if I’m missing anything? I need to identify a reliable source of truth since the data would need to arrive in real-time. Flatfile offers a much better way to download data for backtest (I'd argue you should use bzip2 or some other format, though not related)

@mangled-data mangled-data added the bug Something isn't working label Jan 9, 2025
@mangled-data
Copy link
Author

Here’s another example with FUBO (2025-01-08). The trade data appears consistent, but the quotes data shows approximately a 12% discrepancy in the number of quotes between the API and the flatfile (during regular trading hours)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant