-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
badarith
errors when collecting metrics
#12815
Comments
badarith
errors when collecting metricsbadarith
errors when collecting metrics
I'm not sure if that's because I'm testing a slightly different scenario perhaps these errors are caused by some recent changes but I haven't seen them in the past at all and now I see them very often. |
We have seen these before, usually on a freshly booted node when some values are not yet available. The Prometheus plugin should filter such values out because if a data point isn't available yet, what else can it do? |
I've seen this enough times now to say that it's certainly not about a freshly booted node, but about a queue getting deleted. |
@mkuratczyk yup, that can be another case where some metrics no longer exist. During a boot, they do not yet exist, and after queue deletion, they no longer exist. I am all for making the code more defensive but if some samples are missing… what other than an error can the Prometheus scraping API endpoint return? |
Describe the bug
I don't have the exact repro steps. I think this is a race condition when a queue is deleted while the metrics are collected and the queue is found by the collector but then returns empty values (
''
) instead of the expected numbers. I've seen this occur in 3 different places:Observed on the
main
branch on November 26Reproduction steps
Not clear
Expected behavior
No crashes
Additional context
No response
The text was updated successfully, but these errors were encountered: