Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proxy readyz flaps when auth unreachable #50589

Open
programmerq opened this issue Dec 26, 2024 · 0 comments
Open

Proxy readyz flaps when auth unreachable #50589

programmerq opened this issue Dec 26, 2024 · 0 comments
Labels
bug c-vdc Internal Customer Reference

Comments

@programmerq
Copy link
Contributor

programmerq commented Dec 26, 2024

According to https://goteleport.com/docs/admin-guides/management/diagnostics/monitoring/#readyz-and-heartbeats the healthcheck should behave like this:

If a Teleport component fails to execute its heartbeat procedure, it will enter a degraded state. Teleport will begin recovering from this state when a heartbeat completes successfully.

The first successful heartbeat will transition Teleport into a recovering state.

A second consecutive successful heartbeat will cause Teleport to transition to the OK state.

Teleport heartbeats run approximately every 60 seconds when healthy, and failed heartbeats are retried approximately every 5 seconds. This means that depending on the timing of heartbeats, it can take 60-70 seconds after connectivity is restored for /readyz to start reporting healthy again.

Instead, if a proxy instance (tested with no other roles than proxy) loses connectivity with the Auth service, it will flap between states in a loop.

I added additional logs in a test build (based on 16.4.12) to capture the behavior in more detail.

The fetchAndAnnounce() function in srv/heartbeat.go:624 is called once every 5 seconds. Depending on the current heartbeat state, it will take a different action. If it encounters an error, then a TeleportDegraded event is emitted by the caller. If there is no error, then TeleportOK is emitted.

If the state is HeartbeatStateAnnounce, then it will attempt to do the heartbeat announce. If there is a problem, it returns an error, and a TeleportDegraded event is emitted. The issue seems to stem from the fact that it will always emit a TeleportOK event if the heartbeat status is in HeartbeatStateAnnounceWait, even if it hasn't successfully emitted an event. This doesn't match the behavior as described in the documentation.

A Table showing the expected and observed states and events fired over time
Time interval Heartbeat State Actual Event Expected Event Actual Process State Expected Process State actual readyz
12/26/2024 15:35:01 HeartbeatStateInit - - stateStarting stateStarting teleport is starting and hasn't joined the cluster yet
12/26/2024 15:35:13 0:00:12 HeartbeatStateAnnounceWait TeleportOKEvent TeleportOKEvent stateOK stateOK ok
12/26/2024 15:35:16 0:00:03 HeartbeatStateAnnounceWait TeleportOKEvent None / no change stateOK stateOK ok
12/26/2024 15:35:21 0:00:05 HeartbeatStateAnnounceWait TeleportOKEvent None / no change stateOK stateOK ok
12/26/2024 15:35:22 ------ ERROR CONDITION INTRODUCED ---------
12/26/2024 15:35:26 0:00:05 HeartbeatStateAnnounceWait TeleportOKEvent None / no change stateOK stateOK ok
12/26/2024 15:35:31 0:00:05 HeartbeatStateAnnounceWait TeleportOKEvent None / no change stateOK stateOK ok
12/26/2024 15:35:36 0:00:05 HeartbeatStateAnnounceWait TeleportOKEvent None / no change stateOK stateOK ok
12/26/2024 15:35:41 0:00:05 HeartbeatStateAnnounceWait TeleportOKEvent None / no change stateOK stateOK ok
12/26/2024 15:35:46 0:00:05 HeartbeatStateAnnounceWait TeleportOKEvent None / no change stateOK stateOK ok
12/26/2024 15:35:51 0:00:05 HeartbeatStateAnnounceWait TeleportOKEvent None / no change stateOK stateOK ok
12/26/2024 15:35:56 0:00:05 HeartbeatStateAnnounceWait TeleportOKEvent None / no change stateOK stateOK ok
12/26/2024 15:36:01 0:00:05 HeartbeatStateAnnounceWait TeleportOKEvent None / no change stateOK stateOK ok
12/26/2024 15:36:06 0:00:05 HeartbeatStateAnnounceWait TeleportOKEvent None / no change stateOK stateOK ok
12/26/2024 15:36:11 0:00:05 HeartbeatStateAnnounceWait TeleportOKEvent None / no change stateOK stateOK ok
12/26/2024 15:36:16 0:00:05 HeartbeatStateAnnounceWait TeleportOKEvent None / no change stateOK stateOK ok
12/26/2024 15:36:21 0:00:05 HeartbeatStateAnnounceWait TeleportOKEvent None / no change stateOK stateOK ok
12/26/2024 15:36:26 0:00:05 HeartbeatStateAnnounceWait TeleportOKEvent None / no change stateOK stateOK ok
12/26/2024 15:36:31 0:00:05 HeartbeatStateAnnounceWait TeleportOKEvent None / no change stateOK stateOK ok
12/26/2024 15:36:36 0:00:05 HeartbeatStateAnnounceWait TeleportOKEvent None / no change stateOK stateOK ok
12/26/2024 15:36:41 0:00:05 HeartbeatStateAnnounceWait TeleportOKEvent None / no change stateOK stateOK ok
12/26/2024 15:36:46 0:00:05 HeartbeatStateAnnounceWait TeleportOKEvent None / no change stateOK stateOK ok
12/26/2024 15:36:51 0:00:05 HeartbeatStateAnnounceWait TeleportOKEvent None / no change stateOK stateOK ok
12/26/2024 15:36:56 0:00:05 HeartbeatStateAnnounceWait TeleportOKEvent None / no change stateOK stateOK ok
12/26/2024 15:37:01 0:00:05 HeartbeatStateAnnounceWait TeleportOKEvent None / no change stateOK stateOK ok
12/26/2024 15:37:06 0:00:05 HeartbeatStateAnnounceWait TeleportOKEvent None / no change stateOK stateOK ok
12/26/2024 15:37:11 0:00:05 HeartbeatStateAnnounceWait TeleportOKEvent None / no change stateOK stateOK ok
12/26/2024 15:37:16 0:00:05 HeartbeatStateAnnounceWait TeleportOKEvent None / no change stateOK stateOK ok
12/26/2024 15:37:21 0:00:05 HeartbeatStateAnnounceWait TeleportOKEvent None / no change stateOK stateOK ok
12/26/2024 15:37:26 0:00:05 HeartbeatStateAnnounceWait TeleportOKEvent None / no change stateOK stateOK ok
12/26/2024 15:37:31 0:00:05 HeartbeatStateAnnounceWait TeleportOKEvent None / no change stateOK stateOK ok
12/26/2024 15:37:36 0:00:05 HeartbeatStateAnnounceWait TeleportOKEvent None / no change stateOK stateOK ok
12/26/2024 15:37:41 0:00:05 HeartbeatStateAnnounceWait TeleportOKEvent None / no change stateOK stateOK ok
12/26/2024 15:37:46 0:00:05 HeartbeatStateAnnounceWait TeleportOKEvent None / no change stateOK stateOK ok
12/26/2024 15:37:51 0:00:05 HeartbeatStateAnnounceWait TeleportOKEvent None / no change stateOK stateOK ok
12/26/2024 15:37:56 0:00:05 HeartbeatStateAnnounceWait TeleportOKEvent None / no change stateOK stateOK ok
12/26/2024 15:38:01 0:00:05 HeartbeatStateAnnounceWait TeleportOKEvent None / no change stateOK stateOK ok
12/26/2024 15:38:06 0:00:05 HeartbeatStateAnnounceWait TeleportOKEvent None / no change stateOK stateOK ok
12/26/2024 15:38:11 0:00:05 HeartbeatStateAnnounceWait TeleportOKEvent None / no change stateOK stateOK ok
12/26/2024 15:38:16 0:00:05 HeartbeatStateAnnounceWait TeleportOKEvent None / no change stateOK stateOK ok
12/26/2024 15:38:21 0:00:05 HeartbeatStateAnnounceWait TeleportOKEvent None / no change stateOK stateOK ok
12/26/2024 15:38:26 0:00:05 HeartbeatStateAnnounceWait TeleportOKEvent None / no change stateOK stateOK ok
12/26/2024 15:38:31 0:00:05 HeartbeatStateAnnounceWait TeleportOKEvent None / no change stateOK stateOK ok
12/26/2024 15:38:36 0:00:05 HeartbeatStateAnnounceWait TeleportOKEvent None / no change stateOK stateOK ok
12/26/2024 15:38:41 0:00:05 HeartbeatStateAnnounceWait TeleportOKEvent None / no change stateOK stateOK ok
12/26/2024 15:38:46 0:00:05 HeartbeatStateAnnounceWait TeleportOKEvent None / no change stateOK stateOK ok
12/26/2024 15:38:51 0:00:05 HeartbeatStateAnnounceWait TeleportOKEvent None / no change stateOK stateOK ok
12/26/2024 15:38:56 0:00:05 HeartbeatStateAnnounceWait TeleportOKEvent None / no change stateOK stateOK ok
12/26/2024 15:39:01 0:00:05 HeartbeatStateAnnounceWait TeleportOKEvent None / no change stateOK stateOK ok
12/26/2024 15:39:06 0:00:05 HeartbeatStateAnnounceWait TeleportOKEvent None / no change stateOK stateOK ok
12/26/2024 15:39:11 0:00:05 HeartbeatStateAnnounceWait TeleportOKEvent None / no change stateOK stateOK ok
12/26/2024 15:39:16 0:00:05 HeartbeatStateAnnounceWait TeleportOKEvent None / no change stateOK stateOK ok
12/26/2024 15:39:21 0:00:05 HeartbeatStateAnnounceWait TeleportOKEvent None / no change stateOK stateOK ok
12/26/2024 15:39:26 0:00:05 HeartbeatStateAnnounceWait TeleportOKEvent None / no change stateOK stateOK ok
12/26/2024 15:39:31 0:00:05 HeartbeatStateAnnounceWait TeleportOKEvent None / no change stateOK stateOK ok
12/26/2024 15:39:36 0:00:05 HeartbeatStateAnnounceWait TeleportOKEvent None / no change stateOK stateOK ok
12/26/2024 15:39:41 0:00:05 HeartbeatStateAnnounceWait TeleportOKEvent None / no change stateOK stateOK ok
12/26/2024 15:39:46 0:00:05 HeartbeatStateAnnounceWait TeleportOKEvent None / no change stateOK stateOK ok
12/26/2024 15:39:51 0:00:05 HeartbeatStateAnnounceWait TeleportOKEvent None / no change stateOK stateOK ok
12/26/2024 15:39:56 0:00:05 HeartbeatStateAnnounceWait TeleportOKEvent None / no change stateOK stateOK ok
12/26/2024 15:40:01 0:00:05 HeartbeatStateAnnounceWait TeleportOKEvent None / no change stateOK stateOK ok
12/26/2024 15:40:06 0:00:05 HeartbeatStateAnnounceWait TeleportOKEvent None / no change stateOK stateOK ok
12/26/2024 15:40:11 0:00:05 HeartbeatStateAnnounceWait TeleportOKEvent None / no change stateOK stateOK ok
12/26/2024 15:40:16 0:00:05 HeartbeatStateAnnounceWait TeleportOKEvent None / no change stateOK stateOK ok
12/26/2024 15:40:21 0:00:05 HeartbeatStateAnnounce TeleportDegraded TeleportDegraded stateDegraded stateDegraded teleport is in a degraded state, check logs for details
12/26/2024 15:40:26 0:00:05 HeartbeatStateAnnounceWait TeleportOKEvent None / no change stateRecovering stateDegraded teleport is recovering from a degraded state, check logs for details
12/26/2024 15:40:31 0:00:05 HeartbeatStateAnnounceWait TeleportOKEvent None / no change stateOK stateDegraded ok
12/26/2024 15:40:36 0:00:05 HeartbeatStateAnnounceWait TeleportOKEvent None / no change stateOK stateDegraded ok
12/26/2024 15:40:41 0:00:05 HeartbeatStateAnnounceWait TeleportOKEvent None / no change stateOK stateDegraded ok
12/26/2024 15:40:46 0:00:05 HeartbeatStateAnnounceWait TeleportOKEvent None / no change stateOK stateDegraded ok
12/26/2024 15:40:51 0:00:05 HeartbeatStateAnnounceWait TeleportOKEvent None / no change stateOK stateDegraded ok
12/26/2024 15:40:56 0:00:05 HeartbeatStateAnnounceWait TeleportOKEvent None / no change stateOK stateDegraded ok
12/26/2024 15:41:01 0:00:05 HeartbeatStateAnnounceWait TeleportOKEvent None / no change stateOK stateDegraded ok
12/26/2024 15:41:06 0:00:05 HeartbeatStateAnnounceWait TeleportOKEvent None / no change stateOK stateDegraded ok
12/26/2024 15:41:11 0:00:05 HeartbeatStateAnnounceWait TeleportOKEvent None / no change stateOK stateDegraded ok
12/26/2024 15:41:16 0:00:05 HeartbeatStateAnnounceWait TeleportOKEvent None / no change stateOK stateDegraded ok
12/26/2024 15:41:21 0:00:05 HeartbeatStateAnnounceWait TeleportOKEvent None / no change stateOK stateDegraded ok
12/26/2024 15:41:26 0:00:05 HeartbeatStateAnnounce TeleportDegraded TeleportDegraded stateDegraded stateDegraded teleport is in a degraded state, check logs for details
12/26/2024 15:41:31 0:00:05 HeartbeatStateAnnounceWait TeleportOKEvent None / no change stateRecovering stateDegraded teleport is recovering from a degraded state, check logs for details
12/26/2024 15:41:36 0:00:05 HeartbeatStateAnnounceWait TeleportOKEvent None / no change stateOK stateDegraded ok
12/26/2024 15:41:41 0:00:05 HeartbeatStateAnnounceWait TeleportOKEvent None / no change stateOK stateDegraded ok
12/26/2024 15:41:46 0:00:05 HeartbeatStateAnnounceWait TeleportOKEvent None / no change stateOK stateDegraded ok
12/26/2024 15:41:51 0:00:05 HeartbeatStateAnnounceWait TeleportOKEvent None / no change stateOK stateDegraded ok
12/26/2024 15:41:56 0:00:05 HeartbeatStateAnnounceWait TeleportOKEvent None / no change stateOK stateDegraded ok
12/26/2024 15:42:01 0:00:05 HeartbeatStateAnnounceWait TeleportOKEvent None / no change stateOK stateDegraded ok
12/26/2024 15:42:06 0:00:05 HeartbeatStateAnnounceWait TeleportOKEvent None / no change stateOK stateDegraded ok

A load balancer that is trying to use /readyz to determine whether to send traffic to this proxy will get inconsistent data. It will show as degraded or recovering for a brief time, and then go back to ok. This doesn't seem like the intended behavior, but if it is, the documentation should reflect it.

Steps to Reproduce:

  • Start a proxy-only Teleport instance.
  • Block access to the Teleport auth service. I used a kubectl port-forward, so I just stopped the process so the proxy would see "connection refused"
  • Watch the /readyz response in a loop over time. Note that it will only show "degraded" and then "recovering" in 5 second windows each, and then return to ok.
@programmerq programmerq added bug c-vdc Internal Customer Reference labels Dec 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug c-vdc Internal Customer Reference
Projects
None yet
Development

No branches or pull requests

1 participant