Teleport Kubernetes agent does not work / keeps crashing #34393

LarsSven · 2023-11-09T14:09:31Z

LarsSven
Nov 9, 2023

We have a set of Kubernetes clusters set up with agents connecting them to a Teleport clusters. For all clusters this works and has worked for a long time, but recently in one of our new clusters the agent keeps having connection issues / crashing. We get the following logs in the agent:

2023-11-09T14:00:41Z INFO             Starting Teleport v14.1.1 with a config file located at "/etc/teleport/teleport.yaml" common/teleport.go:588
2023-11-09T14:00:41Z INFO [PROC:1]    Service diag is creating new listener on 0.0.0.0:3000. pid:7.1 service/signals.go:247
2023-11-09T14:00:41Z INFO [DIAG:1]    Starting diagnostic service on 0.0.0.0:3000. pid:7.1 service/service.go:3141
2023-11-09T14:00:41Z INFO [PROC:1]    Connecting to the cluster teleport.researchable.dev with TLS client certificate. pid:7.1 service/connect.go:208
2023-11-09T14:00:41Z INFO [PROC:1]    Connecting to the cluster teleport.researchable.dev with TLS client certificate. pid:7.1 service/connect.go:208
2023-11-09T14:00:41Z INFO [PROC:1]    Instance: features loaded from auth server: Kubernetes:true App:true DB:true Desktop:true Assist:true DeviceTrust:<> AccessRequests:<>  pid:7.1 service/connect.go:92
2023-11-09T14:00:41Z INFO [INSTANCE:] Successfully registered instance client. pid:7.1 service/service.go:2452
2023-11-09T14:00:41Z INFO [UPLOAD:1]  starting upload completer service pid:7.1 service/service.go:2858
2023-11-09T14:00:41Z INFO [UPLOAD:1]  Creating directory /var/lib/teleport/log. pid:7.1 service/service.go:2874
2023-11-09T14:00:41Z INFO [UPLOAD:1]  Creating directory /var/lib/teleport/log/upload. pid:7.1 service/service.go:2874
2023-11-09T14:00:41Z INFO [UPLOAD:1]  Creating directory /var/lib/teleport/log/upload/streaming. pid:7.1 service/service.go:2874
2023-11-09T14:00:41Z INFO [UPLOAD:1]  Creating directory /var/lib/teleport/log/upload/streaming/default. pid:7.1 service/service.go:2874
2023-11-09T14:00:41Z INFO [UPLOAD:1]  Creating directory /var/lib/teleport/log. pid:7.1 service/service.go:2874
2023-11-09T14:00:41Z INFO [UPLOAD:1]  Creating directory /var/lib/teleport/log/upload. pid:7.1 service/service.go:2874
2023-11-09T14:00:41Z INFO [UPLOAD:1]  Creating directory /var/lib/teleport/log/upload/corrupted. pid:7.1 service/service.go:2874
2023-11-09T14:00:41Z INFO [UPLOAD:1]  Creating directory /var/lib/teleport/log/upload/corrupted/default. pid:7.1 service/service.go:2874
2023-11-09T14:00:41Z INFO [PROC:1]    Reusing Instance client for Kube. additionalSystemRoles=[Kube] pid:7.1 service/connect.go:1057
2023-11-09T14:00:41Z INFO [UPLOAD]    uploader will scan /var/lib/teleport/log/upload/streaming/default every 5s filesessions/fileasync.go:192
2023-11-09T14:00:41Z INFO [UPLOAD:1:] upload completer will run every 5m0s events/complete.go:143
2023-11-09T14:00:41Z WARN             Inventory control stream failed: inventory control stream closed: control streams do not support impersonation ("4dbd4c8f-cc69-4564-8107-80f740270bff" -> "64da9ab4-5d71-4015-902b-a4ed731f46e4") inventory/inventory.go:196
2023-11-09T14:00:41Z INFO [PROC:1]    Kube: features loaded from auth server: Kubernetes:true App:true DB:true Desktop:true Assist:true DeviceTrust:<> AccessRequests:<>  pid:7.1 service/connect.go:92
2023-11-09T14:00:42Z INFO [KUBERNETE] Cache "kube" first init succeeded. cache/cache.go:910
2023-11-09T14:00:42Z INFO [KUBERNETE] Started reverse tunnel client. pid:7.1 service/kubernetes.go:148
2023-11-09T14:00:42Z INFO [KUBERNETE] Starting Kube service via proxy reverse tunnel. pid:7.1 service/kubernetes.go:252
2023-11-09T14:00:42Z INFO [PROC:1]    The new service has started successfully. Starting syncing rotation status with period 10m0s. pid:7.1 service/connect.go:684
2023-11-09T14:00:44Z WARN             Inventory control stream failed: inventory control stream closed: control streams do not support impersonation ("4dbd4c8f-cc69-4564-8107-80f740270bff" -> "64da9ab4-5d71-4015-902b-a4ed731f46e4") inventory/inventory.go:196
2023-11-09T14:00:55Z WARN             Inventory control stream failed: inventory control stream closed: control streams do not support impersonation ("4dbd4c8f-cc69-4564-8107-80f740270bff" -> "64da9ab4-5d71-4015-902b-a4ed731f46e4") inventory/inventory.go:196
2023-11-09T14:01:19Z WARN             Inventory control stream failed: inventory control stream closed: control streams do not support impersonation ("4dbd4c8f-cc69-4564-8107-80f740270bff" -> "64da9ab4-5d71-4015-902b-a4ed731f46e4") inventory/inventory.go:196
2023-11-09T14:01:57Z WARN             Inventory control stream failed: inventory control stream closed: control streams do not support impersonation ("4dbd4c8f-cc69-4564-8107-80f740270bff" -> "64da9ab4-5d71-4015-902b-a4ed731f46e4") inventory/inventory.go:196
2023-11-09T14:02:52Z WARN [KUBERNETE] Keep alive has failed: access denied. srv/heartbeat.go:601
2023-11-09T14:02:52Z WARN [KUBERNETE] Heartbeat failed keep alive channel closed. srv/heartbeat.go:282
2023-11-09T14:02:52Z INFO [PROC:1]    Detected Teleport component "kubernetes" is running in a degraded state. pid:7.1 service/state.go:106
2023-11-09T14:02:55Z WARN             Inventory control stream failed: inventory control stream closed: control streams do not support impersonation ("4dbd4c8f-cc69-4564-8107-80f740270bff" -> "64da9ab4-5d71-4015-902b-a4ed731f46e4") inventory/inventory.go:196
2023-11-09T14:02:57Z INFO [PROC:1]    Teleport component "kubernetes" is recovering from a degraded state. pid:7.1 service/state.go:120
2023-11-09T14:03:12Z INFO [PROC:1]    Teleport component "kubernetes" has recovered from a degraded state. pid:7.1 service/state.go:124
2023-11-09T14:03:58Z WARN             Inventory control stream failed: inventory control stream closed: control streams do not support impersonation ("4dbd4c8f-cc69-4564-8107-80f740270bff" -> "64da9ab4-5d71-4015-902b-a4ed731f46e4") inventory/inventory.go:196
2023-11-09T14:05:00Z WARN             Inventory control stream failed: inventory control stream closed: control streams do not support impersonation ("4dbd4c8f-cc69-4564-8107-80f740270bff" -> "64da9ab4-5d71-4015-902b-a4ed731f46e4") inventory/inventory.go:196
2023-11-09T14:05:07Z WARN [KUBERNETE] Keep alive has failed: access denied. srv/heartbeat.go:601
2023-11-09T14:05:07Z WARN [KUBERNETE] Heartbeat failed keep alive channel closed. srv/heartbeat.go:282
2023-11-09T14:05:07Z INFO [PROC:1]    Detected Teleport component "kubernetes" is running in a degraded state. pid:7.1 service/state.go:106
2023-11-09T14:05:12Z INFO [PROC:1]    Teleport component "kubernetes" is recovering from a degraded state. pid:7.1 service/state.go:120
2023-11-09T14:05:27Z INFO [PROC:1]    Teleport component "kubernetes" has recovered from a degraded state. pid:7.1 service/state.go:124
2023-11-09T14:06:13Z WARN             Inventory control stream failed: inventory control stream closed: control streams do not support impersonation ("4dbd4c8f-cc69-4564-8107-80f740270bff" -> "64da9ab4-5d71-4015-902b-a4ed731f46e4") inventory/inventory.go:196

The readiness probe then constantly fails with 400 and 503 errors.

This agent has the exact same manifest/setup as all our other agents, which do work (in fact they're all deployed as an ArgoCD AppSet), but this one agent is the only one that keeps breaking / not working.

The cluster runs on Azure, we have other Azure clusters that do in fact work.

Does anyone have an idea on where to get started with fixing this issue? We've been looking into it for a while but we really cannot figure it out.

Answered by webvictim

Mar 9, 2024

This is likely an issue to do with the pod not having valid instance credentials (which are used for sending version information to the control plane) and not being able to use its existing join method to generate new ones, or some kind of internal UUID mismatch.

The simplest way to fix this is to delete the pod's state secret which holds its credentials, then restart the pod so it joins the cluster as a fresh agent. This requires that you have a valid join method configured in the chart values, so you should update your values and run a helm upgrade first if needed.

The secret is in the same namespace as the agent pod and called <helm-release-name>-0-state.

View full answer

totegamma · 2024-01-21T13:56:43Z

totegamma
Jan 21, 2024

any updates on this? i'm having the same trouble.

0 replies

dancmeyers · 2024-03-08T12:12:10Z

dancmeyers
Mar 8, 2024

Same sort of failures suddenly started appearing here.

0 replies

webvictim · 2024-03-09T04:03:12Z

webvictim
Mar 9, 2024
Collaborator

This is likely an issue to do with the pod not having valid instance credentials (which are used for sending version information to the control plane) and not being able to use its existing join method to generate new ones, or some kind of internal UUID mismatch.

The simplest way to fix this is to delete the pod's state secret which holds its credentials, then restart the pod so it joins the cluster as a fresh agent. This requires that you have a valid join method configured in the chart values, so you should update your values and run a helm upgrade first if needed.

The secret is in the same namespace as the agent pod and called <helm-release-name>-0-state.

0 replies

dancmeyers · 2024-03-11T11:26:04Z

dancmeyers
Mar 11, 2024

Can confirm, deleting the secret and pod for our failing agent worked! The container restarted, the secret was recreated, and it joined just fine.

For reference, we are configured to use IAM roles for agents joining the cluster, so this was very easy. Had we still been using the older method of having to create a short-lived token on the cluster for the initial join, and then the agents refreshing their internal access method regularly once that initial join has occurred, this would have been more of a hassle to do.

2 replies

mehdibenfeguir Apr 24, 2024

but where is the secret you are talking about located, there are no secrets at all in the namespace where I installed the helm chart

malahayatii Dec 18, 2024

Per default the secret should be located where your teleport-kube-agent is deployed. Assuming you haven't configured it otherwise.

teleport-kube-agent-0-state 
teleport-kube-agent-0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Teleport Kubernetes agent does not work / keeps crashing #34393

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 4 comments 2 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

Teleport Kubernetes agent does not work / keeps crashing #34393

LarsSven Nov 9, 2023

Replies: 4 comments · 2 replies

totegamma Jan 21, 2024

dancmeyers Mar 8, 2024

webvictim Mar 9, 2024 Collaborator

dancmeyers Mar 11, 2024

mehdibenfeguir Apr 24, 2024

malahayatii Dec 18, 2024

LarsSven
Nov 9, 2023

Replies: 4 comments 2 replies

totegamma
Jan 21, 2024

dancmeyers
Mar 8, 2024

webvictim
Mar 9, 2024
Collaborator

dancmeyers
Mar 11, 2024