Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bottlerocket-update-operator not updating the labeled nodes #706

Open
NishanthNaniReddy opened this issue Jan 10, 2025 · 4 comments
Open

Comments

@NishanthNaniReddy
Copy link

Image I'm using:
bottlerocket-aws-k8s-1.30-x86_64-v1.28.0-0ab4fab4

Issue or Feature Request:

Hi Team,

I tried installing bottle rocket-update-operator on my cluster and I could see all the pods ( agent, api-server and the controller) up and running.
I used the v1.28 image(bottlerocket-aws-k8s-1.30-x86_64-v1.28.0-0ab4fab4) when I created the cluster hoping that the brupop would update the node ( I labeled ) with v1.29 as this is the latest as of now.

But for some reason the agent is considering the target version same as the current version i.e. 1.28 and it says "No action detected" and waiting for the next scheduled time . The target version should be the latest i.e. 1.29 as per my understanding .

( I also tried with BR image with sub-version v.127 and the issue is same even in this case . agent logs says the below
"{ current_version: "1.27.0", target_version: "1.27.0", current_state: Idle, crash_count: 0, state_transition_failure_timestamp: None }) }")

Logs:
{ api_version: "v1", block_owner_deletion: None, controller: None, kind: "Node", name: "ip-10-168-39-166.eu-west-2.compute.internal", uid: "6c865f3c-310a-4fcc-bd63-491d5b89e957" }]), resource_version: Some("18763727"), self_link: None, uid: Some("66f464ca-8f05-4bbe-9fce-5d71873838e0") }, spec: BottlerocketShadowSpec { state: Idle, state_transition_timestamp: None, version: None }, status: Some(BottlerocketShadowStatus { current_version: "1.28.0", target_version: "1.28.0", current_state: Idle, crash_count: 0, state_transition_failure_timestamp: None }) }, state: Idle, shadow_error_info: ShadowErrorInfo { crash_count: 0, state_transition_failure_timestamp: None }

Could you please let me know if my understanding on this is correct or am I missing something?

Thanks.

@ytsssun
Copy link
Contributor

ytsssun commented Jan 10, 2025

Hi @NishanthNaniReddy , thanks for opening this issue. May I know what your brupop setup is like?

  1. How many bottlerocket nodes are there in the cluster that has brupop installed?
  2. What is the output of apiclient update check when you log into the bottlerocket node?

I was not able to reproduce this with the bottlerocket-aws-k8s-1.29-x86_64-v1.28.0-0ab4fab4 AMI. My setup is on us-west-2 and I am using 1.29 k8s version. I will try the exact setup you have and report back.

My brupop agent was able to detect the 1.29.0 version and bump to it

  2025-01-10T23:35:26.434844Z  INFO agent::agentclient: Brs status has been updated., brs_name: "ip-192-168-74-140.us-west-2.compute.internal", brs_status: BottlerocketShadowStatus { current_version: "1.28.0", target_version: "1.29.0", current_state: StagedAndPerformedUpdate, crash_count: 0, state_transition_failure_timestamp: None }
kubectl get brs --namespace brupop-bottlerocket-aws       
NAME                                               STATE                      VERSION   TARGET STATE         TARGET VERSION   CRASH COUNT
brs-ip-192-168-71-160.us-west-2.compute.internal   Idle                       1.28.0    Idle                 <no value>       0
brs-ip-192-168-74-140.us-west-2.compute.internal   StagedAndPerformedUpdate   1.28.0    RebootedIntoUpdate   1.29.0           0
brs-ip-192-168-88-44.us-west-2.compute.internal    Idle                       1.29.0    Idle                 1.29.0           0

@ytsssun
Copy link
Contributor

ytsssun commented Jan 11, 2025

Update to the above comment. I later used the same bottlerocket-aws-k8s-1.30-x86_64-v1.28.0-0ab4fab4 in eu-west-2 and test the update. It also worked for me.

  2025-01-11T00:04:36.322804Z  INFO agent::agentclient: Brs status has been updated., brs_name: "ip-192-168-173-228.eu-west-2.compute.internal", brs_status: BottlerocketShadowStatus { current_version: "1.28.0", target_version: "1.29.0", current_state: Idle, crash_count: 0, state_transition_failure_timestamp: None }

It would also be helpful to share your scheduler_cron_expression.

@NishanthNaniReddy
Copy link
Author

Hi @ytsssun ,
Thanks for your response !

My cluster is on eu-west-2 and with 1.30 K8s version . It has got 2 nodes and I have added the label to only one node.

Below are the logs screenshots of agent, api-server and controller pods..

Screenshot 2025-01-09 at 12 32 39 Screenshot 2025-01-09 at 12 33 03 Screenshot 2025-01-09 at 12 34 46 Screenshot 2025-01-10 at 15 18 43

@NishanthNaniReddy
Copy link
Author

NishanthNaniReddy commented Jan 13, 2025

Ah, Looks like it's failing to fetch the updates ....
There is a netpol to allow-all in brupop-bottlerocket-aws namespace and "https://updates.bottlerocket.aws" is in allowed egress list. Not sure what is blocking here to fetch the data ...

apiclient update check

12:02:34 [INFO] Refreshing updates...
Failed to check for updates: refresh attempt failed with status 'Failed' (-1): Metadata error: Failed to fetch https://updates.bottlerocket.aws/2020-07-07/aws-k8s-1.30/x86_64/7.root.json: Transport 'other' error fetching 'https://updates.bottlerocket.aws/2020-07-07/aws-k8s-1.30/x86_64/7.root.json?seed=1322&version=1.28.0': error sending request for url (https://updates.bottlerocket.aws/2020-07-07/aws-k8s-1.30/x86_64/7.root.json?seed=1322&version=1.28.0)

Its able to make connection too :

curl -kv https://updates.bottlerocket.aws/2020-07-07/aws-k8s-1.30/x86_64/7.root.json?seed=1322&version=1.28.0

  • Trying 18.165.201.101:443...
  • Connected to updates.bottlerocket.aws (18.165.201.101) port 443
  • ALPN: curl offers h2,http/1.1
  • Cipher selection: ALL:!EXPORT:!EXPORT40:!EXPORT56:!aNULL:!LOW:!RC4:@strength
  • TLSv1.2 (OUT), TLS handshake, Client hello (1):
  • Recv failure: Connection reset by peer
  • OpenSSL SSL_connect: Connection reset by peer in connection to updates.bottlerocket.aws:443
  • Closing connection
    curl: (35) Recv failure: Connection reset by peer

@NishanthNaniReddy NishanthNaniReddy changed the title Bottlerocket-update-operator agent pod considering the "target version" incorrectly Bottlerocket-update-operator not updating the labeled nodes Jan 13, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants