Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update standard K8s version policy #334

Merged
merged 1 commit into from
Nov 20, 2023
Merged

Update standard K8s version policy #334

merged 1 commit into from
Nov 20, 2023

Conversation

cah-hbaum
Copy link
Contributor

The standard for the K8s version policy is updated in order to add the support period for Kubernetes versions into it. Since this is a bit contradictive with the K8s version recency, the whole standard needed to be updated.

@cah-hbaum cah-hbaum added Container Issues or pull requests relevant for Team 2: Container Infra and Tooling SCS is standardized SCS is standardized SCS-VP10 Related to tender lot SCS-VP10 Sprint Izmir Sprint Izmir (2023, cwk 32+33) labels Aug 21, 2023
@cah-hbaum cah-hbaum self-assigned this Aug 21, 2023
@cah-hbaum cah-hbaum force-pushed the issue/386 branch 2 times, most recently from d4af8fc to 31324f5 Compare August 21, 2023 07:53
@cah-hbaum cah-hbaum linked an issue Aug 21, 2023 that may be closed by this pull request
4 tasks
Copy link
Member

@garloff garloff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two points on language:

  • Standards should not make extensive use of "probably" or "we believe". We need to clarify these things until we are convinced. And then just make statements what is expected.
  • relative clause "The reason that XYZ can be done is ..." do not have commas in English

Two on the content:

  • This has been toned down into mere recommendations while the original intention was to have stronger language MUST. Did we agree as a team to really make it so soft? How much value remains then if customers can not rely on anything even if the container SCS-compatible cert has been granted?
  • There seems to be confusion about support period for minor versions that puzzles me. Let's clarify this!

Standards/scs-0210-v2-k8s-version-policy.md Outdated Show resolved Hide resolved
Standards/scs-0210-v2-k8s-version-policy.md Outdated Show resolved Hide resolved
Standards/scs-0210-v2-k8s-version-policy.md Outdated Show resolved Hide resolved
Standards/scs-0210-v2-k8s-version-policy.md Outdated Show resolved Hide resolved
Standards/scs-0210-v2-k8s-version-policy.md Outdated Show resolved Hide resolved
Standards/scs-0210-v2-k8s-version-policy.md Outdated Show resolved Hide resolved
Standards/scs-0210-v2-k8s-version-policy.md Outdated Show resolved Hide resolved
Standards/scs-0210-v2-k8s-version-policy.md Outdated Show resolved Hide resolved
Standards/scs-0210-v2-k8s-version-policy.md Outdated Show resolved Hide resolved
@cah-hbaum
Copy link
Contributor Author

cah-hbaum commented Aug 24, 2023

Answering to #334 (review) and #334 (comment) since they ask the same things:

So I had a misunderstanding on how this K8s providing should work, since I thought that would be the provided version for things like managed K8s clusters.
If the K8s version means the version made available to the users of a cloud by the provider, this would imply two problems to me:

  1. The previously made test for k8s version recency is wrong, since we just test a version of a cluster. This wouldn't matter in the case described above, since a cluster wouldn't need to be updated right away (only the version the provider needs to offer) and the failure on the test if a version is delayed (according to the standard) would be false in this case as well.
  2. If a new version is published by Kubernetes, I see no reason to not offer it right away. I think the versions are sufficiently tested and having a delay between the official release date and a "provided" release date just feels not right.

I think what I would want to know here to clear that up:
What is the use-case?
How are the Kubernetes versions provided?
And can this even be tested (in the next step), since we can't check if a version is "provided", only the version of a cluster (?)

I would continue here if these things are cleared up, since I think they're essential to continuing.

@garloff
Copy link
Member

garloff commented Sep 8, 2023

So I had a misunderstanding on how this K8s providing should work, since I thought that would be the provided version for things like managed K8s clusters.

I assume that even in managed k8s offerings, the k8s version updates are normally done only with the user's consent.
At least for minor updates (like going from v1.24.xx to v1.25.xx), the upgrade tends to require the user (who manages the workloads) to review the changelogs with the deprecations ... to ensure the workload is still working after such an upgrade.

So, looking at a cluster and determine its level of outdatedness may be useful, but does not provide important insight into standards compliance. (Well, if it is recent, we may be able to conclude that the operator is SCS-compatible, but the outdatedness does not mean that he's not.)

Instead, we need to ensure that the latest patchlevel (with a delay of less than a week) is available.
Same for latest minor versions (with a max delay of a generous 4 months).

I agree that testing this is difficult.

Now, this is all specific to our reference implementation, unfortunately.
Tests can be developed to check openstack-kube-versions.inc and the image availability.
In a generic approach, we have not defined a way to discover available k8s versions, so we can not yet cover other providers. Maybe we'll have a mechanism for all clusterstack users when we move to next-gen Ref.Impl.

@cah-hbaum
Copy link
Contributor Author

Now, this is all specific to our reference implementation, unfortunately. Tests can be developed to check openstack-kube-versions.inc and the image availability. In a generic approach, we have not defined a way to discover available k8s versions, so we can not yet cover other providers. Maybe we'll have a mechanism for all clusterstack users when we move to next-gen Ref.Impl.

Ok I think I understand the intention behind all of this, but doesn't this still make the first test for k8s cluster recency incorrect, since the test is only applied to a cluster and we shouldn't look for a version recency in a cluster, since this is seemingly up to the user?

@cah-hbaum
Copy link
Contributor Author

Made an update to the text.
I still think it will be very hard to test this (if its even possible), but we will see. This is only the standard for now.

@cah-hbaum cah-hbaum requested a review from matofeder September 28, 2023 07:21
@anjastrunk anjastrunk requested review from anjastrunk and removed request for anjastrunk September 28, 2023 11:29
More information can be found under [Kubernetes Support Period].

The Kubernetes releases are planned for about 4 months, which usually results in about
**3 minor** releases per year [Kubernetes Release Cycle].
Copy link
Contributor

@anjastrunk anjastrunk Sep 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
**3 minor** releases per year [Kubernetes Release Cycle].
**3 minor** releases per year [Kubernetes Release Cycle](https://kubernetes.io/releases/release/).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just recognized, that links are listed at the end of the document. Is this a common practice in SCS standards?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. Linking docs that explain a statement or term is best be done in place. A link I would click if I need an explanation or definition or want to see the source of truth for a statement.

A references section at the end is more for further reading or background reading (in case I'm new to the topic).
Also link to related content, e.g. related standards.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we really put links in different locations depending on their relation to the standard? I would rather try to put everything in the final section and check if the link nevertheless works in the document itself.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In favor of readability, I would add links in place. And put, as @garloff mentioned, additional documents, not related to a special term, at the bottom of the document.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Put the links in the text.

@garloff
Copy link
Member

garloff commented Sep 29, 2023

If the K8s version means the version made available to the users of a cloud by the provider, this would imply two problems to me:

1. The previously made test for `k8s version recency` is wrong, since we just test a version of a cluster. This wouldn't matter in the case described above, since a cluster wouldn't need to be updated right away (only the version the provider needs to offer) and the failure on the test if a version is delayed (according to the standard) would be false in this case as well.

Right. The mere existence of a cluster with outdated k8s version in it does not indicate a failure to comply with the SCS-compatibility. (For a managed k8s scenario, I would expect a provider to push customers to agree to upgrading at some point though, but we have no recommended let alone mandatory policies for this at this point.)

A test would look like this:

  • We try to create a cluster with a version that should be available according to the policy.
    • If it succeeds (i.e. a working cluster is created), we are good.
    • It it fails, we are likely not.

@garloff
Copy link
Member

garloff commented Sep 29, 2023

2. If a new version is published by Kubernetes, I see no reason to not offer it right away. I think the versions are sufficiently tested and having a delay between the official release date and a "provided" release date just feels not right.

Well ... we currently don't offer k8s-v1.28.x officially yet, as capo does not officially support k8s-v1.28 yet.
(Yes, we provide it, it's tested and works, but we have protected it with an --allow-preview-versions flag to makr it as TechPreview.)
The up to four months is a compromise that came out of the discussion with providers. While it seems long to me, this is what everyone was willing to commit to.
For patch levels, one week seems reasonable to me -- you need to build node images and run the E2E tests at least.

@cah-hbaum
Copy link
Contributor Author

@garloff thanks for your comments, they make this issue and standard more clear to me.
I will make a small update today in order to fix the issues coming from the comments and then make a slightly bigger update next week in order to make things more clear with regard to the overall story of this standard.

@cah-hbaum
Copy link
Contributor Author

Update the text a bit more and rebased the whole branch in order to include the latest changes from main.

@cah-hbaum cah-hbaum requested a review from anjastrunk October 5, 2023 08:05
@cah-hbaum cah-hbaum force-pushed the issue/386 branch 2 times, most recently from 8d3ccba to f3c45de Compare October 18, 2023 07:25
Copy link
Contributor

@anjastrunk anjastrunk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@cah-hbaum cah-hbaum force-pushed the issue/386 branch 2 times, most recently from 247fa57 to 5ffca2c Compare November 6, 2023 07:38
@cah-hbaum cah-hbaum added the standards Issues / ADR / pull requests relevant for standardization & certification label Nov 7, 2023
@cah-hbaum cah-hbaum force-pushed the issue/386 branch 2 times, most recently from 2302730 to bd7c6df Compare November 9, 2023 08:49
Copy link
Member

@garloff garloff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry if I have missed a prior answer or discussion.
But I believe that we've had discussed mandatory requirements for version recency before. These are useful, as it allows users to know which versions she can rely on as available on all KaaS platforms that are certified as SCS-compatible on the container layer. As user, it would make my live easier, as I can then decide for validating my workload on a k8s versions that I know must be available on all SCS-compatible KaaS solutions. A should does not provide this guarantee.
So were there reasons for watering this down where I missed a discussion?
Was there a discussion that resulted in g=convincing reasons that we are really

In order to keep up-to date with the latest Kubernetes features, bug fixes and security improvements,
the provided Kubernetes versions should be kept up to date with the upstream.

- The latest minor version SHOULD be provided no later than 4 months after release.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this not a MUST?
4 months is plenty of time to get the latest minor release built, validated, fixed (if needed, which is normally not the case) and published.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To answer all of the MUST questions, I had a grave misunderstanding with the way how MUST and SHOULD are used. In my mind, they were equal, but after reading the RFC document regarding MUST / SHOULD / ... I can see my mistake.
And you're right, all of the ones you pointed out should be MUST.

the provided Kubernetes versions should be kept up to date with the upstream.

- The latest minor version SHOULD be provided no later than 4 months after release.
- The latest patch version SHOULD be provided no later than 1 week after release.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe we can make this a MUST as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

- The latest patch version SHOULD be provided no later than 1 week after release.
- This time period MUST be even shorter for patches that target critical CVEs (CVSS >= 8).
It is RECOMMENDED to provide a new patch version in a 2 day time period after their release.
- New versions SHOULD be tested before being rolled out on productive infrastructure;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MUST?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

- This time period MUST be even shorter for patches that target critical CVEs (CVSS >= 8).
It is RECOMMENDED to provide a new patch version in a 2 day time period after their release.
- New versions SHOULD be tested before being rolled out on productive infrastructure;
at least the CNCF E2E tests should be passed beforehand.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could be a MUST as well (and if we want to be generous, we could allow failed e2e tests to be listed in a release note rather than requiring 100% pass rate).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See #334 (comment)

Also I think you're right about the generosity of passed tests, but I'm not really sure how to properly display this in Release notes.

- New versions SHOULD be tested before being rolled out on productive infrastructure;
at least the CNCF E2E tests should be passed beforehand.

At the same time, provider should support Kubernetes versions at least as long as the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, I would make this a MUST.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the same time, provider should support Kubernetes versions at least as long as the
official sources as mentioned in the [Kubernetes Support Period](https://kubernetes.io/releases/patch-releases/#support-period).

- Kubernetes versions SHOULD be supported as long as the official sources support them.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, looks duplicate to the sentence before.
And should be MUST from my PoV.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The first sentence was just a short introduction for the decision underneath it, but I can change that if it should be written another way.

The standard for the K8s version policy is updated in order to add the support period for Kubernetes versions into it. Since this is a bit contradictive with the K8s version recency, this part needed to be updated as well.

Signed-off-by: Hannes Baum <hannes.baum@cloudandheat.com>
@cah-hbaum
Copy link
Contributor Author

Since this issue is now open for 3 months and it got one Approval by @anjastrunk and I fixed all issues opened by @garloff, I'm gonna merge this now.

@cah-hbaum cah-hbaum merged commit e98f54e into main Nov 20, 2023
4 checks passed
@cah-hbaum cah-hbaum deleted the issue/386 branch November 20, 2023 12:36
@garloff
Copy link
Member

garloff commented Nov 20, 2023

Thanks, @cah-hbaum!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Container Issues or pull requests relevant for Team 2: Container Infra and Tooling SCS is standardized SCS is standardized SCS-VP10 Related to tender lot SCS-VP10 Sprint Izmir Sprint Izmir (2023, cwk 32+33) standards Issues / ADR / pull requests relevant for standardization & certification
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

K8s version support period
3 participants