[Feature Request] Implement tests for scs-0210-v2, scs-0214-v1 #481

mbuechse · 2024-02-23T09:46:36Z

This is a high-priority crash operation: find a way to implement reasonable tests by 2024-02-28.

scs-0210-v1 does have tests; what needs to be done for v2?
scs-0214-v1 might be hard to test; what's the "MVP" that can be achieved within this timeframe?

Request assistance from whatever parties necessary!

cah-hbaum · 2024-02-23T10:12:35Z

We integrated sonobuoy yesterday, but for the scs-0210-v2 we should probably still use and update the old python test from v1. The problem here is, its not really useful to update the test, since v2 adds a section talking about the support period for a K8s version. Testwise, this is contradicting to the stuff tested right now, since v1 checks a given cluster and if its version is up-to-date with the release timeline. v2 demands that old versions are still supported and available from the provider, which isn't really something you could test (or at least I don't know how).

scs-0214-v1 is as @mbuechse said a bit more tricky. If it is doable, a test could be written in Go with Sonobuoy in mind. If there isn't time for that, just doing a Python test would be fine and we can convert that later.
The main thing to test here is the requirement for control plane nodes to be distributed over multiple nodes. The standard even calls for physical nodes, but that could be kinda hard to do.
The best thing to do would be to check the IPs of the control plane (and maybe worker nodes) and look for a difference there. This would at least guarantee multiple VMs used. Now to check if they're separated over different machines or even failure zones is harder and probably dependent on the infrastructure of the provider. If there would be access to OpenStack as well (or another IaaS solution) we could check that with something like Failure Zones, but this is maybe too much for now (?). Would be dependent on the available time overall.
Now if these failure zones or zones in general are strictly separated over e.g. multiple racks or buildings, we can't really check, since this is something the provider would know and the metadata necessary for that is probably not publicly available or can't be accessed for some reason or another.

martinmo · 2024-02-23T15:15:18Z

For conformance testing of scs-0210-v2 ("K8S Version Policy") I have the following proposal.

The check consists of two stages:

Getting the list of currently supported K8s versions for a given KaaS provider (the new and maybe tricky part). It can optionally be saved into a YAML file.
Deciding if this list, which can optionally be read from a YAML file, conforms to scs-0210-v2. For this, a lot of the the existing v1 test implementation can be reused, because the basic predicate "Is k8s version X.Y.Z recent enough?" of course still needs to be decided.

Stage 1 has multiple strategies:

Strategy A: only works for providers that use Gardener to manage the customer's K8s clusters, which are called "shoots" in that context. According to the shoot admin docs, CloudProfiles should give us exactly the information that we need. (However, I do currently not have access to a Gardener setup to verify that idea. Any hints/help would be appreciated!)
Strategy B: works for any provider and requires them to give us access to three K8s clusters of different "profiles" plus the corresponding Kubeconfigs with the suggested names stable, oldstable, oldoldstable (greetings to Debian). We then gather the K8s versions by connecting to these clusters.
Strategy C (fallback): means manual collection and creation of said YAML file.

A downside of strategy B is the cost to provide these otherwise unused clusters and the manual labour which is required whenever the provider switches the stable profile, e.g., from 1.27 to 1.28.

The stages can be implemented independent of each other. Furthermore, the design allows us to add other strategies later on, if needed.

martinmo · 2024-02-23T15:35:48Z

I've asked the members of the Team Container Matrix chat for feedback. Independent of that I'll continue to implement second stage of the described design.

artificial-intelligence · 2024-02-23T16:57:23Z

So for the first part I think every testing strategy must be as close to distribution agnostic as it can be, so different k8s deployment/distribution variants can be tested, as long as they are CNCF certified compliant k8s variants. There is still a lot of movement in k8s distributions/installers development so it seems wise to stay agile and not be dependent on a single blessed cluster installer.

So imho "Strategy A" is already out.

another comment I have: notice that Kubernetes doesn't necessarily "only" supports the last three releases, even if it says so in it's release schedule, e.g. currently there are in fact 4 supported releases listed at https://kubernetes.io/releases/ even if version 1.26 will be EOL at 28.02.2024 (so next week).

I know this is pretty weird, but well, it is what it is.

in april, according to the schedule k8s 1.30 will be released bumping the number of supported releases to 4 again.

So keep that in mind while writing test 😉

So I'm strongly in favor of option "B", because we really can only do meaningful testing when we exercise the actual code, that is, connect to a running cluster and see which version it is. There's imho not much value gained from just checking a manually provided yaml file or something.

However let me also add some more notes.

Please bear with me, but I think the standard mandates as a MUST:

Kubernetes versions MUST be supported as long as the official sources support them. The current support period can therefore be found in [Kubernetes Support Period].

afaik we only currently test if all the provided versions from a CSP are still supported, we do not test at the day before a release goes EOL if it is still listed/possible to create at the CSP.

This makes me also realize, that it is not really defined what "supported" means here, because arguably you maybe don't want customers to create a new cluster on a version that will be EOL tomorrow.

A more reasonable definition of "supported" might include providing security updates for cluster versions until they are EOL and be able to reinstall the "same cluster" until it is EOL, but maybe restrict new installation of a soon-to-be EOL version to e.g. one month prior to EOL?

I realize we will probably not be able to extend the tests even further, but I think it still is worth mentioning to maybe fix this later by either changing the standard or extend the tests.

PS: I notice the standard has also a missing link for [Kubernetes Support Period].
I'll provide a PR to fix that.

martinmo · 2024-02-23T17:28:48Z

Thanks @artificial-intelligence for your feedback and perspective.

There is still a lot of movement in k8s distributions/installers development so it seems wise to stay agile and not be dependent on a single blessed cluster installer.

That's a fair point, I agree.

So I'm strongly in favor of option "B", because we really can only do meaningful testing when we exercise the actual code, that is, connect to a running cluster and see which version it is.

I'll focus on this (to be honest it is also my favorite approach).

There's imho not much value gained from just checking a manually provided yaml file or something.

Still, it's at least useful for me as a test implementer, for testing the tests ;-)

[…] notice that Kubernetes doesn't necessarily "only" supports the last three releases, even if it says so in it's release schedule […]

Ack. Because testing only three "profiles" is not enough, I will extend the number of profiles with another oldoldoldstable (prettier name will come soon). It will be tested only if it makes any sense, i.e., if there are four non-EOL minor k8s versions at the time the check is performed.

afaik we only currently test if all the provided versions from a CSP are still supported, we do not test at the day before a release goes EOL if it is still listed/possible to create at the CSP.
This makes me also realize, that it is not really defined what "supported" means here, because arguably you maybe don't want customers to create a new cluster on a version that will be EOL tomorrow.

I agree with you. At the moment, I think the "supported" in the standard roughly means "a minor version of k8s which is not EOL". Using this definition might not be the best choice for a customer for the reasons you pointed out. However, I also think this is a change to the standard itself that needs to be discussed in separate issue.

garloff · 2024-02-23T17:34:47Z

The main requirement here is to get providers to deliver patches (and minor releases) fast enough. So that would be my primary concern for testing. A provider disabling the provisioning of an old version a bit too early is not nice, but not nearly as likely and also not nearly as problematic for the user.

So testing strategy would be from my perspective:

At the test date, determine which versions should be available. This will cover 3 to 4 minor k8s versions with the latest patch level (and sometimes the patch level one before may still be allowable).
Try to create clusters with these k8s versions. (Sometimes, if this fails and the previous patch level is still allowable, retry with it).
All tests must result in working clusters (sometimes after retrying, see above).

There is one tiny challenge: Step 2 is implementation dependant. We can implement a test for cluster stacks (and we could also cover KaaS v1 if we wanted to).
So I guess we would do it for our own solution and require 3rd partly KaaS to contribute a test case for their specific cluster management before we certify them...

martinmo · 2024-02-26T07:52:07Z

Thanks @garloff for the feedback! I agree with the primary goals you described.

However, the test methods steps 2 and 3 would deviate a lot from the current implementation for scs-0210-v1, which merely checks the versions of an existing cluster. My proposal for v2 tests is an extension of this for multiple clusters.

To summarize and categorize the current proposed approaches for scs-0210-v2 testing:

Check supported versions by gathering metadata.
a. Get metadata from a KaaS management interface such as Gardener (impl specific).
b. Get metadata by connecting to multiple existing clusters.
Check supported versions by trying to create clusters with a KaaS management interface such as cluster stacks and smoke testing them (impl specific).

Because we're in a hurry, I think we need to settle for one of the simpler approaches, but leave the implementation open. @mbuechse what do you think?

martinmo · 2024-02-26T12:59:47Z

After meeting with @jschoone we concluded that it makes most sense if I stick to option 1b, i.e., "Get metadata by connecting to multiple existing clusters" because it is the most simple one to implement. It also mimics the way we connect to OpenStack for the IaaS conformance tests (we use a central cloud.yaml with multiple clouds plus a secrets.yaml created from GH secrets in the GH actions workflow).

As I've stated before, other options/strategies can be added later on. However, in this context I also want to note that creating K8s clusters during the conformance tests will add delays and an additional source of (transient) errors.

For the other standard that needs to be tested (scs-0214-v1), my proposal involves simply querying the following well-known labels for the master and worker nodes:

kubernetes.io/hostname
topology.kubernetes.io/zone
topology.kubernetes.io/region

Unfortunately, we would need to revamp the standard to mandate that these labels need to be set. To my knowledge, this is not the default case.

cah-hbaum · 2024-02-26T13:09:41Z

For the other standard that needs to be tested (scs-0214-v1), my proposal involves simply querying the following
well-known labels for the master and worker nodes:

kubernetes.io/hostname
topology.kubernetes.io/zone
topology.kubernetes.io/region

Unfortunately, we would need to revamp the standard to mandate that these labels need to be set. To my knowledge,
this is not the default case.

That sounds like a good idea. I would try to implement this tomorrow (no time today). Sadly, like you said, this also kinda mandates the creation of a v2 for the standard (since it is already stabilized now and therefore not really changeable). So we would probably create this test initially, but after the Certification scope went through, a revamp of this test and the standard will probably be necessary.

cah-hbaum · 2024-02-27T05:38:46Z

Had a short exchange with @mbuechse yesterday, we probably won't do a v2 for the standard yet. Instead, I will write an update for the standard v1 and put that into a separate commit. We will then see, how that is accepted and if its fine, we will just update the standard.

This provides a basic node distribution test only operating based on the labels of the kubernetes cluster provided. A test will come back negative, if no distribution according to the labels can be detected. Signed-off-by: Hannes Baum <hannes.baum@cloudandheat.com>

The test section of the standard also needs to be updated, even if the standard is already stabilized. Signed-off-by: Hannes Baum <hannes.baum@cloudandheat.com>

cah-hbaum · 2024-02-27T09:35:22Z

PR created: #486
Can already be checked, if errors are found, I can try to fix them around 3-4PM today or later this evening.

Signed-off-by: Hannes Baum <hannes.baum@cloudandheat.com>

This provides a basic node distribution test only operating based on the labels of the kubernetes cluster provided. A test will come back negative, if no distribution according to the labels can be detected. Signed-off-by: Hannes Baum <hannes.baum@cloudandheat.com>

The test section of the standard also needs to be updated, even if the standard is already stabilized. Signed-off-by: Hannes Baum <hannes.baum@cloudandheat.com>

* Provide node distribution test (#481) This provides a basic node distribution test only operating based on the labels of the kubernetes cluster provided. A test will come back negative, if no distribution according to the labels can be detected. Signed-off-by: Hannes Baum <hannes.baum@cloudandheat.com> * Update the test section in the standard (#481) The test section of the standard also needs to be updated, even if the standard is already stabilized. Signed-off-by: Hannes Baum <hannes.baum@cloudandheat.com> --------- Signed-off-by: Hannes Baum <hannes.baum@cloudandheat.com>

resolves #481 Signed-off-by: Matthias Büchse <matthias.buechse@cloudandheat.com>

@cah-hbaum

This adds the missing conformance tests for the version policy in scs-0210-v2, reusing the CVE collection and parsing code from the scs-0210-v1 Python script written by @cah-hbaum. Refs: #481 Closes: SovereignCloudStack/issues#505 Co-authored-by: Hannes Baum <hannes.baum@cloudandheat.com> Co-authored-by: Matthias Büchse <matthias.buechse@cloudandheat.com> Signed-off-by: Martin Morgenstern <martin.morgenstern@cloudandheat.com>

martinmo · 2024-03-05T11:50:51Z

Closing this because working tests have been merged with #486 and #488. Limitations of the scs-0214-v1 aka node distribution tests will be worked on in #494.

mbuechse added enhancement New feature or request action required Issues or pull requests that urgently require action SCS is standardized SCS is standardized SCS-VP10 Related to tender lot SCS-VP10 labels Feb 23, 2024

mbuechse added this to the R6 (v7.0.0) milestone Feb 23, 2024

mbuechse assigned martinmo and cah-hbaum Feb 23, 2024

mbuechse added this to Sovereign Cloud Stack Feb 23, 2024

github-project-automation bot moved this to Backlog in Sovereign Cloud Stack Feb 23, 2024

This was referenced Feb 23, 2024

[Testing] Distributed K8s nodes to ensure Anti-Affinity (scs-0214-v1) SovereignCloudStack/issues#477

Closed

[Testing] K8s version support period (scs-0210-v2) SovereignCloudStack/issues#505

Closed

jschoone added the Container Issues or pull requests relevant for Team 2: Container Infra and Tooling label Feb 24, 2024

mbuechse mentioned this issue Feb 27, 2024

[Feature Request] Enable main check tool to run KaaS tests #485

Closed

cah-hbaum added a commit that referenced this issue Feb 27, 2024

Update the test section in the standard (#481)

754c11b

The test section of the standard also needs to be updated, even if the standard is already stabilized. Signed-off-by: Hannes Baum <hannes.baum@cloudandheat.com>

cah-hbaum mentioned this issue Feb 27, 2024

Create conformance test for scs-0214-v1 #486

Merged

cah-hbaum added a commit that referenced this issue Feb 27, 2024

fixup! Provide node distribution test (#481)

a655af0

Signed-off-by: Hannes Baum <hannes.baum@cloudandheat.com>

cah-hbaum added a commit that referenced this issue Feb 27, 2024

fixup! Provide node distribution test (#481)

7019c25

Signed-off-by: Hannes Baum <hannes.baum@cloudandheat.com>

cah-hbaum added a commit that referenced this issue Feb 27, 2024

fixup! Provide node distribution test (#481)

8bb5fcd

Signed-off-by: Hannes Baum <hannes.baum@cloudandheat.com>

cah-hbaum added a commit that referenced this issue Feb 27, 2024

fixup! Provide node distribution test (#481)

509fe3f

Signed-off-by: Hannes Baum <hannes.baum@cloudandheat.com>

cah-hbaum added a commit that referenced this issue Feb 27, 2024

fixup! Provide node distribution test (#481)

5f3b5c0

Signed-off-by: Hannes Baum <hannes.baum@cloudandheat.com>

cah-hbaum added a commit that referenced this issue Feb 27, 2024

fixup! Provide node distribution test (#481)

0579b50

Signed-off-by: Hannes Baum <hannes.baum@cloudandheat.com>

cah-hbaum added a commit that referenced this issue Feb 27, 2024

fixup! Provide node distribution test (#481)

9581df5

Signed-off-by: Hannes Baum <hannes.baum@cloudandheat.com>

cah-hbaum added a commit that referenced this issue Feb 27, 2024

fixup! Provide node distribution test (#481)

4f85444

Signed-off-by: Hannes Baum <hannes.baum@cloudandheat.com>

mbuechse self-assigned this Feb 28, 2024

cah-hbaum added a commit that referenced this issue Feb 28, 2024

Update the test section in the standard (#481)

f6e34d9

The test section of the standard also needs to be updated, even if the standard is already stabilized. Signed-off-by: Hannes Baum <hannes.baum@cloudandheat.com>

mbuechse added a commit that referenced this issue Feb 28, 2024

Enable master check tool to run KaaS tests

5a96c49

resolves #481 Signed-off-by: Matthias Büchse <matthias.buechse@cloudandheat.com>

mbuechse mentioned this issue Feb 28, 2024

Enable main check tool to run KaaS tests #490

Merged

mbuechse added a commit that referenced this issue Feb 28, 2024

Enable master check tool to run KaaS tests

920a790

resolves #481 Signed-off-by: Matthias Büchse <matthias.buechse@cloudandheat.com>

mbuechse moved this from Backlog to Doing in Sovereign Cloud Stack Feb 29, 2024

martinmo closed this as completed Mar 5, 2024

github-project-automation bot moved this from Doing to Done in Sovereign Cloud Stack Mar 5, 2024

martinmo mentioned this issue Jun 10, 2024

[EPIC] KaaS standards #615

Open

29 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature Request] Implement tests for scs-0210-v2, scs-0214-v1 #481

[Feature Request] Implement tests for scs-0210-v2, scs-0214-v1 #481

mbuechse commented Feb 23, 2024

cah-hbaum commented Feb 23, 2024

martinmo commented Feb 23, 2024 •

edited

Loading

martinmo commented Feb 23, 2024 •

edited

Loading

artificial-intelligence commented Feb 23, 2024

martinmo commented Feb 23, 2024

garloff commented Feb 23, 2024

martinmo commented Feb 26, 2024

martinmo commented Feb 26, 2024

cah-hbaum commented Feb 26, 2024

cah-hbaum commented Feb 27, 2024

cah-hbaum commented Feb 27, 2024

martinmo commented Mar 5, 2024

[Feature Request] Implement tests for scs-0210-v2, scs-0214-v1 #481

[Feature Request] Implement tests for scs-0210-v2, scs-0214-v1 #481

Comments

mbuechse commented Feb 23, 2024

cah-hbaum commented Feb 23, 2024

martinmo commented Feb 23, 2024 • edited Loading

martinmo commented Feb 23, 2024 • edited Loading

artificial-intelligence commented Feb 23, 2024

martinmo commented Feb 23, 2024

garloff commented Feb 23, 2024

martinmo commented Feb 26, 2024

martinmo commented Feb 26, 2024

cah-hbaum commented Feb 26, 2024

cah-hbaum commented Feb 27, 2024

cah-hbaum commented Feb 27, 2024

martinmo commented Mar 5, 2024

martinmo commented Feb 23, 2024 •

edited

Loading

martinmo commented Feb 23, 2024 •

edited

Loading