Skip to content

Commit

Permalink
Some more updates
Browse files Browse the repository at this point in the history
Did some more textual updates requested by @mbuechse.

Signed-off-by: Hannes Baum <hannes.baum@cloudandheat.com>
  • Loading branch information
cah-hbaum committed Jun 4, 2024
1 parent 921ee1a commit c265f73
Show file tree
Hide file tree
Showing 2 changed files with 44 additions and 44 deletions.
1 change: 1 addition & 0 deletions Standards/scs-0214-v1-k8s-node-distribution.md
Original file line number Diff line number Diff line change
Expand Up @@ -120,3 +120,4 @@ If also produces warnings and informational outputs, if e.g. labels don't seem t
[k8s-ha]: https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/high-availability/
[k8s-large-clusters]: https://kubernetes.io/docs/setup/best-practices/cluster-large/
[scs-0213-v1]: https://github.com/SovereignCloudStack/standards/blob/main/Standards/scs-0213-v1-k8s-nodes-anti-affinity.md
[k8s-labels-docs]: https://kubernetes.io/docs/reference/labels-annotations-taints/#topologykubernetesiozone
87 changes: 43 additions & 44 deletions Standards/scs-0214-v2-k8s-node-distribution.md
Original file line number Diff line number Diff line change
Expand Up @@ -65,50 +65,48 @@ shouldn't be defined further in this document.

This standard formulates the requirement for the distribution of Kubernetes nodes in order
to provide a fault-tolerant and available Kubernetes cluster infrastructure.
Since some providers only have small environments to work with and therefore couldn't
comply with this standard, it will be treated as a RECOMMENDED standard, where providers
can OPT OUT.

If the standard is used by a provider, the following decisions are binding and valid:

- The control plane nodes MUST be distributed over multiple physical machines. Kubernetes
provides best-practices on this topic, which are also RECOMMENDED by SCS.
- At least one control plane instance MUST be run in each "failure zone", more are
RECOMMENDED in each "failure zone" to provide fault-tolerance for each zone.
- Worker nodes are RECOMMENDED to be distributed over multiple zones. This policy makes
it OPTIONAL to provide a worker node in each "failure zone", meaning that worker nodes
can also be scaled vertically first before scaling horizontally.
- Worker node distribution MUST be indicated to the user through some kind of labeling
in order to enable (anti)-affinity for workloads over "failure zones".
- To provide metadata about the node distribution, which also enables testing of this standard,
providers MUST annotate their K8s nodes with the labels listed below, and the labels
MUST be kept up to date.
- `topology.kubernetes.io/zone`

Corresponds with the label described in [K8s labels documentation][k8s-labels-docs].
It provides a logical zone of failure on the side of the provider, e.g. a server rack
in the same electrical circuit or multiple machines bound to the internet through a
singular network structure. How this is defined exactly is up to the plans of the provider.
The field gets autopopulated most of the time by either the kubelet or external mechanisms
like the cloud controller.

- `topology.kubernetes.io/region`

Corresponds with the label described in [K8s labels documentation][k8s-labels-docs].
It describes the combination of one or more failure zones into a region or domain, therefore
showing a larger entity of logical failure zone. An example for this could be a building
containing racks that are put into such a zone, since they're all prone to failure, if e.g.
the power for the building is cut. How this is defined exactly is also up to the provider.
The field gets autopopulated most of the time by either the kubelet or external mechanisms
like the cloud controller.

- `topology.scs.community/host-id`

This is an SCS-specific label; it MUST contain the hostID of the physical machine running
the hypervisor (NOT: the hostID of a virtual machine). Here, the hostID is an arbitrary identifier,
which need not contain the actual hostname, but it should nonetheless be unique to the host.
This helps identify the distribution over underlying physical machines,
which would be masked if VM hostIDs were used.

The control plane nodes MUST be distributed over multiple physical machines.
Kubernetes provides [best-practices][k8s-zones] on this topic, which are also RECOMMENDED by SCS.

At least one control plane instance MUST be run in each "failure zone" used for the cluster,
more instances per "failure zone" are possible to provide fault-tolerance inside a zone.

Worker nodes are RECOMMENDED to be distributed over multiple zones. This policy makes
it OPTIONAL to provide a worker node in each "failure zone", meaning that worker nodes
can also be scaled vertically first before scaling horizontally.

To provide metadata about the node distribution and possibly provide the ability
to schedule workloads efficiently, which also enables testing of this standard,
providers MUST annotate their K8s nodes with the labels listed below.
These labels MUST be kept up to date with the current state of the deployment.

- `topology.kubernetes.io/zone`

Corresponds with the label described in [K8s labels documentation][k8s-labels-docs].
It provides a logical zone of failure on the side of the provider, e.g. a server rack
in the same electrical circuit or multiple machines bound to the internet through a
singular network structure. How this is defined exactly is up to the plans of the provider.
The field gets autopopulated most of the time by either the kubelet or external mechanisms
like the cloud controller.

- `topology.kubernetes.io/region`

Corresponds with the label described in [K8s labels documentation][k8s-labels-docs].
It describes the combination of one or more failure zones into a region or domain, therefore
showing a larger entity of logical failure zone. An example for this could be a building
containing racks that are put into such a zone, since they're all prone to failure, if e.g.
the power for the building is cut. How this is defined exactly is also up to the provider.
The field gets autopopulated most of the time by either the kubelet or external mechanisms
like the cloud controller.

- `topology.scs.community/host-id`

This is an SCS-specific label; it MUST contain the hostID of the physical machine running
the hypervisor (NOT: the hostID of a virtual machine). Here, the hostID is an arbitrary identifier,
which need not contain the actual hostname, but it should nonetheless be unique to the host.
This helps identify the distribution over underlying physical machines,
which would be masked if VM hostIDs were used.

## Conformance Tests

Expand All @@ -128,3 +126,4 @@ requirements regarding node labeling.
[k8s-large-clusters]: https://kubernetes.io/docs/setup/best-practices/cluster-large/
[scs-0213-v1]: https://github.com/SovereignCloudStack/standards/blob/main/Standards/scs-0213-v1-k8s-nodes-anti-affinity.md
[k8s-labels-docs]: https://kubernetes.io/docs/reference/labels-annotations-taints/#topologykubernetesiozone
[k8s-zones]: https://kubernetes.io/docs/setup/best-practices/multiple-zones/

0 comments on commit c265f73

Please sign in to comment.