ADR for KaaS Observability Architecture and MVP-0 #394

o-otte · 2023-11-30T16:09:40Z

This PR adds a ADR Document that describes how the Kubernetes as a Service Observability Stack will be designed.

closes SovereignCloudStack/issues#300

Standards/scs-0403-v1-csp-kaas-observability-stack.md

matofeder · 2023-12-01T14:35:14Z

Standards/scs-0403-v1-csp-kaas-observability-stack.md

+
+## Requirements
+
+A survey was conducted to gather the needs and requirements of a CSP when providing Kubernetes as a Service. The results of the Survey (Questions with answers) were the following:


I understood the Survey more as hints for the requirements definition.
IMO the ADR requirements section should not contain the QA blocks that contain e.g. Do you have an observabiltiy infrastructure, if yes, how it is built

I would like to suggest re-format this section e.g. as follows:

KaaS observability solution SHOULD gather the following metrics ...

KaaS observability solution SHOULD define alerts based on collected metrics ...

good catch @matofeder, I concur.

I also agree! Still I think the information provided is interesting to preserve, maybe in a decision record like this?

I rewrote the requirements section and put the survey results as reference to the end.

matofeder · 2023-12-01T14:53:53Z

Standards/scs-0403-v1-csp-kaas-observability-stack.md

+
+For use of a CSP that provides Kubernetes as a Service the provisioning of the observability tools and the onboarding of a customer cluster need to be fully automated. For a customer, all the tools on their Kubernetes cluster needs to be installed at creation time and the observability data of that cluster needs to present in the Observer Cluster immediately.
+
+### Options considered


From my perspective, the better naming for options we mainly considered for KaaS obs. layer:

Pull-based architecture, i.e. Observer cluster scrapes metrics from the KaaS cluster, where the Prometheus server lives

Push-based architecture, i.e. KaaS cluster (using Prometheus Agent) remote-writes metrics to the Observer cluster

The PROS/CONS we considered are written down here https://input.scs.community/sig-monitoring-29-09-2023#Prometheus-server-vs-prometheus-agent

artificial-intelligence

As a general comment I think it's not so good to mix two things in the same document.

This mixes the documentation of the general architectural decisions of the observability cluster architecture with the implementation details of the MVP-0.

But as these are at least clearly marked as distinct things I guess it's okay to leave this as-is.

artificial-intelligence · 2023-12-07T15:11:20Z

Standards/scs-0403-v1-csp-kaas-observability-stack.md

+
+## Requirements
+
+A survey was conducted to gather the needs and requirements of a CSP when providing Kubernetes as a Service. The results of the Survey (Questions with answers) were the following:


I also agree! Still I think the information provided is interesting to preserve, maybe in a decision record like this?

Standards/scs-0403-v1-csp-kaas-observability-stack.md

artificial-intelligence · 2023-12-07T15:16:31Z

Standards/scs-0403-v1-csp-kaas-observability-stack.md

+
+#### Scope of the Observability Architecture
+
+The Observability Cluster and Archtiecture should be defined such that it can be used to not only observe the Kubernetes Layer of an SCS Stack, but also the IaaS and other Layers.


which other layers? maybe omit "other layers" here if we don't know which one?

I rewrote the section to make more clear, that the Observer Cluster should also be usable as a Observability System for the complete SCS Stack.

Standards/scs-0403-v1-csp-kaas-observability-stack.md

fkr · 2023-12-09T08:08:10Z

I suggest that I wait with my review until the feedback of @matofeder and @artificial-intelligence has been incorporated by @o-otte. Once that has happened I'll circle over the PR.

fkr · 2023-12-18T22:48:07Z

I suggest that I wait with my review until the feedback of @matofeder and @artificial-intelligence has been incorporated by @o-otte. Once that has happened I'll circle over the PR.

@o-otte ping.

bitkeks

LGTM for now as a draft. We need to advance the observability topic to gather more feedback from testing and from CSPs. Publishing this draft as SCS standard will help us moving forward.

This mixes the documentation of the general architectural decisions of the observability cluster architecture with the implementation details of the MVP-0.

But as these are at least clearly marked as distinct things I guess it's okay to leave this as-is.

Picking up @artificial-intelligence's thought, which I share, let's keep this doc as a mixed bundle for now. The most important thing is to find a common ground on which next steps can be built upon.

We can later refactor the wording, removing MVP-0 and converting all insights collected in MVP-0 as requirements. Example: "The MVP-0 will consist of the following features" will later become "The SCS observability stack MUST consist of the following features"

artificial-intelligence

LGTM

Signed-off-by: Oliver Kautz <oliver.kautz@gonicus.de>

Fix Typos Co-authored-by: Matej Feder <feder.mato@gmail.com> Co-authored-by: Sven <svenkieske@posteo.de> Signed-off-by: Oliver Kautz <69149308+o-otte@users.noreply.github.com>

- Move Survey results to new references section - Refactored short-term and hybrid approaches to pull and push based approaches - More precise Requirements section - fix of some typos Signed-off-by: Oliver Kautz <oliver.kautz@gonicus.de>

Signed-off-by: Oliver Kautz <oliver.kautz@gonicus.de>

o-otte added the Ops Issues or pull requests relevant for Team 3: Ops Tooling label Nov 30, 2023

o-otte requested a review from matofeder November 30, 2023 16:09

matofeder reviewed Dec 1, 2023

View reviewed changes

fkr self-requested a review December 3, 2023 21:33

artificial-intelligence self-requested a review December 7, 2023 08:37

artificial-intelligence reviewed Dec 7, 2023

View reviewed changes

fkr requested a review from bitkeks January 10, 2024 12:51

artificial-intelligence self-requested a review February 1, 2024 08:11

bitkeks approved these changes Feb 2, 2024

View reviewed changes

artificial-intelligence approved these changes Feb 8, 2024

View reviewed changes

o-otte and others added 8 commits February 8, 2024 10:31

Add decisions on the contents of MVP-0

1aefd1d

Signed-off-by: Oliver Kautz <oliver.kautz@gonicus.de>

Add decision about use for IaaS layer

669ccd9

Signed-off-by: Oliver Kautz <oliver.kautz@gonicus.de>

fix typos

2f51f48

Signed-off-by: Oliver Kautz <oliver.kautz@gonicus.de>

Formatting Headings

1d17d89

Signed-off-by: Oliver Kautz <oliver.kautz@gonicus.de>

Fix headings

6f10e9f

Signed-off-by: Oliver Kautz <oliver.kautz@gonicus.de>

Apply suggestions from code review

b6d223e

Fix Typos Co-authored-by: Matej Feder <feder.mato@gmail.com> Co-authored-by: Sven <svenkieske@posteo.de> Signed-off-by: Oliver Kautz <69149308+o-otte@users.noreply.github.com>

Improve sections.

222bb13

- Move Survey results to new references section - Refactored short-term and hybrid approaches to pull and push based approaches - More precise Requirements section - fix of some typos Signed-off-by: Oliver Kautz <oliver.kautz@gonicus.de>

Fix formatting

d4ca578

Signed-off-by: Oliver Kautz <oliver.kautz@gonicus.de>

o-otte force-pushed the adr-kaas-observability branch from 1fe8508 to d4ca578 Compare February 8, 2024 09:31

o-otte merged commit 044ec11 into main Feb 8, 2024
5 checks passed

o-otte deleted the adr-kaas-observability branch February 8, 2024 09:34

o-otte mentioned this pull request Feb 8, 2024

ADR: Is a Standard or Recommendation of Monitoring Requirements for a CSP needed? SovereignCloudStack/issues#301

Closed

10 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ADR for KaaS Observability Architecture and MVP-0 #394

ADR for KaaS Observability Architecture and MVP-0 #394

o-otte commented Nov 30, 2023

matofeder Dec 1, 2023

fkr Dec 3, 2023

artificial-intelligence Dec 7, 2023

o-otte Dec 19, 2023

matofeder Dec 1, 2023

artificial-intelligence left a comment

artificial-intelligence Dec 7, 2023

artificial-intelligence Dec 7, 2023

o-otte Dec 19, 2023

fkr commented Dec 9, 2023

fkr commented Dec 18, 2023

bitkeks left a comment •

edited

Loading

artificial-intelligence left a comment


		## Requirements

		A survey was conducted to gather the needs and requirements of a CSP when providing Kubernetes as a Service. The results of the Survey (Questions with answers) were the following:


		For use of a CSP that provides Kubernetes as a Service the provisioning of the observability tools and the onboarding of a customer cluster need to be fully automated. For a customer, all the tools on their Kubernetes cluster needs to be installed at creation time and the observability data of that cluster needs to present in the Observer Cluster immediately.

		### Options considered


		#### Scope of the Observability Architecture

		The Observability Cluster and Archtiecture should be defined such that it can be used to not only observe the Kubernetes Layer of an SCS Stack, but also the IaaS and other Layers.

ADR for KaaS Observability Architecture and MVP-0 #394

ADR for KaaS Observability Architecture and MVP-0 #394

Conversation

o-otte commented Nov 30, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

artificial-intelligence left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fkr commented Dec 9, 2023

fkr commented Dec 18, 2023

bitkeks left a comment • edited Loading

Choose a reason for hiding this comment

artificial-intelligence left a comment

Choose a reason for hiding this comment

bitkeks left a comment •

edited

Loading