Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ADR for KaaS Observability Architecture and MVP-0 #394

Merged
merged 8 commits into from
Feb 8, 2024

Conversation

o-otte
Copy link
Member

@o-otte o-otte commented Nov 30, 2023

This PR adds a ADR Document that describes how the Kubernetes as a Service Observability Stack will be designed.

closes SovereignCloudStack/issues#300

@o-otte o-otte added the Ops Issues or pull requests relevant for Team 3: Ops Tooling label Nov 30, 2023
@o-otte o-otte requested a review from matofeder November 30, 2023 16:09
Standards/scs-0403-v1-csp-kaas-observability-stack.md Outdated Show resolved Hide resolved
Standards/scs-0403-v1-csp-kaas-observability-stack.md Outdated Show resolved Hide resolved
Standards/scs-0403-v1-csp-kaas-observability-stack.md Outdated Show resolved Hide resolved
Standards/scs-0403-v1-csp-kaas-observability-stack.md Outdated Show resolved Hide resolved

## Requirements

A survey was conducted to gather the needs and requirements of a CSP when providing Kubernetes as a Service. The results of the Survey (Questions with answers) were the following:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understood the Survey more as hints for the requirements definition.
IMO the ADR requirements section should not contain the QA blocks that contain e.g. Do you have an observabiltiy infrastructure, if yes, how it is built

I would like to suggest re-format this section e.g. as follows:

KaaS observability solution SHOULD gather the following metrics ...

KaaS observability solution SHOULD define alerts based on collected metrics ...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch @matofeder, I concur.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also agree! Still I think the information provided is interesting to preserve, maybe in a decision record like this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I rewrote the requirements section and put the survey results as reference to the end.


For use of a CSP that provides Kubernetes as a Service the provisioning of the observability tools and the onboarding of a customer cluster need to be fully automated. For a customer, all the tools on their Kubernetes cluster needs to be installed at creation time and the observability data of that cluster needs to present in the Observer Cluster immediately.

### Options considered
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From my perspective, the better naming for options we mainly considered for KaaS obs. layer:

  • Pull-based architecture, i.e. Observer cluster scrapes metrics from the KaaS cluster, where the Prometheus server lives
  • Push-based architecture, i.e. KaaS cluster (using Prometheus Agent) remote-writes metrics to the Observer cluster

The PROS/CONS we considered are written down here https://input.scs.community/sig-monitoring-29-09-2023#Prometheus-server-vs-prometheus-agent

@fkr fkr self-requested a review December 3, 2023 21:33
Copy link
Contributor

@artificial-intelligence artificial-intelligence left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a general comment I think it's not so good to mix two things in the same document.

This mixes the documentation of the general architectural decisions of the observability cluster architecture with the implementation details of the MVP-0.

But as these are at least clearly marked as distinct things I guess it's okay to leave this as-is.


## Requirements

A survey was conducted to gather the needs and requirements of a CSP when providing Kubernetes as a Service. The results of the Survey (Questions with answers) were the following:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also agree! Still I think the information provided is interesting to preserve, maybe in a decision record like this?

Standards/scs-0403-v1-csp-kaas-observability-stack.md Outdated Show resolved Hide resolved
Standards/scs-0403-v1-csp-kaas-observability-stack.md Outdated Show resolved Hide resolved
Standards/scs-0403-v1-csp-kaas-observability-stack.md Outdated Show resolved Hide resolved

#### Scope of the Observability Architecture

The Observability Cluster and Archtiecture should be defined such that it can be used to not only observe the Kubernetes Layer of an SCS Stack, but also the IaaS and other Layers.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

which other layers? maybe omit "other layers" here if we don't know which one?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I rewrote the section to make more clear, that the Observer Cluster should also be usable as a Observability System for the complete SCS Stack.

Standards/scs-0403-v1-csp-kaas-observability-stack.md Outdated Show resolved Hide resolved
Standards/scs-0403-v1-csp-kaas-observability-stack.md Outdated Show resolved Hide resolved
@fkr
Copy link
Member

fkr commented Dec 9, 2023

I suggest that I wait with my review until the feedback of @matofeder and @artificial-intelligence has been incorporated by @o-otte. Once that has happened I'll circle over the PR.

@fkr
Copy link
Member

fkr commented Dec 18, 2023

I suggest that I wait with my review until the feedback of @matofeder and @artificial-intelligence has been incorporated by @o-otte. Once that has happened I'll circle over the PR.

@o-otte ping.

@fkr fkr requested a review from bitkeks January 10, 2024 12:51
Copy link
Member

@bitkeks bitkeks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM for now as a draft. We need to advance the observability topic to gather more feedback from testing and from CSPs. Publishing this draft as SCS standard will help us moving forward.

This mixes the documentation of the general architectural decisions of the observability cluster architecture with the implementation details of the MVP-0.

But as these are at least clearly marked as distinct things I guess it's okay to leave this as-is.

Picking up @artificial-intelligence's thought, which I share, let's keep this doc as a mixed bundle for now. The most important thing is to find a common ground on which next steps can be built upon.

We can later refactor the wording, removing MVP-0 and converting all insights collected in MVP-0 as requirements. Example: "The MVP-0 will consist of the following features" will later become "The SCS observability stack MUST consist of the following features"

Copy link
Contributor

@artificial-intelligence artificial-intelligence left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

o-otte and others added 8 commits February 8, 2024 10:31
Signed-off-by: Oliver Kautz <oliver.kautz@gonicus.de>
Signed-off-by: Oliver Kautz <oliver.kautz@gonicus.de>
Signed-off-by: Oliver Kautz <oliver.kautz@gonicus.de>
Signed-off-by: Oliver Kautz <oliver.kautz@gonicus.de>
Signed-off-by: Oliver Kautz <oliver.kautz@gonicus.de>
Fix Typos

Co-authored-by: Matej Feder <feder.mato@gmail.com>
Co-authored-by: Sven <svenkieske@posteo.de>
Signed-off-by: Oliver Kautz <69149308+o-otte@users.noreply.github.com>
- Move Survey results to new references section
- Refactored short-term and hybrid approaches to pull and push based
  approaches
- More precise Requirements section
- fix of some typos

Signed-off-by: Oliver Kautz <oliver.kautz@gonicus.de>
Signed-off-by: Oliver Kautz <oliver.kautz@gonicus.de>
@o-otte o-otte force-pushed the adr-kaas-observability branch from 1fe8508 to d4ca578 Compare February 8, 2024 09:31
@o-otte o-otte merged commit 044ec11 into main Feb 8, 2024
5 checks passed
@o-otte o-otte deleted the adr-kaas-observability branch February 8, 2024 09:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Ops Issues or pull requests relevant for Team 3: Ops Tooling
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ADR on KaaS Observability Plattform Architecture
5 participants