-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ADR for KaaS Observability Architecture and MVP-0 #394
Conversation
|
||
## Requirements | ||
|
||
A survey was conducted to gather the needs and requirements of a CSP when providing Kubernetes as a Service. The results of the Survey (Questions with answers) were the following: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understood the Survey more as hints for the requirements definition.
IMO the ADR requirements section should not contain the QA blocks that contain e.g. Do you have an observabiltiy infrastructure, if yes, how it is built
I would like to suggest re-format this section e.g. as follows:
KaaS observability solution SHOULD gather the following metrics ...
KaaS observability solution SHOULD define alerts based on collected metrics ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good catch @matofeder, I concur.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also agree! Still I think the information provided is interesting to preserve, maybe in a decision record like this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I rewrote the requirements section and put the survey results as reference to the end.
|
||
For use of a CSP that provides Kubernetes as a Service the provisioning of the observability tools and the onboarding of a customer cluster need to be fully automated. For a customer, all the tools on their Kubernetes cluster needs to be installed at creation time and the observability data of that cluster needs to present in the Observer Cluster immediately. | ||
|
||
### Options considered |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From my perspective, the better naming for options we mainly considered for KaaS obs. layer:
- Pull-based architecture, i.e. Observer cluster scrapes metrics from the KaaS cluster, where the Prometheus server lives
- Push-based architecture, i.e. KaaS cluster (using Prometheus Agent) remote-writes metrics to the Observer cluster
The PROS/CONS we considered are written down here https://input.scs.community/sig-monitoring-29-09-2023#Prometheus-server-vs-prometheus-agent
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a general comment I think it's not so good to mix two things in the same document.
This mixes the documentation of the general architectural decisions of the observability cluster architecture with the implementation details of the MVP-0.
But as these are at least clearly marked as distinct things I guess it's okay to leave this as-is.
|
||
## Requirements | ||
|
||
A survey was conducted to gather the needs and requirements of a CSP when providing Kubernetes as a Service. The results of the Survey (Questions with answers) were the following: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also agree! Still I think the information provided is interesting to preserve, maybe in a decision record like this?
|
||
#### Scope of the Observability Architecture | ||
|
||
The Observability Cluster and Archtiecture should be defined such that it can be used to not only observe the Kubernetes Layer of an SCS Stack, but also the IaaS and other Layers. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
which other layers? maybe omit "other layers" here if we don't know which one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I rewrote the section to make more clear, that the Observer Cluster should also be usable as a Observability System for the complete SCS Stack.
I suggest that I wait with my review until the feedback of @matofeder and @artificial-intelligence has been incorporated by @o-otte. Once that has happened I'll circle over the PR. |
@o-otte ping. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM for now as a draft. We need to advance the observability topic to gather more feedback from testing and from CSPs. Publishing this draft as SCS standard will help us moving forward.
This mixes the documentation of the general architectural decisions of the observability cluster architecture with the implementation details of the MVP-0.
But as these are at least clearly marked as distinct things I guess it's okay to leave this as-is.
Picking up @artificial-intelligence's thought, which I share, let's keep this doc as a mixed bundle for now. The most important thing is to find a common ground on which next steps can be built upon.
We can later refactor the wording, removing MVP-0 and converting all insights collected in MVP-0 as requirements. Example: "The MVP-0 will consist of the following features" will later become "The SCS observability stack MUST consist of the following features"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Signed-off-by: Oliver Kautz <oliver.kautz@gonicus.de>
Signed-off-by: Oliver Kautz <oliver.kautz@gonicus.de>
Signed-off-by: Oliver Kautz <oliver.kautz@gonicus.de>
Signed-off-by: Oliver Kautz <oliver.kautz@gonicus.de>
Fix Typos Co-authored-by: Matej Feder <feder.mato@gmail.com> Co-authored-by: Sven <svenkieske@posteo.de> Signed-off-by: Oliver Kautz <69149308+o-otte@users.noreply.github.com>
- Move Survey results to new references section - Refactored short-term and hybrid approaches to pull and push based approaches - More precise Requirements section - fix of some typos Signed-off-by: Oliver Kautz <oliver.kautz@gonicus.de>
Signed-off-by: Oliver Kautz <oliver.kautz@gonicus.de>
1fe8508
to
d4ca578
Compare
This PR adds a ADR Document that describes how the Kubernetes as a Service Observability Stack will be designed.
closes SovereignCloudStack/issues#300