Skip to content

Commit

Permalink
Merge branch 'main' into preinstallation-checklist
Browse files Browse the repository at this point in the history
  • Loading branch information
scoopex authored Dec 16, 2024
2 parents 27eb469 + 3d2865a commit f5138c1
Show file tree
Hide file tree
Showing 14 changed files with 4,858 additions and 2,455 deletions.
6 changes: 0 additions & 6 deletions docs.package.json
Original file line number Diff line number Diff line change
Expand Up @@ -11,12 +11,6 @@
"target": "docs/02-iaas/components",
"label": ""
},
{
"repo": "SovereignCloudStack/k8s-cluster-api-provider",
"source": "doc",
"target": "docs/03-container/components",
"label": "k8s-cluster-api-provider"
},
{
"repo": "SovereignCloudStack/cluster-stack-provider-openstack",
"source": "docs",
Expand Down
19 changes: 5 additions & 14 deletions docs/03-container/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,26 +18,17 @@ The container layer within the Sovereign Cloud Stack (SCS) offers a robust solut
### Prerequisites and Requirements

- Knowledge: Familiarity with Kubernetes, container orchestration, and basic cloud infrastructure principles is pivotal.
- Software: The core software component is the K8s Cluster API Provider, crafted to function optimally on OpenStack environments. Although designed to run on the SCS IaaS layer, with minor configuration adjustments, it can operate on any OpenStack environment.
- Software: The core software component are the Cluster Stacks based on Cluster API, crafted to function best on OpenStack environments. Although designed to run on the SCS IaaS layer, with minor configuration adjustments, it can operate on any OpenStack environment.
- Hardware: Virtualization-enabled hardware capable of running OpenStack is essential if hosting the IaaS layer independently. For further details, refer to the IaaS layer documentation.

### Features

- Automated Cluster Management: The K8s Cluster API Provider automates the process of creating, scaling, managing and updating Kubernetes clusters, thus significantly reducing the operational overhead.
- Automated Cluster Management: The Cluster API automates the process of creating, scaling, managing and updating Kubernetes clusters, thus significantly reducing the operational overhead.
- Standardized Operations: Upholding SCS standards across various clusters ensures operational consistency and reliability.
- Integration with OpenStack: The K8s Cluster API Provider is tailored to work seamlessly with SCS IaaS (OpenStack), thus offering a unified platform for managing both containers and the underlying infrastructure.
- Container Registry Integration: The container layer has an integrated container registry, facilitating easy management and deployment of container images.
- Certificate Managment: The kubernetes clusters can optionaly include a certbot allowing for ease of deployment of public facing services out of the box.
- Preconfigured ingress: Certificate Management: Optional inclusion of Certbot in Kubernetes clusters facilitates straightforward deployment of publicly accessible services.
Preconfigured Ingress: Kubernetes clusters come with a preconfigured Nginx ingress, designed with OpenStack in mind, providing a ready-to-use ingress solution with enhancements like out-of-the-box client source IP visibility.
- Integration with OpenStack: The Cluster Stacks are tailored to work seamlessly with SCS IaaS (OpenStack), thus offering a unified platform for managing both containers and the underlying infrastructure.
- Container Registry Integration: The container layer has an optional container registry, facilitating easy management and deployment of container images.
- Cluster Addons: Cluster Stacks come with a small default set of workload applications needed to make the cluster usable, such as CNI plugin, CSI plugin and a cloud controller manager.

### Limitations

- OpenStack Dependency: The current design primarily supports OpenStack environments, which could be a limitation for other infrastructure setups.
- Serverless/Functions as a Service Support: Lack of direct support for serverless containers and Functions as a Service (FaaS) might require additional tools or platforms.

### Current state and future Outlook

The container layer has matured with multiple cloud providers now offering Kubernetes as a Service using this layer to manage a multitude of clusters. It follows a half-yearly release schedule to ensure security and up-to-date Kubernetes clusters, alongside providing backports for significant features into older versions.

Looking ahead, a new version based on ClusterStacks is in the pipeline, currently in its Alpha state. This upcoming release aims to be backward compatible, facilitating smooth migration from existing setups, and further extending the capabilities of the SCS container layer.
10 changes: 2 additions & 8 deletions docs/index.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -37,16 +37,10 @@ manual.

### Container Layer

#### K8s Cluster API Provider

You can easily deploy the container layer on top of the testbed (or a production
SCS cloud) checking out the code from
[k8s-cluster-api-provider](https://github.com/SovereignCloudStack/k8s-cluster-api-provider/).

#### Cluster Stacks

With the Cluster Stacks, in the V2 KaaS reference implementation, we provide an opinionated optimized configuration of Kubernetes clusters. Through better packaging, integrated testing, and bundled configuration, SCS-based Kubernetes clusters provide easier individualization.
Throughout the R6 development cycle Cluster Stacks are taken from a technical preview to be [functional and available on top of the IaaS reference implementation](https://github.com/SovereignCloudStack/issues/milestone/8) as well to replace the V1 KaaS reference implementation [k8s-cluster-api-provider](https://github.com/SovereignCloudStack/k8s-cluster-api-provider/).
With the Cluster Stacks, in the V2 KaaS reference implementation, we provide an opinionated optimized configuration of Kubernetes clusters. Through better packaging, integrated testing, and bundled configuration, SCS-based Kubernetes clusters provide easier individualization.
Throughout the R6 development cycle Cluster Stacks are taken from a technical preview to be [functional and available on top of the IaaS reference implementation](https://github.com/SovereignCloudStack/issues/milestone/8) as well to replace the V1 KaaS reference implementation [k8s-cluster-api-provider](https://github.com/SovereignCloudStack/k8s-cluster-api-provider/).
The Cluster Stacks can already be tried with the [demo](https://github.com/SovereignCloudStack/cluster-stacks-demo) repository. Although this is based on the not-production-ready Docker provider, the usage is the same for every provider.

### Public SCS Clouds in production
Expand Down
118 changes: 118 additions & 0 deletions docs/turnkey-solution/hardware-landscape.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,118 @@
---
sidebar_label: Hardware-Landscape
sidebar_position: 99
---

# The SCS Hardware-Landscape

![An image of the SCS hardware landscape rack](images/combined_rack_visual.jpg)

## General information

The general aim of this environment is to install and operate the SCS reference implementation on hardware.
In addition to the classic tasks in the area of quality assurance, the environment is also used to evaluate
new concepts in the underlay/overlay network area, as a test environment for hardware-related developments,
as a demonstration environment for interested parties and as a publicly accessible blueprint for users.
The environment is designed for long-term use which a varying circle of users.

The environment consists of 21 server and 12 switch components. The selection of hardware and the
functions and properties used was designed so that the focus is on generally available or characteristic
functions and dependency on manufacturer-specific functions is avoided. Instead of the x86 servers or SONiC
switches used here, the realised environment could also be realised with hardware from other manufacturers.

From 1 January 2025, the environment will be operated by [forum SCS-Standards](https://scs.community/2024/10/23/osba-forum-scs-standards/)
and the participating companies.

## Tasks and Objectives

The tasks and objectives of the environments can be summarised as follows:

* The division into several environments makes it possible to run a lab as well as to map a productive environment (near-live operation).
* Operation of the compliance monitor (automated test for conformity with the SCS standards)
* Implementation and validation of the developed standards in a reference environment
* Analysis of problems in the interaction with the standards
* Provision of proof-of-concept installations for interested parties who want to use, promote or further develop the project
* The environment can be used by members of the SCS Standards forum and by contributors to the SCS community
as a development and test environment for open-source development in connection with the further development
of the SCS standards, SCS reference implementation and other relevant software components ('open-lab'/'near-live laboratory').
* Continuous Integration Environment ('Zuul as a Service') - Operation of non-critical zuul worker instances

## Installation details

The available hardware was divided into two distinct application areas:

* The **lab environment** consists exclusively of switch hardware used to evaluate, test and develop
concepts in the area of 'Software Defined Networking'. This means that various switch models can be
used to test and implement development tasks in the area of the open [SONiC](https://sonicfoundation.dev/) NOS
(a network operating system based on Debian Linux) or provisioning automation tasks in the SONiC environment with the
open-source system Netbox, a solution that is used primarily for IPAM and DCIM (IP Address Management, Data Center Infrastructure Management).
* The **production environment** is an exemplary installation of the relevant or most reference implementations with regard to an
SCS system. It follows a configuration or approach that is based on the needs and circumstances of a real and much larger environment.
To this end, characteristic infrastructure components were automatically installed on the manager nodes used for the installation.

The setup of the entire environment is designed in such a way that it can be reproducibly restored or reset.
Therefore, the Ansible automation available via OSISM was used in many areas.
Areas that could not be usefully automated using Ansible were implemented using a Python command-line tooling stored in the GIT repository.

## Available documentation

The primary point of information and orientation is the [*readme file*](https://github.com/SovereignCloudStack/hardware-landscape?tab=readme-ov-file#references)
which is stored at the top level of the [configuration repository](https://github.com/SovereignCloudStack/hardware-landscape).

The relevant **References** section refers here to the individual documentation areas.

## Specific installation and configuration details

* Processes for access management to the environment (2 VPN gateways, SSH logins, SSH profiles,..) have been implemented
* The production and lab environments have been set up, automated and documented as described above
* The complete environment is managed in a [GIT repository](https://github.com/SovereignCloudStack/hardware-landscape),
adjustments and further developments are managed via GIT merge requests
* Almost all installation steps are [documented and automated](https://github.com/SovereignCloudStack/hardware-landscape/blob/main/documentation/System_Deployment.md)
based on a pure rack installation (The setup is extensively documented, in particular the few manual steps)
* The entire customized setup of the nodes is [implemented by OSISM/Ansible](https://github.com/SovereignCloudStack/hardware-landscape/tree/main/environments/custom)
* All secrets (e.g. passwords) of the environment are stored and versioned in the encrypted Ansible Vault in
the repository (when access is transferred, rekeying can be used to change the access or the rights to it).
* A far-reaching or in-depth automation has been created that allows the environment to be re-set up or parts of it to
be re-set up with a reasonable amount of personnel.
* The setup of the basic environment was implemented appropriately with Ansible and using the OSISM environment (the reference implementation)
* Python tooling was created that adds areas that are specific to the use case of the environment and provides functions that simplify the operation of the infrastructure
* Server systems
* Backup and restore of the hardware configuration
* Templating of the BMC configuration
* Automatic installation of the operating system base image via Redfish Virtual Media
* Control of the server status via command line (to stop and start the system for test, maintenance and energy-saving purposes)
* Generation of base profiles for the Ansible Inventory based on the hardware key data stored in the documentation
* Switches
* Backup and restore of the switch configuration
* Generation of base profiles for the Ansible Inventory based on the hardware key data stored in the documentation
* Network setup
* The two management hosts act as redundant VPN gateways, ssh jumphosts, routers and uplink routers
* The system is deployed with a layer 3 underlay concept
* An "eBGP router on the host" is implemented for the node-interconnectivity
(all nodes and all switches are running FRR instances)
* All Ceph and Openstack nodes of the system do not have a direct upstream routing
(access is configured and provided by HTTP-, NTP and DNS-proxies)
* For security reasons, the system itself can only be accessed via VPN.
The provider network of the production environment is realized with a VXLAN which is terminated on the managers for routing
('a virtual provider network')).
* The basic node installation was realised in such a way that specific [node images](https://github.com/osism/node-image)
are created for the respective rack, which make the operation or reconfiguration of network equipment for PXE bootstrap
unnecessary. (Preliminary stage for rollout via OpenStack Ironic)
* The management of the hardware (BMC and switch management) is implemented with a VLAN
* Routing, firewalling and NAT is managed by a NFTables Script which adds rules in a idempotent way to the existing rules
of the manager nodes.
* The [openstack workload generator](https://github.com/SovereignCloudStack/openstack-workload-generator) is used put test workloads
on the system
* Automated creation of OpenStack domains, projects, servers, networks, users, etc.
* Launching test workloads
* Dismantling test workloads
* An observability stack was built
* Prometheus for metrics
* Opensearch for log aggregation
* Central syslog server for the switches on the managers (recorded via the manager nodes in Opensearch)
* Specific documentation created for the project
* Details of the hardware installed in the environment
* The physical structure of the environment was documented in detail (rack installation and cabling)
* The technical and logical structure of the environment was documented in detail
* A FAQ for handling the open-source network operating system SONiC was created with relevant topics for the test environment
* As part of the development, the documentation and implementation of the OSISM reference implementation was significantly improved (essentially resulting from
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
2 changes: 1 addition & 1 deletion docusaurus.config.js
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ const config = {
baseUrl: '/',
onBrokenLinks: 'warn',
onBrokenMarkdownLinks: 'warn',
favicon: 'img/favicon.ico',
favicon: 'img/favicon.png',
markdown: {
mermaid: true
},
Expand Down
Loading

0 comments on commit f5138c1

Please sign in to comment.