diff --git a/docs/turnkey-solution/hardware-landscape.md b/docs/turnkey-solution/hardware-landscape.md index 95dbf00537..7e19133ecc 100644 --- a/docs/turnkey-solution/hardware-landscape.md +++ b/docs/turnkey-solution/hardware-landscape.md @@ -60,3 +60,59 @@ The primary point of information and orientation is the [*readme file*](https:// which is stored at the top level of the [configuration repository](https://github.com/SovereignCloudStack/hardware-landscape). The relevant **References** section refers here to the individual documentation areas. + +## Specific installation and configuration details + +* Processes for access management to the environment (2 VPN gateways, SSH logins, SSH profiles,..) have been implemented +* The production and lab environments have been set up, automated and documented as described above +* The complete environment is managed in a [GIT repository](https://github.com/SovereignCloudStack/hardware-landscape), + adjustments and further developments are managed via GIT merge requests +* Almost all installation steps are [documented and automated](https://github.com/SovereignCloudStack/hardware-landscape/blob/main/documentation/System_Deployment.md) + based on a pure rack installation (The setup is extensively documented, in particular the few manual steps) + * The entire customized setup of the nodes is [implemented by OSISM/Ansible] + * All secrets (e.g. passwords) of the environment are stored and versioned in the encrypted Ansible Vault in i + the repository (when access is transferred, rekeying can be used to change the access or the rights to it). + * A far-reaching or in-depth automation has been created that allows the environment to be re-set up or parts of it to + be re-set up with a reasonable amount of personnel. + * The setup of the basic environment was implemented appropriately with Ansible and using the OSISM environment (the reference implementation) + * Python tooling was created that adds areas that are specific to the use case of the environment and provides functions that simplify the operation of the infrastructure + * Server systems + * Backup and restore of the hardware configuration + * Templating of the BMC configuration + * Automatic installation of the operating system base image via Redfish Virtual Media + * Control of the server status via command line (to stop and start the system for test, maintenance and energy-saving purposes) + * Generation of base profiles for the Ansible Inventory based on the hardware key data stored in the documentation + * Switches + * Backup and restore of the switch configuration + * Generation of base profiles for the Ansible Inventory based on the hardware key data stored in the documentation +* Network setup + * The two management hosts act as redundant VPN gateways, ssh jumphosts, routers and uplink routers + * The system is deployed with a layer 3 underlay concept + * An "eBGP router on the host" is implemented for the node-interconnectivity + (all nodes and all switches are running FRR instances) + * All Ceph and Openstack nodes of the system do not have a direct upstream routing + (access is configured and provided by HTTP-, NTP and DNS-proxies) + * For security reasons, the system itself can only be accessed via VPN. + The provider network of the production environment is realized with a VXLAN which is terminated on the managers for routing + ('a virtual provider network')). + * The basic node installation was realised in such a way that specific [node images](https://github.com/osism/node-image) + are created for the respective rack, which make the operation or reconfiguration of network equipment for PXE bootstrap + unnecessary. (Preliminary stage for rollout via OpenStack Ironic) + * The management of the hardware (BMC and switch management) is implemented with a VLAN + * Routing, firewalling and NAT is managed by a NFTables Script which adds rules in a idempotent way to the existing rules + of the manager nodes. +* The [openstack workload generator](https://github.com/SovereignCloudStack/openstack-workload-generator) is used put test workloads + on the system + * Automated creation of OpenStack domains, projects, servers, networks, users, etc. + * Launching test workloads + * Dismantling test workloads +* An observability stack was built + * Prometheus for metrics + * Opensearch for log aggregation + * Central syslog server for the switches on the managers (recorded via the manager nodes in Opensearch) +* Specific documentation created for the project + * Details of the hardware installed in the environment + * The physical structure of the environment was documented in detail (rack installation and cabling) + * The technical and logical structure of the environment was documented in detail + * A FAQ for handling the open-source network operating system SONiC was created with relevant topics for the test environment + * As part of the development, the documentation and implementation of the OSISM reference implementation was significantly improved (essentially resulting from