-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
standardizing IPv4 networking in SCS #522
base: main
Are you sure you want to change the base?
Conversation
… and its components.
…working and its components.
|
||
Quota: The standard quota of floating IPs and routers **SHOULD** be rather small, e.g. 3-5 floating IPs. This ensures a more fair distribution of these resources for all cloud users. If a user wants to use more of these resources, the user **SHOULD** be able to pay for more. | ||
|
||
IP Usage Monitoring: SCS CSPs **SHOULD** implement monitoring solutions to track the utilization of IPv4 addresses. This facilitates efficient management of resources and supports capacity planning efforts. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is "utilization of IPv4 addresses" referring to all IPv4 addresses (incl. private ones in tenant networks) or just floating IPs? If the latter, please clarify this in the sentence.
Usage of Neutron Routers: To manage traffic between internal and external networks Neutron Routers **MUST** be used as the default gateway for VMs requiring access to external networks and the internet, thereby facilitating the routing of traffic and enhancing network security. | ||
|
||
CSPs **SHOULD** use OVN or L3agent as High Availability (HA) service deployments. | ||
Standard external networks **MUST NOT** be made accessible as _shared networks_. It is advised that external networks are only reachable by the usage of routing and floating IPs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think a glossary at the beginning of the document would be useful. There is a lot of terminology related to networks in Neutron. For example:
Neutron seems to call a specific kind of external networks "provider networks"1 (I believe this is what the paragraph is referring to?). In some other examples, Neutron calls networks "external" although they have router:external=Internal
set2 and calls other networks "public" instead. Then there's the shared
attribute of networks as well, which also affects their classification depending on its setting3.
If we are enforcing things here (MUST / MUST NOT), we need to be very clear about what exactly we are referring to in my opinion. Depending on the context, "external networks" might be ambiguous. Same might go for "shared" in case it is not referring to the verbatim attribute but a topology classification.
Footnotes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason, why we want to forbid this should be stated here as this led to confusion in the IaaS call
|
||
#### _Neutron Plugins_ | ||
|
||
Neutron Plugins: A SCS conform CSP **MAY** use RBAC and VPNaaS plugins. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if we should really include a general section about Neutron plugins (or extensions) here. There is a lot of them and most of them are specific to a topic1. It's hard to cover exhaustively in my opinion. The current sentence might imply to the reader that other plugins/extensions are not allowed for IPv4 networking, which I don't think is the goal. Maybe we should limit this document to giving instructions regarding those directly related to IPv4 networking and leave others open but I don't know where to draw the line to be honest.
Footnotes
-
for example DNS plugins, see https://github.com/SovereignCloudStack/issues/issues/229#issuecomment-2018465278 ↩
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does a current list of plugins exist? When I look into the code 1 I only see the ML2 Plugin. Also the documentation does not mention a plugin list (anymore). It seems to me, that most of it was removed or moved to extensions
. When looking into the openstack repository on github, there are some deprecated plugins, a lot of charms-repos, some repos that could be plugins or agents and some other stuff 2.
From working in secustack I know it is possible to add custom made Plugins.
I looked into my devstack config and it does not specify any plugins, except the ML2:
[DEFAULT]
service_plugins = ovn-router
rpc_state_report_workers = 0
api_workers = 2
notify_nova_on_port_data_changes = True
notify_nova_on_port_status_changes = True
auth_strategy = keystone
debug = True
core_plugin = ml2
dhcp_agent_notification = False
transport_url = rabbit://stackrabbit:MuchSecretSuchW0W@192.168.23.238:5672/
logging_exception_prefix = ERROR %(name)s ^[[01;35m%(instance)s^[[00m
logging_default_format_string = %(color)s%(levelname)s %(name)s [^[[00;36m-%(color)s] ^[[01;35m%(instance)s%(color)s%(message)s^[[00m
logging_context_format_string = %(color)s%(levelname)s %(name)s [^[[01;36m%(global_request_id)s %(request_id)s ^[[00;36m%(project_name)s %(user_name)s%(color)s] ^[[01;35m%(instance)s%(color>
logging_debug_format_suffix = ^[[00;33m{{(pid=%(process)d) %(funcName)s %(pathname)s:%(lineno)d}}^[[00m
bind_host = 0.0.0.0
use_syslog = False
state_path = /opt/stack/data/neutron
So all extensions seem to be usable all the time - because I was able to test the network rbac
.
My conclusion: We should not state anything about plugins here, as they are poorly documented, not well maintained or even completely customized. We could discuss about letting CSPs add customized plugins. But all of these plugins while touching networking issues should not interfere with the scope of this standard. So exclude this, add a new issue and maybe we will find out enough for a new standard or maybe we just have a short note about Plugins overall on the docs-page.
Footnotes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some background for this as well: Neutron used to include a number of monolithic core plugins, but they have all been converted to drivers for the the ML2 core plugin, as well as service plugins that build on top of that.
The available service plugins are defined in neutrons setup.cfg, and CSPs can also configure additional external plugins there. Each plugin implementation has to declare the API extensions that it provides, most of which are defined in neutron-lib. The most prominent ones are router
and router-ovn
which implement virtual routers and floating IPs for l3agent and ovn, respectively.
A number of API extensions are also implemented by the ML2 core plugin itself, such as subnetpools, security groups, and the different rbac extensions. As such there is no separate plugin for RBAC.
There used to be a VPNaaS plugin in neutron but it has been removed at some point, though the definition for the API extension still exists in neutron-lib.
|
||
Security Group Policies: Standardized security group policies **SHOULD** be applied to all instances utilizing public IPv4 addresses. These policies must define and enforce access | ||
controls to ensure the security of the cloud environment. | ||
Security Groups **SHOULD** be enabled by default but **MUST** be capable of being switched off. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Security Groups **SHOULD** be enabled by default but **MUST** be capable of being switched off. | |
Security Groups **SHOULD** be enabled by default but **MUST** be capable of being switched off by allowing port security to be disabled. |
Security Groups don't have an "off switch" per se, they are implicitly disabled once port security is disabled for a port or whole network a port is created in. Is this what you are referring to?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@markus-hentsch is correct Security Groups are part of Neutron extensions and thus cannot be switched of: https://github.com/openstack/neutron/blob/master/neutron/extensions/securitygroup.py
Security Groups are always there and being used in VMs. Even if a user does not specify anything - in that case the default VM is being used.
Nevertheless this topic does not interfere with the scope of this standard and should be included. But as we maybe want to include an architecture definition here, it would be good to reference all the work we do for security groups.
And I wonder, what do you mean with security group policies
? Do you mean the default rules for security groups? -> In that case you can just link the DR from me and maybe later on the guide and the standard:
DR: https://github.com/SovereignCloudStack/standards/blob/main/Standards/scs-0113-v1-security-groups-decision-record.md
guide and standard for the rules are still in progress.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll add more commentary later, thanks for providing the first draft of this!
|
||
## Design Considerations | ||
|
||
### Options considered |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be also worthwhile to add some other options, if any where considered, and why they where not considered. It's also possible to link a decision record document, once the breakout session around this document has taken place.
It also wouldn't hurt if it was explicitly mentioned if no other options where considered and why.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discusses in the IaaS call, we should start with the architecture here.
To me every other Option considered depends on the decision for or against an architecture.
This document outlines the standardized approach for the management and allocation of | ||
public IPv4 addresses within Sovereign Cloud Stack (SCS) environments. Its aim is to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This document outlines the standardized approach for the management and allocation of | |
public IPv4 addresses within Sovereign Cloud Stack (SCS) environments. Its aim is to | |
This document outlines the standardized approach for the management and allocation of | |
public IPv4 addresses and security groups within Sovereign Cloud Stack (SCS) environments. Its aim is to |
not sure how to fix this, so there are multiple problems and options here:
- The introduction only talk about public IPv4 addresses, but later on there are specifics about floating IPs and other IPs, suggesting that the "other" IPs are non public? -> This could simply be clarified, if only public IPs are in scope of the standard.
- The introduction makes no mention of security groups, neutron routers and neutron plugins -> these could either be mentioned explicitly here as well or be declared out of scope for this standard.
- there are already drafts on how to formulate security groups, default security groups etc. I feel there is a large overlap here, and I think it would be good to focus the effort around security groups in a single document, and not litter many documents with possibly the same content which quickly will get out of sync. If we must - which makes sense from a security pov - we can link to a central security groups document and add context where needed. I guess @josephineSei has some opinions on this topic as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with @artificial-intelligence on these points.
From my opinion we nee a strict focus on what do we want to standardize in this document. I still miss this focus here. And I would rather go for smaller standards (e.g. the standard for default security group rules will link the DR and the guide for Security Groups) so that we don't mix up to many topics.
I would see the focus on the architecture first here. So describing default networking structure and listing all required neutron plugins.
- mentioning other resources is not problematic, but as @artificial-intelligence said, there should be a clear line and (maybe later on) links to documents/standards/guides describing these (e.g. security groups). This would also help to keep the focus.
The motivation behind establishing this standard is to enhance interoperability, improve | ||
security measures, and streamline the operational processes across different SCS clouds. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is good!
What I miss though, is some analysis - probably it belongs in the ## Design Considerations
section - what are the current problem in these areas in real world deployments, so we have logically train of thought from "exact problem we are facing -> solution".
e.g. in which ways are current security measures not good enough, where are gaps?
Thinking about it, this could probably also be moved to a decision record, not sure though.
|
||
Usage of Neutron Routers: To manage traffic between internal and external networks Neutron Routers **MUST** be used as the default gateway for VMs requiring access to external networks and the internet, thereby facilitating the routing of traffic and enhancing network security. | ||
|
||
CSPs **SHOULD** use OVN or L3agent as High Availability (HA) service deployments. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I miss some reasoning why exactly I need to use OVN or L3 agent, also notice that due to some intricacies L3 agent is not really HA currently in some edge cases, like upgrades/reboots, depending on your exact setup - this technical discussion is probably out of scope for this document though.
to be clear I'm totally for using ovn, but we should write something down why we encourage it's use, what are advantages etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe it would help to list other options and why we would prefer OVN.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to add some background on this point: this is about high availability of virtual routers, i.e. replicating virtual routers across multiple network nodes and then allowing failover via VRRP in case of a node failure, a feature that is supported by both virtual router implementations included in neutron (OVN and L3agent), but may be mutually exclusive with distributed virtual routing (DVR, the implementation of virtual routers on the compute nodes).
So this is not really an endorsement of OVN or L3agent, it is just that those are the two available implementations. There might be proprietary service plugins to replace them, but every driver that is not OVN seems to just use L3agent.
This feature is also invisible to tenants, and I'm not sure if it should be part of this specific standard. We should probably have a different discussion about where and how to mandate HA features, maybe in the context of #527.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are many things discussed in thins standard. As it will hopefully lead to an architecture we standardize, mentioning all networking things is appropiate. But we should keep a clear focus and delegate anything that has nothing to do with the architecture and the workflow of getting a floating IPv4 (or can be a simple separate topic) to other issues.
This document outlines the standardized approach for the management and allocation of | ||
public IPv4 addresses within Sovereign Cloud Stack (SCS) environments. Its aim is to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with @artificial-intelligence on these points.
From my opinion we nee a strict focus on what do we want to standardize in this document. I still miss this focus here. And I would rather go for smaller standards (e.g. the standard for default security group rules will link the DR and the guide for Security Groups) so that we don't mix up to many topics.
I would see the focus on the architecture first here. So describing default networking structure and listing all required neutron plugins.
- mentioning other resources is not problematic, but as @artificial-intelligence said, there should be a clear line and (maybe later on) links to documents/standards/guides describing these (e.g. security groups). This would also help to keep the focus.
|
||
## Design Considerations | ||
|
||
### Options considered |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discusses in the IaaS call, we should start with the architecture here.
To me every other Option considered depends on the decision for or against an architecture.
|
||
Usage of Neutron Routers: To manage traffic between internal and external networks Neutron Routers **MUST** be used as the default gateway for VMs requiring access to external networks and the internet, thereby facilitating the routing of traffic and enhancing network security. | ||
|
||
CSPs **SHOULD** use OVN or L3agent as High Availability (HA) service deployments. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe it would help to list other options and why we would prefer OVN.
|
||
#### _Neutron Routers_ | ||
|
||
Usage of Neutron Routers: To manage traffic between internal and external networks Neutron Routers **MUST** be used as the default gateway for VMs requiring access to external networks and the internet, thereby facilitating the routing of traffic and enhancing network security. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We may add a line stating that routing between internal networks of the same project SHOULD be done by Neutron routers, so we officially recommend a way but do not forbid other options
public IPv4 addresses within Sovereign Cloud Stack (SCS) environments. Its aim is to | ||
ensure a consistent, secure, and efficient methodology for IP address provisioning | ||
across all SCS cloud services. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As mentioned by @markus-hentsch a glossary should be added here. Something like:
Term | Meaning |
---|---|
external network | Neutron Network with the external flag that is bound to an outgoing provider network |
internal network | Neutron Network that is created by a customers project |
OVN | ..... |
router | ..... |
floating IP | ..... |
.... | ..... |
|
||
Security Group Policies: Standardized security group policies **SHOULD** be applied to all instances utilizing public IPv4 addresses. These policies must define and enforce access | ||
controls to ensure the security of the cloud environment. | ||
Security Groups **SHOULD** be enabled by default but **MUST** be capable of being switched off. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@markus-hentsch is correct Security Groups are part of Neutron extensions and thus cannot be switched of: https://github.com/openstack/neutron/blob/master/neutron/extensions/securitygroup.py
Security Groups are always there and being used in VMs. Even if a user does not specify anything - in that case the default VM is being used.
Nevertheless this topic does not interfere with the scope of this standard and should be included. But as we maybe want to include an architecture definition here, it would be good to reference all the work we do for security groups.
And I wonder, what do you mean with security group policies
? Do you mean the default rules for security groups? -> In that case you can just link the DR from me and maybe later on the guide and the standard:
DR: https://github.com/SovereignCloudStack/standards/blob/main/Standards/scs-0113-v1-security-groups-decision-record.md
guide and standard for the rules are still in progress.
|
||
#### _Quota & Monitoring_ | ||
|
||
Quota: The standard quota of floating IPs and routers **SHOULD** be rather small, e.g. 3-5 floating IPs. This ensures a more fair distribution of these resources for all cloud users. If a user wants to use more of these resources, the user **SHOULD** be able to pay for more. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes the Quota is important and belongs into this standard. We can still argue about the number :)
#### _External Network Naming_ | ||
|
||
All SCS clouds **SHOULD** adopt the naming convention | ||
scs-external-net for external networks. This standardization facilitates easier identification and management of external network resources. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have to clarify why we would need a Naming convention at all. Renaming networks without good reasons is not really helpful.
Pre-statement: external networks can be listed with openstack network list --external
.
- How many CSPs have more than 1 external network (for IPv4) ?
- How many CSPs have multiple different subnets for one external network == more than one floating-IP-pool ?
- Are there external networks that are only for one specific customer (i have seen something like this)?
Pro:
- distinguish between IPv4 and IPv6 external networks
Con:
- how to deal with multiple IPv4 external networks?
- use other options like tags:
stack@devstack:~/devstack$ openstack network show public
+---------------------------+----------------------------------------------------------------------------+
| Field | Value |
+---------------------------+----------------------------------------------------------------------------+
| admin_state_up | UP |
| availability_zone_hints | |
| availability_zones | |
| created_at | 2024-01-24T16:12:31Z |
| description | |
| dns_domain | None |
| id | 73edb86b-d7ab-4db3-82b7-25fa8b012e40 |
| ipv4_address_scope | None |
| ipv6_address_scope | None |
| is_default | True |
| is_vlan_transparent | None |
| mtu | 1500 |
| name | public |
| port_security_enabled | True |
| project_id | 15f2ab0eaa5b4372b759bde609e86224 |
| provider:network_type | flat |
| provider:physical_network | public |
| provider:segmentation_id | None |
| qos_policy_id | None |
| revision_number | 4 |
| router:external | External |
| segments | None |
| shared | False |
| status | ACTIVE |
| subnets | 3e0206bc-53c8-44ca-a0f1-2c2548bba766, 84dffd43-6d7f-4c2f-9180-8f0f0b83c9d4 |
| tags | IPv4 |
| tenant_id | 15f2ab0eaa5b4372b759bde609e86224 |
| updated_at | 2024-03-28T09:39:03Z |
+---------------------------+----------------------------------------------------------------------------+
stack@devstack:~/devstack$ openstack network list --external --long --tag IPv4
+----------------------------+--------+--------+----------------------------+-------+--------+----------------------------+--------------+-------------+--------------------+------+
| ID | Name | Status | Project | State | Shared | Subnets | Network Type | Router Type | Availability Zones | Tags |
+----------------------------+--------+--------+----------------------------+-------+--------+----------------------------+--------------+-------------+--------------------+------+
| 73edb86b-d7ab-4db3-82b7- | public | ACTIVE | 15f2ab0eaa5b4372b759bde609 | UP | False | 3e0206bc-53c8-44ca-a0f1- | flat | External | | IPv4 |
| 25fa8b012e40 | | | e86224 | | | 2c2548bba766, 84dffd43- | | | | |
| | | | | | | 6d7f-4c2f-9180- | | | | |
| | | | | | | 8f0f0b83c9d4 | | | | |
+----------------------------+--------+--------+----------------------------+-------+--------+----------------------------+--------------+-------------+--------------------+------+
I am pretty much for investigating those tags! Help from CSPs is wanted (we don't want to accidentally render a network not working anymore :D )
#### _Floating IPs_ | ||
|
||
Floating IPs for Dynamic Allocation: Utilization of Floating IPs to allow dynamic reassignment of public IPv4 addresses to different instances (VMs or Loadbalancers), facilitating high availability and fault tolerance. | ||
Floating IPs **MUST** be enabled. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again the question: Can they be disabled? OR just not set in pools?
|
||
- Naming Convention Flexibility: How rigid should the naming convention for external | ||
networks be across various SCS clouds? | ||
- Load Balancing: Do we want to dictate a Load Balancer or a set of Load Balancers or nothing at all? E.g. Octavia, Yawol |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should keep a focus and discuss this important question in another issue.
(EDIT: comment moved here: SovereignCloudStack/issues#167 (comment)) |
Current status of this topic & some open questions:
Key suggestions:
Open Questions:
Todos:
|
I'm trying to restructure the standard a bit and add some background and justification to the proposals. EDIT: I moved the draft to its own PR: #572 |
For the record, that topic was discussed recently in a SCS IaaS meeting, initiated by kgube. |
This is the initial draft document to standardize IPv4 networking in the context of SCS.