Skip to content

Commit

Permalink
docs: add some missing info to aggregated cluster docs (#37845)
Browse files Browse the repository at this point in the history
Signed-off-by: Rohit Agrawal <rohit.agrawal@databricks.com>
  • Loading branch information
agrawroh authored Jan 3, 2025
1 parent f7a8635 commit b0d58be
Showing 1 changed file with 75 additions and 25 deletions.
100 changes: 75 additions & 25 deletions docs/root/intro/arch_overview/upstream/aggregate_cluster.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,25 +3,40 @@
Aggregate Cluster
=================

Aggregate cluster is used for failover between clusters with different configuration, e.g., from EDS
upstream cluster to STRICT_DNS upstream cluster, from cluster using ROUND_ROBIN load balancing
policy to cluster using MAGLEV, from cluster with 0.1s connection timeout to cluster with 1s
connection timeout, etc. Aggregate cluster loosely couples multiple clusters by referencing their
name in the :ref:`configuration <envoy_v3_api_msg_extensions.clusters.aggregate.v3.ClusterConfig>`. The
fallback priority is defined implicitly by the ordering in the :ref:`clusters list <envoy_v3_api_field_extensions.clusters.aggregate.v3.ClusterConfig.clusters>`.
Aggregate cluster uses tiered load balancing. The load balancer chooses cluster and priority first
and then delegates the load balancing to the load balancer of the selected cluster. The top level
load balancer reuses the existing load balancing algorithm by linearizing the priority set of
multiple clusters into one.
An aggregate cluster allows you to set up failover between multiple upstream clusters that have different
configurations. For example, you might switch from an :ref:`EDS <arch_overview_service_discovery_types_eds>` cluster to
a :ref:`STRICT_DNS <arch_overview_service_discovery_types_strict_dns>` cluster, or from a cluster using
:ref:`ROUND_ROBIN <arch_overview_load_balancing_types_round_robin>` load balancing to one using
:ref:`MAGLEV <arch_overview_load_balancing_types_maglev>`. You can also use it to change timeouts, such as moving from
a ``0.1s`` connection timeout to a ``1s`` timeout.

To enable this failover, the aggregate cluster references other clusters by their names in the
:ref:`configuration <envoy_v3_api_msg_extensions.clusters.aggregate.v3.ClusterConfig>`. The ordering of these clusters
in the :ref:`clusters list <envoy_v3_api_field_extensions.clusters.aggregate.v3.ClusterConfig.clusters>` implicitly
defines the fallback priority.

The aggregate cluster uses a tiered approach to load balancing:

* At the top level, it decides which cluster and priority to use.
* It then hands off the actual load balancing to the selected cluster’s own load balancer.

Internally, this top-level load balancer treats all the priorities across all referenced clusters as a single linear
list. By doing so, it reuses the existing load balancing algorithm and makes it possible to seamlessly shift traffic
between clusters as needed.

Linearize Priority Set
----------------------

Upstream hosts are divided into multiple :ref:`priority levels <arch_overview_load_balancing_priority_levels>`
and each priority level contains a list of healthy, degraded and unhealthy hosts. Linearization is
used to simplify the host selection during load balancing by merging priority levels from multiple
clusters. For example, primary cluster has 3 priority levels, secondary has 2 and tertiary has 2 and
the failover ordering is primary, secondary, tertiary.
Upstream hosts are grouped into different :ref:`priority levels <arch_overview_load_balancing_priority_levels>`, and
each level includes hosts that can be healthy, degraded, or unhealthy. To simplify host selection during load balancing,
linearization merges these priority levels across multiple clusters into a single sequence.

For example, if the primary cluster has three priority levels, and the secondary and tertiary clusters each have two,
the failover order is:

* Primary
* Secondary
* Tertiary

+-----------+----------------+-------------------------------------+
| Cluster | Priority Level | Priority Level after Linearization |
Expand All @@ -41,6 +56,9 @@ the failover ordering is primary, secondary, tertiary.
| Tertiary | 1 | 6 |
+-----------+----------------+-------------------------------------+

This approach ensures a straightforward way to decide which hosts receive traffic based on priority, even when working
with multiple clusters.

Example
-------

Expand All @@ -61,18 +79,48 @@ A sample aggregate cluster configuration could be:
- secondary
- tertiary
Note: :ref:`PriorityLoad retry plugins <envoy_v3_api_field_config.route.v3.RetryPolicy.retry_priority>` won't
work for aggregate cluster because the aggregate load balancer will override the *PriorityLoad*
during load balancing.
Important Considerations for Aggregate Clusters
-----------------------------------------------

Some features might not work as expected with aggregate clusters. For example,

PriorityLoad Retry Plugins
^^^^^^^^^^^^^^^^^^^^^^^^^^

:ref:`PriorityLoad retry plugins <envoy_v3_api_field_config.route.v3.RetryPolicy.retry_priority>` will not work with an
aggregate cluster. Because the aggregate cluster’s load balancer controls traffic distribution at a higher level, it
effectively overrides the PriorityLoad behavior during load balancing.

Stateful Sessions
^^^^^^^^^^^^^^^^^

:ref:`Stateful Sessions <envoy_v3_api_msg_extensions.filters.http.stateful_session.v3.StatefulSession>` rely on the
cluster to directly know the endpoint receiving traffic. With an aggregate cluster, the top-level load balancer selects
a cluster first, but does not track specific endpoints inside that cluster.

If we configure Stateful Sessions to override the upstream address, the load balancer bypasses its usual algorithm to
send traffic directly to that host. This works only when the cluster itself knows the exact endpoint.

In an aggregate cluster, the final routing decision happens one layer beneath the aggregate load balancer, so the filter
cannot locate that specific endpoint at the aggregate level. As a result, Stateful Sessions are incompatible with
aggregate clusters, because the final cluster choice is made without direct knowledge of the specific endpoint which
doesn’t exist at the top level.

Load Balancing Example
----------------------

Aggregate cluster uses tiered load balancing algorithm and the top tier is distributing traffic to
different clusters according to the health score across all :ref:`priorities <arch_overview_load_balancing_priority_levels>`
in each cluster. The aggregate cluster in this section includes two clusters which is different from
what the above configuration describes.
Aggregate cluster uses tiered load balancing algorithm and the top tier is distributing traffic to different clusters
according to the health score across all :ref:`priorities <arch_overview_load_balancing_priority_levels>` in each
cluster. The aggregate cluster in this section includes two clusters which is different from what the above
configuration describes.

The aggregate cluster uses a tiered load balancing algorithm with two main steps:

* **Top Tier:** Distribute traffic across different clusters based on each cluster’s overall health (across all
:ref:`priorities <arch_overview_load_balancing_priority_levels>`).
* **Second Tier:** Once a cluster is chosen, delegate traffic distribution within that cluster to its own load balancer
(e.g., :ref:`ROUND_ROBIN <arch_overview_load_balancing_types_round_robin>`,
:ref:`MAGLEV <arch_overview_load_balancing_types_maglev>`, etc.).

+-----------------------------------------------------------------------------------------------------------------------+--------------------+----------------------+
| Cluster | Traffic to Primary | Traffic to Secondary |
Expand Down Expand Up @@ -100,9 +148,11 @@ what the above configuration describes.
| 0% | 0% | 0% | 72% | 0% | 0% | 100% |
+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+--------------------+----------------------+

Note: The above load balancing uses default :ref:`overprovisioning factor <arch_overview_load_balancing_overprovisioning_factor>`
which is 1.4 which means if 80% of the endpoints in a priority level are healthy, that level is
still considered fully healthy because 80 * 1.4 > 100.
.. note::
By default, the :ref:`overprovisioning factor <arch_overview_load_balancing_overprovisioning_factor>` is ``1.4``.
This factor boosts lower health percentages to account for partial availability. For instance, if a priority level is
``80%`` healthy, multiplying by ``1.4`` results in ``112%``, which is capped at ``100%``. In other words, any product
above ``100%`` is treated as ``100%``.

The example shows how the aggregate cluster level load balancer selects the cluster. E.g., healths
of {{20, 20, 10}, {25, 25}} would result in a priority load of {{28%, 28%, 14%}, {30%, 0%}} of
Expand Down

0 comments on commit b0d58be

Please sign in to comment.