docs: add some missing info to aggregated cluster docs (#37845)

Signed-off-by: Rohit Agrawal <rohit.agrawal@databricks.com>
envoyproxy · Jan 3, 2025 · b0d58be · b0d58be
1 parent f7a8635
commit b0d58be
Showing 1 changed file with 75 additions and 25 deletions.
diff --git a/docs/root/intro/arch_overview/upstream/aggregate_cluster.rst b/docs/root/intro/arch_overview/upstream/aggregate_cluster.rst
@@ -3,25 +3,40 @@
 Aggregate Cluster
 =================
 
-Aggregate cluster is used for failover between clusters with different configuration, e.g., from EDS
-upstream cluster to STRICT_DNS upstream cluster, from cluster using ROUND_ROBIN load balancing
-policy to cluster using MAGLEV, from cluster with 0.1s connection timeout to cluster with 1s
-connection timeout, etc. Aggregate cluster loosely couples multiple clusters by referencing their
-name in the :ref:`configuration <envoy_v3_api_msg_extensions.clusters.aggregate.v3.ClusterConfig>`. The
-fallback priority is defined implicitly by the ordering in the :ref:`clusters list <envoy_v3_api_field_extensions.clusters.aggregate.v3.ClusterConfig.clusters>`.
-Aggregate cluster uses tiered load balancing. The load balancer chooses cluster and priority first
-and then delegates the load balancing to the load balancer of the selected cluster. The top level
-load balancer reuses the existing load balancing algorithm by linearizing the priority set of
-multiple clusters into one.
+An aggregate cluster allows you to set up failover between multiple upstream clusters that have different
+configurations. For example, you might switch from an :ref:`EDS <arch_overview_service_discovery_types_eds>` cluster to
+a :ref:`STRICT_DNS <arch_overview_service_discovery_types_strict_dns>` cluster, or from a cluster using
+:ref:`ROUND_ROBIN <arch_overview_load_balancing_types_round_robin>` load balancing to one using
+:ref:`MAGLEV <arch_overview_load_balancing_types_maglev>`. You can also use it to change timeouts, such as moving from
+a ``0.1s`` connection timeout to a ``1s`` timeout.
+
+To enable this failover, the aggregate cluster references other clusters by their names in the
+:ref:`configuration <envoy_v3_api_msg_extensions.clusters.aggregate.v3.ClusterConfig>`. The ordering of these clusters
+in the :ref:`clusters list <envoy_v3_api_field_extensions.clusters.aggregate.v3.ClusterConfig.clusters>` implicitly
+defines the fallback priority.
+
+The aggregate cluster uses a tiered approach to load balancing:
+
+* At the top level, it decides which cluster and priority to use.
+* It then hands off the actual load balancing to the selected cluster’s own load balancer.
+
+Internally, this top-level load balancer treats all the priorities across all referenced clusters as a single linear
+list. By doing so, it reuses the existing load balancing algorithm and makes it possible to seamlessly shift traffic
+between clusters as needed.
 
 Linearize Priority Set
 ----------------------
 
-Upstream hosts are divided into multiple :ref:`priority levels <arch_overview_load_balancing_priority_levels>`
-and each priority level contains a list of healthy, degraded and unhealthy hosts. Linearization is
-used to simplify the host selection during load balancing by merging priority levels from multiple
-clusters. For example, primary cluster has 3 priority levels, secondary has 2 and tertiary has 2 and
-the failover ordering is primary, secondary, tertiary.
+Upstream hosts are grouped into different :ref:`priority levels <arch_overview_load_balancing_priority_levels>`, and
+each level includes hosts that can be healthy, degraded, or unhealthy. To simplify host selection during load balancing,
+linearization merges these priority levels across multiple clusters into a single sequence.
+
+For example, if the primary cluster has three priority levels, and the secondary and tertiary clusters each have two,
+the failover order is:
+
+* Primary
+* Secondary
+* Tertiary
 
 +-----------+----------------+-------------------------------------+
 | Cluster   | Priority Level |  Priority Level after Linearization |
@@ -41,6 +56,9 @@ the failover ordering is primary, secondary, tertiary.
 | Tertiary  | 1              |  6                                  |
 +-----------+----------------+-------------------------------------+
 
+This approach ensures a straightforward way to decide which hosts receive traffic based on priority, even when working
+with multiple clusters.
+
 Example
 -------
 
@@ -61,18 +79,48 @@ A sample aggregate cluster configuration could be:
       - secondary
       - tertiary
 
-Note: :ref:`PriorityLoad retry plugins <envoy_v3_api_field_config.route.v3.RetryPolicy.retry_priority>` won't
-work for aggregate cluster because the aggregate load balancer will override the *PriorityLoad*
-during load balancing.
+Important Considerations for Aggregate Clusters
+-----------------------------------------------
+
+Some features might not work as expected with aggregate clusters. For example,
 
+PriorityLoad Retry Plugins
+^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+:ref:`PriorityLoad retry plugins <envoy_v3_api_field_config.route.v3.RetryPolicy.retry_priority>` will not work with an
+aggregate cluster. Because the aggregate cluster’s load balancer controls traffic distribution at a higher level, it
+effectively overrides the PriorityLoad behavior during load balancing.
+
+Stateful Sessions
+^^^^^^^^^^^^^^^^^
+
+:ref:`Stateful Sessions <envoy_v3_api_msg_extensions.filters.http.stateful_session.v3.StatefulSession>` rely on the
+cluster to directly know the endpoint receiving traffic. With an aggregate cluster, the top-level load balancer selects
+a cluster first, but does not track specific endpoints inside that cluster.
+
+If we configure Stateful Sessions to override the upstream address, the load balancer bypasses its usual algorithm to
+send traffic directly to that host. This works only when the cluster itself knows the exact endpoint.
+
+In an aggregate cluster, the final routing decision happens one layer beneath the aggregate load balancer, so the filter
+cannot locate that specific endpoint at the aggregate level. As a result, Stateful Sessions are incompatible with
+aggregate clusters, because the final cluster choice is made without direct knowledge of the specific endpoint which
+doesn’t exist at the top level.
 
 Load Balancing Example
 ----------------------
 
-Aggregate cluster uses tiered load balancing algorithm and the top tier is distributing traffic to
-different clusters according to the health score across all :ref:`priorities <arch_overview_load_balancing_priority_levels>`
-in each cluster. The aggregate cluster in this section includes two clusters which is different from
-what the above configuration describes.
+Aggregate cluster uses tiered load balancing algorithm and the top tier is distributing traffic to different clusters
+according to the health score across all :ref:`priorities <arch_overview_load_balancing_priority_levels>` in each
+cluster. The aggregate cluster in this section includes two clusters which is different from what the above
+configuration describes.
+
+The aggregate cluster uses a tiered load balancing algorithm with two main steps:
+
+* **Top Tier:** Distribute traffic across different clusters based on each cluster’s overall health (across all
+  :ref:`priorities <arch_overview_load_balancing_priority_levels>`).
+* **Second Tier:** Once a cluster is chosen, delegate traffic distribution within that cluster to its own load balancer
+  (e.g., :ref:`ROUND_ROBIN <arch_overview_load_balancing_types_round_robin>`,
+  :ref:`MAGLEV <arch_overview_load_balancing_types_maglev>`, etc.).
 
 +-----------------------------------------------------------------------------------------------------------------------+--------------------+----------------------+
 | Cluster                                                                                                               | Traffic to Primary | Traffic to Secondary |
@@ -100,9 +148,11 @@ what the above configuration describes.
 | 0%                    | 0%                    | 0%                    | 72%                   | 0%                    | 0%                 | 100%                 |
 +-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+--------------------+----------------------+
 
-Note: The above load balancing uses default :ref:`overprovisioning factor <arch_overview_load_balancing_overprovisioning_factor>`
-which is 1.4 which means if 80% of the endpoints in a priority level are healthy, that level is
-still considered fully healthy because 80 * 1.4 > 100.
+.. note::
+   By default, the :ref:`overprovisioning factor <arch_overview_load_balancing_overprovisioning_factor>` is ``1.4``.
+   This factor boosts lower health percentages to account for partial availability. For instance, if a priority level is
+   ``80%`` healthy, multiplying by ``1.4`` results in ``112%``, which is capped at ``100%``. In other words, any product
+   above ``100%`` is treated as ``100%``.
 
 The example shows how the aggregate cluster level load balancer selects the cluster. E.g., healths
 of {{20, 20, 10}, {25, 25}} would result in a priority load of {{28%, 28%, 14%}, {30%, 0%}} of