Skip to content

Traffic

Charlie edited this page Nov 9, 2023 · 2 revisions

Notes

Amazon Builder's Library - Using Load Shedding to Avoid Overload

  • Seeking to determine the "ideal" number of concurrent connections a server should accept.

  • Found that "maximum connections" was too imprecise.

  • As a server reaches high utilization; latency increases. [Universal Scalability Law, Amdahl's law]

  • Throughput - rps sent to a server

  • Goodput - rps of requests that are successful and low enough latency for a client to accept

  • Problems with overload

    • Positive feedback loop - client error -> wasted work, client retries

Solutions: Load Shedding

  • Only accept requests where latency can be low enough to reply before client timeout.
  • Maintain goodput even when throughput exceeds limits.
  • Even with load shedding, goodput will eventually fail [good chart].

Testing

  • If you don't test to the breaking point and far beyond; assume least desirable failure
  • Ideal result is for goodput to plateau when close to full utilization and remain flat with increasing throughput
  • Critical to measure client-perceived availability and latency
  • Critical to test overload before exploring mechanisms to avoid it
  • Each mechanism introduces complexity

Testing Techniques

  • Fixed fleet, gradually increasing load
  • Sustained, fixed load while removing fleet capacity

Watch Outs

  • Do not pollute metrics with failed request latencies
  • Take care with how load shedding interacts with autoscaling

Load Shedding Mechanisms

  • Prioritizing Requests - health checks are critical
  • Watch the clock - provide timeout hints to upstreams
  • Perform bounded work to avoid wasting work
  • Watch out for queues - importance cannot be overstated
  • Protect in lower layers - set max_conns high and implement more accurate load shedding at lower layers
  • Protect in layers - want some layer to take on the request and log it before dropping

Amazon Builder's - Implementing Health Checks

  • Health checks are useful for making sure everything is working / mitigating single-server failures.
  • Trouble when health check fails for non-critical reason and failure is correlated across servers.
  • Tension between these two things.

Types of health checks

  • Liveness - is the service alive; basic connectivity
  • Local health - is the application likely to function; check local, non-shared resources
  • Dependency health - checks interaction between systems
  • Anomaly detection - is any server behaving differently than peers?

Reacting safely

  • Fail open - when one server fails, remove it. When they all fail, allow all

  • Tend to:

    • Use local health checks for fast-acting; active health checking
    • Use centralized systems for careful reaction to deeper dependency checks
  • EXTREME CARE: When logic can quickly act on large numbers of servers

  • Health checks must be prioritized; services need to reserve resources

  • Techniques:

    • Max conns at the downstreams, extra capacity on server for health
    • Servers implement maximum concurrency enforcement
Clone this wiki locally