-
Notifications
You must be signed in to change notification settings - Fork 0
Traffic
Charlie edited this page Nov 9, 2023
·
2 revisions
-
Seeking to determine the "ideal" number of concurrent connections a server should accept.
-
Found that "maximum connections" was too imprecise.
-
As a server reaches high utilization; latency increases. [Universal Scalability Law, Amdahl's law]
-
Throughput - rps sent to a server
-
Goodput - rps of requests that are successful and low enough latency for a client to accept
-
Problems with overload
- Positive feedback loop - client error -> wasted work, client retries
Solutions: Load Shedding
- Only accept requests where latency can be low enough to reply before client timeout.
- Maintain goodput even when throughput exceeds limits.
- Even with load shedding, goodput will eventually fail [good chart].
Testing
- If you don't test to the breaking point and far beyond; assume least desirable failure
- Ideal result is for goodput to plateau when close to full utilization and remain flat with increasing throughput
- Critical to measure client-perceived availability and latency
- Critical to test overload before exploring mechanisms to avoid it
- Each mechanism introduces complexity
Testing Techniques
- Fixed fleet, gradually increasing load
- Sustained, fixed load while removing fleet capacity
Watch Outs
- Do not pollute metrics with failed request latencies
- Take care with how load shedding interacts with autoscaling
Load Shedding Mechanisms
- Prioritizing Requests - health checks are critical
- Watch the clock - provide timeout hints to upstreams
- Perform bounded work to avoid wasting work
- Watch out for queues - importance cannot be overstated
- Protect in lower layers - set max_conns high and implement more accurate load shedding at lower layers
- Protect in layers - want some layer to take on the request and log it before dropping
- Health checks are useful for making sure everything is working / mitigating single-server failures.
- Trouble when health check fails for non-critical reason and failure is correlated across servers.
- Tension between these two things.
Types of health checks
- Liveness - is the service alive; basic connectivity
- Local health - is the application likely to function; check local, non-shared resources
- Dependency health - checks interaction between systems
- Anomaly detection - is any server behaving differently than peers?
Reacting safely
-
Fail open - when one server fails, remove it. When they all fail, allow all
-
Tend to:
- Use local health checks for fast-acting; active health checking
- Use centralized systems for careful reaction to deeper dependency checks
-
EXTREME CARE: When logic can quickly act on large numbers of servers
-
Health checks must be prioritized; services need to reserve resources
-
Techniques:
- Max conns at the downstreams, extra capacity on server for health
- Servers implement maximum concurrency enforcement