Traffic

Notes

Seeking to determine the "ideal" number of concurrent connections a server should accept.
Found that "maximum connections" was too imprecise.
As a server reaches high utilization; latency increases. [Universal Scalability Law, Amdahl's law]
Throughput - rps sent to a server
Goodput - rps of requests that are successful and low enough latency for a client to accept
Problems with overload
- Positive feedback loop - client error -> wasted work, client retries

Solutions: Load Shedding

Only accept requests where latency can be low enough to reply before client timeout.
Maintain goodput even when throughput exceeds limits.
Even with load shedding, goodput will eventually fail [good chart].

Testing

If you don't test to the breaking point and far beyond; assume least desirable failure
Ideal result is for goodput to plateau when close to full utilization and remain flat with increasing throughput
Critical to measure client-perceived availability and latency
Critical to test overload before exploring mechanisms to avoid it
Each mechanism introduces complexity

Testing Techniques

Watch Outs

Load Shedding Mechanisms

Prioritizing Requests - health checks are critical
Watch the clock - provide timeout hints to upstreams
Perform bounded work to avoid wasting work
Watch out for queues - importance cannot be overstated
Protect in lower layers - set max_conns high and implement more accurate load shedding at lower layers
Protect in layers - want some layer to take on the request and log it before dropping

Health checks are useful for making sure everything is working / mitigating single-server failures.
Trouble when health check fails for non-critical reason and failure is correlated across servers.
Tension between these two things.

Types of health checks

Liveness - is the service alive; basic connectivity
Local health - is the application likely to function; check local, non-shared resources
Dependency health - checks interaction between systems
Anomaly detection - is any server behaving differently than peers?

Reacting safely

Fail open - when one server fails, remove it. When they all fail, allow all
Tend to:
- Use local health checks for fast-acting; active health checking
- Use centralized systems for careful reaction to deeper dependency checks
EXTREME CARE: When logic can quickly act on large numbers of servers
Health checks must be prioritized; services need to reserve resources
Techniques:
- Max conns at the downstreams, extra capacity on server for health
- Servers implement maximum concurrency enforcement