The event pipeline needs a way to signal to push based inputs that they should backpressure incoming requests #41892

cmacknz · 2024-12-04T17:36:11Z

Relates x-pack/filebeat/input/http_endpoint: add back pressure mechanism #41764
Relates cloudflare_logpush: enable memory limit checks for http_endpoint input integrations#11920

Calls to the Beat event pipeline's Publish() method will block if the internal queue is full. This provides a way to propagate backpressure from the output back to pull based inputs like filestream. Pull based inputs will block on the call to Publish and stop reading in additional data until there is space available in the internal queue.

In the case of push based inputs, like the http_endpoint that accepts HTTP requests from clients, this approach does not work. To make a call to Publish() the input must be holding an event in memory. For the http_endpoint input, there could be an unbounded number of concurrent HTTP requests that have deserialized the request body and are holding it in memory while being blocked on a call to Publish() because the internal queue is full.

The internal queue being full is only one variant of this problem. The lack of a concurrency limit in a push based input means the memory usage is no longer bounded by the size of the internal queue, because each request handler holds an event in memory. The queue could have space for 100 more events and a client could open 500 concurrent requests.

Rather than require inputs to implement arbitrary concurrency limits, the Beat event pipeline should provide a way for push based inputs to detect that they should backpressure the source before doing any work with the data they need to accept. For the http_endpoint input, the input would want a way to know that it should return a 429 or 503 error to clients immediately before processing the request body at all.

The text was updated successfully, but these errors were encountered:

elasticmachine · 2024-12-04T17:36:13Z

Pinging @elastic/elastic-agent-data-plane (Team:Elastic-Agent-Data-Plane)

leehinman · 2024-12-04T18:14:33Z

One concern I have is when the input has received input that expands to several orders of magnitude events more than the queue size. A good example would be something like an AWS VPC flow log, where a single gzipped log file could have 100k events in it. In a situation like that, the single flow log would keep the queue full and signaling for back pressure for 100s of round trips to Elasticsearch. In that scenario, the input could have signaled multiple time to the "pusher" to back-off. And with geometric back off you could have large holes, where the pusher is waiting for it's backoff to expire before trying again, and the queue is empty. This would result in an eps that is lower than what it really could be doing.

We need to make sure we have some way to "tune" ourselves out of a problem like that.

cmacknz added the Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team label Dec 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The event pipeline needs a way to signal to push based inputs that they should backpressure incoming requests #41892

The event pipeline needs a way to signal to push based inputs that they should backpressure incoming requests #41892

cmacknz commented Dec 4, 2024

elasticmachine commented Dec 4, 2024

leehinman commented Dec 4, 2024

The event pipeline needs a way to signal to push based inputs that they should backpressure incoming requests #41892

The event pipeline needs a way to signal to push based inputs that they should backpressure incoming requests #41892

Comments

cmacknz commented Dec 4, 2024

elasticmachine commented Dec 4, 2024

leehinman commented Dec 4, 2024