-
Notifications
You must be signed in to change notification settings - Fork 654
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Possible memory leak when using pooled connections #3367
Comments
@vitjouda with so little information we can't start any investigation.
|
I understand, I will provide everything I can.
Usage (adjusted so that properties are hardcoded with real world values) @Bean
@Scope("prototype")
public WebClient.Builder webClientBuilder(ObjectProvider<WebClientCustomizer> customizerProvider) {
WebClient.Builder builder = WebClient.builder();
customizerProvider.orderedStream().forEach((customizer) -> customizer.customize(builder));
return builder;
}
@Bean("serviceClientBuilder")
public WebClient.Builder serviceWebClientBuilder(@Qualifier("webClientBuilder") WebClient.Builder builder) {
return builder
.baseUrl("serviceUrl")
.clientConnector(new ReactorClientHttpConnector(customHttpClient()));
}
protected HttpClient customHttpClient() {
return HttpClient.create(ConnectionProvider.builder("serviceClientPool")
.maxConnections(500)
.maxIdleTime(Duration.ofMinutes(5))
.maxLifeTime(Duration.ofHours(1))
.metrics(true)
.build())
.responseTimeout(Duration.ofSeconds(120));
} public Response get(String url, String... urlParams) {
// I have some abstraction around, but this is the resulting usage.
// Instance of webClient is created from builder above in the class constructor
return webClient
.get()
.uri(url, (Object[]) urlParams)
.accept(MediaType.APPLICATION_JSON)
.retrieve()
.bodyToMono(Response.class);
} |
@vitjouda I see that you have metrics for the |
If you mean the |
@vitjouda these might have different |
for example check this issue where too many connection pools were created because of wrong configuration #3315 |
Aha, then I believe there is only 1 pool per |
@vitjouda then we will really need something to play with :( |
@vitjouda Do you often change the remote_address? |
Yeah I understand. I have been trying to simulate it by creating some reproducer which mimicked my setup to running perf test to locally deployed app but never managed to reproduce it outside of our production setup, and I cannot provide its memory dump. I really believe that the problem is that there are a lot of |
Not really, its another Spring service in our cluster which is generally pretty stable. |
@vitjouda Thank you for your further investigation. Let's continue with it. So when a connection is upgraded to websocket we mark it as non-persistent, it happens here in the code below, can you check it? It is important because non-persistent channels are never returned to the connection pool: Line 87 in 22b4332
so it is correct that when we don't need the connection and it is non-persistent then we will wait for the close event here: Line 414 in 22b4332
I don't understand this |
The WS connection is made from another Another thing I managed to simulate just now is, that when the connection is closed BEFORE response, it does not get cleared and still has the Ill try to explain better. The whole reason that the leak happens is, that there are super long to infinite reference chains of EpollSocketChannel -> EpollSocketChannel -> another.... The moment I make the memory dump, they are all inactive, yet linked together by I tried to make a quick diagram that might help. In this diagram, first Now At this point, there exists a reference chain from |
@vitjouda Please show code how you are doing all this linking. You should never keep reference to a pooled channel. The idea of the pooled channel is to serve a given request/response and then to be released. How exactly you open a WS connection from HTTP connection that is scheduled for recycling? |
I am testing it in Kotlin. This is enough to keep the pooled connection referenced from the EpollSocket used for WS. // created at the class level and reused for all WS calls
val httpClient = HttpClient.create(ConnectionProvider.newConnection())
.metrics(true, Function.identity())
Mono.just("uri")
.flatMap {
// 'it' is ignored but in real app I call the client in flatMap operator, so I kept it here to simulate it
// client is classic webClient setup with pooled connections as per code posted previously
client
.get()
.uri("/data")
.retrieve()
.bodyToMono(Data::class.java)
}
.flatMap { result ->
//result is ignored in this example but in real app its needed to open WS (upgrade request needs session header)
ReactorNettyWebSocketClient(httpClient).execute(URI("localhost:8083/ev")) {
it.receive()
.`as`(it::send)
.then()
}
}
.subscribe({}, {}, { println("WS Closed") }) |
@vitjouda I cannot comment for Kotlin, in Java, this
will consume the response and will release the connection to the pool, you should be able to see:
If I need to edit your picture above, it will be something like this (two independent connections, where the ws consumes the result from the request made with pooled channel A) pooled channel A |
Yes, it will release the connection to the pool, but there is still a reference chain which is keeping it from being garbage collected. I will post my partial reproducer together with memory dump snippet where you can see the path from WS to unrelated Channel. I probably cannot make it today, but you can see the reference chain on this image. The selected Channel is the WS, the last is the one used to make HTTP call. So even if the response is consumed, there is still a reference preventing it from being garbage collected. Actually the re-using of the channel without clearing the connectionOwner attribute afterwards is the problem. At least I think so. I believe that if this attribute would be cleared in every case when the channel is being closed (currently it does not happen in some cases, for example when remote closes the connection), the memory leak would stop. |
If you would like us to look at this issue, please provide the requested information. If the information is not provided within the next 7 days this issue will be closed. |
Closing due to lack of requested feedback. If you would like us to look at this issue, please provide the requested information and we will re-open. |
…3459) - When terminating detach the connection from request/response objects - Move registration to terminate event outside of doFinally when we have the real connection - Obtain the event loop outside of doFinally when we have the real connection - Dispose the connection only if it is not disposed Related to #3416, #3367
I am facing periodic OOM issues with my service, which uses
reactor-netty
basedWebClient
for HTTP communication with other REST services. I am not sure this is 100% reactor-netty related, but the only thing that managed to mitigate this issue was switching to anotherClientHttpConnector
. Therefore I assume this is the correct place to start asking for help :).When I inspect the memory dump of my service, I can see that most of the growing memory is retained by instances of
io.netty.channel.epoll.EpollSocketChannel
. When I inspect the path to GC root of some randomEpollSocketChannel
instance, there is a long chain of Reactor operators retained. I can see that there are "blocks" of the processing pipeline that correspond to different and unrelated server requests chained together, presumably by theHttpClientOperations
instance holding reference toPooledConnection
/EpollSocketChannel
that is now assigned to another request. I tried to visualize it in the attached MAT screenshot, but I know its hard to explain.In the image, each color block represents a reference chain of
Channel
toHttpClientOperations
. Each block belongs to distinct server request processing pipeline, as shown in this diagram. There are hundreds of concurrent requests at any given time.During the lifetime of the service, these chains get longer and retain more and more memory until OOM.
Steps to Reproduce
Unfortunately I can´t provide reproducer because I never managed to simulate this behavior locally (and I tried for days) and I cannot share the actual memory dump. If I can provide any information that would help, please tell me.
Your Environment
The text was updated successfully, but these errors were encountered: