From 053a988497342a6fd0a717cc097d09c23f83e103 Mon Sep 17 00:00:00 2001 From: Gleb Smirnoff Date: Mon, 23 Dec 2024 10:35:49 -0800 Subject: [PATCH] tcp: don't ever return ECONNRESET on close(2) The SUS doesn't mention this error code as a possible one [1]. The FreeBSD manual page specifies a possible ECONNRESET for close(2): [ECONNRESET] The underlying object was a stream socket that was shut down by the peer before all pending data was delivered. In the past it had been EINVAL (see 21367f630d72), and this EINVAL was added as a safety measure in 623dce13c64ef. After conversion to ECONNRESET it had been documented in the manual page in 78e3a7fdd51e6, but I bet wasn't ever tested to actually be ever returned, cause the tcp-testsuite[2] didn't exist back then. So documentation is incorrect since 2006, if my bet wins. Anyway, in the modern FreeBSD the condition described above doesn't end up with ECONNRESET error code from close(2). The error condition is reported via SO_ERROR socket option, though. This can be checked using the tcp-testsuite, temporarily disabling the getsockopt(SO_ERROR) lines using sed command [3]. Most of these getsockopt(2)s are followed by '+0.00 close(3) = 0', which will confirm that close(2) doesn't return ECONNRESET even on a socket that has the error stored, neither it is returned in the case described in the manual page. The latter case is covered by multiple tests residing in tcp- testsuite/state-event-engine/rcv-rst-*. However, the deleted block of code could be entered in a race condition between close(2) and processing of incoming packet, when connection had already been half-closed with shutdown(SHUT_WR) and sits in TCPS_LAST_ACK. This was reported in the bug 146845. With the block deleted, we will continue into tcp_disconnect() which has proper handling of INP_DROPPED. The race explanation follows. The connection is in TCPS_LAST_ACK. The network input thread acquires the tcpcb lock first, sets INP_DROPPED, acquires the socket lock in soisdisconnected() and clears SS_ISCONNECTED. Meanwhile, the syscall thread goes through sodisconnect() which checks for SS_ISCONNECTED locklessly(!). The check passes and the thread blocks on the tcpcb lock in tcp_usr_disconnect(). Once input thread releases the lock, the syscall thread observes INP_DROPPED and returns ECONNRESET. - Thread 1: tcp_do_segment()->tcp_close()->in_pcbdrop(),soisdisconnected() - Thread 2: sys_close()...->soclose()->sodisconnect()->tcp_usr_disconnect() Note that the lockless operation in sodisconnect() isn't correct, but enforcing the socket lock there will not fix the problem. [1] https://pubs.opengroup.org/onlinepubs/9799919799/ [2] https://github.com/freebsd-net/tcp-testsuite [3] sed -i "" -Ee '/\+0\.00 getsockopt\(3, SOL_SOCKET, SO_ERROR, \[ECONNRESET\]/d' $(grep -lr ECONNRESET tcp-testsuite) PR: 146845 Reviewed by: tuexen, rrs, imp Differential Revision: https://reviews.freebsd.org/D48148 --- lib/libsys/close.2 | 5 +---- sys/netinet/tcp_usrreq.c | 5 ----- 2 files changed, 1 insertion(+), 9 deletions(-) diff --git a/lib/libsys/close.2 b/lib/libsys/close.2 index 83f8604a4a22a2..91a7a902d70d92 100644 --- a/lib/libsys/close.2 +++ b/lib/libsys/close.2 @@ -25,7 +25,7 @@ .\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF .\" SUCH DAMAGE. .\" -.Dd December 1, 2017 +.Dd December 18, 2024 .Dt CLOSE 2 .Os .Sh NAME @@ -111,9 +111,6 @@ is not an active descriptor. An interrupt was received. .It Bq Er ENOSPC The underlying object did not fit, cached data was lost. -.It Bq Er ECONNRESET -The underlying object was a stream socket that was shut down by the peer -before all pending data was delivered. .El .Pp In case of any error except diff --git a/sys/netinet/tcp_usrreq.c b/sys/netinet/tcp_usrreq.c index 35578b348c9f71..acc3e2ea294239 100644 --- a/sys/netinet/tcp_usrreq.c +++ b/sys/netinet/tcp_usrreq.c @@ -690,11 +690,6 @@ tcp_usr_disconnect(struct socket *so) inp = sotoinpcb(so); KASSERT(inp != NULL, ("tcp_usr_disconnect: inp == NULL")); INP_WLOCK(inp); - if (inp->inp_flags & INP_DROPPED) { - INP_WUNLOCK(inp); - NET_EPOCH_EXIT(et); - return (ECONNRESET); - } tp = intotcpcb(inp); if (tp->t_state == TCPS_TIME_WAIT)