-
Notifications
You must be signed in to change notification settings - Fork 437
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[QUESTION] Expected behavior of parallel Strategy for MultiSubnetFailover option #1552
Comments
I was using a different driver, but the symptom sounds similar. I have seen this issue when a smart device, such as F5 or similar, answers the SYN packet going to the secondary subnet. In this case, the driver will get fooled as to which subnet connection has the active node and will attempt to connect to the inactive node. However, once the PreLogin packet is emitted, the device tries to contact the back-end database and fails. The case I had was intermittent and the F5 device was configured to detect SYN attacks and once the number of SYN packets in the inactive subnet reached a certain threshold, it would start answering them, and then, later, it would stop answering them for a while. I was able to replicate it with TELNET and using the inactive IP address. For a few minutes, it would die a normal death and then for another few minutes, it would open up as if it was connected to the back-end. You can see the response packets in a network trace. |
Hi @ml-rex , Thanks for raising this and the detailed explanation. As for you question, I will try my best to answer them: When multiSubnetFailover is set to true, does tedious check and reject an offline subnet IP? If not, does all applications using tedious need to implement this check themselves? Hi @arthurschreiber , am I correct about the dns.lookup returns all the address no matter of their online/ offline status? Do you aware of any way that we can look up the address but filter out the offline IPs? |
Thanks for the sharing. Can you also share what driver you were using when you experience the issue? |
Thanks Michael for the answer. To supplement, |
Hi @ml-rex, the driver does not matter. In my case it was the .NET SqlClient driver. The way that multi-subnet works is that the DNS request will return multiple IP addresses in a random order. The primary server maps the IP address for its subnet to the MAC address of its NIC card. The secondary releases its IP address so it is not connected to anything. If the driver connected to the secondary IP address first, e.g. when not using Mulitsubnet failover, then it would normally take 21 seconds to get an error from the network. MSF overcomes this by connecting to both/all IP addresses in parallel and assumes the primary will respond in a few ms and the secondary won't respond but will error out later. Once a response is made, it cancels the other connection attempt and uses the first connection. This works really well. But in the case I experienced, a network device thwarted the connection assumptions. It's generally better to identify and remove the device doing this rather than try to predict which IP address should be connected to. Your code potentially could be subject to the same "spoofing" from the device. |
Question
I have an application that connect to a SQL Server Multi-Subnet Cluster with two subnets (primary and DR subnets).
With this setup, the DR is in offline state while the primary is the active one. Also, we have setup the Availability Group Listener with a DNS record round-robin the two subnet IP addresses. The application is using TypeORM with mssql driver, which use tedious.
https://learn.microsoft.com/en-us/sql/sql-server/failover-clusters/windows/sql-server-multi-subnet-clustering-sql-server?view=sql-server-ver16
As suggested in the link, we added the
multiSubnetFailover: true
option to the connection config and we expect tedious will only create connection to the database nodes in the active primary subnet. However, sometimes we receive the error message: "ConnectionError: Connection lost - read ECONNRESET".After deliberate effort of investigation, we see that a pattern that tedious was connecting to the offline DR IP when this error happens. This is out of my expectation since the offline IP is supposed to fail the pool validation check and should not be created in the connection pool.
Looking deep into the source code of tedious with debug tool, i can confirm that the ParallelConnectionStrategy was being used when the multiSubnetFailover option is provided. And apparently the TCP connection was established successfully for the offline IP but later on the connection will emit an error. I added some console log to visualize what happened:
My questions are:
Versions:
Typeorm: 0.3.12
Mssql: 7.3.0
Tedious: ^11.4.0
Config
Relevant Issues and Pull Requests
The text was updated successfully, but these errors were encountered: