Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data channels quickly slow down to a crawl under pressure #1088

Closed
lostmsu opened this issue Mar 20, 2024 · 5 comments
Closed

Data channels quickly slow down to a crawl under pressure #1088

lostmsu opened this issue Mar 20, 2024 · 5 comments

Comments

@lostmsu
Copy link
Contributor

lostmsu commented Mar 20, 2024

Please see #1087 for the test project.

P.S. I think a unit test for transmitting a few GiB of data should be put in place.

@lostmsu
Copy link
Contributor Author

lostmsu commented Mar 20, 2024

OK, I had some midnight debugging, and I believe the main reason is incorrect reset of _congestionWindow in both places where it is set to _defaultMTU.

In the bit that refers to T3-rtx timer I believe this line above it is the cause:

.Where(x => now.Subtract(x.LastSentAt).TotalMilliseconds > (_hasRoundTripTime ? _rto : _rtoInitialMilliseconds))

Here T3-rtx timer is considered expired when RTO milliseconds passed since the last time the chunk was sent at. I believe this is wrong, because from my understanding of RFC T3-rtx timer is not the chunk property, but rather the destination property. I might be mistaken here, maybe LastSentAt is a clever trick that does exactly that, but it is not immediately clear from the code.

The second bit that enters retransmit mode I believe simply used the wrong formula from RFC. It should have used cwnd = ssthresh instead. E.g. the line should be _congestionWindow = _slowStartThreshold.

Additionally, I believe in CalculateCongestionWindow all comparisons to _congestionWindow must be replaced with "or equal" comparisons again from reading RFC9620. For example, (7.2.1)

MUST use the slow-start algorithm to increase cwnd only if the current congestion window is being fully utilized

I interpret this as outstandingBytes >= _congestionWindow while the code has outstandingBytes > _congestionWindow.

lostmsu added a commit to BorgGames/sipsorcery that referenced this issue Mar 20, 2024
some of the fixes are probably legit, but I believe the T3-rtx timer is not implemented correctly

kinda works around sipsorcery-org#1088
lostmsu added a commit to BorgGames/sipsorcery that referenced this issue Mar 20, 2024
some of the fixes are probably legit, but I believe the T3-rtx timer is not implemented correctly

kinda works around sipsorcery-org#1088
lostmsu added a commit to BorgGames/sipsorcery that referenced this issue Mar 21, 2024
@ris-work
Copy link

@lostmsu, thank you for finding this out. I would like to have the patches applied to the one I am currently using. It says that the commits don't belong to a branch. Are the patches under the same license as SipSorcey?

I also noticed that on a not-really-bad server, with dd if=/dev/zero bs=1k piped to it, the throughput is about 768kB/s (kilobytes) and the CPU usage is... not bad, but ok. I have no idea how to monkeypatch C# projects locally so it might take some time. Rust gives me around 2-3 MB/s, but due to some issues, I had to move from webrtc-rs with TURN relay candidates with coturn to SipSorcery. I could not pinpoint the details, but node-wrtc works well under the same circumstances though it is unmaintained, at about 1.3 Mb/s (which is pretty okay for a language like JS.).

@lostmsu
Copy link
Contributor Author

lostmsu commented Mar 28, 2024

@ris-work same license

If you want to experiment, ignore the first commit, I edited it and force-pushed as the second commit. You would need to get their branch and cherry-pick the commits from it (the branch has many breaking changes of various stability).

@sipsorcery
Copy link
Member

Just for context the SCTP and data channels implementation was whipped together in a month to replace a previous native dependency. That was the last major effort I undertook on this library before starting a different venture which consumed all my time.

Heaps of features are missing and no proper performance testing, let alone optimisations, have been carried out.

@lostmsu I see there are lots of improvements in your #1192 PR. If it's feasible it'd be great if the data channel improvements could be put into a dedicated PR so they can be tested in isolation.

@sipsorcery
Copy link
Member

sipsorcery commented Nov 12, 2024

Added to enhancements tracking issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants