Giter Site home page Giter Site logo

sctp's Introduction


Pion SCTP

A Go implementation of SCTP

Pion SCTP Slack Widget
GitHub Workflow Status Go Reference Coverage Status Go Report Card License: MIT


Roadmap

The library is used as a part of our WebRTC implementation. Please refer to that roadmap to track our major milestones.

Community

Pion has an active community on the Slack.

Follow the Pion Twitter for project updates and important WebRTC news.

We are always looking to support your projects. Please reach out if you have something to build! If you need commercial support or don't want to use public methods you can contact us at [email protected]

Contributing

Check out the contributing wiki to join the group of amazing people making this project possible

License

MIT License - see LICENSE for full text

sctp's People

Contributors

adriancable avatar aeronotix avatar at-wat avatar backkem avatar cohosh avatar daonb avatar edaniels avatar enobufs avatar hugoarregui avatar jech avatar jeremija avatar jerry-tao avatar kc5nra avatar lherman-cs avatar marcopolo avatar mengelbart avatar mjmac avatar paulwe avatar pionbot avatar renovate-bot avatar renovate[bot] avatar rgeorgeoff avatar rob-deutsch avatar scorpionknifes avatar sean-der avatar stv0g avatar sukunrt avatar trivigy avatar tuexen avatar wawesomenogui avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

sctp's Issues

Add simple PMTU calculation with heartbeats

Calculate the maximum MTU size that can travel the entire path machine1<->machine2.

The easiest way is probably to craft heartbeat messages starting at the maximum value until they are successfully returned. Potentially listen to ICMP PacketTooBig

Performance measurement tool

Summary

It would be great to have a tool to measure general performance. Developing something similar to iperf would enable evaluating the performance in countless scenarios (loopback over memory only, loopback over IP, specific delay, packet loss, packet duplication, packet reordering, ...).

Motivation

Compare performance against other stacks and determine spots for optimisation.

Notes

It may be a good idea to split the tool up for usage over raw sockets and for loopback usage over memory only. The former allows for interop and network performance testing while the latter can be used to test CPU usage.

The test protocol should be as simple as possible and specifics are up for discussion.

Docs: Add basics

Copy the style from pions/webrtc including readme, license, ... .

'rwnd' never becomes 0

Your environment.

  • Version: v2.0.3 or earlier
  • Browser: n/a

What did you do?

Sent 100 MB of data over a data channel, using MaxRetransmits set to 0 or 1, then the transmission stops due to an issue reported as pion/ice#12.

What did you expect?

All 100 MB of data transfer should complete successfully.

What happened?

The issue pion/ice#12 is a separate problem, but obviously, SCTP is pushing more than 1MB of data even though the receiver window is set to 128KB.

"1MB" is the maximum number of bytes ICE would buffer at most.

I confirmed that when this happens, the receiver is still reporting a_rwnd (advertised receiver window size) is 128KB (=fully available).

I noticed that the current receiver window size is calculated only by the payloadQueue size. This is because received DATA chunks are almost immediately handed off to the stream layer, and DATA chunks in the payloadQueue will be removed as soon as a.peerLastTSN advances, while those DATA chunks are still in the reasemblyQueue and waiting to be read by the application.

The rwnd to be advertised to the sender should use the number of bytes stored in the reassemblyQueue in its calculation, so that sender will stop sending data beyond the amount of data, or rwnd, the application has not read.

Forward TSN for unordered DATA chunks from usrsctp wouldn't discard chunks in the reassembly queue

Your environment.

  • Version: v2.1.18
  • Chrome: Version 80.0.3987.106 (Official Build) (64-bit)
  • Firefox: 73.0

What did you do?

Sending many 32GB messages over data channel with unordered and max retransmits set to 0. Forcibly dropping packets at 4% ratio.

What did you expect?

Pion to continue receiving messages despite some message loss.

What happened?

Pion stop receiving, sending SACK with a_rwnd=0.

During the work of #104, I have learned that when the message was sent with U (unordered) bit set to 1 and later abandoned, Forward TSN will be sent. Current Pion expects that the Foward TSN chunk will include stream ID, but @tuexen pointed out that the stream ID wouldn't be included. I have confirmed that both Chrome and Firefox does not include stream ID and SSN in the Forward TSN chunk.

Current pion/sctp relies on the stream ID for purge incomplete and abandoned fragments. Consequently, the fragments are left in the reassemblyQueue forever to cause receive buffer exhaustion.

I have repo'd the situation with both Chrome and Firefox. I will fix this asap.

chunkPayloadData/payloadQueue race

I'm hitting the race below trying to upgrade backkem/go-libp2p-webrtc-direct to the latest pions/webrtc. The problem seems to be that the chunkPayloadData buffers are passed around to multiple goroutines and accessed concurrently. I haven't quite figured it out the details or solution yet thought.

==================
WARNING: DATA RACE
Write at 0x00c0005822c0 by goroutine 47:
  encoding/binary.PutUvarint()
      encoding/binary/varint.go:48 +0xb5
  github.com/libp2p/go-mplex.(*Multiplex).sendMsg()
      github.com/libp2p/go-mplex/multiplex.go:155 +0x2bf
  github.com/libp2p/go-mplex.(*Stream).Close()
      github.com/libp2p/go-mplex/stream.go:180 +0x11c
  github.com/libp2p/go-libp2p-transport/test.SubtestPingPong.func1.1()
      github.com/libp2p/go-libp2p-transport/test/transport.go:193 +0x310

Previous read at 0x00c0005822c0 by goroutine 146:
  runtime.slicecopy()
      runtime/slice.go:221 +0x0
  github.com/pions/sctp.(*chunkPayloadData).marshal()
      github.com/pions/sctp/chunk_payload_data.go:134 +0x4dc
  github.com/pions/sctp.(*packet).marshal()
      github.com/pions/sctp/packet.go:129 +0x3f4
  github.com/pions/sctp.(*Association).send()
      github.com/pions/sctp/association.go:930 +0x59
  github.com/pions/sctp.(*Association).handleInbound()
      github.com/pions/sctp/association.go:261 +0x160
  github.com/pions/sctp.(*Association).readLoop()
      github.com/pions/sctp/association.go:226 +0x135

Goroutine 47 (running) created at:
  github.com/libp2p/go-libp2p-transport/test.SubtestPingPong.func1()
      github.com/libp2p/go-libp2p-transport/test/transport.go:168 +0x20f

Goroutine 146 (running) created at:
  github.com/pions/sctp.Client()
      github.com/pions/sctp/association.go:141 +0x86
  github.com/pions/webrtc.(*SCTPTransport).Start()
      github.com/pions/webrtc/sctptransport.go:88 +0x150
  github.com/pions/webrtc.(*PeerConnection).SetRemoteDescription.func2()
      github.com/pions/webrtc/peerconnection.go:843 +0x557
==================

Revisit stream.ReadSCTP's error handling

Related to #51, other errors (than io.ErrShortBuffer) could be returned in the future. Also, the handling of error returned by ReadSCTP is not tested. For better detection of regression, we should add some tests for ReadSCTP.

Add a method SetReliabilityParams()

Summary

For DCEP layer, we will need add a method "SetReliabilityParams()" to SCTP layer.

Motivation

Until DCEP layer's handshake has been completed (received DATA_CHANNEL_ACK), any other all other messages containing user data and belonging to this Data Channel MUST be sent ordered, no matter whether the Data Channel is ordered or not accroding the DCEP spec.

Describe alternatives you've considered

We could also add a new argument to sendPayloadData() method, but I think "SetReliabilityParams()" would be cleaner..

Additional context

Related discussion: pion/datachannel#9

Remove unnecessary copy during calculation of CRC32

Summary

Luke pointed out on slack:

unrelated, but if anybody wants an ez SCTP performance optimization: https://github.com/pion/sctp/blob/master/packet.go#L147

should be creating the crc32 Hash and calling `Write` over ranges, instead of allocating/copying every single packet we send/receive (edited) 

Motivation

Seems to be an easy fix/improvement.

TODO

  • Double-check if there was any intention with the copy..

WriteLoop issues

Your environment.

  • Version: 1.6.0

Summary

Running SCTP v1.6.0 as part of the libp2p / go-libp2p-webrtc-direct CI (logs) still seems to cause some problems that are not limited to ICE:

  • There seems to be a problem with shutdown where SCTP keeps trying to write even thought the DTLS connection is already closed: log (this has ICE rolled back as well). I tried to end the writeLoop when we can't write to DTLS in pion/sctp/writeloop but this doesn't fix the problem entirely: log
  • SubtestPingPong is failing as well (disabled in the CI logs linked above). These tests expect the Read & Write methods to operate with io.ReadWriter like semantics, E.g. return io.EOF, etc. My feeling is that these semantics may have changed in v1.6.0. This needs more digging, thought.

cc @enobufs @Sean-Der

Race found by pions/webrtc CI

No one else worry about fixing this! I will open a PR for it

WARNING: DATA RACE
Read at 0x00c00045a301 by goroutine 64:
  github.com/pions/sctp.(*Association).getPayloadDataToSend()
      /home/sean/Documents/Programming/Go/Code/src/github.com/pions/sctp/association.go:960 +0x553
  github.com/pions/sctp.(*Association).handleSack()
      /home/sean/Documents/Programming/Go/Code/src/github.com/pions/sctp/association.go:683 +0x886
  github.com/pions/sctp.(*Association).handleChunk()
      /home/sean/Documents/Programming/Go/Code/src/github.com/pions/sctp/association.go:1150 +0xd59
  github.com/pions/sctp.(*Association).handleInbound()
      /home/sean/Documents/Programming/Go/Code/src/github.com/pions/sctp/association.go:347 +0x245
  github.com/pions/sctp.(*Association).readLoop()
      /home/sean/Documents/Programming/Go/Code/src/github.com/pions/sctp/association.go:317 +0x117

Previous write at 0x00c00045a301 by goroutine 52:
  github.com/pions/sctp.(*Stream).SetReliabilityParams()
      /home/sean/Documents/Programming/Go/Code/src/github.com/pions/sctp/stream.go:73 +0x9a
  github.com/pions/datachannel.newDataChannel()
      /home/sean/Documents/Programming/Go/Code/src/github.com/pions/datachannel/datachannel.go:30 +0x248
  github.com/pions/datachannel.Client()
      /home/sean/Documents/Programming/Go/Code/src/github.com/pions/datachannel/datachannel.go:84 +0x454
  github.com/pions/datachannel.Dial()
      /home/sean/Documents/Programming/Go/Code/src/github.com/pions/datachannel/datachannel.go:55 +0x79
  github.com/pions/webrtc.(*DataChannel).open()
      /home/sean/Documents/Programming/Go/Code/src/github.com/pions/webrtc/datachannel.go:138 +0x2b6
  github.com/pions/webrtc.(*PeerConnection).openDataChannels()
      /home/sean/Documents/Programming/Go/Code/src/github.com/pions/webrtc/peerconnection.go:926 +0x161
  github.com/pions/webrtc.(*PeerConnection).SetRemoteDescription.func2()
      /home/sean/Documents/Programming/Go/Code/src/github.com/pions/webrtc/peerconnection.go:917 +0x88e

Goroutine 64 (running) created at:
  github.com/pions/sctp.Client()
      /home/sean/Documents/Programming/Go/Code/src/github.com/pions/sctp/association.go:160 +0x7f
  github.com/pions/webrtc.(*SCTPTransport).Start()
      /home/sean/Documents/Programming/Go/Code/src/github.com/pions/webrtc/sctptransport.go:90 +0x123
  github.com/pions/webrtc.(*PeerConnection).SetRemoteDescription.func2()
      /home/sean/Documents/Programming/Go/Code/src/github.com/pions/webrtc/peerconnection.go:907 +0x735

Goroutine 52 (running) created at:
  github.com/pions/webrtc.(*PeerConnection).SetRemoteDescription()
      /home/sean/Documents/Programming/Go/Code/src/github.com/pions/webrtc/peerconnection.go:850 +0x16a3
  github.com/pions/webrtc.signalPair()
      /home/sean/Documents/Programming/Go/Code/src/github.com/pions/webrtc/peerconnection_test.go:72 +0x43f
  github.com/pions/webrtc.closeReliabilityParamTest()
      /home/sean/Documents/Programming/Go/Code/src/github.com/pions/webrtc/datachannel_test.go:50 +0x46
  github.com/pions/webrtc.TestDataChannelParamters_Go.func1()
      /home/sean/Documents/Programming/Go/Code/src/github.com/pions/webrtc/datachannel_go_test.go:175 +0x36c
  testing.tRunner()
      /usr/lib/go-1.12/src/testing/testing.go:865 +0x163

Reassembled unordered message is received broken

Your environment.

  • Version: v1.6.11

What did you do?

I was writing test with sctp stream configured for unordered delivery.

What did you expect?

Sending messages, although the order could be different, each messages are identical to the original messages.

What happened?

Each message is broken when the message is larger than the max segment size, 1200.

I know the cause.

The assumption of the design was wrong. When I introduced pendingQueue which has ordered and unordered queues assuming that chunks in the unordered queue can be completely unordered. But 'unordered' should only apply to user messages. Not at the chunk level.

When a message is large, fragmented into multiple chunks with the current code, the message could be corrupted. (this has been repro'ed in my local test. PR is incoming)

Turn Stream.WriteSCTP() and Write() a blocking call by default

Your environment.

  • Version: v1.7.0 (or earlier)

Background

WebRTC's datachannel.Write() does not block as we follow JavaScript WebRTC API. When sending a large amount of data, it is the application's responsibility to check the buffered amount (in SCTP layer for sending).

This is pretty standard in JavaScript land, but this really does not align with a typical semantics of Go. (i.e. net.Conn). Also, implementing a flow control at the user level IS tedious and not too trivial to get it right and error-prone.

What would you expect?

We, I believe, should keep the current behavior with pion/webrtc API (maintaining JavaScript API semantics as a policy), but we could make some exceptions to it in the following cases:

  • When data channel is detached
  • When pion/datachannel or pion/sctp was used directly

In these cases, blocking Write() method can be the default behavior, and turn off the blocking when used with pion/webrtc non-detached, etc.

What others thought

@backkem

What if we add a default implementation to SCTP itself? E.g. default threshold and if you pass it stream.Write starts blocking.
If you overwrite, you're on your own.
...
Yea so, if you use SCTP or DataChannel directly -> Default blocking implementation (blocking by default seems rather idiomatic).
If you use WebRTC -> We overwrite the OnBufferedAmountLow and it disables the default implementation and otherwise confirms to the WebRTC spec. 

@AeroNotix

that default implementation will suit 99% of people's needs
"block if exceed buffer size"

I totally agree with the above comments, and I think SCTP layer should take care of this.

Delayed Ack

Summary

I'd consider this as a post-v2 work item (no API change required)
The support of congestion control addressed in #11 has drastically improved the performance, but it is not optimal yet. To improve it further, we will need to implement delayed ack to reduce the number of packets per sec.

Motivation

Say if you are handling 20MB/s of traffic, as the current MTU size (for SCTP) is set to 1200 bytes, SCTP (sender) is handling 17500 packets of outgoing DATA chunks and the same number of SACK packets incoming per second. That's a lot of CPU usage. We can drastically reduce the number of SACK chunks by implementing the delayed ack. (the handleSelectiveAck routine is the most complex/expensive routine in sctp, and we could reduce 17500 SACK chunks to just 5! - in 20Mbps/200msec delayed ack). We should also piggy-back the ack on the outgoing DATA chunks also as recommended by RFC 4960.

Describe alternatives you've considered

No alternative I can think of right now.

TODO

  • Profile CPU usage first
  • Implement delayed Ack (up to 200 msec - check the spec)
  • Implement piggy-back

Setup CI

Copy the configuration from pions/webrtc.

Pion <-> usrsctp under 10% packet loss gets an abort

  • Using main.go
  • demo.js
  • simulating packet loss using comcast --device=lo --latency=150 --packet-loss=10% --target-proto=udp

I get an ABORT after ~5 seconds.

Reass 30000007,CI:ffffffff,TSN=1e5e52d3,SID=0002,FSN=1e5e52d3,SSN:0000

This abort is generated here in usrsctp. I am not sure why yet though, I need to read more about SCTP to fully understand.

Bug in T3-rtx timer getting in busy loop at 63rd timeout

Your environment.

  • Version: v1.7.4
sctp DEBUG: 18:33:56.319559 association.go:2070: [0xc0000b6000] T3-rtx timed out: nRtos=58 cwnd=1228 ssthresh=4912
sctp DEBUG: 18:34:56.320670 association.go:2070: [0xc0000b6000] T3-rtx timed out: nRtos=59 cwnd=1228 ssthresh=4912
sctp DEBUG: 18:35:56.318587 association.go:2070: [0xc0000b6000] T3-rtx timed out: nRtos=60 cwnd=1228 ssthresh=4912
sctp DEBUG: 18:36:56.321205 association.go:2070: [0xc0000b6000] T3-rtx timed out: nRtos=61 cwnd=1228 ssthresh=4912
sctp DEBUG: 18:37:56.320873 association.go:2070: [0xc0000b6000] T3-rtx timed out: nRtos=62 cwnd=1228 ssthresh=4912
sctp DEBUG: 18:38:56.318899 association.go:2070: [0xc0000b6000] T3-rtx timed out: nRtos=63 cwnd=1228 ssthresh=4912
sctp DEBUG: 18:38:56.318946 association.go:2070: [0xc0000b6000] T3-rtx timed out: nRtos=64 cwnd=1228 ssthresh=4912
sctp DEBUG: 18:38:56.319016 association.go:2070: [0xc0000b6000] T3-rtx timed out: nRtos=65 cwnd=1228 ssthresh=4912
sctp DEBUG: 18:38:56.319042 association.go:2070: [0xc0000b6000] T3-rtx timed out: nRtos=66 cwnd=1228 ssthresh=4912
sctp DEBUG: 18:38:56.319121 association.go:2070: [0xc0000b6000] T3-rtx timed out: nRtos=67 cwnd=1228 ssthresh=4912
sctp DEBUG: 18:38:56.319203 association.go:2070: [0xc0000b6000] T3-rtx timed out: nRtos=68 cwnd=1228 ssthresh=4912
sctp DEBUG: 18:38:56.319282 association.go:2070: [0xc0000b6000] T3-rtx timed out: nRtos=69 cwnd=1228 ssthresh=4912
sctp DEBUG: 18:38:56.319356 association.go:2070: [0xc0000b6000] T3-rtx timed out: nRtos=70 cwnd=1228 ssthresh=4912

What did you do?

Use a data channel, then close a receiver abruptly (ctrl-c, etc) and leave the sender running for about an hour.

What did you expect?

The sender should keep retransmitting on T3 timeout at the maxRTO = 60 sec. Or, ICE should detect the disconnection and the sender should also disconnect. (the former is the bug to address, and the latter is a bug in my app.)

What happened?

After the 63rd timeout, the retransmission interval becomes 0, causing 100% CPU usage.

I found the cause. The T3 timer interval doubles every time the timeout occurs up until the interval hits max RTO which is 60 sec. The current code correctly caps the interval at 60sec, but it still internally keeps doubling the interval, using shift-left operator, causing the result being 0 after its 64-bit width exhausts, end up with the interval being 0 from the 64th timer.

Introduce association.Listener

Summary

I found it is difficult to use pion/sctp alone because sctp.Association.Client and sctp.Association.Server take a "connected" UDP. These are equivalent to a child (server) socket and a client socket of TCP. What's missing in pion/sctp s "listening socket".
In the context of WebRTC, we use Client-Client simultaneous open, which is not practical in the cases of non-WebRTC (no-ICE, etc). Client-Server is possible, but it would be difficult to establish connection if there's a NAT in between.

Motivation

Introduce the usability of pion/sctp standalone in TCP style.

Describe alternatives you've considered

I have written a tool that enables TCP like listening socket capability. However, it cannot take advantage of "cookie-echo" (resembles to TCP-Syn-cookie) SCTP offers, which is a measure against DDoS attack (like spoofed SYN-flood in TCP) because internal chunk parsers are not exposed via its API.

We can use go-rudp as a reference.

Panic in handleChunk

I just got me a panic.

Failed to accept data channel: The association is closed
panic: close of closed channel

goroutine 148 [running]:
github.com/pions/sctp.(*Association).handleChunk(0xc0000a68c0, 0xc000173f40, 0x93ee40, 0xc00012afc0, 0x0, 0x0, 0x0, 0x0, 0x0)
        /home/jch/go/src/github.com/pions/sctp/association.go:769 +0x427
github.com/pions/sctp.(*Association).handleInbound(0xc0000a68c0, 0xc000396000, 0x30, 0x2000, 0x30, 0x0)
        /home/jch/go/src/github.com/pions/sctp/association.go:239 +0x1c4
github.com/pions/sctp.(*Association).readLoop(0xc0000a68c0)
        /home/jch/go/src/github.com/pions/sctp/association.go:210 +0xf6
created by github.com/pions/sctp.Client
        /home/jch/go/src/github.com/pions/sctp/association.go:135 +0x5b

OnBufferedAmountLow wouldn't fire when bufferAmountLow is set to 0

Your environment.

  • Version: v1.6.9 or earlier
  • Browser: n/a
  • Other Information - *stacktraces, related issues, suggestions how to fix, links for us to

What did you do?

@hugoArregui found this.
When bufferAmountLow is set to 0, OnBufferedAmountLow callback will never be made.

What did you expect?

In the case, if the bufferedAmount is > 0 then reaches 0, the callback should be made. But with the current code, it wouldn't be made. See https://github.com/pion/sctp/blob/v1.6.9/stream.go#L286

The < should be <=. Also, we will need to make sure nBytesReleased is a positive value.

  • 0 -> 0 : won't fire
  • n (>0) -> 0: should fire

W3C WebRTC API says:

When the bufferedAmount decreases from above this threshold to equal or below it, the bufferedamountlow event fires. The bufferedAmountLowThreshold is initially zero on each new RTCDataChannel, but the application may change its value at any time.

Don't abort associations on unexpected Ack

Am I reading RFC 4960 wrong?

diff --git a/association.go b/association.go
index a65188d..b77daff 100644
--- a/association.go
+++ b/association.go
@@ -729,7 +729,8 @@ func (a *Association) handleChunk(p *packet, c chunk) ([]*packet, error) {
                        a.setState(cookieEchoed)
                        return pack(r), nil
                default:
-                       return nil, errors.Errorf("TODO Handle Init acks when in state %s", a.state.String())
+                       // RFC 4960 Section 5.2.3
+                       return nil, nil
                }
 
        case *chunkAbort:
@@ -778,7 +779,8 @@ func (a *Association) handleChunk(p *packet, c chunk) ([]*packet, error) {
                        close(a.handshakeCompletedCh)
                        return nil, nil
                default:
-                       return nil, errors.Errorf("TODO Handle Init acks when in state %s", a.state.String())
+                       // RFC 4960 Section 5.2.5
+                       return nil, nil
                }
 
                // TODO Abort

32bit support

Build failed.

# github.com/pion/sctp [github.com/pion/sctp.test]
./chunk_test.go:38:107: constant 3899461680 overflows int

Action Required: Fix Renovate Configuration

There is an error with this repository's Renovate configuration that needs to be fixed. As a precaution, Renovate will stop PRs until it is resolved.

Error type: undefined. Note: this is a nested preset so please contact the preset author if you are unable to fix it yourself.

Improve data channel (SCTP) performance

Motivation

SCTP (datachannel) performance is perceived very low particularly over a real network with latency with limited bandwidth. No one appears to have properly measured performance. We should identify underlying problems causing the slowness with correct measurement, then tackle those to improve it.

  1. Benchmark
  2. Improve performance (we may create separate tickets, and this issue will be used as an umprella ticket for those)

(Possible) Deadlock found by pions/webrtc CI

Haven't really dug into this yet, but it looks like we hold a lock when when waiting for accept (so we can deadlock the association this way)

Full build https://travis-ci.org/pions/webrtc/builds/513830206?utm_source=github_status&utm_medium=notification

        0x46a6ee        sync.runtime_notifyListWait+0xce                                        /home/travis/go/src/runtime/sema.go:510
        0x48db5d        sync.(*Cond).Wait+0x8d                                                  /home/travis/go/src/sync/cond.go:56
        0x74bac1        github.com/pions/sctp.(*Stream).ReadSCTP+0x181                          /home/travis/gopath/pkg/mod/github.com/pions/[email protected]/stream.go:109
        0x751a1a        github.com/pions/datachannel.(*DataChannel).ReadDataChannel+0x9a        /home/travis/gopath/pkg/mod/github.com/pions/[email protected]/datachannel.go:132
        0x9eb069        github.com/pions/webrtc.(*DataChannel).readLoop+0xe9                    /home/travis/build/pions/webrtc/datachannel.go:262

        0x729e32        github.com/pions/sctp.(*Association).createStream+0x1272        /home/travis/gopath/pkg/mod/github.com/pions/[email protected]/association.go:598
        0x729a28        github.com/pions/sctp.(*Association).getOrCreateStream+0xe68    /home/travis/gopath/pkg/mod/github.com/pions/[email protected]/association.go:610
        0x72995d        github.com/pions/sctp.(*Association).handleData+0xd9d           /home/travis/gopath/pkg/mod/github.com/pions/[email protected]/association.go:522
        0x730aa4        github.com/pions/sctp.(*Association).handleChunk+0x1df4         /home/travis/gopath/pkg/mod/github.com/pions/[email protected]/association.go:1147
        0x7271a5        github.com/pions/sctp.(*Association).handleInbound+0x245        /home/travis/gopath/pkg/mod/github.com/pions/[email protected]/association.go:347
        0x726ce7        github.com/pions/sctp.(*Association).readLoop+0x117             /home/travis/gopath/pkg/mod/github.com/pions/[email protected]/association.go:317

        0x6de226        github.com/pions/dtls.(*Conn).Read+0x86                 /home/travis/gopath/pkg/mod/github.com/pions/[email protected]/conn.go:178
        0x726c9b        github.com/pions/sctp.(*Association).readLoop+0xcb      /home/travis/gopath/pkg/mod/github.com/pions/[email protected]/association.go:312

        0x75122e        github.com/pions/sctp.(*Association).AcceptStream+0x6e                  /home/travis/gopath/pkg/mod/github.com/pions/[email protected]/association.go:579
        0x7511ef        github.com/pions/datachannel.Accept+0x2f                                /home/travis/gopath/pkg/mod/github.com/pions/[email protected]/datachannel.go:73
        0xa12a1c        github.com/pions/webrtc.(*SCTPTransport).acceptDataChannels+0x10c       /home/travis/build/pions/webrtc/sctptransport.go:133

        0x4694bc        sync.runtime_SemacquireMutex+0x3c                                       /home/travis/go/src/runtime/sema.go:71
        0x48f468        sync.(*Mutex).Lock+0x148                                                /home/travis/go/src/sync/mutex.go:134
        0x490699        sync.(*RWMutex).Lock+0x49                                               /home/travis/go/src/sync/rwmutex.go:93
        0x72df82        github.com/pions/sctp.(*Association).sendPayloadData+0x52               /home/travis/gopath/pkg/mod/github.com/pions/[email protected]/association.go:910
        0x74c045        github.com/pions/sctp.(*Stream).WriteSCTP+0x195                         /home/travis/gopath/pkg/mod/github.com/pions/[email protected]/stream.go:168
        0x7523eb        github.com/pions/datachannel.(*DataChannel).writeDataChannelAck+0xab    /home/travis/gopath/pkg/mod/github.com/pions/[email protected]/datachannel.go:227
        0x751619        github.com/pions/datachannel.Server+0x2c9                               /home/travis/gopath/pkg/mod/github.com/pions/[email protected]/datachannel.go:115
        0x75126f        github.com/pions/datachannel.Accept+0xaf                                /home/travis/gopath/pkg/mod/github.com/pions/[email protected]/datachannel.go:80
        0xa12a1c        github.com/pions/webrtc.(*SCTPTransport).acceptDataChannels+0x10c       /home/travis/build/pions/webrtc/sctptransport.go:133

        0xa23c7b        github.com/pions/webrtc.TestPeerConnection_Close+0x2db  /home/travis/build/pions/webrtc/peerconnection_close_test.go:46
        0x545143        testing.tRunner+0x163                                   /home/travis/go/src/testing/testing.go:865

Retransmission of RECONFIG chunk

Summary

Retransmission of RECONFIG chunk has not been implemented yet.

Motivation

This is crucial when the application expects to receive EOF at the end, over a lossy connection.

Describe alternatives you've considered

A workaround to this is to implement end-of-file signaling at the application level.

Additional context

Relates to pion/webrtc#652

Caching issue

Your environment.

What did you do?

I'm using pions/webrtc datachannels in a project to do direct file sharing.
The file transfert is stable, but the speed drops heavily over time.
I'm using a datachannel initialized like this:

ordered := true
maxPacketLifeTime := uint16(10000)
dataChannel, err := s.peerConnection.CreateDataChannel("data", &webrtc.DataChannelInit{
	Ordered:           &ordered,
	MaxPacketLifeTime: &maxPacketLifeTime,
})

What did you expect?

Transmission speed shouldn't decrease that much over time.

What happened?

The file transfer is stable, but the speed decreases over time, making it almost unusuable for long file transferts.
After cpu/mem profiling the issue, I'm pretty confident the issue is that abandoned packets are never removed from the orderedPackets (payloadQueue, payload_queue.go). My program spends ~80% of its time in the Association.sendPayloadData method (association.go), and more specifically ~60% of its time sorting the orderedPackets array.

I'm not sure if this behavior is due to a bug or due to a RFC spec not built-in yet.

Here is the profiling datas of a 50MB file transfert:
cpu-profiling.pdf
mem-profiling.pdf

Stream.Close() wouldn't unblock ReadSCTP()

Your environment.

  • Version: v1.6.10

What did you do? & What happened?

In my test tool using SCTP, an attempt to close stream did not unblock Stream.ReadSCTP() when the association with the remote was already gone.

What did you expect?

ReadSCTP() should be unblocked.

Stream doesn't check for io.ErrShortBuffer returns from reassemblyQueue.read

Your environment.

What did you do?

Used the pion/webrtc library for the the proxy side of our snowflake anti-censorship system that reads from a WebRTC connection to the client. I used keroserene/go-webrtc as the library on the client side and found data being dropped despite the usage of a reliable channel.

What did you expect?

I expected to receive all of the data that I sent

What happened?

Large chunks of data were missing.

More details:

It turns out Stream is not returning io.ErrShortBuffer errors from reassemblyQueue.read. Instead the error is being overwritten and the data is lost. Stream.ReadSCTP should instead return the error.

Slow-reader problem

The issue

It appears that SCTP is reading incoming data slow in a high bandwidth usage, causing a buffer (packetio) at ICE layer being filled up with a lot of incoming data (cause a lot of latency before reaching SCTP layer) - pion/ice#12.

(Excerpt from the discussion in the PR #39)
syscall.pdf
Looking at the trace output (go tool trace), it appears, what association's readLoop is blocked by is not the mutex of stream layer, but the syscall "sendto". (underlying (UDP) socket is a blocking socket)
To solve the "slow-reader" problem, we'd probably need a drastic change in SCTP, such as stop sending data (a reply) on handleInbound(), and use another goroutine for the immediate replies (control packets), which would raise another issue - how much can sctp layer buffer those immediate replies?

Related issue: pion/ice#12 and #32

Approach (proposal)

  • Introduce another goroutine, writeLoop in association layer
  • WriteSCTP() stores data to pending data, wake up the writeLoop and return immediately.
  • handleInbound() (the readLoop routine) DO NOT CALL association.send(). Instead, it should schedule the transmission of immediate control packets for writeLoop to send it, and exit the routine as soon as possible (similar to delayed-ack, but immediately)
  • Also consider avoiding read and write routines waiting on the same mutex at the stream layer. (introduce a mutex inside reassembly queue, etc.

TODO: evaluate if the above proposal is viable first.

T1-init / T1-cooke timers

This issue is a split from #11.
Data channel connectivity issue has become very critical, I'd like to roll out T1-init/cookie timers earlier that T3-rtx/congestion control features.

sctp Server start with state closed, which prevent the cookie exchange with the client

  • server association is created with sctp.Server, which set state to close (well, it doesn't set the state really)
  • client association is created with sctp.Client, which set the state to cookie-wait

client sends cookie but is ignored by the server because:

	if state != cookieWait {
		// RFC 4960
		// 5.2.3.  Unexpected INIT ACK
		//   If an INIT ACK is received by an endpoint in any state other than the
		//   COOKIE-WAIT state, the endpoint should discard the INIT ACK chunk.
		//   An unexpected INIT ACK usually indicates the processing of an old or
		//   duplicated INIT chunk.
		return nil
	}

Race condition for chunkmap access in payloadQueue

Your environment.

  • Version: v1.7.5
  • Browser: N/A (Snowflake)
  • Other Information - see Tor ticket #33211

What did you do?

There are more details in the ticket, but we found our application going into a CPU-intensive infinite loop. Profiling pointed us towards the markAllToRetransmit function in payload_queue.go: https://github.com/pion/sctp/blob/master/payload_queue.go#L163

I'm not sure whether the infinite loop behaviour we saw was due to this race condition but it seems plausible.

RTX Failure: T1-init

What did you do?

Added two PeerConnections and established Opus streams between them

What did you expect?

No logs.

What happened?

Pion printf logged "RTX Failure: T1-init". The audio stream works fine.

Should this happen normally or is it a bug?

Invalid inbound chucks don't cause an ABORT to be generated

handleChunk iterates each chunk and calls c.check this function returns if we should abort or not.

Right now we ignore the return value and just print. In the future we need to properly handle, and send an ABORT instead of just logging.

Problems when transmitting ordered data. Unordered data too.

This may not be a problem in sctp, maybe there is a bug on the example or I'm missing something but since I have a reproducible case I think it's interesting to document it.

The code is in here: https://github.com/hugoArregui/testsctp/tree/topic-ordered-problem

I see this on the receiver end:

2019/09/26 12:43:31 Received Mbps: 19.459, totalBytesReceived: 2551268
sctp DEBUG: 12:43:31.348204 association.go:967: [0xc0000a5380] receive buffer full. dropping DATA with tsn=3502769449
2019/09/26 12:43:32 Received Mbps: 9.736, totalBytesReceived: 2552496
2019/09/26 12:43:33 Received Mbps: 6.491, totalBytesReceived: 2552496
sctp DEBUG: 12:43:33.348266 association.go:967: [0xc0000a5380] receive buffer full. dropping DATA with tsn=3502769449
2019/09/26 12:43:34 Received Mbps: 4.870, totalBytesReceived: 2553724
2019/09/26 12:43:35 Received Mbps: 3.896, totalBytesReceived: 2553724
2019/09/26 12:43:36 Received Mbps: 3.247, totalBytesReceived: 2553724
2019/09/26 12:43:37 Received Mbps: 2.783, totalBytesReceived: 2553724
sctp DEBUG: 12:43:37.348694 association.go:967: [0xc0000a5380] receive buffer full. dropping DATA with tsn=3502769449
2019/09/26 12:43:38 Received Mbps: 2.437, totalBytesReceived: 2554952
2019/09/26 12:43:39 Received Mbps: 2.166, totalBytesReceived: 2554952

And this on the sender end:

2019/09/26 12:43:31 Sent Mbps: 19.461, totalBytesSent: 2551268, bufferedAmout: 557056
sctp DEBUG: 12:43:31.347734 association.go:2011: [0xc00009f380] T3-rtx timed out: nRtos=1 cwnd=1228 ssthresh=41486
sctp DEBUG: 12:43:31.347962 association.go:518: [0xc00009f380] retransmitting 1228 bytes
2019/09/26 12:43:32 Sent Mbps: 9.736, totalBytesSent: 2552496, bufferedAmout: 557056
2019/09/26 12:43:33 Sent Mbps: 6.491, totalBytesSent: 2552496, bufferedAmout: 557056
sctp DEBUG: 12:43:33.348043 association.go:2011: [0xc00009f380] T3-rtx timed out: nRtos=2 cwnd=1228 ssthresh=4912
sctp DEBUG: 12:43:33.348124 association.go:518: [0xc00009f380] retransmitting 1228 bytes
2019/09/26 12:43:34 Sent Mbps: 4.871, totalBytesSent: 2553724, bufferedAmout: 557056
2019/09/26 12:43:35 Sent Mbps: 3.897, totalBytesSent: 2553724, bufferedAmout: 557056
2019/09/26 12:43:36 Sent Mbps: 3.247, totalBytesSent: 2553724, bufferedAmout: 557056
2019/09/26 12:43:37 Sent Mbps: 2.783, totalBytesSent: 2553724, bufferedAmout: 557056
sctp DEBUG: 12:43:37.348328 association.go:2011: [0xc00009f380] T3-rtx timed out: nRtos=3 cwnd=1228 ssthresh=4912
sctp DEBUG: 12:43:37.348477 association.go:518: [0xc00009f380] retransmitting 1228 bytes

I think for some reason the ressembly queue cannot be read (missing some package or there is a bug somewhere) that causes the buffer to be always full which causes the sender to halt.

Data race during the test

On CI environment,

=== RUN   TestStats
--- PASS: TestStats (0.00s)
==================
WARNING: DATA RACE
Write at 0x00c0000b2098 by goroutine 83:
  github.com/pion/sctp.(*fakeEchoConn).Read()
      /home/travis/gopath/src/github.com/pion/sctp/association_test.go:2216 +0x180
  github.com/pion/sctp.(*Association).readLoop()
      /home/travis/gopath/src/github.com/pion/sctp/association.go:421 +0x271

Previous read at 0x00c0000b2098 by goroutine 27:
  github.com/pion/sctp.TestStats()
      /home/travis/gopath/src/github.com/pion/sctp/association_test.go:2314 +0x38c
  testing.tRunner()
      /home/travis/.gimme/versions/go1.13.6.linux.amd64/src/testing/testing.go:909 +0x199

Goroutine 83 (running) created at:
  github.com/pion/sctp.(*Association).init()
      /home/travis/gopath/src/github.com/pion/sctp/association.go:298 +0xdb
  github.com/pion/sctp.Client()
      /home/travis/gopath/src/github.com/pion/sctp/association.go:218 +0x9a
  github.com/pion/sctp.TestStats()
      /home/travis/gopath/src/github.com/pion/sctp/association_test.go:2307 +0x255
  testing.tRunner()
      /home/travis/.gimme/versions/go1.13.6.linux.amd64/src/testing/testing.go:909 +0x199

Goroutine 27 (running) created at:
  testing.(*T).Run()
      /home/travis/.gimme/versions/go1.13.6.linux.amd64/src/testing/testing.go:960 +0x651
  testing.runTests.func1()
      /home/travis/.gimme/versions/go1.13.6.linux.amd64/src/testing/testing.go:1202 +0xa6
  testing.tRunner()
      /home/travis/.gimme/versions/go1.13.6.linux.amd64/src/testing/testing.go:909 +0x199
  testing.runTests()
      /home/travis/.gimme/versions/go1.13.6.linux.amd64/src/testing/testing.go:1200 +0x521
  testing.(*M).Run()
      /home/travis/.gimme/versions/go1.13.6.linux.amd64/src/testing/testing.go:1117 +0x2ff
  main.main()
      _testmain.go:294 +0x347
==================

Minimal client / server example fails handshaking with "ChunkInit recieved in state 'Closed'"

Your environment.

  • Version: v1.6.10
  • Go 1.12.4

What did you do?

Created minimal client / server example as shown here: https://gist.github.com/richp10/b5afc98353e548533385af55f587da63

What did you expect?

Expected handshake to complete and be able to communicate across the stream.

What happened?

CLIENT:

sctp DEBUG: 08:36:56.509457 association.go:712: [0xc00008c1a0] state change: 'Closed' => 'CookieWait'
sctp DEBUG: 08:36:56.512453 association.go:305: [0xc00008c1a0] sending INIT
sctp DEBUG: 08:36:56.509457 association.go:395: [0xc00008c1a0] readLoop entered
sctp DEBUG: 08:36:56.509457 association.go:418: [0xc00008c1a0] writeLoop entered
sctp DEBUG: 08:36:59.512983 association.go:305: [0xc00008c1a0] sending INIT
sctp DEBUG: 08:37:05.513388 association.go:305: [0xc00008c1a0] sending INIT
sctp DEBUG: 08:37:17.513699 association.go:305: [0xc00008c1a0] sending INIT

SERVER:

sctp DEBUG: 08:36:47.848844 association.go:395: [0xc0000761a0] readLoop entered
sctp DEBUG: 08:36:56.514451 association.go:747: [0xc0000761a0] chunkInit received in state 'Closed'
sctp WARNING: 2019/09/24 08:36:56 [0xc0000761a0] failed to write packets on netConn: write udp [::]:10001: wsasend: A request
 to send or receive data was disallowed because the socket is not connected and (when sending on a datagram socket using a se
ndto call) no address was supplied.
sctp DEBUG: 08:36:56.515451 association.go:430: [0xc0000761a0] writeLoop ended
sctp DEBUG: 08:36:56.515451 association.go:446: [0xc0000761a0] writeLoop exited
sctp DEBUG: 08:36:59.512983 association.go:747: [0xc0000761a0] chunkInit received in state 'Closed'
sctp DEBUG: 08:37:05.513388 association.go:747: [0xc0000761a0] chunkInit received in state 'Closed'
sctp DEBUG: 08:37:17.513699 association.go:747: [0xc0000761a0] chunkInit received in state 'Closed'

This is more likely a misunderstanding on my part than a bug in the library, but I hope you can help in either event!

Reliability

In order to pass the libp2p test cases used in backkem/go-libp2p-webrtc-direct we should be able to pass the TestAssocStressDuplex test for about 15000000 messages. Right now, I'm able to make it work with around 1500 messages. More will break the current implementation and cause the test to hang forever.
As discussed before this is likely caused by a combination of SACK storm (reader far behind), packet loss and no re-transmission timer to fall back on.

This can probably be fixed by a combination of T3 timer (#11) and a simple congestion window (#14).

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.