Review of Signaling in different WebRTC applications
This post provides a quick review of the signaling channel implementation in various popular WebRTC platforms. It examines the protocol used for the channel, how messages are serialized, and whether the applications use Session Description Protocol (SDP) as an opaque string over the wire, or if they instead send the required parameters in a custom format.
To provide a variety of platforms, I have included a mix of popular end-user applications, cloud providers, and open-source implementations in the table. If you would like, I am happy to add others to the list.
How was it tested?
To test it, join a room and check in Chrome Developer Tools whether there are WebSocket connections established or periodic HTTP requests being made. Then, inspect the messages of those connections and requests and check if the format is Binary/JSON/XML. In case of Binary messages, it's harder to see the content, and there's a chance that the information is compressed/encrypted, and there's some JSON or SDP inside.
Next, look inside those messages for the typical information in SDP description, such as ufrag, header extensions, and payload types, and see how they are sent.
Most applications seem to use WebSockets, but some of the biggest ones use HTTP polling. This may be due to historical reasons, to better handle specific users with restricted networks or devices, or to react faster to disconnections in unstable connections.
In case of Google Meet it looks like part of the signaling goes over datachannels in addition to the existing HTTP long polling mechanism.
The serialization formats are all over the place between binary (the most efficient), JSON (the simplest) and XML in case of using XMPP. Using JSON can be less efficient over the wire but require less libraries/runtime to process it in a browser. The ProtoJSON approach used by Google Meet looks like a good compromise in the middle.
Most applications seem to not use SDPs over the wire. Instead they only send the parameters needed and keep the SDPs as an implementation detail in the client side. That's slightly more efficient and gives you more flexibility but it is less straightforward to ad those conversions if all your endpoints are libwebrtc based.
The conclusion could be that there is no single right solution for WebRTC signaling but different alternatives that all of them can work well. However two of the most common approaches seems to be using WebSockets and not sending SDP on the wire that is something that personally I'm very happy about.
Did you find something else interesting or maybe even something that is not accurate in this post? Feedback is more than welcomed either here or .