Showing posts from May, 2023

Review of Signaling in different WebRTC applications

  This post provides a quick review of the signaling channel implementation in various popular WebRTC platforms. It examines the protocol used for the channel, how messages are serialized, and whether the applications use Session Description Protocol (SDP) as an opaque string over the wire, or if they instead send the required parameters in a custom format. To provide a variety of platforms, I have included a mix of popular end-user applications, cloud providers, and open-source implementations in the table. If you would like, I am happy to add others to the list. How was it tested? To test it, join a room and check in Chrome Developer Tools whether there are WebSocket connections established or periodic HTTP requests being made. Then, inspect the messages of those connections and requests and check if the format is Binary/JSON/XML. In case of Binary messages, it's harder to see the content, and there's a chance that the information is compressed/encrypted, and there's s

Perfect Interactive Broadcasting Architecture

While we might sometimes talk about low-latency or interactive broadcasting in a generic way, it's important to note that there are actually two distinct types of streaming use cases that require different levels of interactivity. Conversational use cases where multiple participants are talking together and that conversation is being streamed to many other viewers. These viewers can potentially become speakers at some point too. Single stream use cases where just one person is streaming their video feed (it can be their camera, their screen, or a combination of both) to many other viewers who can interact in different ways. The most obvious way is through chat messages, but it can also include emoji reactions or even bids on an auction being streamed. The conversational use case has specific requirements. For instance, it demands effective synchronization of multiple streams, ultra-low latency (less than 250ms) only between the users who are speaking, and an element that per

WebRTC header extensions review

WebRTC supports the concept of RTP header extensions to extend media packets with additional metadata.    One of the most common use cases is to attach the audio level to audio packets so that the server can calculate active speakers without having to decode the audio packets. Some of these header extensions are standard and have been used for a while, but there are others that are added by Google when needed and are only documented lightly in the website and the libwebrtc code.   Those header extensions and its usage is not very well know and this post is an attempt to give visibility of them for the WebRTC community. To discover some of these headers you can usually take a look at the Offer/Answer of a Google Meet session or take a look at the libwebrtc source code here: Audio Levels ( urn:ietf:params:rtp-hdrext:ssrc-audio-level ) Doc [Very common] The header contains the audio level (volume) of the audi