Audio Mixing or Forwarding

How many audio streams should your WebRTC server forward to the participants in a room? There are various options, ranging from the simplest approach of forwarding everything, to the most extreme option of mixing all audio and sending just a single stream.

A few weeks ago, we engaged in a Twitter conversation about this very topic. Following that discussion, bloggeek also wrote a post on the subject. For me it is always interesting to see what different types of applications are doing because at least in some of those cases they have the ability to do A/B testing and compare the results with millions of users before making a decision.

The simplest way to determine the best approach is to enter a room with different applications and inspect the SDP (Session Description Protocol) in chrome://webrtc-internals. Within this tool, you can examine how many channels are being forwarded when you're in a room and look for potential clues within the SDP (some people use the "mixed" identifier for audio tracks, among other things). For example in case of Google Meet you can see in the SDP that joining a room you get 3 hardcoded audio streams created:

m=audio 9 UDP/TLS/RTP/SAVPF 111 63 9 0 8 13 110 126, mid=3
m=audio 9 UDP/TLS/RTP/SAVPF 111 63 9 0 8 13 110 126, mid=4
m=audio 9 UDP/TLS/RTP/SAVPF 111 63 9 0 8 13 110 126, mid=5

All the tests below were done with the web clients and Zoom was not included because it doens't make use of standard WebRTC and everything is binary&compiled so it would take more effort to reverse engineer it (for example analysing the audio bitrates when multiple participants speak).

What's the optimal approach for this task? There isn't a one-size-fits-all solution; it hinges on your particular use case and specific requirements. The most straightforward implementation is plain forwarding, which is what the majority of open-source projects and platforms employ. Interestingly, enterprise solutions tend to lean toward mixing. This inclination might be attributed to the fact that these services have evolved from earlier audio mixing solutions, the necessity for PSTN interconnection, or the need to accommodate unusual scenarios, such as devices with extremely limited available bandwidth.

Comments

Popular posts from this blog

Bandwidth Estimation in WebRTC (and the new Sender Side BWE)

Improving Real Time Communications with Machine Learning

Controlling bandwidth usage in WebRTC (and how googSuspendBelowMinBitrate works)