Existing WebRTC is not great for broadcasting use cases

WebRTC was originally designed for real-time communication with a small number of participants, where latency requirements are extremely strict (typically <250ms). However, it has also been utilized for broadcasting use cases, such as YouTube Studio or Cloudflare CDN, where protocols used in the past have been different, typically Adobe’s RTMP and protocols based on HTTP.

WebRTC offers a new range of broadcasting use cases, particularly those requiring hyper-low latency, such as those with audience interactivity, for instance, user reactions or auction use cases. However, choosing WebRTC comes with tradeoffs, including increased complexity, scalability challenges, or lower quality. While it's possible to address the first two with enough time and effort, the primary concern should be how to obtain the best possible quality.

Why do we have lower quality when using WebRTC?

First of all, a clarification.  In a perfect network with infinite bandwidth there is no much difference in quality but in other restricted or flaky networks these are some reasons why WebRTC provides lower quality than RTMP&HTTP based solutions:
  • Smaller buffers: WebRTC implementations prioritize latency over quality by using small buffers, which can sometimes be too small to receive retransmissions in time. TCP-based protocols with large buffers prioritize quality over latency and do not face this issue.
  • Aggressive congestion control: WebRTC employs aggressive congestion control that quickly reduces video transmission when congestion is detected to prevent increased latency and potential packet loss.  This helps to have more predictive low latency but in many cases being less aggressive would maintain a higher quality at the expense of a higher latency.
  • Only real time reliability: If you lose connectivity for some seconds WebRTC is not going to queue the media to deliver it whenever you recover.   Same way it is not going to recover a previous frame after a newer one has been decoded.   The most obvious impact on this is in case of server side recordings.
  • Fast Encoding: WebRTC codecs configuration are optimized for speed and do not use two-pass encodings or bidirectional predicted frames, instead using conservative settings to ensure fast encoding. However, this means that the encoding efficiency is not as good as that of broadcasting applications, resulting in slightly worse encoding for a given bitrate.
  • Unpredictable Bitrate: The fact that WebRTC cannot use 2 pass encoding makes the quality/bitrate decisions by the codecs less optimal and unstable.
To be honest some of these limitations can be solved if you control the WebRTC implementation (f.e. in a mobile or desktop native app) but that is not so easy or even possible nowadays for web applications where the WebRTC stack is embedded in the browsers.

What should we do for interactive broadcasting?

Ideally we need a protocol where we can configure the acceptable latency in a range from real time (<250 ms) to traditional broadcasting (5 - 10secs).  Once that constraint is set the protocol needs to provide the best possible quality in those conditions.   That is doable and we will probably see it in one or both of these ways:
  • More tunneable WebRTC.   With some more flexibility in WebRTC APIs it should be possible to overcome the limitations described in the previous section.   For example things like configurable congestion control with settings like RTCPeerConnection.congestionControl = "low latency" or some lower level APIs in the new WebCodecs API.
  • New protocols working over UDP and reusing some of the WebRTC techniques but providing more flexibility (f.e. new QUIC protocols like RUSH).  Implementations using those protocols created from scratch can overcome most of those limitations because they leave part of the problem (buffering or encoding) to the application.   Although they would also need the ability to control the congestion control behavior.

So should I use WebRTC or not?

I would use it only when it is really required and provides enough business value to make it worth the additional complexity and potential quality degradation.   For streaming a traditional soccer game it is probably not needed while for a stream with interactivity with the audience (f.e. reactions) it can provide real value and the quality loss can be worth the benefits in terms of engagement.

Also we need to keep in mind that we don't need to have an end to end WebRTC.   For example in cases where there is a need for low latency only between speakers it could make sense to use WebRTC for ingestion but keep using HLS for playback.    Or in some cases use RTMP for ingestion from broadcasters with good connectivity and RUSH for playback.


Popular posts from this blog

Bandwidth Estimation in WebRTC (and the new Sender Side BWE)

Improving Real Time Communications with Machine Learning

Controlling bandwidth usage in WebRTC (and how googSuspendBelowMinBitrate works)