Friday, March 17, 2017

How are retransmissions implemented in WebRTC

Executive Summary: This ended up being more complex than expected, if you are building an SFU just take the & co files and use them.  Or even better, just use OpenTok or any other existing platform.   Anyway it is fun stuff to play with.


When your network is congested and some of the packets are lost you need a mechanism to recover them.   If we ignore signal processing tricks (like stretching the previous and next audio packets in the jitter buffer) there are two alternatives:
  • Forward error correction: in each packet you add some extra information about the previous ones in case they are lost and you need to reconstruct them (flexfec is the new format to do this in WebRTC [1]).
  • Retransmissions: the sender will resend the packets usually after the receiver requests that with a negative acknowledgement message (NACK) indicating which packets were lost.
These mechanisms can be combined depending on the network conditions and can also be tuned for specific cases like scalable video as described in [2]. In Chrome the retransmissions are implemented for both audio and video but by default it is only enabled for video.

This post is about the retransmissions approach, specifically about when to ask for a retransmission and when not to do it.

Disclaimer: Please report any mistake/inaccuracy in this post and I will fix it asap for future reference. BTW this post is not about rtx "encapsulation" vs the legacy way of sending retransmissions but I'm happy to talk about that in twitter :)

Implementation in the RTP Receiver Side

The RTP receiver side is the one requesting the retransmission sending a NACK RTCP packet when it detects a missing packet.

But... should I request for retransmissions immediately after detecting a gap in the sequence numbers?  Google implementation does that plus also have a periodic timer to keep requesting them.

And... for how long should I keep requesting the retransmission? Chrome keeps requesting the retransmission of a specific packet unless the sequence number is "more than 10000 old", the number of missing packets in the list is larger than 1000, you asked for the same packet 10 times already or you have a new decodable full frame (no packet is missing for any other frame it depends on).

This is the Google implementation of WebRTC (and the same code is copied into Firefox and Safari) transcribed as pseudocode:

Implementation in the RTP Sender Side

The RTP sender side is the one receiving the retransmission requests (NACKs) and resending the lost packets if possible.

The main question here is if I should obey all the retransmission requests or not.  The way this is implemented in Google's WebRTC implementation right now is this one:
  1. Keep a copy of the packets sent in the last 1000 msecs (the "history").
  2. When a NACK is received try to send the packets requests if we still have them in the history.
  3. But... (
    • Ignore the request if the packet has been resent in the last RTT msecs.
    • Ignore the request if we are sending too many retransmissions.  There is a rate limiter (retransmission_rate_limiter) that is apparently configured to the whole channel bandwidth estimation.
    • If pacing is enabled insert this packet in the queue with normal priority.

Monday, January 30, 2017

Bandwidth Estimation in WebRTC (and the new Sender Side BWE)

Bandwidth estimation is probably the most critical component in the video engine of WebRTC. The bandwidth estimation (BWE) module is responsible for deciding how much video* traffic you can send without congesting the network to prevent degradation of the video quality.

In the past bandwidth estimation algorithms used to be very rudimentary and based mostly on packet loss. Basically we used to start increasing slowly the video bitrate until we detected packets being lost.  To detect the packet loss you use the standard RTCP feedback mechanisms where the receiver side reports packet loss periodically using RTCP Receiver Report (RR) messages.

Modern bandwidth estimation algorithms are more advanced and try to detect congestion before it is bad enough to make routers discard packets. These algorithms predicts congestion analyzing the delay between packets. The idea is that when you start having some congestion, the buffers in the routers will start filling and the delay will be more variable. Some popular examples of these algorithms are Google Congestion Control (the one used in WebRTC), SCReAM and SPROUT.  If you want to read more about the history and status of congestion control standards you can read this very interesting post from Randell Jesup.

From the very beginning of WebRTC, the media engine (that is built by Google but included in both Chrome and Firefox) was based on the concept of remote bandwidth estimation. As explained before the receiver of the traffic analyzes the inter-packet delay and generates an estimation of the available bandwidth that is reported back to the sender using RTCP messages with a new message type that was defined for this purpose: REMB. Another detail of WebRTC implementation is that the sender will use not only this bandwidth estimation received in the REMB packet but also the packet loss feedback to decide the final value of the target video bitrate to be sent.

Sender pseudocode (
    if (lossRate < 2%) video_bitrate *= 1.08
    if (lossRate > 10%) video_bitrate *= (1 - 0.5*lossRate)
    if (video_bitrate > bwe) video_bitrate = bwe;

The nice consequence of that implementation is that it reduces bitrate quickly when overuse is detected while slowly increasing bitrate when no congestion is detected.

But in recent versions of Chrome this has changed and now the whole bandwidth estimation logic happens in the sender side.   The basic detection of congestion is not very different from how it was before and the sender needs delay information from the receiver side in order to be able to estimate the available bandwidth.  This is accomplished with two new mechanisms/protocols:

1. Transport wide sequence numbers header extension:   All the video RTP packets carry an extra 4 bytes in the header to include a sequence number.    This is negotiated in the SDP with the following line:


Note: the idea of this new sequence number is to be able to use a single counter for both audio and video but Chrome still doesn't use it for audio, so I think it is not very useful yet.

2. Transport Feedback: The receiver side sends periodic feedback to the media sender with information  about the packets received and the the delay between them.  To do this the receiver uses a new RTCP Packet (Transport Feedback).  This feature is negotiated in the SDP with this line including the new RTCP feedback message:

a=rtcp-fb:100 transport-cc

When looking at how is this transport feedback packet looks like it is funny to realize that there is a specification with Google's proposal and an official standardization proposal but the only source of truth is the real implementation in the source code.

This RTCP feedback is sent every 100 msecs by default but it is dynamically adapted to use 5% of the available bandwidth (min value is 50 msecs and max is 250 msecs).

The format of this new RTCP packet is very concise to minimize the size, including grouping packets in chunks, storing numbers as base + diff or reducing granularity to 0.25 msec intervals. I did a simple test and even with these improvements it was still using 16Kbps sending feedback packets every 50 msecs (Figure 2).

Figure 2: Bandwidth used in RTCP Transport Feedback Messages
You can take a look at the implementation in remote_estimator_proxy.h (generating the packets) and (serialization).

What is good about sender side bandwidth estimation? The theory as explained by Google is that this way all the decision logic is in a single place (the sender) and that it makes it possible to test new algorithms easily because you don't depend on both endpoints. Honestly given that browsers auto update I don't see the big advantage of this point but it is certainly cleaner even if it is more expensive in bandwidth usage. The other advantage is that the sender knows the type of traffic he is sending and can use a different algorithm when sending normal video than when doing a screencast for example.

Are we affected? If you are building a media server that requires bandwidth estimation for anything (for example to decide the quality to forward when using simulcast) you will need to upgrade your implementation at some point. The good news is that Chrome will have to support the old mechanism (REMB) for a while, at least until Firefox includes support for it.   But REMB probably won't get more improvements and it is more likely to have bugs now so probably not a good idea to postpone the change much.

Is sender side bwe really better? I did a quick test (this is the test page where you can try one or the other changing a boolean) with both algorithms in Chrome (old REMB vs new Transport Feedback) and the new one performed way better at least regarding ramp-up time at the beginning of the connection (see figures below).   I don't think there is a technical reason for that apart from the fact that Google is now focused on the new one and not the old one and all the new improvements are probably only in the new algorithm.   Apparently there is something in the new code to handle in a special way the bwe during the first 2 seconds but I didn't investigate it much.

Who is working on this and what is the status?  Sender side bandwidth estimation is the default in Chrome 55 but this is still work in progress and we should expect many changes.   The official standardization is happening in IETF in the RMCAT group but most of the implementation available in Chrome is Google's own version of the in-progress specifications for algorithms and feedback protocols.

* Chrome is planning to use bandwidth estimation also to decide the audio bitrate to send (Planned for version 58).

You can follow me in Twitter if you are interested in Real Time Communications.

Tuesday, January 10, 2017

Using DSCP for WebRTC packet marking and prioritization

It is a common request from WebRTC developers and customers to know how they can differentiate WebRTC traffic from other type in their networks.  Usually the goal is to be able to prioritize RTC traffic over other types of less important traffic.

By prioritizing the WebRTC traffic in the edge routers it is possible to improve the quality in some network scenarios.  The most common cases where this marking may help are:

  • Congested broadband uplink where the router can discard other type of traffic instead of WebRTC traffic when queues get full.
  • Congested local wireless network
One obvious way to do this is forcing all the traffic to be relayed through a TURN or SFU server and se the priority based on IP addresses.   That's easy and always work, but the problem is that it is very difficult to maintain when your infrastructure changes often (scaling elastically, new datacenters...).

Other way to accomplish this is to use Differentiated Services Code Point (DSCP) marking. With DSCP you can use a specific field (6 bits) in the IP header to mark different classes of traffic.  This field can carry any arbitrary value that you can associate in your routers with different forwarding treatments (priorities).

Even if any value can be used in that DSCP field, there are some commonly used DSCP values, and there are some recommended values to be used for WebRTC endpoints in this IETF draft.

Nowadays this is supported by Chrome and can be enabled with a proprietary constrain passed to the PeerConnection:

new RTCPeerConnection({"iceServers": []}, { optional: [{ googDscp: true }] });

When enabled the DSCP field in the IP packets sent by Chrome has the value 34 (aka Assured Forwarding 41 or AF41) for both audio and video as you can see in the next figure.  While the default value is 0 (Default Forwarding).

This is the full demo page if you want to test it yourself:

Disclaimer: This is not working in WebRTC in case of Android and iOS and at least some versions of Windows but those issues should be fixed at some point based on the comments on the open tickets.

One question is why this "feature" is not enabled by default and all the packets are marked with the recommended DSCP values.   The main reason is because some routers could block packets with specific DSCP values as explained here, so it should be enabled only when you know that it is not going to be blocked in the network where your customers are at least until browsers implement mechanisms to discover those blocked packets and disable it automatically.

  • Differentiated services :
  • DSCP Packet Markings for WebRTC QoS
  • DSCP Transport considerations:
  • Test Page:

You can follow me in Twitter if you are interested in Real Time Communications.

Friday, November 4, 2016

Controlling bandwidth usage in WebRTC (and how googSuspendBelowMinBitrate works)

There are cases when we would like to limit the maximum bitrate being transmitted by WebRTC to avoid wasting resources in the user endpoints or save money reducing the bandwidth usage in our servers.   This is because the maximum bitrate by default in Chrome is around 2Mbps and for many use cases a much lower bitrate provides still pretty good quality.   BTW using a lower bitrate can also help with stability of quality in multiparty scenarios by reducing the amount of competition among different streams.

There is no simple API to configure the maximum bitrate in WebRTC (although there is one in ORTC) but there are 3 ways to do this by mangling the SDP.

1. Use the standard b=AS:BITRATE (Chrome) or b=TIAS:BITRATE (Firefox) attributes in the SDP for the audio or video channel[1]
2. Use codec specific attributes (this work at least for opus audio codec with maxaveragebitrate property) [2]
3. Use the proprietary x-max-bitrate attribute in the video channel of the SDP answer.  For example with something like this:

answer.sdp += "a=fmtp:100 x-google-max-bitrate=500\r\n";

My network sucks right now (I blame my neighbors wifis) and the bitrate is not very stable, but you can see how in average it stays under 500kbps.

There are also people interested on changing the minimum bitrate being used (the default is 30kbps in Chrome and increasing it is very dangerous unless you are in a very controlled environment). This can also be done mangling the SDP:

answer.sdp += "a=fmtp:100 x-google-min-bitrate=1000\r\n";

Another requirement is changing the initial bitrate to speed up the initialization and start with a higher bitrate than the default one (300 kbps).   This can also be done mangling the SDP:

answer.sdp += "a=fmtp:100 x-google-start-bitrate=1000\r\n";

In the graph you can see a session where the start bitrate and the minbitrate where set to 1000 kbps.

For the last three years we have seen comments in the mailing lists and issue tracker about something called "suspend below min bitrate" (3).  I was very curious about this feature as you can see in the ticket comments and finally decided to play with it and also  take a look at the code.

The feature is very simple to understand. If you enable it then WebRTC will stop sending video as soon as the bandwidth estimation goes below the minimum bitrate.  Otherwise by default WebRTC insists on sending minBitrate even if it creates congestion in the network.

To enable it you just need to pass an optional proprietary constrain while creating the PeerConnection:

var pc = new RTCPeerConnection({ iceServers: [] }, { optional: [{ "googSuspendBelowMinBitrate": true }] });
With that in place the video stops as soon as you don't have enough bandwidth for the video.  So if the bandwidth estimation is 60kbps and the audio needs 40kbps the video will still be sent until bandwidth estimation is >= 40kbps.

One of the tricky/interesting parts is the algorithm to allocate the bandwidth to different streams (for example for audio y video or for multiple streams in a single peerconnection or for different qualities when using simulcast).  For that there is a class BitrateAllocator ( in Google's webrtc code.

def allocate_bitrates(bitrate):
   sum_max_bitrate = sum(stream.max_bitrate) for stream in streams
   sum_min_bitrate = sum(stream.min_bitrate) for stream in streams
   if bitrate > sum_max_bitrate:    
      # All streams get max_bitrate
   elif bitrate > sum(stream.min_bitrate) for stream in streams:
      # Each stream gets stream.min_bitrate + (bitrate - sum_min_bitrate) / streams.length
      foreach(stream in streams)
         if suspend_below_min_bitrate:
             stream.allocated_bitrate = min(stream.min_bitrate, bitrate)
             stream.allocated_bitrate = stream.min_bitrate
         bitrate -= stream.allocated_bitrate

One of the problems with this functionality is detecting when this feature is activated to change the UI or notify the receiver that the video has been suspended.  The only solution at this point is using getStats to monitor the encoded video bytes but some kind of callback/notification is under consideration [4]

In case you have the same question I had, the video recovers automatically when the network conditions improves again.   You can see a graph with the whole suspension + reactivation I generated by degrading artificially my network conditions to force a bandwidth estimation < 40kbps.


Tuesday, September 9, 2014

Using Native WebRTC simulcast support in Chrome (or how to be as good as Hangouts) [WiP]

Some weeks ago Philipp Hancke presented an awesome analysis of how Google Hangouts uses WebRTC in Chrome.  In that blog post he explained that based on the SDP contents Google is using simulcast but didn't entered in the details of how to activate it.   So I had a lot of curiosity and thought that it could be great if people (beyond Google) could use this feature so I tried to replicate their behavior.

Step one: Add simulcast ssrcs and "SIM" group to the offer SDP

The first thing I tried is to implement some SDP mangling to make my SDP look like the Hangouts SDP.   That means adding 3 substreams grouped by simulcast semantics. This is the code of my quick and dirty implementation:

Result: no simulcast :(

Step two?: Add a renegotiation

I saw a a renegotiation in the initial connection from Google Hangouts (when there are no other participants) and I thought this was needed to enable simulcast and implemented it but this is not needed to have simulcast.

Step three: Add the x-conference flag to the video track in the answer SDP

The third thing I tried was to set those misterious conference flags in the SDP offer:

var lines = answer.sdp.split("\r\n");
lines.splice(lines.length - 1, 0, "a=x-google-flag:conference");
answer.sdp = lines.join("\r\n")

Result:simulcast  !!!!

You can see this lines in Chrome Canary logs:
VP8 number of temporal layers: 3
Simulcast substream 0: 960x540@450-900kbps with  temporal layers
Simulcast substream 1: 1920x1080@600-2500kbps with  temporal layers
Simulcast substream 2: 3840x2160@600-2500kbps with  temporal layers

Bonus track: Apparently from those logs we not only enabled simulcast but also VP8 temporal scalability.   I have a snippet to parse the RTP packets and check it but didn't have time yet to confirm it.

Step four: Decide what substreams you are interested in

Based on the RTCP Receiver Reports  and PLIs received, Chrome will decide which streams to send from the list of streams you configured in the SDP (in Step one).

PENDING: The logic behind these decisions is not clear yet

Apparently you can optimize also the CPU usage by setting the googSkipUnusedFrames constraint that Google Hangouts uses and that way the video substreams that are not going to be sent won't be encoded either.   (To be tested)

Step five: Increase the bandwidth

Chrome changes the number and quality of the different substreams based on the bandwidth estimations.    The configuration (resolution, temporal layers) for a given original resolution and available bandwidth can not be changed as far as I can tell.

For example if I don't send any RTCP packet, by default it just uses 2 substreams:
Stream 320 x 240
Temporal layers 3, min br 50, target br 150, max br 200, qp max 56
Stream 640 x 480
Temporal layers 3, min br 150, target br 500, max br 700, qp max 56

To increase the number of substreams and quality you have to use the REMBs or maybe periodic RR.

Desperation note: At some point I was so frustrated not being able to get it working that I even tried to run my test page simulating I was behind the domain to see if Chrome was behaving differently.

Tuesday, April 22, 2014

What does the MSID attribute in the SDP mean?

There is a work in progress in the IETF [1] to define a new MSID attribute to be used in the SDP. This attribute has been defined in the context of WebRTC and it is already being sent by WebRTC endpoints (Firefox and Chrome). This is a typical SDP generated by Chrome:

o=- 658899108507703479 2 IN IP4
t=0 0
a=group:BUNDLE audio video
a=msid-semantic: WMS FmJZgkjbR2gBS25U0hP3qoUSjvfDddQl4UjJ
m=audio 1 RTP/SAVPF 111 103 104 0 8 106 105 13 126
a=ssrc:1400694016 cname:Yk4LvPXyWNZKkW6S
a=ssrc:1400694016 msid:FmJZgkjbR2gBS25U0hP3qoUSjvfDddQl4UjJ a228df45
a=ssrc:1400694016 mslabel:FmJZgkjbR2gBS25U0hP3qoUSjvfDddQl4UjJ
a=ssrc:1400694016 label:a228df45-515e-49b6-be8f-0d044b65de64
m=video 1 RTP/SAVPF 100 116 117
a=ssrc:1484622160 cname:Yk4LvPXyWNZKkW6S
a=ssrc:1484622160 msid:FmJZgkjbR2gBS25U0hP3qoUSjvfDddQl4UjJ 00f2f3d2
a=ssrc:1484622160 mslabel:FmJZgkjbR2gBS25U0hP3qoUSjvfDddQl4UjJ
a=ssrc:1484622160 label:00f2f3d2-974e-48a6-b8dc-0f70194949d5
But what is it for and why is it needed?

The purpose of the attribute is to group m lines that are related somehow. In the context of WebRTC the idea is to be able to group channels that belongs to the same MediaStream (a MediaStream is one of the WebRTC JavaScript objects) so that if the sender start sending two streams (one with the audio of the speaker and one with the video of shared screen) the recipient will know that those were 2 separate streams and not a single stream with audio and video. That way the number of streams and tracks in the sender (and the IDs of those streams and tracks) is preserved in the receiver.

So in case of webrtc this line:
a=msid-semantic: WMS FmJZgkjbR2gBS25U0hP3qoUSjvfDddQl4UjJ
means that there is a WebRTC Media Stream (WMS) group identified by that stream id.  

And the line:
a=ssrc:1400694016 msid:FmJZgkjbR2gBS25U0hP3qoUSjvfDddQl4UjJ a228df45
means that there is a media stream with id FmJZ...jJ with an audio track with id a228df45.


Sunday, January 26, 2014

My professional life in RTC stacks ...

I was wondering today what RTC stacks I've used in all these years and I though it would be interesting to remember them because somehow you can see the evolution of communications through them.   This is the approximate timeline (I'm sure I'm forgetting some):

2013 python-sip
2012 pjsip
2011 asmack
2008 sofia-sip
2006 NIST SIP Stack
2004 In house developed SIP Stack
2002 OpenH323
2000 NetMeeting ActiveX

 With a couple of exceptions I have always favored the usage of opensource stacks because they have always been very compelling and mostly because it is a lot of fun the possibility to hack them.

There are much more stacks that I've used for prototypes or Hello Worlds and were also great but this list only includes the ones used in commercial projects.