Redundant RTP coding

There are different mechanisms to make media transmission more robust to packet loss.   Some of those techniques are negative acknowledgments and retransmissions, sending redundant information for forward error correction and signal processing algorithms to reduce the impact of packet looses.

In case of audio codecs with the advent of OPUS most of packet lost issues disappear due to the robustness of the codec.  You can test yourself the quality with 30% of packet loss!!! [1]

Where those techniques makes more sense are with video codecs because they are typically more fragile to packet loss and this is specially critical when sending key frames that can impact communication for a long period of time (until next keyframe).

This post is an overview on how redundant encoding works at RTP level and what are the implications of it.

The core idea is to modify RTP packets format so that instead of including only the primary content payload (video encoded content) it can also include extra blocks with other codifications of the same content that can be used to recover lost packets.    This redundant codification could be a lower quality encoding of the previous frame or a forward error correction bitstream containing information of the last N packets for example.

The basic format from RFC 2198 is this one:
    0                   1                    2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3  4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   |V=2|P|X| CC=0  |M|      PT     |   sequence number of primary  |
   |              timestamp  of primary encoding                   |
   |           synchronization source (SSRC) identifier            |
   |1| block PT=117|  timestamp offset         |   block length    |
   |0| block PT=100|                                               |
   +-+-+-+-+-+-+-+-+                                               +
   +               fec encoded redundant data (PT=117)             +
   +                                               +---------------+
   |                                               |               |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+               +
   |               video encoded primary data (PT=100)             |
   +                                               +---------------+
   |                                               |

It is worth noting that:

  1. The RTP header format (first 12 bytes) doesn't change
  2. The payload type of the RTP packet (byte 1) won't be the payload type of the primary codec (f.e. VP8) but the payload identifying the usage of packets with this multi-block structure
  3. The primary data is always at the end if present
  4. Is it possible to have packets only with redundant data or only with primary data

There are different solutions to encode the redundant data, one of them is ulpfec [3] where the redundant data is an XOR of different packets.   ulpfec includes the concept of layers so that we can send more redudand data for the initial part of the packet and less redundant data for the latest parts of the packets.

Support for red (for packetization) and ulpfec (for actual protection) is usually negotiated in SDP:
m=video 1 RTP/SAVPF 100 116 117
a=rtpmap:100 VP8/90000
a=rtpmap:116 red/90000
a=rtpmap:117 ulpfec/90000

Due to the overhead introduced by adding any redundant data it is usually a good idea to activate it only when the network conditions require it.   That is what is done in WebRTC implementations where all the video packets are sent in redundant format but they only start including redundant data after detecting packet losses in the connection.

The good news is that all these solutions are being integrated in next generation of browsers supporting WebRTC and will make our live as customers or service providers much easier.



Popular posts from this blog

Bandwidth Estimation in WebRTC (and the new Sender Side BWE)

Improving Real Time Communications with Machine Learning

Controlling bandwidth usage in WebRTC (and how googSuspendBelowMinBitrate works)