WebRTC header extensions review
WebRTC supports the concept of RTP header extensions to extend media packets with additional metadata. One of the most common use cases is to attach the audio level to audio packets so that the server can calculate active speakers without having to decode the audio packets.
Some of these header extensions are standard and have been used for a while, but there are others that are added by Google when needed and are only documented lightly in the website and the libwebrtc code. Those header extensions and its usage is not very well know and this post is an attempt to give visibility of them for the WebRTC community.
To discover some of these headers you can usually take a look at the Offer/Answer of a Google Meet session or take a look at the libwebrtc source code here: https://chromium.googlesource.com/external/webrtc/+/master/api/rtp_parameters.h
Audio Levels (urn:ietf:params:rtp-hdrext:ssrc-audio-level) Doc
[Very common]
The header contains the audio level (volume) of the audio contained in this audio package, as well as an optional bit (not usually utilized) that indicates if the packet is voice or another sort of audio.
Use case: A server not decoding audio can still detect the active and/or dominant speaker in a session.
Mixer to client audio levels (urn:ietf:params:rtp-hdrext:csrc-audio-level) Doc
[Not very common but Google Meet uses it]
This header extension includes information about the different sources contributing to the audio of this packet and their audio levels.
Use case: If a server combines or merges audio, an audio stream can portray various speakers simultaneously, and this header enables the receiver to identify the particular user who was speaking during that period.
Inband Confort Noise (http://www.webrtc.org/experiments/rtp-hdrext/inband-c) Doc
[Anybody using it?]
This header extension informs if the audio packet includes confort noise and its level.
Use case: A server could de-prioritize these packets or avoid including them in the mixing. A receiver could use it as a hint for discontinuous transmission being enabled.
Absolute Send Time (http://www.webrtc.org/experiments/rtp-hdrext/abs-send-time) Doc
[Legacy]
This header extension includes an NTP timestamp for when the packet was sent instead of when the media was captured.
Use case: The main use case is for server side bandwidth estimation to be able to detect changes in inter packet latency and use it to estimate the bandwidth available and send it back to the sender (f.e. using a REMB RTCP packet) if needed.
Capture Time (http://www.webrtc.org/experiments/rtp-hdrext/abs-capture-time) Doc
[Not very common but Google Meet uses it]
This header extension includes an NTP timestamp for when the frame corresponding to this packet was captured from the input device.
Use case: Implement audio/video synchronization in servers, for example for mixing purposes.
Transport Wide Identifier (http://www.ietf.org/id/draft-holmer-rmcat-transport-wide-cc-extensions-01) Doc
[Almost always used]
This header extension includes an unique identifier for each packet so that the receiver can then send feedback back to the sender with relative timestamps of those packets with those identifiers. There is a v2 version that gives more control to the sender on when to receive feedback.
Use case: It is only used to generate TransportControlFeedback packets including the inter packet delay for those identifiers included in the header extension.
Media Identifier (mid) (urn:ietf:params:rtp-hdrext:sdes:mid) Doc
[Almost always used]
This header extension includes an ID for the media stream that is also included in the SDP.
Use case: This way the server or the receiver can understand what media stream those packets are for. To save bandwidth this header extension is only included in the first packets of a stream or when the ssrc changes.
RTP Stream ID (urn:ietf:params:rtp-hdrext:sdes:rtp-stream-id) Doc
[Very common]
This header extension includes the identifier of each encoding of a stream (f.e. a simulcast layer).
Use case: It is similar to the mid header extension described above but in this case to differentiate multiple encodings in case of simulcast.
Repaired RTP Stream ID (urn:ietf:params:rtp-hdrext:sdes:repaired-rtp-stream-id) Doc
[Very common]
This header extension includes the identifier of retransmission packets that provide redundancy for packets in other RTP stream.
Use case: Differentiate and associate retransmissions to the corresponding RTP stream.
Time Offset (urn:ietf:params:rtp-hdrext:toffset) Doc
[Anybody using it?]
Includes the offset from when the packet was captured to when it was sent.
Use case: Allow more accurate jitter calculation in the receiver side.
Video orientation (urn:3gpp:video-orientation) Doc
[Commonly used, the only problem is firefox]
This header extension includes the rotation that needs to be applied to the video frames before rendering them (0, 90, 180, 270 degrees). This way a phone sending a 640x480 video in landscape mode can keep sending 640x480 with a value of 90 in the header extension when rotating the phone to portrait mode.
Use case: Used by the sender to avoid having to change the aspect ratio of the stream when a device (usually a phone) goes from portrait to landscape. This is not supported by Firefox making it tricky for some use cases.
Playout Delay (http://www.webrtc.org/experiments/rtp-hdrext/playout-delay) Doc
[Not very commonly used, mostly for gamestreaming use cases]
This header extension includes a delay that should be applied to this audio or video stream before rendering it.
Use case: Used by the sender or a server to indicate the receiver that it needs minimum latency (playout delay=0) or it can tolerate some latency to have better quality (f.e. playout delay 500ms). This behaviour can also be controled with the javascript property playoutDelayHint
Video Content Type (http://www.webrtc.org/experiments/rtp-hdrext/video-content-type) Doc
[Not very commonly used]
This header extension includes information about the type of content (camera or screensharing). Chrome also includes some even more proprietary stuff here to define the spatial layer of this simulcast stream.
Use case: Used by a server to apply different . It could potentially be used by a receiver too to tune some latency or retransmissions behaviour but apparently it is not used for that in Chrome.
Video Timing (http://www.webrtc.org/experiments/rtp-hdrext/video-timing) Doc
[Not very commonly used. Google Meet only maybe?]
This header includes different timestamps for the media corresponding with a specific frame (encode start, encode end, packetization time, pacer time).
Use case: I think the main use case is for monitoring, testing and statistics purposes.
Color Space (http://www.webrtc.org/experiments/rtp-hdrext/color-space) Doc
[Not very commonly used. Stadia?]
This header extension includes the color space used to represent the video frames information.
Use case: Encode video represented in different color spaces.
Generic Frame Descriptor (http://www.webrtc.org/experiments/rtp-hdrext/generic-frame-descriptor-00) Doc
[Legacy. Mostly replaced with the Dependency Descriptor hdrext below]
This header extension includes the information about layers and dependencies of a frame for scalable video.
Use case: Used by a server to be able to filter different frames and resolutions properly in a code-independent way.
Video Layers allocation (http://www.webrtc.org/experiments/rtp-hdrext/video-layers-allocation00) Doc
[Not very commonly used. Google Meet uses it.]
This header extension includes information for streams using scalable video coding or simulcast about the spatial layers, the temporal layers, their resolutions, framerates and bitrates.
Use case: Otherwise the server needs to guess some of this information (f.e. bitrates) to make decissions on what layers to forward to each participant, leading to less accurate values and more complexity.
Dependency descriptor (https://aomediacodec.github.io/av1-rtp-spec/#dependency-descriptor-rtp-header-extension) Doc
[Not very commonly used. Google Meet uses it.]
This header extension includes the information about layers and dependencies of a frame for scalable video. It was designed for AV1 but can be used for other codecs (f.e. VP9).
Use case: Used by a selective forwarding unit to be able to filter different frames and resolutions properly in a code-independent way. It is the same purpose as the legacy Generic Frame Descriptor included above.
Video Frame Tracking Id (http://www.webrtc.org/experiments/rtp-hdrext/video-frame-tracking-id) Doc
[Only for testing]
Header extension including a unique frame identifier.
Use case: For testing purposes to be able to correlate the frames in the sender and receiver side.
What do you think? Any other info or feedback about these or other header extensions? Feel free to share those here or in Twitter.
Awesome recap thank you!
ReplyDelete