Some weeks ago Philipp Hancke presented an awesome analysis of how Google Hangouts uses WebRTC in Chrome. In that blog post he explained that based on the SDP contents Google is using simulcast but didn't entered in the details of how to activate it. So I had a lot of curiosity and thought that it could be great if people (beyond Google) could use this feature so I tried to replicate their behavior.
Step one: Add simulcast ssrcs and "SIM" group to the offer SDP
The first thing I tried is to implement some SDP mangling to make my SDP look like the Hangouts SDP. That means adding 3 substreams grouped by simulcast semantics. This is the code of my quick and dirty implementation:
Result: no simulcast :(
Step two?: Add a renegotiation
saw a a renegotiation in the initial connection from Google Hangouts
(when there are no other participants) and I thought this
was needed to enable simulcast and implemented it but this is not needed to have simulcast.
Step three: Add the x-conference flag to the video track in the answer SDP
The third thing I tried was to set those misterious conference flags in the SDP offer:
var lines = answer.sdp.split("\r\n");
lines.splice(lines.length - 1, 0, "a=x-google-flag:conference");
answer.sdp = lines.join("\r\n")
You can see this lines in Chrome Canary logs:
VP8 number of temporal layers: 3
Simulcast substream 0: 960x540@450-900kbps with temporal layers
Simulcast substream 1: 1920x1080@600-2500kbps with temporal layers
Simulcast substream 2: 3840x2160@600-2500kbps with temporal layers
Bonus track: Apparently from those logs we not only enabled simulcast but also VP8 temporal scalability. I have a snippet to parse the RTP packets and check it but didn't have time yet to confirm it.
Step four: Decide what substreams you are interested in
Based on the RTCP Receiver Reports and PLIs received, Chrome will decide which streams to send from the list of streams you configured in the SDP (in Step one).
PENDING: The logic behind these decisions is not clear yet
Apparently you can optimize also the CPU usage by setting the googSkipUnusedFrames constraint that Google Hangouts uses and that way the video substreams that are not going to be sent won't be encoded either. (To be tested)
Step five: Increase the bandwidth
Chrome changes the number and quality of the different substreams based on the bandwidth estimations. The configuration (resolution, temporal layers) for a given original resolution and available bandwidth can not be changed as far as I can tell.
For example if I don't send any RTCP packet, by default it just uses 2 substreams:
Stream 320 x 240
Temporal layers 3, min br 50, target br 150, max br 200, qp max 56
Stream 640 x 480
Temporal layers 3, min br 150, target br 500, max br 700, qp max 56
To increase the number of substreams and quality you have to use the REMBs or maybe periodic RR.
Desperation note: At some point I was so frustrated not being able to get it working that I even tried to run my test page simulating I was behind the google.com domain to see if Chrome was behaving differently.