Posts

Screensharing content detection

Image
One interesting feature in WebRTC is the ability to configure a content hint for the media tracks so that WebRTC can optimize the transmission for that specific type of content.   That way if the content hint is "text" it will try to optimize the transmission for readability and if the content hint is "motion" it will try to optimize the transmission for fluidity even if that means reducing the resolution or the definition of the video. This is specially useful when sharing documents or slides where the "crispiness" of the text is very important for the user experience.   You can see the impact of those hints in the video encoding in this screenshot taken from the W3C spec: This is very useful but there is a small problem.  What happens when we don't know the type of content being shared?  How do we know if the browser tab being shared has some static text slides or a youtube video being played? One possible option could be to do some type of image pro

New look at WebRTC usage in Google Meet

I hadn't looked at Google Meet webrtc internals for a while so while I was having a meeting last week I decided that it was a good time to check what were the latest changes that had been added. P2P Connections One of the first things that I checked was if Google Meet was using P2P connections when there are only two participants in the room and I was surprised that it was not the case.   P2P support was included in the past ( P2P-SFU transitions discussion ) but apparently has been removed. This increase the infrastructure cost (not an issue for Google) and increase a bit the end to end latency for the 1:1 calls but given that Google Meet is probably deployed in many points of presence that's probably not a big increase and the simplicity of not having to handle another type of connections and the transition between them (P2P <-> SFU) is a big benefit so it looks reasonable. ICE candidates and (NO) TURN servers Google Meet is not configuring any ICE servers anymore and t

Handling back-pressure in RTC services

Image
In the context of software, back-pressure refers to actions taken by systems to “push back” downstream forces. As such, it is a defensive action taken unilaterally while under duress or if the aggregate call pattern exhibits too many spikes, or is too bursty. This approach is commonly used in microservices infrastructures but we don't talk too much about it in the context of Real-Time Communication platforms even if they can be equally important to handle spikes of load while keeping the quality of service inside some reasonable limits.   During times of spikes or burstiness, the quality could be slightly degraded but it should still be usable and the servers should never go down. The typical techniques to apply back-pressure and get some protection against those unsustainable patterns of traffic are buffering, dropping, and controlling the sender: Buffering : Messages can be queued and processed at a lower rate when needed to limit the impact of spikes or high load. Dropping : Mes

The role of Real Time Communications in the Multiverse

Image
Recently many companies have been talking about something that they usually call the metaverse .   There is no single clear definition of what the metaverse is but something like this sounds close enough: "A highly connected environment with lots of interactive players and complex simulation creating rich experiences, something more than a game but less than the real world" ( What is the metaverse, and why is it worth so much money? ) For me, the easiest way to understand it is to think about something similar to the world shown in the Ready Player One movie where users can interact with each other in a digital environment making use of virtual reality devices and sensors. Perhaps the biggest company speaking publicly about the metaverse is Facebook and is betting hard on it saying that they want to move from being a "social company to a metaverse company" but there are also many gaming companies working on that direction or at least talking about it.  In this blo

WebRTC Video Codecs performance evaluation

Image
The standard and most popular codecs in WebRTC are VP8 and H.264 but those are not the only options we have.   VP9 has been available for a while and some  big services are using it  and AV1 support has been recently added to Chrome . When comparing codecs there are interesting considerations like interoperability and licensing but probably the most important factors are how good the codec is in terms of compression and how cheap the codec is in terms of cpu&memory usage. The compression ratio is usually the first thing we look at and there are many comparisons available for that, but the resource consumption is equally important if we want to be able to use the codecs for real time use cases. Given that AV1 is available in the Chrome Canary versions I decided to run some tests to get estimation of where we stand in terms of cpu usage for the 4 available codecs in WebRTC ecosystem.   The idea of the test is to compare the whole video pipeline with those 4 codecs and not just the co