WebRTC Facts and Lies

Nowadays there is some hype around the WebRTC initiative leading to some missunderstandings. I have collected some frequently asked questions to help to understand what is and what is not WebRTC to avoid it becaming the next big buzzword in the telco industry.

What is WebRTC?

Is an initiative to add capabilities to send/receive audio and video in the browsers without requiring additional plugins. It is exactly the same that happened with video playing and today most modern browsers have native support for that capability.

Who is behind WebRTC?

Google is the initial and main supporter but Opera and Firefox have been also supporting it from the very beginning. Lots of individual contributors from different companies (f.e. Cisco or Ericsson) are working on this in standardization forums. Efforts to achieve WebRTC goals threefold now: Google with individual contributors building a reference implementation, W3C defining the APIs to be exposed in the web browsers to developers and IETF defining the protocols being used by the browsers to offer those capabilities.

Is WebRTC a new technology?

No, 99% of the technology behind WebRTC was already in place and used by commercial applications like Skype, Tango or TU Me. WebRTC is basically a tag but not bringing any new technology as it is based in codecs, protocols and NAT traversal and encryption solutions that have been used in other applications out of the web browsers for years.

Is this something new?

No, the community has been working for two years on this. First Ericsson implementation was shown in May/2011 and W3C group was officially created in May/2011.

Can I use it out of browsers?

WebRTC is an initiative just focused on browsers because there isn’t any problem to solve in other environments like native mobile applications. But the libraries that are being created by the community as reference implementation to be integrated in browsers are very high quality solutions that can be integrated in any other RTC application with enough knowledge and resources.

How many implementations are there?

There is basically a single implementation for the core (media engine for audio and video processing) that is being integrated by each browser vendor. Each vendor has freedom to reimplement some parts and they are customizing the network layer and the PeerConnection implementation.

Is this SIP or XMPP?

None of them since WebRTC is agnostic to control protocol. Each application have to solve how to invite and accept/reject the calls for example using any “cloud” messaging solution like PubNub.com to send those control messages between users. WebRTC only solves the audio and video transmission after that negotiation has been carried out.

If I decide to use SIP do I need to use SIP over WebSockets?

No, you can use any standard or proprietary transport for SIP messages (even email if you want :-)). But the only standard SIP solution compatible with browsers is SIP over WebSockets.

Is it interoperable with other VoIP solutions?

Probably not mostly because of the encryption of media flows and the codecs supported. That interoperability can be always provided introducing a gateway in the middle to provide any translation needed at protocols or codecs level.

What encryption is provided by WebRTC implementations?

IETF decided to mandate DTLS-SRTP, so all the traffic is encrypted and the keys negotiation is performed in the media flow with some fingerprints exchanged at the control protocol level.

What codecs are provided by WebRTC implementations?

For audio there is strong consensus on mandating support for G711 and Opus (very high quality and modern codec just standardize after some years of effort). For video the discussion is ongoing and there are two options on the table: no agreement on a mandatory codec or mandating VP8. I’m not optimistic on reaching a conclusion on the video codec and I think market will dictate the the-facto standard in the mid term.

What NAT traversal solutions are provided by WebRTC implementations?

WebRTC includes all the standard NAT traversal mechanisms defined in IETF and also being used for years in commercial services like GTalk. ICE to figure out the best path between two endpoints, STUN to find the public address of endpoints and TURN to create tunnels to relay traffic as a last resort solution. The combination of them is supposed to give the optimal solution for all the NAT cases.

Voice/video quality will be bad as this is free, right?

Voice/video quality will be close to the best possible as most of the browser vendors are integrating the reference implementation based on the components acquired by Google (~200M$) from the most prominent VoIP company called GIPS and from the video codec provider On2. This solution is including best in class technology for all the algorithms that have a strong impact on the user experience: AEC, NS, AGC, Jitter Buffer

Ok, but without QoS mechanisms in the network this is not going to work

Did you hear about Skype?

Does it require some server infrastructure?

You need a server infrastructure for the control protocol, but that part is not standardized and each service provider can implement its own solution for the sake of flexibility.

Do I need the WebRTC box that IMS vendors are trying to sell me?

In the unfortunate case you have to use an IMS infrastructure provided by a traditional vendor you will probably have an access box (SBC) that is probably missing some features needed (mostly the websockets, ICE and SRTP support). Providers should end up including support for those features in the SBC but in the meantime you can add an additional box in front of it.
If you are using an opensource plain SIP solution (f.e. kamailio) you won’t probably need any additional support.

Does it support PSTN interconnection?

PSTN interconnection is provided by gateways. To have PSTN interconnection you will need a gateway (or another box in front of it) understanding ICE, SRTP and OPUS. You can fallback to G.711 for the encoding if the bandwidth is not a problem.

What is a PeerConnection?

PeerConnection is the basic API exponsing WebRTC capabilities by the browsers to web developers. It basically exposes the capabilities to start and stop different media channels between two peers.

Do developers need something else?

Most developers would appreciate higher level APIs abstracting a specific call control solution and also hiding all the PeerConnection handling complexities. There are already javascript libraries provided by players like OpenTok or Phono that are offering this level of abstraction.

Can I send arbitrary data using PeerConnection?

There is ongoing work to be able to establish direct data channels between browsers in the same way that we establish the audio and video channels.

When will it be available for production?

You can use it today in Chrome browsers by activating it in the preferences. It is expected that half of the browsers market (Chrome, Firefox, Opera) will support this without requiring any manual enabling on the beginning of 2013.

What can I do if there are incompatibilities/missing features between browsers?

This is a probable scenario specially because of the video codec. The solutions as usual can be some server-side gateway to translate protocols/codecs or the use of plugins for the browsers “miss-behaving” (for example the already existing Google Chrome Frame plugin)

What are the Microsoft and Apple plans?

I don’t care know.

What is Google purpose with WebRTC?

Dominate the world.

Thank you for the review and contributions to this list from Oscar and Christian.

Search This Blog

Real Time Communications Bits