Posts

Measuring WebRTC video quality for different bitrates - Playing with VMAF

Image
I've been wanting to play with Netflix Video Multi-Method Assessment Fusion (VMAF) for a while and yesterday I found the time and the motivation to give it a try.

Netflix VMAF is an algorithm to generate a video quality score by comparing a reference image/video with a distorted image/video.   To do that VMAF calculates scores using tradicional image quality metrics like VIF or DLM and then aggregate them using a Machine Learning model (SVM) trained with the videos and scores coming from real users.  Smart, isn't it?  (You can see a high level description of those metrics that are aggregated in this Netflix post or the Wikipedia page)

It is important to notice that VMAF works in a per-frame base so it is NOT a good tool to measure the quality impact of many artefacts happening in Real Time Communications (delays, reduced/frozen framerate, audio/video desync).   However we can use it to measure the impact of different encoding settings like the average bitrate of the encoding.

How are Images/Videos sent in WhatsApp?

Image
I've been involved in the development on different IM&P services in the past and always one of the core features was the ability to share media files (audio, video, pdf...) with other users or groups.


Probably the most popular of these services was TU Me that was based on SIP but used HTTP to upload and download the files from a central storage. SIP was used to send the message with the url to the other end so that he can download the media file.

In cases where I had to use XMPP there were typically two approaches at least if you need to provide offline messaging:
Some inefficient inband alternatives where the media files are sent using the XMPP connection.Custom (or not widely supported like xep-0363) extensions based on HTTP Uploads. But today I was curious on how this is done in WhatsApp.  I found this opensource python implementation (yowsup) that looks great and got an idea on how it could be done but I wanted to confirm what is the protocol used by the most recent WA clie…

Improving Real Time Communications with Machine Learning

Image
When we talk about the applications of Artificial Intelligence / Machine Learning (AI/ML) for Real Time Communications (RTC) we can group them in two different planes:
Service Level: There are many features that can be added to a videoconference service, for example identification of the participants, augmented reality, emotion detection, speech transcription or audio translation.  These features are usually based on image and speech recognition and language processing.Infrastructure Level: There are many ways to apply ML that do not provide new features but improve the quality and/or reliability of the audio/video transmission.