Showing posts from October, 2019

Speech recognition in Hangouts Meet

There are many possible applications for speech recognition in Real Time Communication services live captions, simultaneous translation, voice commands or storing/summarising audio conversations.

Speech recognition in the form of live captioning has been available in Hangouts Meet for some months but recently it was promoted to a button in the main UI and I have started to use it almost every day.

I'm mostly interested in the recognition technology and specifically on how to integrate DeepSpeech in RTC media servers to provide a cost effective solution but in this post I wanted to spend some time analysing how Hangouts Meet implemented captioning from a signalling point of view.

At a very high level there are at least three possible architectures for speech recognition in RTC services: