Improving Real Time Communications with Machine Learning


When we talk about the applications of Artificial Intelligence / Machine Learning (AI/ML) for Real Time Communications (RTC) we can group them in two different planes:
  • Service Level: There are many features that can be added to a videoconference service, for example identification of the participants, augmented reality, emotion detection, speech transcription or audio translation.  These features are usually based on image and speech recognition and language processing.
  • Infrastructure Level: There are many ways to apply ML that do not provide new features but improve the quality and/or reliability of the audio/video transmission.


Service level applications are fun, but they are more for Product Managers and I like technology more, so in the next sections I will try to describe possible applications of AI/ML for Real Time Communications at Infrastructure Level organizing those ideas in five different categories.

Optimizing video quality

Some of the ML algorithms used for image recognition can be used to optimize the video transmission in RTC services.  I can imagine at least three ways in which these algorithms can help to improve the video quality.

The first way would be to select the best possible encoding parameters for a specific video or a specific part of the frame.  For example if we can detect the most important parts of the scene (maybe the talking head) and use better encoding quality (lower quantization level) for those areas.  Or another example, we can detect the type of information and give preference to framerate vs quality depending if it is a high motion video or a typical conversation.

The second way could be to reduce the amount of information being sent by removing the information that can be regenerated by the receiver.  As an extreme example, we all know the shape of human hair so even if you can, can you send lower quality and reconstruct the hair in the receiver?  One example of this application can be seen in the RAISR demos by Google.


The same process could be also used to improve the readability or increase the detail level of objects that are too far or out of focus.

It is also possible to apply ML in the video codec implementations to optimize the processing required to encode the frames as you can see in this code included in the VP9 codebase.

Optimizing audio quality

We should be able to eliminate redundant audio data that can be regenerated in the receiver side in the same way we describe for video in the previous section.  In an extreme case we would only need to send the text and the accent / speed... of the speaker and the receiver should be able to reconstruct almost the same voice based on a previous learning process.

One of the problems with audio quality is the intelligibility in noisy environments.   ML algorithms can also help with this as shown by RRNoise project by Mozilla/Xiph by learning how to better differentiate and suppress noise vs voice.
Banner

Tuning transmission settings

The amount of parameters involved in a RTC session is really big, it goes from codecs with tons of settings, bitrates, packetization sizes, buffers, timeouts....  Deciding which ones to use at a given time is not trivial and can even require per user/network adjusts that change dynamically.  A ML based system could learn what is the best combination of those parameters for a specific user in some specific conditions.
In case of multiparty calls it is critical to include some algorithms to decide how the available bitrate is distributed for the different streams in the room.  For example is it better to send 2 videos at 50kbps or disable one and send the other one at 100kbps.  A ML algorithm able to make those decisions (bitrates, framerates, resolutions, codecs...) in real time based on the uplink/downlink characteristics of all the participants in the call, but also based on the type of conversation and on who is/are the active speaker/s could provide a much better quality of experience.

Resource allocation and planning

Most of the RTC infrastructures include the concept of Media Servers.  Those are the servers routing the audio and video packets between the different participants and are specially important in case of multiparty calls.

For these calls you can use ML algorithms to decide what is the best server to be used for a specific call based on the location of the server, the location of the participant/s and the status of the servers (basically the load and network status).

In the same way it can use for forecasting to predict load and make sure the amount of available resources is the optimum one.

Diagnostics and Monitoring

Most of the people who have been working building RTC platforms for a while has probably experienced how painful it is to debug issues.  Same way that ML is started to be used for medical diagnostics in e-health it can be also used to debug and find the root cause when there are problems.  For example based on quality metrics from different participants and the status of the servers can diagnose if it is a bug or a network issue, and in the second case which network was the responsible.

We can also use the classification algorithms available in ML to classify calls according to quality scores or other parameters to generate reports or monitoring purposes.

We can also use unsupervised learning to detect anomalies in the system automatically and trigger alerts.


As you can see there are many ways to use ML to improve the quality and/or reliability of our RTC platforms.  And these were only some examples and I'm sure you have many other ideas.


You can follow me in Twitter if you are interested in Real Time Communications.

Comments

  1. Thank you a lot for providing individuals with a very spectacular possibility to read critical reviews from this site.

    digital training in chennai

    ReplyDelete
    Replies
    1. Improving real-time communications with machine learning involves using various algorithms and models to enhance the quality, efficiency, and effectiveness of communication systems. Here’s a look at how machine learning can be applied to different aspects of real-time communications:
      Machine Learning Projects for Final Year
      1. Speech Recognition and Processing
      Automatic Speech Recognition (ASR): Machine learning models can be trained to convert spoken language into text accurately in real time. This is crucial for applications such as voice assistants and real-time transcription services.
      Deep Learning Projects for Final Year
      Natural Language Processing (NLP)
      Real-Time Translation: Machine learning models, such as neural machine translation (NMT) systems, can translate spoken or written text in real-time between different languages.
      python projects for engineering students

      Delete
  2. This comment has been removed by the author.

    ReplyDelete
  3. This post contains all about application of Artificial Intelligence In Speech Recognition. Thanks for the article.

    ReplyDelete
  4. I can't match inside your head however if you happen to noticed me, that is the place I would be because of the photons reflecting off of me and into and thru your eye and on to your retina therefore transformed to electrical impulses which transmit as electrical indicators into your mind which reconstructs identical again right into a digital actuality model of me that now can match inside your cranium.This is great blog. If you want to know more about this visit here Machine Learning Model.

    ReplyDelete
  5. Thanks For Sharing Excellent Blog. Machine Learning is steadily moving away from abstractions and engaging more in business problem solving with support from AI and Deep Learning. With Big Data making its way back to mainstream business activities, now smart (ML) algorithms can simply use massive loads of both static and dynamic data to continuously learn and improve for enhanced performance. Pridesys IT Ltd

    ReplyDelete
  6. Keep blogging.!!

    quite informative, thanks for sharing about new things here,

    - Learn Digital Academy

    ReplyDelete
  7. Hmm, it seems like your site ate my first comment (it was extremely long) so I guess I’ll just sum it up what I had written and say, I’m thoroughly enjoying your blog.
    nebosh igc course in chennai

    ReplyDelete
  8. This comment has been removed by the author.

    ReplyDelete
  9. Awesome post. Thanks for sharing this post with us.to form a pc checkers application that was one amongst the primary programs that would learn from its own mistakes and improve its performance over time.Machine learning course one step any - it changes its program's behavior supported what it learns.

    ReplyDelete
  10. thank you for giving such an great information.






    https://excelr.com.my/course/certification-on-industrial-revolution-4-0/

    ReplyDelete
  11. I’ve I’m a I’m planning to start my blog soon, but little lost on everything. Would you suggest starting with a free platform like Word Press or go for a paid option?
    nebosh course in chennai
    offshore safety course in chennai

    ReplyDelete
  12. Thank you a lot for providing individuals with a very spectacular possibility to read critical reviews from this site.
    nebosh course in chennai
    offshore safety course in chennai

    ReplyDelete
  13. Great Job keep publishing such good articles, I would like to read more such articles going ahead.

    data science training in aurangabad
    data science course in aurangabad

    ReplyDelete
  14. Good Post! Thank you so much for sharing this pretty post, it was so good to read and useful to improve my knowledge as updated one, keep blogging…



    Dot Net Training in Chennai | Dot Net Training in anna nagar | Dot Net Training in omr | Dot Net Training in porur | Dot Net Training in tambaram | Dot Net Training in velachery


    ReplyDelete
  15. Glad to chat your blog,I seem to be forward to more reliable articles and i think we all wish to thank so many good articles,blog to share with us.
    learn data scientist course
    360digitmg best data science courses

    ReplyDelete
  16. This was not just great in fact this was really perfect your talent in writing was great.
    learn360digitmg data science training

    ReplyDelete
  17. This comment has been removed by the author.

    ReplyDelete
  18. it's really cool blog. Linking is very useful thing.you have really helped
    360digitmg ai online course

    ReplyDelete
  19. it's really cool blog. Linking is very useful thing.you have really helped
    360digitmg ai online course

    ReplyDelete
  20. Such a very useful article. Very interesting to read this article.I would like to thank you for the efforts you had made for writing this awesome article.

    Data Science Online Training

    Data Science Classes Online

    Data Science Training Online

    Online Data Science Course

    Data Science Course Online

    ReplyDelete