Microsoft has spent the last two years introducing dazzling new productivity capabilities to Teams, and now, owing to artificial intelligence, the company is revamping how the foundations function. We’ve all been on a call where someone’s room acoustics made it difficult to hear them, or when two people tried to chat at the same time, resulting in an embarrassing “no, you go ahead” moment. Microsoft’s new AI-powered voice quality enhancements should reduce or eliminate these minor annoyances.
Microsoft is now utilizing machine learning models to improve room acoustics, so you won’t sound like you’re in a cave. “While we’ve been doing our best with digital signal processing to do a really good job in Teams, we’ve now started using machine learning for the first time to build echo cancellation where you can truly reduce echo from all the different devices,” says Robert Aichner, Microsoft’s principal program manager for intelligent conversation and communications cloud, in an interview with The Verge.
Microsoft has been testing this for months, monitoring its models in the real world to verify that Team users notice the echo reduction and call quality improvements. The software company used 30,000 hours of speech to train its models, and thousands of devices were captured through crowdsourcing, in which Teams users are paid to record their voice and playback audio from their device.
If Teams detect sound bouncing or echoing in a room, resulting in shallow audio, the model will convert and process the acquired audio to make it appear like Teams players are speaking into a close-range microphone rather than an echoey mess.
The ability to interrupt each other on Teams calls without the awkward overlap where you can’t hear the other person owing to the echo is the most striking feature. Microsoft is currently delivering all of this work in Teams, along with past advancements to AI-based noise suppression. Instead of using the cloud, all processing is done locally on client devices.
All of these new Microsoft Teams enhancements, as well as some real-time screen optimizations for text in videos and AI-based improvements to bandwidth limits during video or screen-sharing sessions, are now available.