Maxine will process video calls in the cloud using Nvidia’s GPUs and boost call quality in a number of ways with the help of artificial intelligence. Using AI, Maxine can realign callers’ faces and gazes so that they’re always looking directly at their camera, reduce the bandwidth requirement for video “down to one-tenth of the requirements of the H.264 streaming video compression standard” by only transmitting “key facial points,” and upscale the resolution of videos. Other features available in Maxine include face re-lighting, real-time translation and transcription, and animated avatars.
While this might warm the hearts of Nvidia fans, a lot of these features are not new. Video compression and real-time transcription are common enough, and Microsoft and even Apple have introduced gaze-alignment in the Surface Pro X and FaceTime to ensure people keep eye contact during video calls.
Maxine is not a consumer platform but a toolkit for third-party firms to improve their own software. So far, though, Nvidia has only announced one partnership — with communications firm Avaya, which will be using select features of Maxine. As indicated in the image below, all major cloud vendors are offering Maxine as part of their Nvidia GPU cloud services.
In a conference call, Nvidia’s general manager for media and entertainment Richard Kerris, described Maxine as a “really exciting and very timely announcement,” and highlighted its AI-powered video compression as a particularly useful feature.
“We’ve all experienced times where bandwidth has been a limitation in our conferencing we’re doing on a daily basis these days”, said Kerris. “If we apply AI to this problem we can reconstruct the difference scenes on both ends and only transmit what needs to transmit, and thereby reducing that bandwidth significantly.”
Nvidia says its compression feature uses an AI method known as generative adversarial networks or GANs to partially reconstruct callers’ faces in the cloud. This is the same technique used in many deepfakes. “Instead of streaming the entire screen of pixels, the AI software analyses the key facial points of each person on a call and then intelligently re-animates the face in the video on the other side."