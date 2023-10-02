Boston: In a major breakthrough, researchers at the Northeastern University have developed a Machine Learning (ML) powered tool that can extract audio from a picture and muted video.
The researchers do not stop here.
They claimed their newly developed ML powered tool “Side Eye” can do even more.
“Side Eye can even recognise the gender of the people speaking in the room where the photo or the video was captured along with the exact words and sentences they spoke”, Kevin Fu, a Professor of Electrical and Computer Engineering and Computer Science at Northeastern University said.
It means that if someone is recording a video and muted it to dub music. The Side Eye can extract the audio and gender of the creators or the other people speaking in the recording room, Northeastern Global News reported.
"Imagine someone is doing a TikTok video and they mute it and dub music. Have you ever been curious about what they're really saying? Was it 'Watermelon watermelon' or 'Here's my password?' Was somebody speaking behind them? You can actually pick up what is being spoken off camera", Fu said.
To extract background images from pictures and muted videos, Side Eye harnesses image stabilization technology that is universally employed in nearly all smartphone cameras.
Smartphone cameras incorporate a lens suspension system with springs submerged in liquid, ensuring that photos remain clear and focused even when the photographer has an unsteady hand. These cameras utilize sensors and an electromagnet to counteract any movement by adjusting the lens in the opposite direction, thereby stabilizing the image.
Interestingly, when someone speaks in close proximity to the camera lens while a photo is being taken, it generates minute vibrations in the springs, subtly altering the path of light. Extracting the audio frequencies from these vibrations may seem nearly impossible, but it becomes feasible thanks to the rolling shutter technique commonly used in photography by most cameras.
"The way cameras work today to reduce cost basically is they don’t scan all pixels of an image simultaneously – they do it one row at a time. [That happens] hundreds of thousands of times in a single photo. What this basically means is you’re able to amplify by over a thousand times how much frequency information you can get, basically the granularity of the audio."
For tech enthusiasts, Side Eye can be another major innovation but it will surely raise security and privacy concerns. And, Fu himself realises this fact but says his tool can be used for investigation and also as an evidence in courts.
“The most interesting application for Side Eye could be as a new form of digital evidence for lawyers and others working in the criminal legal system”, he said.
"Maybe there's an alibi and it's being admitted to court and somebody wants to prove somebody was or wasn't there," Fu added.
"You might be able to use this technique if you have an authenticated video with a known timestamp to confirm one way or the other. If you hear the person's voice, they're more than likely there”, he said.
