In 2017, a startup called “Lyrebird” made headlines with AI generated replications of celebrity voices that were extremely convincing.
Tracks posted to SoundCloud featured the voices of Donald Trump, Barack Obama, and Hillary Clinton making a pitch for the Lyrebird’s new technology. In the video, a Fake President Trump voice says, “They can make us say anything now.”
While the story gathered some attention initially, it quickly disappeared from the news cycle, except for just about one place, The Joe Rogan Experience Podcast. Rogan was fascinated by the technology and spoke about it at length on his podcast in the weeks after the news broke.
In the two years since, Rogan regularly informed his guests about the incredible technology, telling them that it’s only a matter of time before very real and recognizable voices will be mimicked and manipulated to say specific text for specific, and potentially nefarious, purposes. The sky is likely the limit as this technology advances, getting better results with less data.
Oddly enough, Rogan was the first celebrity target for the AI developers wanting to show off how far this technology has come in just two years. A video released this week features Rogan talking about training a hockey team made up of intelligent chimps, among other equally ridiculous and amusing rants.
“I just listened to an AI generated audio recording of me talking about chimp hockey teams and it’s terrifyingly accurate. At this point, I’ve long ago left enough content out there that they could basically have me saying anything they want, so my position is to shrug my shoulders and shake my head in awe, and just accept it. The future is gonna be really f***ing weird, kids,” Rogan said on Facebook this week.
Dessa, the AI startup responsible for the video, explained in a blog post that it will get easier and easier for the average person to make these types of replicas.
“Right now, technical expertise, ingenuity, computing power and data are required to make models like RealTalk perform well. So not just anyone can go out and do it. But in the next few years (or even sooner), we’ll see the technology advance to the point where only a few seconds of audio are needed to create a life-like replica of anyone’s voice on the planet,” the post read.
The replica of Rogan’s voice was produced using a text-to-speech deep learning system called RealTalk, which generates life-like speech using only text inputs, according to the developers.