Science speaks with experts about the unique threat posed by AI-generated audio and video.
The “fake news” of 2016 is so passé. This year, fake video and audio clips generated by artificial intelligence (AI) are the new looming threat that could sway voters in the upcoming U.S. presidential election.

Images created by Eliot Higgins with the use of artificial intelligence show a fictitious skirmish with Donald Trump and New York City police officers posted on Higgins’ Twitter account, as photographed on an iPhone in Arlington, Va., Thursday, March 23, 2023. The highly detailed, sensational images, which are not real, were produced using a sophisticated and widely accessible image generator. (AP Photo/J. David Ake)
Last week, voters in New Hampshire’s presidential primary received a robocall in which President Joe Biden seemed to be discouraging them from participating in the election. “Voting this Tuesday only enables the Republicans in their quest to elect [former President] Donald Trump again,” the recording said.
Only the call wasn’t actually recorded by Biden—it was a deepfake, a form of generative AI in which algorithms collate clips of a person’s face or voice, learn from them, and impersonate the subject saying things they never did. Experts are alarmed at how convincing it seemed. “It has a lot of dangerous things combined in one: disinformation, AI-generated voices, impersonating the president, and essentially discouraging voting, which is an illegal activity,” says data scientist Matt Groh of Northwestern University.
Deepfakes are cheaper and easier to produce than ever, and we’re likely to see many more during the election season. Science talked with several experts about the dangers of AI-generated content, and why we struggle so much to recognize it.
Are deepfakes really any worse than old-fashioned fake news?
False narratives on social media have certainly caused problems for years. Researchers have found that people readily share fake news articles that support their beliefs, even if they know the stories are false. It doesn’t matter if the content is tagged as fake news: The more times we see fake content, the more likely we are to remember it as real. “Through repetition, content becomes ingrained in people’s heads,” says Steven Sloman, a cognitive psychologist at Brown University.
But those problems are heightened with images and video, which tend to stick in the mind in the way that text doesn’t. “When we see something happen, we just naturally believe it,” Sloman says. So deepfakes that look convincing—even if only at a quick glance—could be even more widely shared and believed.
There’s another danger, Sloman says: Deepfakes sow uncertainty. If people can’t tell the difference between the two, they may become more likely to claim that real images that don’t support their views are deepfakes, regardless of what expert analyses or detection software says. One study found that alerting people that videos might be AI generated didn’t make them any better at spotting them. Instead, it made them more likely to disbelieve everything they saw. In that sense, Sloman says, “Deepfakes certainly present more of a threat than any other kind of medium that’s existed.”
But that’s just one study—can’t most people spot deepfakes?
People certainly like to think so—and that’s part of the problem. Another study of 210 volunteers found that most were confident they could distinguish deepfake videos from real ones—but in reality, their guesses were not much more accurate than if they’d simply flipped a coin to decide.
We’re also pretty bad at recognizing audio deepfakes, like the one that went out to voters in New Hampshire. A study last year of more than 500 English and Mandarin speakers found that people only correctly identified speech deepfakes about 73% of the time—and thought that real audio was fake at almost the same rate.
It’s not surprising that we can be fooled, says computational neuroscientist Tijl Grootswagers of Western Sydney University. “In our lives, we never have to think about who is a real or a fake person,” he says. “It’s not a task we’ve been trained on.”
But our brains may be better at detecting deepfakes than we are. When Grootswagers had 22 volunteers look at a deepfake headshot, the image triggered an electrical signal from their brains’ visual cortex that differed from the signal when they looked at a real person. Yet these volunteers were still pretty bad at guessing which images were real. He’s not sure why—perhaps other brain regions interfere with the signals from the visual cortex before they reach our conscious perception, or perhaps these signals simply don’t register with us, because we’ve never really needed to use them before.
Even if that information doesn’t filter into our conscious awareness, we can often sense when there is something “off,” especially if given multiple clues. In a recent preprint posted on arXiv, Groh tested whether more than 2200 online volunteers could determine whether political speeches by Biden and Trump—half of which were AI generated—were real or fake. His team found that people struggle with this task if they only read the transcripts, but perform better at distinguishing deepfake videos of the speeches—and even better if audio and subtitles are added.
Once more kinds of media and interactions are involved, “there’s all these opportunities for failure points to arise and it’s easy for people to spot the inconsistencies,” Groh says. That’s why people are worse at detecting simple AI-generated headshots than complex fake images, such as the one above showing Trump being arrested.
Are there any other tell-tale signs to look out for?
For a long time, AI-generated images tended to contain signature giveaways such as oddly shaped hands with extra fingers. Newer technologies have largely fixed these problems, but still contain subtle inconsistencies—in the viral image of Pope Francis wearing a puffy coat, for instance, the pontiff’s glasses form an unrealistic shadow.
People looking for malformed body parts are likely to miss the signs of the latest deepfakes, which tend to be too perfect. The algorithms produce “average” faces from their training sets and rarely include unusual facial features seen in real life, which Grootswagers says may explain why people fall for them so easily. “People consider average faces more attractive and trustworthy,” he says. The same is true of audio deepfakes, Groh says. AI-generated speech doesn’t tend to contain the lip smacks, “ums” and “uhs,” or poor recording quality that characterize real speech, for example.
Many companies are developing software to detect deepfakes, but it’s not always clear what aspects of an image the software detects—or whether it will keep up as AI technology advances. In a 2021 study by Groh’s group, computer vision software identified fake videos about as well as humans did, although humans and computers made different kinds of mistakes. Humans, but not computers, became worse at detecting the fakes when the face was flipped upside down or the eyes were covered, suggesting the machines were picking up on something other than peculiarities in the face.
Can anything be done to stop deepfakes from causing havoc?
Some governments—particularly their security services—have begun to look into the threat that generative AI poses to elections. The U.S. Federal Election Commission has proposed updating its rules on election campaign fraud to include deceptive AI. But they can’t do much to stop AI technology from advancing overall, and efforts to detect deepfakes will be a continual arms race, Grootswagers says. “I think the problem is going
to get worse because algorithms are getting better.”
For the time being, Groh says, deepfake videos aren’t necessarily as easy to make as people think, although he agrees the technology could improve even between now and the November election. Creating truly convincing videos such the viral Tom Cruise deepfake first released in 2021 still requires a lot of resources, including a look-alike actor and heavy editing.
Groh says that making people more aware of deepfakes’ existence and teaching them how to spot them is key. It’s especially important, he says, to educate diverse groups—younger people might be used to adding fun filters to their videos, but older people may not know how easy it is to, say, sync a person’s lips with a different audio clip.
Still, education can only do so much. Even if most people are aware of the dangers of deepfakes and how to spot them, Groh says, “are they going to be attentive to that when some kind of robocall is coming? And that’s an open question.”
doi: 10.1126/science.zw0xbf6
Source: Science