Seeing is Believing, Hearing is Revealing
A MIT and Northwestern study finds humans are more adept at spotting political deepfakes than previously thought.
As artificial intelligence technologies advance, concerns grow about the potential for hyper-realistic deepfake content — media edited or generated with AI — to mislead viewers, especially voters, and undermine democracy. But new research published on September 2nd, 2024 in Nature Communications finds that individuals between the age of 18 and 65 are surprisingly adept at distinguishing real political speeches from AI-generated fakes. The research, conducted by scientists at MIT and Northwestern University, provides some reassurance about humans' ability to detect digital deception in an era of rapidly evolving AI.
"There's been a lot of concern that deepfakes would soon be indistinguishable from authentic videos," said Aruna Sankaranarayanan, lead graduate student on the project. "But our findings show that having sensory information — being able to both see and hear the content — actually helps people spot fakes more accurately than we previously thought."
The study involved experiments with over 40,000 participants who were asked to evaluate real and fabricated political speeches by prominent figures, such as President-elect Donald Trump and President Joe Biden. The speeches were presented in various formats, including text transcripts, audio clips, and videos.
Across multiple experiments, accuracy in distinguishing real from fake content increased significantly when participants had access to audio-visual information. In the experiment, accuracy rose from just 57% for text transcripts to 86% for videos with audio.
"We were surprised by how much the addition of audio improved people's ability to detect fakes," said Sankaranarayanan. "It seems that how something is said — the audio-visual cues — matters even more than what is said when it comes to judging authenticity."
Even when not explicitly asked to judge authenticity, participants were more likely to express suspicion about fabricated content when presented with video rather than just text or audio. This suggests that visual cues play an important role in triggering people's natural skepticism.
Sankaranarayanan noted that the findings may inform approaches to media literacy education and social media content moderation; providing full audio-visual content could be key in helping the public distinguish fact from fiction.
Interestingly, the study also found that deepfakes using audio generated by text-to-speech algorithms were harder for participants to detect than those using voice actors, demonstrating the rapid progress in AI voice synthesis technology. Nikhil Singh, a graduate student who worked on crafting the audio for the study, “spent a lot of effort into making these as realistic as possible,” Sankaranarayanan stated, “accounting for factors like environmental acoustics and how voices change when moving closer to or further from a microphone.”
The researchers examined how the prevalence of fake content affected people's ability to detect it, finding no significant overall difference in accuracy between groups exposed to high versus low rates of deepfakes. However, participants in the high-exposure group were less confident when identifying fakes — the experiment allowed the participants to list their levels of confidence. This suggests a possible desensitization effect, where individuals heavily exposed to deepfakes are less likely to identify them.
Even though the experiments showed average identification rates of 80%, the 20% is still enough to swing election results. Aya Schwartz, an undergraduate not involved in the study who studies misinformation and its consequences with Dr. Gillian Sinnott at Harvard University, comments: “A technology or policy that identifies deep fakes would allow people to feel more comfortable in determining if media is fact or fiction,” referencing the mistrust of media. “People are able to make informed choices when they believe what they see, a vital part of democracy.” The power of deepfakes, particularly their ability to compel and elicit a more emotional response, mean that just one solution is not sufficient to solve the problem.
The researchers behind the study acknowledged this limitation and emphasized that their work is just one piece of the puzzle in understanding and combating digital misinformation. "Our study focused on a specific context — political speeches by well-known figures," Sankaranarayanan explained. "The dynamics could be different for less prominent individuals or different types of media content."
The team is now exploring how factors such as political affiliation and media literacy affect people's ability to detect deepfakes. They're also investigating how exposure to fabricated content might influence political attitudes and voting behavior.
As AI continues to evolve, staying ahead of misuse remains a challenge. Harnessing our natural perceptual abilities, particularly our capacity to integrate visual and auditory information, could be a powerful tool in that effort. "The human brain is remarkably good at detecting subtle inconsistencies," Sankaranarayanan noted. "By understanding and leveraging these capabilities, we may be able to build more effective defenses against digital deception."
In an era where seeing is no longer always believing, it seems that seeing and hearing together might be our best bet for discerning truth from fiction in the digital age.