Published daily by the Lowy Institute

Don’t play it by ear: Audio deepfakes in a year of global elections

From robocalls to voice clones, generative AI is allowing malicious actors to spread misinformation with ease.

Deepfake audio programs can clone human speech with as little as a 15-second audio sample (Getty Images)
Deepfake audio programs can clone human speech with as little as a 15-second audio sample (Getty Images)

Artificial intelligence company OpenAI recently introduced Voice Engine, a natural-sounding speech generator that uses text and a 15-second audio sample to create an “emotive and realistic” imitation of the original speaker.

OpenAI has not yet released Voice Engine to the public, citing concerns over the potential abuse of its generative artificial intelligence (AI) – specifically to produce audio deepfakes – which could contribute to misinformation, especially during elections.

Audio deepfakes and their uses

Audio deepfakes are generated using deep learning techniques in which large datasets of audio samples are used for AI models to learn the characteristics of human speech to produce realistic audio. Audio deepfakes can be generated in two ways: text-to-speech (text is converted to audio) and speech-to-speech (an uploaded voice recording is synthesised as the targeted voice).

Anyone can generate an audio deepfake. They are easier and cheaper to make than video deepfakes and simpler to disseminate on social media and messaging platforms.

Audio deepfakes have been used in cyber-enabled financial scams where fraudsters impersonate bank customers to authorise transactions. The same technology is increasingly being used to propagate disinformation. Several audio deepfakes attempting to mimic the voices of politicians have circulated on social media. In 2023, artificially generated audio clips of UK Labour leader Keir Starmer allegedly featured him berating party staffers.  While fact-checkers determined the audio was fake, it surpassed 1.5 million hits on X (formerly Twitter).

In India, voice cloning of children has been used to deceive parents into transferring money. In Singapore, deepfake videos containing voice clones of politicians such as the prime minister and deputy prime minister have been used in cyber-scams.

Commercialisation boom

Anyone can generate an audio deepfake. They are easier and cheaper to make than video deepfakes and simpler to disseminate on social media and messaging platforms.

With advancements in technology, only one or two minutes of audio are needed to generate a convincing deepfake recording. More professional voice clones require payment, but the sums are not prohibitive. OpenAI’s Voice Engine has reduced the number of audio seconds needed to generate a realistic recording.

The commercialisation of audio deepfake technology has boomed in recent years. Companies such as ElevenLabs offer services to create synthetic copies of voices, generate speech in 29 languages, and match accents of one’s choice.

There has been an uptick in political deepfakes targeting electoral processes in recent years, with the aim of sowing discord and confusion. Audio deepfakes are being deployed in the lead-up to India’s elections. An AI-enabled audio of US President Joe Biden was used in a robocall targeted at registered Democrat residents in New Hampshire ahead of the Democratic Primary in January 2024. While the robocall was deemed to be a fake, it served to increase public awareness of the potential risks of AI-enabled voice cloning.

The White HouseFollowing▾ P20210915AS-0265-1 President Joe Biden talks on the phone with California Governor Gavin Newsom about the previous day’s recall election, Wednesday, September 15, 2021, in the Oval Office. (Official White House Photo by Adam Schultz)
An AI-enabled audio of US President Joe Biden was used in a robocall targeted at registered Democrat residents in New Hampshire ahead of the Democratic Primary in January 2024 (Adam Schultz/The White House/Flickr)

Days before the Slovakian parliamentary elections, audio deepfake recordings allegedly featuring conversations between a leading politician and a journalist discussing topics such as vote-rigging went viral, dividing public opinion despite fact-checkers verifying that the recordings were fake and manipulated – the conversations never happened. Experts believe the recordings may have influenced the election outcome.

In the United States, state election officials are concerned about being targeted for voice cloning, which could be used to maliciously announce false election results.

Generative AI has real potential to influence public opinion and turbo-charge disinformation, allowing it to spread rapidly online on social media and messaging platforms.

Audio deepfakes can also be weaponised to sow discord and incite violence. An audio deepfake of the Mayor of London, Sadiq Khan, disparaging Remembrance weekend and calling for pro-Palestinian marches in London was manipulated to resemble a secret audio recording. It went viral on social media and led to hateful comments directed at Khan.

These uses of generative AI have real potential to influence public opinion and turbo-charge disinformation, allowing it to spread rapidly on social media and messaging platforms.

Solutions in the works

Audio deepfakes contain fewer overt signs of manipulation than deepfake videos or images, and are not easily detected without technical expertise.

Computer security software company McAfee recently announced Project Mockingbird to detect and expose altered audio in videos. Companies that provide services for AI voice generation have also taken measures to ensure their systems can identify altered audio. The watermarking of audio generated by such companies could go some way to promoting the proactive monitoring of audio deepfakes and how they are being used.

On the legislative front, there has been an increasing call for action globally. The United States has committed to attempt to regulate audio deepfakes used in elections, for instance, by banning robocalls during campaigns.

Emphasis should be placed on responding quickly to refute misinformation and disinformation propagated by audio deepfakes. Giving more resources to journalists and fact-checkers to tap their collective subject expertise would go a long way towards demystifying deepfakes and exposing the use of AI by malicious actors to generate misleading content.




You may also be interested in