AI-generated speech is a synthetic data type posing a serious and current risk, increasingly used in crime. This article introduces ongoing efforts in increasing public resilience against AI-mediated crime.

Recent technological developments mean that AI-generated speech has skyrocketed in quality, to the point where it is in some cases virtually indistinguishable from genuine human speech. Even more concerningly, speech can be specially engineered to sound like a specific speaker, a phenomenon known as voice spoofing. Spoofs (also known as voice clones or auditory deepfakes) can fool automatic speaker verification (ASV) systems used in secure settings such as banking and government, potentially granting fraudsters access.

Human listeners are similarly at risk. To best illustrate this, we turn now to the ‘the 3 Ds of Deception’:

Disinform - in which spoofs are used to instigate and disseminate knowingly false information. In successful cases, listeners believe and further spread the information, because to all appearances, the source is an influential, ostensibly trustworthy figure. This D is most commonly seen in politics, where cybercriminals impersonate people in power to suppress voters or manipulate elections. At their worst, spoofs of this nature have the potential to threaten democracy itself.

Defame – unlike the other two Ds, the main victims here are those being spoofed, not the listeners being deceived. Defamatory voice spoofs create fake audio of their victim saying something offensive, inflammatory, or illegal, triggering social and sometimes financial backlash. While high-profile cases exist, the risk to ordinary people can be greater. A celebrity might weather the backlash; a working-class shop assistant might not. A deepfake of the latter making a racist remark would not spark the same levels of international outrage as, say, one of Billie Eilish would. However, if both lost their jobs as a result, one would lose their only income, while the other would still be worth 
$50 million.

Defraud – voice spoofs can be used to impersonate someone with a relationship to a listener, with the aim of extracting money from that listener. AI-mediated fraud is a new type of confidence scam, exploiting pre-existing relationships and trust, rather than building trust over time. Fraud can involve false celebrity endorsements or phone scams, in which an imposter poses as a colleague, friend, or family member in financial need. This D has already shown to be hugely damaging, with millions of pounds being stolen this way in recent years (see Stupp, 2019; Brewster, 2023; Bunn, 2023; Verma, 2023).

 

The conditions dictating accurate discrimination between genuine and synthetic speech are little understood.

 

Presently, attempting to protect human listeners from AI-mediated deception is difficult, since the conditions dictating accurate discrimination between genuine and synthetic speech are little understood. There is, however, one factor that may prove invaluable to increasing this understanding: familiarity. As demonstrated above by all 3 Ds, spoofed voices are celebrities, colleagues, family members – people who are known to the listener. Listeners will have a developed knowledge base of the spoofed speaker’s genuine voice. Surely, then, this familiarity will make listeners better at recognising when the speaker is being impersonated?

Our preliminary findings investigating the influence of familiarity on detection of deepfake voices suggest that this is not the case. Familiarity may degrade performance. Of 120 participants studied, it was found that accurate identification of AI-generated samples dropped to 63% if the perceived speaker was a very well-known celebrity (Donald Trump), compared to one with a more limited fanbase (Linus Sebastian), where accuracy rates were around 73%.

One possible explanation for this is that familiarity with someone’s voice isn’t limited to just knowing how they sound. The voice activates a wealth of rich information about that person: knowledge of their experiences, personalities, values, and relationships. In the study above, many listeners left comments indicating strong negative feelings for Donald Trump. One listener described the speaker they were listening to as an ‘idiot’, and therefore decided it must genuinely be Trump, rather than a spoof. In short, they used a highly subjective value judgement to discriminate between human and AI-generated speech, rather than the voice alone.

This suggests that a listener’s personal experience with the speaker might overshadow objectivity in decision-making. Compare this to a stranger’s voice – the listener has had no past contact with them, and so they can assess the voice impartially.

Future work in this research series will diverge from familiar-celebrity spoofs in favour of spoof detection when there are familiar-intimate relationships between listener & perceived speaker (i.e., family, friends). From the insights gained – particularly why listeners make their decisions – we can explore the cues that are being registered, and more crucially those that are being missed. Spoofing is not an isolated threat; people and organisations at all levels of society are potential victims. Understanding how and why people are deceived is essential if we wish to develop our resilience against them.

 

Download the article

 


Hope McVean is a current PhD student at Lancaster University, working primarily in FACTOR (Forensic Linguistics, Cybersecurity and Technology Research).

Read more

Brewster, T. (2021, October 14). Fraudsters Cloned Company Director’s Voice In $35 Million Heist, Police Find. Forbes.
https://www.forbes.com/sites/thomasbrewster/2021/10/14/huge-bank-fraud-uses-deep-fake-voice-tech-to-steal-millions/

 

Bristow, T. (2023, October 9). Keir Starmer suffers UK politics’ first deepfake moment. It won’t be the last. Politico. https://www.politico.eu/article/uk-keir-starmer-labour-party-deepfake-ai-politics-elections

 

Bunn, A. (2023, May 15). Artificial Imposters—Cybercriminals Turn to AI Voice Cloning for a New Breed of Scam. McAfee. https://www.mcafee.com/blogs/privacy-identity-protection/artificial-imposters-cybercriminals-turn-to-ai-voice-cloning-for-a-new-breed-of-scam/

 

Khan, A., Malik, K. M., Ryan, J., & Saravanan, M. (2023). Battling voice spoofing: A review, comparative analysis, and generalizability evaluation of state-of-the-art voice spoofing countermeasures. Artificial Intelligence Review, 56(1), 513–566. https://doi.org/10.1007/s10462-023-10539-8

 

McVean, H. (2024). Synthetic celebs and deepfake deception: Discrimination between human and AI-generated speech of known vs unknown speakers (Master’s dissertation, Lancaster University). Lancaster University Repository.

 

Meaker, M. (2023, October 3). Slovakia’s Election Deepfakes Show AI Is a Danger to Democracy. Wired. https://www.wired.com/story/slovakias-election-deepfakes-show-ai-is-a-danger-to-democracy

 

Mlot, S. (2023, January 31). People Are Still Terrible: AI Voice-Cloning Tool Misused for Deepfake Celeb Clips. PCMag. https://uk.pcmag.com/news/145199/people-are-still-terrible-ai-voice-cloning-tool-misused-for-deepfake-celeb-clips

 

Rose, R. & Cohen, M. (2024, May 23). Political consultant behind fake Biden AI robocall faces charges in New Hampshire. CNN. 
https://www.wsj.com/articles/fraudsters-use-ai-to-mimic-ceos-voice-in-unusual-cybercrime-case-11567157402

 

Stupp, C. (2019, August 30). Fraudsters Used AI to Mimic CEO’s Voice in Unusual Cybercrime Case. Wall Street Journal. https://www.wsj.com/articles/fraudsters-use-ai-to-mimic-ceos-voice-in-unusual-cybercrime-case-11567157402

 

Verma, P. (2023, March 5). They thought loved ones were calling for help. It was an AI scam. The Washington Post. https://www.washingtonpost.com/technology/2023/03/05/ai-voice-scam