Talks & Media Coverage

Talks and Media Coverage

  • TOP
  • Talks and Media Coverage
  • [Invited Lecture] Countermeasures against exploitation of voice generation AI and voice cloning: From deep fake detection to active defense

Share

  • Invited Talks & Tutorials

Acoustical Society of Japan Kyushu Branch 1st Online Seminar

[Invited Lecture] Countermeasures against exploitation of voice generation AI and voice cloning: From deep fake detection to active defense

  • #DeepfakeDetection
  • #Audio processing
  • #Generative model

Speaker: Junichi Yamagishi
Conference name: 1st Online Seminar of the Kyushu Branch of the Acoustical Society of Japan
Organizer: Kyushu Branch of the Acoustical Society of Japan
Venue: Online
Date: July 2, 2025
URL:

Recent voice generation models, particularly voice cloning technology that reproduces speaker identity, have brought new value to entertainment and other fields. However, if misused, their high reproducibility can cause problems in personal authentication systems and other systems. In this presentation, we will introduce our efforts and research results on defensive models against such deepfake impersonation attacks. First, we will introduce a large-scale audio database for training deepfake audio detection models, as well as evaluation data for deepfake audio detection over the phone and on compressed audio. We will then present findings from the analysis of 50 detection models built on this database. Next, we will introduce a method for detecting deepfakes using unknown methods, taking into account the fact that media generation technology is constantly evolving and new methods are constantly being developed.

In the second half of the presentation, I will discuss not only passive defense frameworks that consider detection methods after audio is generated, but also active defense, which considers abuse prevention measures from the early stages of audio generation and publication, and comprehensively builds an environment that makes inappropriate use difficult. First, I will introduce the usefulness and limitations of neural watermarking, which processes the model weights of audio generation AI and automatically embeds watermarks in the audio output. Finally, I will introduce speaker anonymization technology, which reduces the risk of deepfakes being created and protects the privacy of our voices by anonymizing only the features associated with the speaker before publishing audio on social media, etc.