Talks & Media Coverage

Talks and Media Coverage

Share

  • Invited Talks & Tutorials

Information Processing Society of Japan Seminar Series 2024

[Seminar Talk] Recent Trends in Speech Generative AI and Measures Against Misuse

  • #Generative model
  • #DeepfakeDetection
  • #Audio processing

Speaker: Junichi Yamagishi
Conference name: Information Processing Society of Japan Seminar Series 2024 "New Horizons of Information Technology: Social Change Driven by AI and Quantum" 12th session "The Future of Symbiotic Interaction in an Intelligent Information Environment"
Organizer: Information Processing Society of Japan
Location: Online
Date: December 20, 2024
URL: https://www.ipsj.or.jp/event/seminar/2024/program12.html

This presentation will first introduce the latest trends in AI speech generation models, which have undergone dramatic evolution. We will introduce technologies such as voice cloning, which reproduces speaker characteristics, and speech tokenizers, which utilize speech input and output in large-scale language models (LLMs), and will also introduce the latest trends in speech interaction currently being researched.

We will then discuss countermeasures for misuse of voice generation AI. First, we will touch on deepfake detection technology, which distinguishes between artificial voices generated by generative models and human voices, and introduce a large-scale voice database for training and evaluating detection models, as well as the analysis results of multiple detection models under poor conditions. Finally, we will introduce neural watermarking, which processes the model weights of voice generation AI and automatically embeds watermarks in the audio output, and discuss its usefulness and limitations.