Talks & Media Coverage
Talks and Media Coverage
- Invited Talks & Tutorials
11th Speech, Acoustics and Signal Processing Workshop (SPEASIP)
[Invited Talk] Latest Trends in Speech Generative AI and Misuse Countermeasures
- #Generative model
- #DeepfakeDetection
- #Audio processing
Speaker: Junichi Yamagishi
Conference name: 11th Workshop on Speech, Acoustics and Signal Processing (SPEASIP)
Organizers: IEICE Speech Committee (SP), Applied Acoustics Committee (EA), Acoustical Society of Japan Electroacoustics Committee (EA), IEICE Signal Processing Committee (SIP), APSIPA Japan Chapter (APSIPA JC), Information Processing Society of Japan Spoken Language Processing Committee (SLP)
Location: Okinawa
Date: March 4, 2025
URL: https://www.ipsj.or.jp/event/seminar/2024/program12.html
This presentation will first introduce the latest trends in speech generation AI models, which have undergone dramatic evolution. We will introduce technologies such as voice cloning, which reproduces speaker characteristics, and speech tokenizers, which utilize speech input and output in large-scale language models (LLMs), and discuss the latest trends in speech interaction research. We will then discuss countermeasures for misuse of speech generation AI. First, we will touch on deepfake detection technology, which distinguishes between artificial and human speech generated by generative models. We will then introduce a large-scale speech database for training and evaluating detection models, and the results of analyses of multiple detection models under adverse conditions. Finally, we will introduce neural watermarking, which processes the model weights of speech generation AI and automatically embeds watermarks in its speech output, and discuss its usefulness and limitations.