Survey Talks - Interspeech 2024

September 2

Session A1-O1: L2 Speech, Bilingualism and Code-Switching - Title: Second language (L2) Speech and Perception

Date: September 2

Session A1-O1: L2 Speech, Bilingualism and Code-Switching

Time: 11:30 – 12:10

Location: Panacea Amphitheater

Title: Second language (L2) Speech and Perception

Abstract: 6102

Author/Presenter: Ann Bradlow (Northwestern University)

Session A3-O1: Paralinguistics - Title: Paralinguistics: Past and Present

Date: September 2

Session A3-O1: Paralinguistics

Time: 15:00 – 15:40

Location: Aegle B

Title: Paralinguistics: Past and Present

Abstract: 6302

Author/Presenter: Anton Batliner (University of Munich)

Session A7-O2: Zero-shot TTS - Title: Zero-shot text-to-speech

Date: September 2

Session A7-O2: Zero-shot TTS

Time: 15:00 – 15:40

Location: Aegle A

Title: Zero-shot text-to-speech

Abstract: 6702

Author/Presenter: Zhizheng Wu (Chinese University of Hong Kong, Shenzhen)

September 3

Session A8-O3: LLM in ASR - Title: Development of Spoken Language Models

Date: September 3

Session A8-O3: LLM in ASR

Time: 13:30 – 14:10

Location: Iasso

Title: Development of Spoken Language Models

Abstract: 6802

Author/Presenter: Hung-yi Lee (National Taiwan University)

Session A9-O2: Vision and Speech - Title: Visually Grounded Speech Models

Date: September 3

Session A9-O2: Vision and Speech

Time: 13:30 – 14:10

Location: Aegle B

Title: Visually Grounded Speech Models

Abstract: 6901

Author/Presenter: David Harwath (The University of Texas at Austin)

September 4

Session A3-O3: Speech Emotion Recognition - Title: Automatic Emotion Detection/Recognition

Date: September 4

Session A3-O3: Speech Emotion Recognition

Time: 10:00 – 10:40

Location: Iasso

Title: Automatic Emotion Detection/Recognition

Abstract: 6301

Author/Presenter: Carlos Busso (University of Texas at Dallas)

Session A5-O5: Speech Quality Assessment - Title: Neural Speech Assessment Metrics

Date: September 4

Session A5-O5: Speech Quality Assessment

Time: 10:00 – 10:40

Location: Aegle B

Title: Neural Speech Assessment Metrics

Abstract: 6503

Author/Presenter: Yu Tsao (Academia Sinica)

Session A9-O3: Novel Architectures for ASR - Title: Novel Architecture for ASR

Date: September 4

Session A9-O3: Novel Architectures for ASR

Time: 10:00 – 10:40

Location: Hippopcrates

Title: Novel Architecture for ASR

Abstract: 6902

Author/Presenter: Rohit Prabhavalkar (Google)

Session A5-O6: Speech and Audio Modelling - Title: Toward Speech and Audio Foundation Models

Date: September 4

Session A5-O6: Speech and Audio Modelling

Time: 13:30 – 14:10

Location: Melambus

Title: Toward Speech and Audio Foundation Models

Abstract: 6501

Author/Presenter: Shinji Watanabe (Carnegie Mellon University)

September 5

Session A7-O7: Privacy and Security in Speech Communication 2 - Title: Fake Voice and Voice Privacy

Date: September 5

Session A7-O7: Privacy and Security in
Speech Communication 2

Time: 10:00 – 10:40

Location: Aegle A

Title: Fake Voice and Voice Privacy

Abstract: 6701

Author/Presenter: Junichi Yamagishi (National Institute of Informatics); Xin Wang (National Institute of Informatics)

Session A4-O7: Self and Weakly-Labelled Speaker Verification - Title: Self-supervised Based Speaker Recognition

Date: September 5

Session A4-O7: Self and Weakly-Labelled Speaker Verification

Time: 13:30 – 14:10

Location: Iasso

Title: Self-supervised Based Speaker Recognition

Abstract: 6404

Author/Presenter: Themos Stafylakis (Omilia – Conversational Intelligence)

Session A6-O7: Deep Learning-Based Speech Enhancement: Approaches, Scalability, and Evaluation - Title: Deep Learning Based Speech Enhancement

Date: September 5

Session A6-O7: Deep Learning-Based
Speech Enhancement: Approaches, Scalability, and Evaluation

Time: 13:30 – 14:10

Location: Melambus

Title: Deep Learning Based Speech Enhancement

Abstract: 6601

Author/Presenter: Timo Gerkmann (Universität Hamburg)

Session A8-O8: Self-Supervised Learning for ASR - Title: Multilingual Speech Representations

Date: September 5

Session A8-O8: Self-Supervised Learning for ASR

Time: 13:30 – 14:10

Location: Aegle B

Title: Multilingual Speech Representations

Abstract: 6801

Author/Presenter: Bhuvana Ramabhadran (Google)