Papers shortlisted for the ISCA Best Student Paper Award 2024
1. Zhiqi Ai, Zhiyong Chen & Shugong Xu
MM-KWS: Multi-modal Prompts for Multilingual User-defined Keyword Spotting
PaperID-10, ASR and LLMs, A10-O2
2. Rui Cao, Tianrui Wang, Meng Ge, Andong Li, Longbiao Wang, Jianwu Dang & Yungang Jia
VoiCor: A Residual Iterative Voice Correction Framework for Monaural Speech Enhancement
PaperID-53, Deep Learning-Based Speech Enhancement: Approaches, Scalability, and Evaluation, A6-O7
3. Dong Yang, Tomoki Koriyama & Yuki Saito
Frame-Wise Breath Detection with Self-Training: An Exploration of Enhancing Breath Naturalness in Text-to-Speech
PaperID-168, Speech Synthesis: Other Topics 2, A7-P5-B
4. Haochen Wu, Wu Guo, Zhentao Zhang, Wenting Zhao, Shengyu Peng & Jie Zhang
Spoofing Speech Detection by Modeling Local Spectro-Temporal and Long-term Dependency
PaperID-251, Speaker Recognition: Adversarial and Spoofing Attacks, A4-P1
5. Helin Wang, Jesus Antonio Villalba, Laureano Moro-Velazquez, Jiarui Hai, Thomas Thebaud & Najim Dehak
Noise-robust Speech Separation with Fast Generative Correction
PaperID-327, Source Separation 1, A5-O4
6. Yifei Xin, Xuxin Cheng, Zhihong Zhu, Xusheng Yang & Yuexian Zou
DiffATR: Diffusion-based Generative Modeling for Audio-Text Retrieval
PaperID-405, Audio-Text Retrieval, A5-O3
7. Jie Chi, Electra Wallington & Peter Bell
Characterizing code-switching: Applying Linguistic Principles for Metric Assessment and Development
PaperID-551, L2 Speech, Bilingualism and Code-Switching, A1-O1
8. Junseok Ahn, Youkyum Kim, Yeunju Choi, Doyeop Kwak, Ji-Hoon Kim, Seongkyu Mun & Joon Son Chung
VoxSim: A perceptual voice similarity dataset
PaperID-646, Databases and Progress in Methodology, A1-P1-A
9. Beomseok Lee, Ioan Calapodescu, Marco Gaido, Matteo Negri & Laurent Besacier
Speech-MASSIVE: A Multilingual Speech Dataset for SLU and Beyond
PaperID-957, Spoken Language Understanding, A11-O1
10. Yin-Long Liu, Rui Feng, Jiahong Yuan & Zhen-Hua Ling
Clever Hans Effect Found in Automatic Detection of Alzheimer’s Disease through Speech
PaperID-1018, Pathological Speech Analysis 3, A13-P2-A
11. Alkis Koudounas, Flavio Giobergia, Eliana Pastor & Elena Baralis
A Contrastive Learning Approach to Mitigate Bias in Speech Models
PaperID-1219, Spoken Language Understanding, A11-O1
12. Dongchao Yang, Dingdong Wang, Haohan Guo, Xueyuan Chen, Xixin Wu and Helen Meng
SimpleSpeech: Towards Simple and Efficient Text-to-Speech with Scalar Latent Transformer Diffusion Models
PaperID-1392, Speech Synthesis: Paradigms and Methods 3, A7-P4-B
13. Fabian Alejandro Ritter-Gutierrez, Kuan-Po Huang,Jeremy H. M. Wong, Dianwen Ng, Hung-yi Lee & Eng Siong Chng
Dataset-Distillation Generative Model for Speech Emotion Recognition
PaperID-1430, Speech Emotion Recognition, A3-O3
14. Yuke Lin, Ming Cheng, Fulin Zhang, Yingying Gao, Shilei Zhang & Ming Li
VoxBlink2: A 100K+ Speaker Recognition Corpus and the Open-Set Speaker-Identification Benchmark
PaperID-1490, Speaker recognition evaluation and resources, A4-O6
15. Dail Kim, Da-Hee Yang, Donghyun Kim, Joon-Hyuk Chang, Jeonghwan Choi, Moa Lee, Jaemo Yang & Han-gil Moon
Guided conditioning with predictive network on score-based diffusion model for speech enhancement
PaperID-1545, Generative Speech Enhancement, A6-O2
16. Jiajun He & Tomoki Toda
2DP-2MRC: 2-Dimensional Pointer-based Machine Reading Comprehension Method for Multimodal Moment Retrieval
PaperID-1633, Spoken Term Detection and Speech Retrieval, A12-O3
17. Ziping Zhao,Tian Gao, Haishuai Wang & Bjoern Schuller
MFDR: Multiple-stage Fusion and Dynamically Refined Network for multimodal emotion recognition
PaperID-1735, New Avenues in Emotion Recognition, A3-O4
18. Kevin Y Huang, Jack Goldberg, Louis Goldstein & Shrikanth Narayanan
Analysis of articulatory setting for L1 and L2 English speakers using MRI data
PaperID-2175, Phonetics and Phonology of Second Language Acquisition, A2-O2
19. Tejes Srivastava, Jiatong Shi, William Chen & Shinji Watanabe
EFFUSE: Efficient Self-Supervised Feature Fusion for E2E ASR in Low Resource and Multilingual Scenarios
PaperID-2199, Cross-Lingual and Multilingual Processing, A9-P2
20. Irene B. R. Smith, Morgan Sonderegger & The Spade Consortium
Modelled Multivariate Overlap: A method for measuring vowel merger
PaperID-2260, Individual and Social Factors in Phonetics, A2-O1
21. Vrushank Changawala & Frank Rudzicz
Whister: Using Whisper’s representations for Stuttering detection
PaperID-2293, Speech Disorders 2, A13-P1-B
22. Tian-Hao Zhang, Xinyuan Qian, Feng Chen & Xu-Cheng Yin
Transmitted and Aggregated Self-Attention for Automatic Speech Recognition
PaperID-2374, Neural Network Architectures for ASR 2, A8-P1
2. Haechan Kim, Junho Myung, Seoyoung Kim, Sungpah Lee, Dongyeop Kang & Juho Kim
LearnerVoice: A Dataset of Non-Native English Learners’ Spontaneous Speech
PaperID-2392, Accented Speech, Prosodic Features, Dialect, Emotion, Sound Classification, A8-P3
24. Ashish Mittal, Darshan Deepak Prabhu, Sunita Sarawagi & Preethi Jyothi
SALSA: Speedy ASR-LLM Synchronous Aggregation
PaperID-2499, Error Correction and Rescoring, A9-O4