ISCA Best Student Paper Award 2024

Papers shortlisted for the ISCA Best Student Paper Award 2024

1. Zhiqi Ai, Zhiyong Chen & Shugong Xu
MM-KWS: Multi-modal Prompts for Multilingual User-defined Keyword Spotting
PaperID-10, ASR and LLMs, A10-O2

 

2. Rui Cao, Tianrui Wang, Meng Ge, Andong Li, Longbiao Wang, Jianwu Dang & Yungang Jia

VoiCor: A Residual Iterative Voice Correction Framework for Monaural Speech Enhancement
PaperID-53, Deep Learning-Based Speech Enhancement: Approaches, Scalability, and Evaluation,  A6-O7

 

3. Dong Yang, Tomoki Koriyama & Yuki Saito
Frame-Wise Breath Detection with Self-Training: An Exploration of Enhancing Breath Naturalness in Text-to-Speech
PaperID-168, Speech Synthesis: Other Topics 2, A7-P5-B 

 

4. Haochen Wu, Wu Guo, Zhentao Zhang, Wenting Zhao, Shengyu Peng & Jie Zhang
Spoofing Speech Detection by Modeling Local Spectro-Temporal and Long-term Dependency
PaperID-251, Speaker Recognition: Adversarial and Spoofing Attacks,  A4-P1 

 

5. Helin Wang, Jesus Antonio Villalba, Laureano Moro-Velazquez, Jiarui Hai, Thomas Thebaud & Najim Dehak
Noise-robust Speech Separation with Fast Generative Correction
PaperID-327, Source Separation 1,  A5-O4 

 

6. Yifei Xin, Xuxin Cheng, Zhihong Zhu, Xusheng Yang & Yuexian Zou
DiffATR: Diffusion-based Generative Modeling for Audio-Text Retrieval
PaperID-405, Audio-Text Retrieval,  A5-O3 

 

7. Jie Chi, Electra Wallington & Peter Bell
Characterizing code-switching: Applying Linguistic Principles for Metric Assessment and Development
PaperID-551, L2 Speech, Bilingualism and Code-Switching,  A1-O1 

 

8. Junseok Ahn, Youkyum Kim, Yeunju Choi, Doyeop Kwak, Ji-Hoon Kim, Seongkyu Mun & Joon Son Chung
VoxSim: A perceptual voice similarity dataset
PaperID-646,  Databases and Progress in Methodology,  A1-P1-A

 

9. Beomseok Lee, Ioan Calapodescu, Marco Gaido, Matteo Negri & Laurent Besacier
Speech-MASSIVE: A Multilingual Speech Dataset for SLU and Beyond
PaperID-957,  Spoken Language Understanding, A11-O1 

 

10. Yin-Long Liu, Rui Feng, Jiahong Yuan & Zhen-Hua Ling
Clever Hans Effect Found in Automatic Detection of Alzheimer’s Disease through Speech
PaperID-1018,  Pathological Speech Analysis 3, A13-P2-A 

 

11. Alkis Koudounas, Flavio Giobergia, Eliana Pastor & Elena Baralis
A Contrastive Learning Approach to Mitigate Bias in Speech Models
PaperID-1219, Spoken Language Understanding, A11-O1 

 

12. Dongchao Yang, Dingdong Wang, Haohan Guo, Xueyuan Chen, Xixin Wu and Helen Meng
SimpleSpeech: Towards Simple and Efficient Text-to-Speech with Scalar Latent Transformer Diffusion Models
PaperID-1392, Speech Synthesis: Paradigms and Methods 3, A7-P4-B 

 

13. Fabian Alejandro Ritter-Gutierrez, Kuan-Po Huang,Jeremy H. M. Wong, Dianwen Ng, Hung-yi Lee & Eng Siong Chng
Dataset-Distillation Generative Model for Speech Emotion Recognition
PaperID-1430, Speech Emotion Recognition, A3-O3


14. Yuke Lin, Ming Cheng, Fulin Zhang, Yingying Gao, Shilei Zhang & Ming Li
VoxBlink2: A 100K+ Speaker Recognition Corpus and the Open-Set Speaker-Identification Benchmark
PaperID-1490, Speaker recognition evaluation and resources, A4-O6


15. Dail Kim, Da-Hee Yang, Donghyun Kim, Joon-Hyuk Chang, Jeonghwan Choi, Moa Lee, Jaemo Yang & Han-gil Moon
Guided conditioning with predictive network on score-based diffusion model for speech enhancement
PaperID-1545, Generative Speech Enhancement, A6-O2


16. Jiajun He & Tomoki Toda
2DP-2MRC: 2-Dimensional Pointer-based Machine Reading Comprehension Method for Multimodal Moment Retrieval
PaperID-1633, Spoken Term Detection and Speech Retrieval, A12-O3


17. Ziping Zhao,Tian Gao, Haishuai Wang & Bjoern Schuller
MFDR: Multiple-stage Fusion and Dynamically Refined Network for multimodal emotion recognition
PaperID-1735, New Avenues in Emotion Recognition, A3-O4


18. Kevin Y Huang, Jack Goldberg, Louis Goldstein & Shrikanth Narayanan
Analysis of articulatory setting for L1 and L2 English speakers using MRI data
PaperID-2175, Phonetics and Phonology of Second Language Acquisition, A2-O2


19. Tejes Srivastava, Jiatong Shi, William Chen & Shinji Watanabe
EFFUSE: Efficient Self-Supervised Feature Fusion for E2E ASR in Low Resource and Multilingual Scenarios
PaperID-2199, Cross-Lingual and Multilingual Processing, A9-P2


20. Irene B. R.  Smith, Morgan Sonderegger & The Spade Consortium
Modelled Multivariate Overlap: A method for measuring vowel merger
PaperID-2260, Individual and Social Factors in Phonetics, A2-O1


21. Vrushank Changawala & Frank Rudzicz

Whister: Using Whisper’s representations for Stuttering detection
PaperID-2293, Speech Disorders 2, A13-P1-B


22. Tian-Hao Zhang, Xinyuan Qian, Feng Chen & Xu-Cheng Yin
Transmitted and Aggregated Self-Attention for Automatic Speech Recognition
PaperID-2374, Neural Network Architectures for ASR 2, A8-P1


2. Haechan Kim, Junho Myung, Seoyoung Kim, Sungpah Lee, Dongyeop Kang & Juho Kim
LearnerVoice: A Dataset of Non-Native English Learners’ Spontaneous Speech
PaperID-2392, Accented Speech, Prosodic Features, Dialect, Emotion, Sound Classification, A8-P3


24. Ashish Mittal, Darshan Deepak Prabhu, Sunita Sarawagi & Preethi Jyothi
SALSA: Speedy ASR-LLM Synchronous Aggregation
PaperID-2499, Error Correction and Rescoring, A9-O4