Keynote Speakers - Interspeech 2024

Prof. Isabel Trancoso

ISCA Medalist 2024

INESC-ID / IST, University of Lisbon, Portugal

Title: Towards Responsible Speech Processing

Responsible AI may not be a consensuous concept and the list of the so called pillars may not be uniquely defined either. Nonetheless, their message is clear and urgent. In this talk, I’ll address some of the pillars of responsible speech processing, focusing on privacy, explainability (namely for health applications), fairness/inclusion and sustainability. Rather than attempting a comprehensive survey of all the efforts in these directions, I will present my own perspective of how these pillars should inform the next generation of speech research.

Biography

Isabel Trancoso is a full professor at Instituto Superior Técnico (IST, Univ. Lisbon), and the former President of the Scientific Council of INESC ID Lisbon. She got her PhD in ECE from IST in 1987. She chaired the ECE Department of IST. She was Editor-in-Chief of the IEEE Transactions on Speech and Audio Processing and had many leadership roles in SPS (Signal Processing Society of IEEE) and ISCA (International Speech Communication Association), namely having been President of ISCA and Chair of the Fellow Evaluation Committees of both SPS and ISCA. She was elevated to IEEE Fellow in 2011, and to ISCA Fellow in 2014.

Her PhD topic was medium-to-low bit rate speech coding. From October 1984 through June 1985, she worked on this topic at AT&T Bell Laboratories, Murray Hill, New Jersey. After her PhD, her research focus shifted to speech synthesis and recognition, with a special emphasis on tools and resources for the Portuguese language. She launched the speech processing group of INESC-ID, later restructured as L2F/HLT.

Her current research scope is much broader, encompassing many areas in spoken language processing. Her recent PhD advising activities cover microblog translation, lexical and prosodic entrainment in spoken dialogues, disfluency detection in spontaneous speech, and conversation quality evaluation. She has a particular interest in speech as a health biomarker and privacy preserving speech processing. Her formal retirement (from teaching) in October 2022 did not affect in any way her passion for speech and language research.

Dr. Shoko Araki

NTT Communication Science Laboratories, NTT Corporation, Japan

Title: Frontier of Frontend for Conversational Speech Processing

To deepen and enrich our daily communications, researchers have made significant efforts over several decades to develop technologies that can recognize and understand natural human conversations. Despite significant progress in both speech/language processing and speech enhancement technology, conversational speech processing remains challenging.

Recordings of conversations with distant microphones contain ambient noise, reverberation, and speaker overlap that changes as the conversation progresses. Consequently, recognizing conversational speech is much more challenging than single-talker speech recognition, and frontend technologies such as speech enhancement and speaker diarization are essential to achieving highly accurate conversational speech processing.

For more than two decades, the presenter‘s research group has explored frontend techniques (source separation, dereverberation, noise reduction, and diarization) for handling realistic natural conversations with distant microphones. In this talk, I would like to talk about the evolution and frontier of frontend technologies for conversational signal processing. Specifically, we will trace the evolution of multichannel signal processing and neural network techniques, including beamforming and target speaker tracking and extraction, which have always played an important role in successive cutting-edge frontends, along with the latest achievements.

Biography

Shoko Araki is a Senior Research Scientist at NTT Communication Science Laboratories, NTT Corporation, Japan where she is currently leading the Signal Processing Research Group. Since joining NTT in 2000, she has been engaged in research on acoustic signal processing, microphone array signal processing, blind speech separation, meeting diarization, and auditory scene analysis.

She was formerly a member of the IEEE SPS Audio and Acoustic Signal Processing Technical Committee (AASP-TC) (2014-2019) and currently serves as its Chair. She was a board member of the Acoustical Society of Japan (ASJ) (2017-2020), and she served as vice president of ASJ (2021-2022). She also served as a member of the organizing committee of several international flagship workshops, ICA 2003, IWAENC 2003, IEEE WASPAA 2007, HSCMA2017, IEEE WASPAA2017, IWAENC2018, and IEEE WASPAA2021 as well as the evaluation co-chair of the Signal Separation Evaluation Campaign (SiSEC) in 2008, 2010, and 2011.

She received the 19th Awaya Prize from Acoustical Society of Japan (ASJ) in 2001, the Best Paper Award of the IWAENC in 2003, the TELECOM System Technology Award from the Telecommunications Advancement Foundation in 2004 and 2014, the Academic Encouraging Prize from the Institute of Electronics, Information and Communication Engineers (IEICE) in 2006, the Itakura Prize Innovative Young Researcher Award from ASJ in 2008, The Young Scientists’ Prize for the Commendation for Science and Technology by the Minister of Education, Culture, Sports, Science and Technology in 2014, the IEEE SPS Best paper award in 2014, and the IEEE ASRU 2015 Best Paper Award Honorable Mention in 2015. She is an IEEE Fellow.

Prof. Dr.-Ing. Elmar Nöth

Pattern Recognition Lab, Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, Germany

Title: Analysis of Pathological Speech – Pitfalls along the Way

In this talk, I focus on speech as an easy-to-extract biomarker for various diseases and congenital defects. I discuss the motivation and information gain of the analysis of pathological speech as well as various aspects which are more or less important when compared to the analysis of regular speech. Examples for these aspects are small data collections, data privacy, and explainability of the automatic decisions.

Biography

Elmar Nöth studied computer science at the Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU) in Erlangen, Germany and at the M.I.T. in Cambridge, USA. Elmar is a professor at FAU’s computer science department and retired from active duty in 2022. He was leading the speech group at the Pattern Recognition Lab. He is author or co-author of more than 500 articles. Almost 30 years ago his group connected the worldwide first conversational dialogue system to the public telephone line. In his PhD, Elmar focused on prosodic analysis. Since the year 2000, his group focused on the analysis and evaluation of pathologic speech and started fundamental research in that field. His current research interests are in prosody, analysis of pathologic speech, computer aided language learning, emotion analysis, and the analysis of animal communication.

Prof. Barbara Tillmann

Laboratory for Research on Learning and Development, LEAD – CNRS UMR5022, Université de Bourgogne, Dijon, France

Title: Perception of music and speech: Focus on rhythm processing

Research in cognitive neuroscience has revealed similarities in neural and cognitive correlates of music and language processing. Investigations focusing on temporal processing, in particular, rhythmic and metrical processing, have revealed interesting connections between music and speech. These observations have led to several theoretical frameworks and hypotheses about underlying mechanisms and neural functioning, and has motivated applications to clinical research. I will present research that has demonstrated beneficial effects of rhythmic stimulation or training to improve language processing in populations of adults and children with typical development and with developmental language disorder or dyslexia. A recent hypothesis highlights the potential value of early detection of atypical rhythmic processing as indicative of increased risk for language disorders. This research domain provides perspectives for creating rhythm-based training programs for rehabilitation and also for early intervention, aiming to decrease language deficits during development.

Biography

After a PhD in cognitive psychology and postdoctoral research in cognitive neuroscience, Barbara Tillmann started a CNRS research position in France and directed the research group “Auditory Cognition and Psychoacoustics” at the Lyon Neuroscience Research Center, before moving to the Laboratory for Research on Learning and Development in Dijon. Her research uses behavioral, neurophysiological and computational methods to investigate how the brain acquires knowledge about complex sound structures (music, language), and how this knowledge shapes perception and memory via predictions. She also studies perspectives for stimulating cognitive and sensory processes with music, including in pathology (e.g., dyslexia, Alzheimer Disease, disordered states of consciousness, hearing-impairment). https://scholar.google.com/citations?user=JjJhBEQAAAAJ&hl=en