Computational speech group
Main topic of our group is speaker recognition: the task of connecting speech sample to an identity (who is speaking?). We work on improving both robustness (improved accuracy under varied channel, noise and other disturbances) and security of such systems (detection of representation attacks, also known as spoofing attacks). Our research group not only takes regularly part to the technology evaluations in our field, but also contributes to open data science as a co-organizer of public evaluation benchmarks (including ASVspoof and VC challenge) and collection of other data, such as the AVOID corpus.
Research group description
One of the most significant areas of research in speech technology is speaker recognition, i.e. recognition of persons based on their voice. We carry out both basic methodology research (e.g. feature extraction, segmentation, classification and data fusion algorithms), and apply the developed techniques as parts of state-of-the-art speaker and language recognition systems that rely heavily on statistical machine learning from big data. One of the central research questions is how to recognize a person robustly from highly uncertain speech data, which often originates from low-quality telephone call in an unknown environment and being mixed with unknown background noise or reverberation. In addition, the person him/herself can modify ones voice purposefully. These inherent data-related challenges require intelligent combinations of digital signal processing models of human speech production (domain knowledge) combined with modern machine learning techniques that are used to infer speaker- and language-specific representations from thousands of hours of unannotated speech material. Another recent research direction, recognition and characterization of voice spoofing attacks, is particularly relevant in information security how to detect whether a given audio was actually produced by a live human, or be fabricated for instance by a speech synthesizer? Such questions are already of key importance in the world of fake news, and with the rapidly evolving speech synthesis technology that can already now be used to create speech that sounds like a particular targeted speaker. The research group has been one of the international forerunners in detecting such spoofing attacks. In a recently concluded H2020-funded OCTAVE project, such methods were implemented in a cloud-based voice biometric system. Besides speaker recognition, other topics in the group include e.g. language and accent identification, voice conversion (imitation by a computer), and comparing human listeners and automatic techniques in these task.
Group members - UEF
Abraham Woubie Zewoudie Post-Doctoral Researcher
Anssi Kanervisto Early Stage Researcher , Faculty of Science and Forestry, School of Computing
Rosa Gonzalez Hautamäki Tutkijatohtori, Faculty of Science and Forestry, School of Computing
Tomi Kinnunen Professor , Faculty of Science and Forestry, School of Computing
Ville Hautamäki Senior Researcher
Ville Vestman Nuorempi tutkija, Faculty of Science and Forestry, School of Computing