Computational speech group´s Profile image

Computational speech group

Research group
08.10.2017 -

Group description

One of the most significant areas of research in speech technology is speaker recognition, i.e. recognition of persons based on their voice. We carry out both basic methodology research (e.g. feature extraction, segmentation, classification and data fusion algorithms), and apply the developed techniques as parts of state-of-the-art speaker and language recognition systems that rely heavily on statistical machine learning from big data. One of the central research questions is how to recognize a person robustly from highly uncertain speech data, which often originates from low-quality telephone call in an unknown environment and being mixed with unknown background noise or reverberation. In addition, the person him/herself can modify ones voice purposefully. These inherent data-related challenges require intelligent combinations of digital signal processing models of human speech production (domain knowledge) combined with modern machine learning techniques that are used to infer speaker- and language-specific representations from thousands of hours of unannotated speech material. Another recent research direction, recognition and characterization of voice spoofing attacks, is particularly relevant in information security  how to detect whether a given audio was actually produced by a live human, or be fabricated for instance by a speech synthesizer? Such questions are already of key importance in the world of fake news, and with the rapidly evolving speech synthesis technology that can already now be used to create speech that sounds like a particular targeted speaker. The research group has been one of the international forerunners in detecting such spoofing attacks. In a recently concluded H2020-funded OCTAVE project, such methods were implemented in a cloud-based voice biometric system. Besides speaker recognition, other topics in the group include e.g. language and accent identification, voice conversion (imitation by a computer), and comparing human listeners and automatic techniques in these task.

Keywords