Computational speech group
One of the most significant areas of research in speech technology is speaker recognition, i.e. recognition of persons based on their voice. We carry out both basic research (e.g. feature extraction, segmentation, classification and data fusion algorithms), and apply the developed techniques as parts of state-of-the-art speaker and language recognition systems that rely heavily on statistical machine learning from large datasets. One of the central research questions is how to recognize a person accurately from uncertain speech data, which often originates from low-quality telephone call (or found data) in an unknown environment and being mixed with unknown background noise, reverberation or other extrinsic disturbances. In addition, the person him/herself can modify ones voice purposefully. These inherent data-related challenges require intelligent combinations of digital signal processing models of human speech production (domain knowledge) combined with modern machine learning techniques that are used to infer speaker- and language-specific representations from thousands of hours of unannotated speech material. Another research direction, recognition and characterization of voice spoofing attacks, is particularly relevant in information security – how to detect whether a given audio was actually produced by a live human, or be fabricated for instance by a speech synthesizer? Such questions are already of key importance in the world of fake news, and with the rapidly evolving speech synthesis technology that can already now be used to create speech that sounds like a particular targeted speaker. The research group has been one of the forerunners in detecting such spoofing attacks (see www.asvspoof.org).
Further information about the research group: https://sites.uef.fi/speech/