Master 2018 2019
Stages de la spécialité SAR
Spatial Features for Robust Speech Recognition

Site : Spatial Features for Robust Speech Recognition
Lieu : OFFIS Institute for Information Technology Escherweg 2., 26121 Oldenburg, Germany
Encadrant : Benjamin Cauchi Researcher in the Group Biomedical and Systems (BMS)
Dates :du 15/02/2018 au 15/08/2018
Rémunération :According to German Regulations (around 450€ per month)
Mots-clés : Parcours ATIAM : Acoustique, Parcours ATIAM : Informatique musicale, Parcours ATIAM : Traitement du signal


Spatial Features for Robust Speech Recognition

Accuracy of ASR (Automatic Speech Recognition) systems and the speech intelligibility perceived by a human listener are defined in a similar way, i.e., as the ratio between the number of words correctly identified and the number of words present in the target signal. Additionally, it is well known that binaural cues have a large impact on speech intelligibility [1] and that preprocessing using spatial characteristics of the signal, e.g. beamforming, can greatly improve the performance of ASR systems [2].

Speech enhancement, which aim at improving speech intelligibility, and preprocessing, which aim at improving ASR accuracy are often based on similar processing schemes and approaches designed for one application can be exploited by the other. For example, features used by ASR systems are often derived from psychoacoustic models [3] while recent approaches have used ASR systems to predict speech intelligibility [4].

The aim of this internship is to quantify the impact of binaural cues on the performance of an ASR system. The first stages of the internship will consist in getting familiar with the literature and in extracting standard spatial features using Matlab. This features will then have to be integrated into a provided ASR framework (Based on Kaldi), which aims at recognising speech sentences from binaural signals, in order to quantify the benefits of the considered spatial features.

The work would require a solid knowledge of signal processing as well as an interest for machine learning and experience in programming using Matlab and C or C++. A good level of English is expected to write publications, e.g. internship report, and communicate results internally.

The projected work is suited for an internship of 5 to 6 months, the starting date is flexible. Work will be carried within the OFFIS Institute for Information Technology in Oldenburg, Germany. OFFIS is an associated institute of the University of Oldenburg, focussing on application oriented research.


[1] J. Blauert, "Spatial Hearing : The Psychophysics of Human Sound Localization.", Cambridge, MA, USA : MIT Press, 1997.

[2] Results of the 4th CHiME Speech Separation and Recognition Challenge /chime2016/ results.html"

[3] N. Moritz et al., "An Auditory Inspired Amplitude Modulation Filter Bank for Robust Feature Extraction in Automatic Speech Recognition.", in IEEE/ACM Trans. on Audio, Speech, and Language Processing, 2015.

[4] B. Kollmeier et al., "Sentence Recognition Prediction for Hearing-impaired Listeners in Stationary and Fluctuation Noise With FADE : Empowering the Attenuation and Distortion Concept by Plomp With a Quantitative Processing Model", in Trends in Hearing, 2016.