Significant improvements in intelligibility of speech in noise can be obtained by modifying the speech signal in the time and/or frequency domains. However, most speech intelligibility enhancement algorithms are designed to use clean speech as an input, and their performance suffers once the input speech signal-to-noise ratio decreases, a common case in face-to-face communication environments such as restaurants or cafés.
In this work we investigate whether a particularly successful speech intelligibility enhancement system—spectral shaping and dynamic range compression—and various front-end noise reduction methods might be suitable in such environments. Our evaluations suggest that such a complete system would provide an increase in speech intelligibility equivalent to a gain of 10 dB input signal-to-noise ratio in the more challenging face-to-face communication environments.