Deep NN for estimating speech model activations

This paper presents an approach for improving the perceptual quality of speech separated from background noise at low signal-to-noise ratios. Our approach uses two stages of deep neural networks, where the first stage estimates the ideal ratio mask that separates speech from noise, and the second stage maps the ratio-masked speech to the clean speech activation matrices that are used for nonnegative matrix factorization (NMF).

Supervised NMF systems make assumptions about the relationship between the activation and basic matrices that do not always hold. Other two-stage approaches combining masking with NMF reconstruction do not account for mask estimation errors. We show that the proposed algorithm achieves higher objective speech quality and intelligibility compared to these related methods.

Share this post

Recommended for You

New Beamforming and Relay Selection for Two-Way Decode-and-Forward Relay Networks

Argo: A Real-Time Network-on-Chip Architecture With an Efficient GALS Implementation

Shaping physical machine topology in distributed data center networks

Integrated test concepts for in-situ millimeter-wave device characterization