Most speech enhancement applications perform frequency shaping by means of multiplication in the frequency domain. Operating in the frequency domain is equivalent to convolution in the time domain. In these speech enhancement algorithms, the updating of frequency response alone cannot ensure the fulfillment of the conditions required for multiplication in frequency to correspond to linear convolution instead of circular convolution. As a result, artifacts and distortions may be present in the output of a standard fast Fourier transform (FFT)-based algorithm. Typical methods to deal with these artifacts involve overlapping and windowing. However, even using these strategies, artifacts may be perceptually noticeable under certain signal-to-noise ratio (SNR) conditions and/or when a high sampling frequency is employed. This paper analyzes the efficiency of the standard methods, explains the source of these distortions, provides a perceptual evidence of these artifacts, and proposes two alternative methods to perform artifact-free and distortion-free FFT convolution. These methods are based on the extension of the impulse response and the splitting of the impulse response in two impulse responses, operations that are performed in the frequency-domain. Computational costs and performance of the proposed techniques are also discussed.
Tópico:
Speech and Audio Processing
Citaciones:
5
Citaciones por año:
Altmétricas:
0
Información de la Fuente:
FuenteIEEE Transactions on Audio Speech and Language Processing