The purpose of this research was to investigate the potential effectiveness of digital speech processing and pattern recognition techniques for the automatic recognition of gender from speech segments. In this paper ‘‘coarse’’ acoustic coefficients (autocorrelation, linear prediction, cepstrum, and reflection) were used to form test and reference templates for vowels, voiced fricatives, and unvoiced fricatives. The effects of different distance measures, filter orders, recognition schemes, and vowels and fricatives were comparatively assessed to determine their effectiveness for the task of gender recognition from speech segments. The results showed that most of the acoustic parameters worked well for gender recognition. A within-gender and within-subject averaging technique was important for generating appropriate test and reference templates. The Euclidean distance measure appeared to be the most robust as well as the simplest of the distance measures. The results from this study implied that the gender information is time invariant, phoneme independent, and speaker independent for a given gender. One recognition scheme achieved 100% correct speaker gender classification for a database of 52 talkers (27 male and 25 female). In part II of this paper [J. Acoust. Soc. Am. 90 (1991); hereafter referred to as paper II] the detailed features of ten vowels that appeared responsible for distinguishing a speaker’s gender were examined statistically. Included in paper II is a replication of part of the classical study of Peterson and Barney [J. Acoust. Soc. Am. 24, 175–184 (1952)] of vowel characteristics.
Tópico:
Speech Recognition and Synthesis
Citaciones:
153
Citaciones por año:
Altmétricas:
0
Información de la Fuente:
FuenteThe Journal of the Acoustical Society of America