Abstract | Mel-frequency cepstral coefficients (MFCC) havebeen dominantly used in speaker recognition as well as in speech
recognition. However, based on theories in speech production,
some speaker characteristics associated with the structure of the
vocal tract, particularly the vocal tract length, are reflected more
in the high frequency range of speech. This insight suggests that
a linear scale in frequency may provide some advantages in
speaker recognition over the mel scale. Based on two state-of-the-
art speaker recognition back-end systems (one Joint Factor
Analysis system and one Probabilistic Linear Discriminant
Analysis system), this study compares the performances between
MFCC and LFCC (Linear frequency cepstral coefficients) in the
NIST SRE (Speaker Recognition Evaluation) 2010 extended-core
task. Our results in SRE10 show that, while they are
complementary to each other, LFCC consistently outperforms
MFCC, mainly due to its better performance in the female trials.
This can be explained by the relatively shorter vocal tract in
females and the resulting higher formant frequencies in speech.
LFCC benefits more in female speech by better capturing the
spectral characteristics in the high frequency region. In addition,
our results show some advantage of LFCC over MFCC in
reverberant speech. LFCC is as robust as MFCC in the babble
noise, but not in the white noise. It is concluded that LFCC
should be more widely used, at least for the female trials, by the
mainstream of the speaker recognition community.
|