A NEW APPROACH FOR HEAD REALTED TRASFER FUNCTION INTERPOLATION/ CUSTOMIZATION
Motivation:
A HRIR is a function of time, elevation, azimuth, and the individual. Measuring
a HRIR experimentally for a fine sampling grid of elevation and azimuth is a
time consuming and laborious process. In most spatial audio systems some sort of
interpolation of the HRIR with respect to elevation and azimuth is needed.
Interpolation is a very deceptive word here since any trivial interpolation of
the HRIRs without taking into consideration the physical significance of the
various features of the HRIRs and their dependency on elevation,azimuth and
individuals would not make any sense. For interpolation with respect to
elevation and azimuth different approaches used include direct time domain
interpolation of HRIRs, direct frequency domain interpolation of HRTFs and
interpolation of principal components in the PCA domain. These methods do not
take into account the perceptual significance of the different features in the
HRIR or the HRTF. For example if we just use linear interpolation in the
frequency domain we are just averaging the magnitude of the spectrum of the
closest sampling points which may distort the perceptually significant features.
Ideally interpolation should be done in the perceptually important feature
domain. Also a set of generic HRIR would not work satisfactorily since it has
been shown that HRIRs are specific to an individual and if we use HRIRs of some
other person the elevation perception will be very poor. Due to the difference
in the anatomy of the humans and also the shape of the pinna the HRIRs for
different persons will be quite different.To demostrate this point Figure 1
shows the HRTF as a function of elevation for zero azimuth along with their ear
shapes for different subjects. It can be observed that there are significant
differences in the HRTF. Also psychophysical experiments have been done to
confirm that a large deterioration in the ability of subjects to localize sounds
vertically when they listened to recordings made with microphone placed in other
subject's ear canals. Several approaches have been studied for customization of
HRIRs. We call both the interpolation of HRIR with respect to elevation and
azimuth and customization to account for individual differences as customization
of HRIRs for any elevation,azimuth and person. So the goal of our project is
given a set of HRIRs experimentally measured for a certain number of individuals
at a fixed sampling of elevation and azimuth, to customize this database for any
new elevation,azimuth and individual.
Methodolgy:
We solve this problem in three steps called as Feature identification, Analysis
or feature extraction,Relation between features and physical measurements and
Synthesis. In the Feature identification phase we study the database which we
have for different elevation, azimuth and individuals and identify that there
are indeed features in the HRIR(in any domain) which are responsible for source
localization and whose physical significance can be conjectured if not fully
explained at this stage. Also these features should vary with elevation, azimuth
and the individual. Once the dominant features are identified in the analysis
phase we devise signal processing techniques to extract these dominant features
from the HRIR automatically. If only the dominant features completely modelled
our hearing system then the residual HRIR should be random noise. However since
there always will be a lot of features which are difficult to explain we retain
the residual HRIRs for all the individuals and all elevations and azimuths. The
residual HRIRs will be later used in the synthesis phase. Once the dominant set
of features are extracted from the HRIR we need to correlate these features with
the elevation, azimuth and different persons and quantify the dependencies. In
the synthesis if we are given a set of anthropometric measurements for a new
person and asked the HRIR for any elevation and azimuth we get the dominant
features based on the elevation,azimuth and the anthropometric measurements and
form a Impulse response which has only the dominant features. Then the closest
residual HRIR is selected from the database and superimposed on the synthesized
Impulse response to get the final HRIR. Figure 2 shows the analysis and the
synthesis phase ina block diagram.
Applications: Psychophysical experiments have confirmed that the features
we extracted are perceptually significant for source localization. So instead of
convolving with the complete HRIR we can build simplified models based on the
features extracted. This is different from the approaches in the literature
which use geometrical models to build simplified models. In our case we extract
the features from the experimentally measured HRIR. Also interpolation can be
done in the feature domain. Based on the features extracted these features can
be related to the physical dimensions of the human anatomy and the pinna and
customization of the HRIR is possible.