Abstract
A pitch-synchronous (PS) auditory feature extraction method, based on ZCPA (zero-crossings peak-amplitudes), has been proposed (Ghulam, M. et al., Proc. ICSLP04, 2004) and was shown to be more robust than the conventional ZCPA (Kim, D.S. et al., IEEE Trans. Speech Audio Process., vol.7, no.1, p.55-69, 1999). We examine the effect of auditory masking, both simultaneous and temporal, in the PS-ZCPA method. We also observe the effect of varying the number of histogram bins on the way to find out the optimum parameters of the proposed method. Experimental results demonstrate the improved performance of the PS-ZCPA method achieved by embedding auditory masking into it; for example, with both the masking methods embedded, the performance increases to 73.71% from the 69.92% obtained without masking for PS-ZCPA, while it showed little improvement with an increased number of histogram bins.