Abstract
Speaker verification is a promising Internet of Things (IoT) user authentication model. However, verifying the identity of an individual using speech utterance is challenging especially in the presence of replay spoofing attacks, where attackers can record the speech of the user and use it for unauthorized access. In speaker verification, the voice signature is often polluted by ambient noise. This alone confounds the design of appropriate feature extractors that can distinguish the authentic signal from that of a replay signal. The literature to date reflects very poor performance; notably with state-of-the-art verification errors no less than 11.3% false positives (FPs) from all samples. To address this difficult gap that has, up to now, prevented speaker verification from being a satisfactory loT authentication alternative we are seeking to develop an improved approach yielding better precision compared to state of the art systems. To accomplish this goal, we have developed a new approach that uses MFCC and CQCC as input features, a CNN based front-end feature extractor, and SVM back-end classifier. Using better training techniques such as batch normalizations, we improved the precision significantly. achieving a state-of-the-art performance. In addition, a formulation that includes knowledge of distinguishing genuine voice from a replay attack is implemented. Experiment has been conducted on ASVspoof 2017 datasets. An improvement in the state-of-the-art performance has been obtained, achieving 7.1 of Equal Error Rate (EER).