Voice Feature Learning using Convolutional Neural Networks Designed to Avoid Replay Attacks

Salahaldeen Duraibi; Wasim Alhamdani; Frederick T. Sheldon; IEEE

Back

Conference proceeding

Voice Feature Learning using Convolutional Neural Networks Designed to Avoid Replay Attacks

Salahaldeen Duraibi, Wasim Alhamdani, Frederick T. Sheldon and IEEE

2020 IEEE SYMPOSIUM SERIES ON COMPUTATIONAL INTELLIGENCE (SSCI), pp.1845-1851

01/01/2020

Abstract

Computer Science

Computer Science, Artificial Intelligence

Computer Science, Interdisciplinary Applications

Engineering

Engineering, Electrical & Electronic

Science & Technology

Technology

Speaker verification is a promising Internet of Things (IoT) user authentication model. However, verifying the identity of an individual using speech utterance is challenging especially in the presence of replay spoofing attacks, where attackers can record the speech of the user and use it for unauthorized access. In speaker verification, the voice signature is often polluted by ambient noise. This alone confounds the design of appropriate feature extractors that can distinguish the authentic signal from that of a replay signal. The literature to date reflects very poor performance; notably with state-of-the-art verification errors no less than 11.3% false positives (FPs) from all samples. To address this difficult gap that has, up to now, prevented speaker verification from being a satisfactory loT authentication alternative we are seeking to develop an improved approach yielding better precision compared to state of the art systems. To accomplish this goal, we have developed a new approach that uses MFCC and CQCC as input features, a CNN based front-end feature extractor, and SVM back-end classifier. Using better training techniques such as batch normalizations, we improved the precision significantly. achieving a state-of-the-art performance. In addition, a formulation that includes knowledge of distinguishing genuine voice from a replay attack is implemented. Experiment has been conducted on ASVspoof 2017 datasets. An improvement in the state-of-the-art performance has been obtained, achieving 7.1 of Equal Error Rate (EER).

Metrics

1 Record Views

Details

Title: Voice Feature Learning using Convolutional Neural Networks Designed to Avoid Replay Attacks
Creators - without role: Salahaldeen Duraibi - University of Idaho
Wasim Alhamdani - University of the Cumberlands
Frederick T. Sheldon - University of Idaho
IEEE
Publication Details: 2020 IEEE SYMPOSIUM SERIES ON COMPUTATIONAL INTELLIGENCE (SSCI), pp.1845-1851
Publisher: IEEE
Number of pages: 7
Identifiers: 9917542308331
Academic Unit: Jazan University
Language: English
Resource Type: Conference proceeding