Abstract
Automatic Speaker Verification (ASV) systems are vulnerable to a variety of voice spoofing attacks, e.g., replays, speech synthesis, etc. The imposters/fraudsters often use different voice spoofing attacks to fool the ASV systems to achieve certain objectives, i.e., bypass the security of someone's home or stealing money from a bank account, etc. To counter such fraudulent activities on the ASV systems, we propose a robust voice spoofing detection system capable of effectively detecting multiple types of spoofing attacks. For this purpose, we propose a novel feature descriptor Center Lop-Sided Local Binary Patterns (CLS-LBP) for audio representation. CLS-LBP effectively analyzes the audios bidirectionally to better cap-ture the artifacts of synthetic speech, microphone distortions of replay, and dynamic speech attributes of the bonafide signal. The proposed CLS-LBP features are used to train the long short-term memory (LSTM) network for detection of both the physical-(replay) and logical-access attacks (speech synthesis, voice conversion). We employed the LSTM due to its effectiveness to better process and learn the internal rep-resentation of sequential data. More specifically, we obtained an equal error rate (EER) value of 0.06% on logical-acess (LA) while 0.58% on physical-access (PA) attacks. Additionally, the proposed system is also capable of detecting the unseen voice spoofing attacks and also robust enough to classify among the clon-ing algorithms used to synthesize the speech. Performance evaluation on the ASVspoof 2019 corpus sig-nify the effectiveness of the proposed system in terms of detecting the physical-and logical-access attacks over existing state-of-the-art voice spoofing detection systems.(c) 2022 The Authors. Published by Elsevier B.V. on behalf of King Saud University. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).