Forensic speaker recognition: A new method based on extracting accent and language information from short utterances

Sajid Saleem; Fazli Subhan; Noman Naseer; Abdul Bais; Ammara Imtiaz

doi:10.1016/j.fsidi.2020.300982

Back

Forensic speaker recognition: A new method based on extracting accent and language information from short utterances

Journal article

Peer reviewed

Forensic speaker recognition: A new method based on extracting accent and language information from short utterances

Sajid Saleem, Fazli Subhan, Noman Naseer, Abdul Bais and Ammara Imtiaz

FORENSIC SCIENCE INTERNATIONAL-DIGITAL INVESTIGATION, Vol.34, p.300982

01/09/2020

DOI: https://doi.org/10.1016/j.fsidi.2020.300982

Abstract

Computer Science

Computer Science, Information Systems

Computer Science, Interdisciplinary Applications

Science & Technology

Technology

This paper presents a new method for Forensic Speaker Recognition (FSR). The new method is based on extracting accent and language information from short utterances. Accent Classification (AC) and Lan-guage Identification (LI) play important role in the identification of people of different groups, communities and origins due to different speaking styles and native languages. In a multilingual society, the forensic experts use AC and LI to reduce search space for suspect recognition to regional and ethnic groups. In this paper, we use different baseline and deep learning methods to automate this process. The baseline methods used are Gaussian Mixture Model-Universal Background Model (GMM-UBM), i-vector and Gaussian Mixture Model-Support Vector Machine (GMM-SVM). The Mel-Frequency Cepstral Coefficients (MFCC) are used as speech features in the baseline methods. The deep learning methods used are Convolutional Neural Network (CNN) and Deep Neural Network (DNN). The recently proposed CNN based methods like VGGVox and GMM-CNN are used. VGGVox and GMM-CNN use speech spectrograms. In case of DNN, x-vectors method is used, which is based on DNN embedding. The experimental results show that GMM-SVM demonstrates better FSR performance compared to GMM-UBM and i-vector methods. Whereas, x-vectors method performs better than GMM-CNN and VGGVox methods. It also performs better than GMM-SVM method. The experimental results show that x-vectors method demonstrates 80.4% FSR accuracy. With AC, it achieves 85.4% accuracy. With LI, its accuracy is 90.2%. Whereas by combining AC and LI it obtains 95.1% accuracy. This shows that the proposed method based on AC and LI gives promising results. (C) 2020 Elsevier Ltd. All rights reserved.

Metrics

1 Record Views

See more details

Details

Title: Forensic speaker recognition: A new method based on extracting accent and language information from short utterances
Creators - without role: Sajid Saleem - National University of Modern Languages
Fazli Subhan - National University of Modern Languages
Noman Naseer - Air University
Abdul Bais - University of Regina
Ammara Imtiaz - National University of Modern Languages
Publication Details: FORENSIC SCIENCE INTERNATIONAL-DIGITAL INVESTIGATION, Vol.34, p.300982
Publisher: Elsevier
Number of pages: 8
Identifiers: 9933690608331
Academic Unit: University of Jeddah
Language: English
Resource Type: Journal article