Log in

Relevant bibliographies by topics / Cepstral mean variance normalization (CMVN)

Contents

Journal articles
Conference papers

Academic literature on the topic 'Cepstral mean variance normalization (CMVN)'

Author: Grafiati

Published: 5 June 2025

Last updated: 15 July 2025

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the lists of relevant articles, books, theses, conference reports, and other scholarly sources on the topic 'Cepstral mean variance normalization (CMVN).'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Journal articles on the topic "Cepstral mean variance normalization (CMVN)"

1

Musab, T. S. Al-Kaltakchi, Abd Al-Raheem Taha Haithem, Abd Shehab Mohanad, and A. M. Abdullah Mohammed. "Comparison of feature extraction and normalization methods for speaker recognition using grid-audiovisual database." Indonesian Journal of Electrical Engineering and Computer Science (IJEECS) 18, no. 2 (2020): 782–89. https://doi.org/10.11591/ijeecs.v18.i2.pp782-789.

Full text

Abstract:

In this paper, different feature extraction and feature normalization methods are investigated for speaker recognition. With a view to give a good representation of acoustic speech signals, Power Normalized Cepstral Coefficients (PNCCs) and Mel Frequency Cepstral Coefficients (MFCCs) are employed for feature extraction. Then, to mitigate the effect of linear channel, Cepstral Mean-Variance Normalization (CMVN) and feature warping are utilized. The current paper investigates Text-independent speaker identification system by using 16 coefficients from both the MFCCs and PNCCs features. Eight different speakers are selected from the GRIDAudiovisual database with two females and six males. The speakers are modeled using the coupling between the Universal Background Model and Gaussian Mixture Models (GMM-UBM) in order to get a fast scoring technique and better performance. The system shows 100% in terms of speaker identification accuracy. The results illustrated that PNCCs features have better performance compared to the MFCCs features to identify females compared to male speakers. Furthermore, feature wrapping reported better performance compared to the CMVN method.

APA, Harvard, Vancouver, ISO, and other styles

2

Al-Kaltakchi, Musab T. S., Haithem Abd Al-Raheem Taha, Mohanad Abd Shehab, and Mohamed A. M. Abdullah. "Comparison of feature extraction and normalization methods for speaker recognition using grid-audiovisual database." Indonesian Journal of Electrical Engineering and Computer Science 18, no. 2 (2020): 782. http://dx.doi.org/10.11591/ijeecs.v18.i2.pp782-789.

Full text

Abstract:

<p><span lang="EN-GB">In this paper, different feature extraction and feature normalization methods are investigated for speaker recognition. With a view to give a good representation of acoustic speech signals, Power Normalized Cepstral Coefficients (PNCCs) and Mel Frequency Cepstral Coefficients (MFCCs) are employed for feature extraction. Then, to mitigate the effect of linear channel, Cepstral Mean-Variance Normalization (CMVN) and feature warping are utilized. The current paper investigates Text-independent speaker identification system by using 16 coefficients from both the MFCCs and PNCCs features. Eight different speakers are selected from the GRID-Audiovisual database with two females and six males. The speakers are modeled using the coupling between the Universal Background Model and Gaussian Mixture Models (GMM-UBM) in order to get a fast scoring technique and better performance. The system shows 100% in terms of speaker identification accuracy. The results illustrated that PNCCs features have better performance compared to the MFCCs features to identify females compared to male speakers. Furthermore, feature wrapping reported better performance compared to the CMVN method. </span></p>

APA, Harvard, Vancouver, ISO, and other styles

3

Deng, Lei, and Yong Gao. "Gammachirp Filter Banks Applied in Roust Speaker Recognition Based GMM-UBM Classifier." International Arab Journal of Information Technology 17, no. 2 (2019): 170–77. http://dx.doi.org/10.34028/iajit/17/2/4.

Full text

Abstract:

In this paper, authors propose an auditory feature extraction algorithm in order to improve the performance of the speaker recognition system in noisy environments. In this auditory feature extraction algorithm, the Gammachirp filter bank is adapted to simulate the auditory model of human cochlea. In addition, the following three techniques are applied: cube-root compression method, Relative Spectral Filtering Technique (RASTA), and Cepstral Mean and Variance Normalization algorithm (CMVN).Subsequently, based on the theory of Gaussian Mixes Model-Universal Background Model (GMM-UBM), the simulated experiment was conducted. The experimental results implied that speaker recognition systems with the new auditory feature has better robustness and recognition performance compared to Mel-Frequency Cepstral Coefficients(MFCC), Relative Spectral-Perceptual Linear Predictive (RASTA-PLP),Cochlear Filter Cepstral Coefficients (CFCC) and gammatone Frequency Cepstral Coefficeints (GFCC)

APA, Harvard, Vancouver, ISO, and other styles

4

Misbullah, Alim, Muhammad Saifullah Sani, Husaini, Laina Farsiah, Zahnur, and Kikye Martiwi Sukiakhy. "Sistem Identifikasi Pembicara Berbahasa Indonesia Menggunakan X-Vector Embedding." Jurnal Teknologi Informasi dan Ilmu Komputer 11, no. 2 (2024): 369–76. http://dx.doi.org/10.25126/jtiik.20241127866.

Full text

Abstract:

Penyemat pembicara adalah vektor yang terbukti efektif dalam merepresentasikan karakteristik pembicara sehingga menghasilkan akurasi yang tinggi dalam ranah pengenalan pembicara. Penelitian ini berfokus pada penerapan x-vectors sebagai penyemat pembicara pada sistem identifikasi pembicara berbahasa Indonesia yang menggunakan model speaker identification. Model dibangun dengan menggunakan dataset VoxCeleb sebagai data latih dan dataset INF19 sebagai data uji yang dikumpulkan dari suara mahasiswa Jurusan Informatika Universitas Syiah Kuala angkatan 2019. Untuk membangun model, fitur-fitur diekstrak dengan menggunakan Mel-Frequency Cepstral Coeffients (MFCC), dihitung Voice Activity Detection (VAD), dilakukan augmentasi dan normalisasi fitur menggunakan Cepstral Mean and Variance Normalization (CMVN) serta dilakukan filtering. Sedangkan proses pengujian model hanya membutuhkan fitur-fitur yang diekstrak dengan menggunakan MFCC dan dihitung VAD saja. Terdapat 4 (empat) model yang dibangun dengan cara mengombinasikan dua jenis konfigurasi MFCC dan dua jenis arsitektur Deep Neural Network (DNN) yang memanfaatkan Time Delay Neural Network (TDNN). Model terbaik dipilih berdasarkan akurasi tertinggi yang dihitung menggunakan metrik Equal Error Rate (EER) dan durasi ekstraksi x-vectors tersingkat dari keempat model. Nilai EER dari model yang terbaik untuk dataset VoxCeleb1 bagian test sebesar 3,51%, inf19_test_td sebesar 1,3%, dan inf19_test_tid sebesar 1,4%. Durasi ekstraksi x-vectors menggunakan model terbaik untuk data train berdurasi 6 jam 42 menit 39 detik, VoxCeleb1 bagian test berdurasi 2 menit 24 detik, inf19_enroll berdurasi 18 detik, inf19_test_td berdurasi 25 detik, dan inf19_test_tid berdurasi 9 detik. Arsitektur DNN kedua dan konfigurasi MFCC kedua yang telah dirancang menghasilkan model yang lebih kecil, akurasi yang lebih baik terutama untuk dataset pembicara berbahasa Indonesia, dan durasi ekstraksi x-vectors yang lebih singkat. Abstract The speaker embedding is a vector that has been proven effective in representing speaker characteristics, resulting in high accuracy in the domain of speaker recognition. This research focuses on the application of x-vectors as speaker embeddings in the Indonesian language speaker identification system using a speaker identification model. The model is built using the VoxCeleb dataset as training data and the INF19 dataset as testing data, collected from the voices of students of Informatics Department, Universitas Syiah Kuala from the 2019 batch. To build the model, features are extracted using Mel-Frequency Cepstral Coefficients (MFCC), Voice Activity Detection (VAD) is applied, augmentation and normalization of features are performed using Cepstral Mean and Variance Normalization (CMVN), and filtering is applied. On the other hand, the model testing process only requires features extracted using MFCC and computed VAD. There are 4 (four) models are constructed by combining two configurations of MFCC and two types of Deep Neural Network (DNN) architectures that utilize the Time Delay Neural Network (TDNN). The best model is selected based on the highest accuracy calculated using the Equal Error Rate (EER) metric and the shortest duration of x-vector extraction from the four models. The EER values for the best model on the VoxCeleb1 test dataset are 3.51%, 1.3% for inf19_test_td, and 1.4% for inf19_test_tid. The x-vector extraction duration using the best model for the training dataset is 6 hours 42 minutes 39 seconds, 2 minutes 24 seconds for VoxCeleb1 test part, 18 seconds for inf19_enroll, 25 seconds for inf19_test_td, and 9 seconds for inf19_test_tid. The second DNN architecture and the second MFCC configuration designed result in a smaller model, better accuracy, especially for Indonesian language speaker datasets, and shorter x-vector extraction duration.

APA, Harvard, Vancouver, ISO, and other styles

5

Huang, Yi Bo, Qiu Yu Zhang, Zhan Ting Yuan, and Peng Fei Xing. "Speech Perception Hash Authentication Algorithm Based on Immittance Spectral Pairs." Applied Mechanics and Materials 610 (August 2014): 385–92. http://dx.doi.org/10.4028/www.scientific.net/amm.610.385.

Full text

Abstract:

According to the situation that traditional speech authentication algorithms don’t be appropriated for present speech communication, we proposed a speech authentication algorithm of perceptual hashing based on Immittance Spectral Pairs. It can satisfy the requirement of the efficiency and the robustness for speech authentication. Firstly, the speech signal pre-processing, for framing, adding window, obtained for each speech frame immittance spectral Pairs parameters, constitute an immittance spectral Pairs parameter matrix. Then process cepstral mean and variance normalization for immittance spectral Pairs parameter matrix, cepstral mean and variance normalization can effectively improve the robustness of the Gaussian white noise. And parameter matrix for non-negative matrix factorization. Finally, quantifying the formed weight matrix and getting perceptual hashing sequences.Experiments show that the proposed algorithm has good robustness for content preserving operations, and it doesn’t reduce the efficiency while meeting robustness, it can satisfy the real-time requirement of speech communication.

APA, Harvard, Vancouver, ISO, and other styles

6

Amara korba, Mohamed Cherif, Houcine Bourouba, and Rafik Djemili. "FEATURE EXTRACTION ALGORITHM USING NEW CEPSTRAL TECHNIQUES FOR ROBUST SPEECH RECOGNITION." Malaysian Journal of Computer Science 33, no. 2 (2020): 90–101. http://dx.doi.org/10.22452/mjcs.vol33no2.1.

Full text

Abstract:

In this work, we propose a novel feature extraction algorithm that improves the robustness of automatic speech recognition (ASR) systems in the presence of various types of noise. The proposed algorithm uses a new cepstral technique based on the differential power spectrum (DPS) instead of the power spectrum (PS), the algorithm replaces the logarithmic non linearity by the power function. In order to reduce cepstral coefficients mismatches between training and testing conditions, we used the mean and variance normalization, then we apply auto-regression movingaverage filtering (MVA) in the cepstral domain. The ASR experiments were conducted using two databases, the first is LASA digit database designed for recognition the isolated Arabic digits in the presence of different types of noise. The second is Aurora 2 noisy speech database designed to recognize connected English digits in various operating environments. The experimental results show a substantial improvement from the proposed algorithm over the baseline Mel Frequency Cepstral Coefficients (MFCC), the relative improvement is the 28.92% for LASA database and is the 44.43% for aurora 2 database. The performance of our proposed algorithm was tested and verified by extensive comparisons with the state-of-the-art noise-robust features in aurora 2.

APA, Harvard, Vancouver, ISO, and other styles

7

Kamiński, Kamil Adam, Andrzej Piotr Dobrowolski, Zbigniew Piotrowski, and Przemysław Ścibiorek. "Enhancing Web Application Security: Advanced Biometric Voice Verification for Two-Factor Authentication." Electronics 12, no. 18 (2023): 3791. http://dx.doi.org/10.3390/electronics12183791.

Full text

Abstract:

This paper presents a voice biometrics system implemented in a web application as part of a two-factor authentication (2FA) user login. The web-based application, via a client interface, runs registration, preprocessing, feature extraction and normalization, classification, and speaker verification procedures based on a modified Gaussian mixture model (GMM) algorithm adapted to the application requirements. The article describes in detail the internal modules of this ASR (Automatic Speaker Recognition) system. A comparison of the performance of competing ASR systems using the commercial NIST 2002 SRE voice dataset tested under the same conditions is also presented. In addition, it presents the results of the influence of the application of cepstral mean and variance normalization over a sliding window (WCMVN) and its relevance, especially for voice recordings recorded in varying acoustic tracks. The article also presents the results of the selection of a reference model representing an alternative hypothesis in the decision-making system, which significantly translates into an increase in the effectiveness of speaker verification. The final experiment presented is a test of the performance achieved in a varying acoustic environment during remote voice login to a web portal by the test group, as well as a final adjustment of the decision-making threshold.

APA, Harvard, Vancouver, ISO, and other styles

8

Farahani, Gholamreza. "Autocorrelation-based noise subtraction method with smoothing, overestimation, energy, and cepstral mean and variance normalization for noisy speech recognition." EURASIP Journal on Audio, Speech, and Music Processing 2017, no. 1 (2017). http://dx.doi.org/10.1186/s13636-017-0110-8.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Conference papers on the topic "Cepstral mean variance normalization (CMVN)"

1

Prasad, N. Vishnu, and S. Umesh. "Improved cepstral mean and variance normalization using Bayesian framework." In 2013 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU). IEEE, 2013. http://dx.doi.org/10.1109/asru.2013.6707722.

Full text

APA, Harvard, Vancouver, ISO, and other styles

We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!