Log in

Relevant bibliographies by topics / Mel-frequency Cepstrum Coefficients (MFCC) / Journal articles

To see the other types of publications on this topic, follow the link: Mel-frequency Cepstrum Coefficients (MFCC).

Journal articles on the topic 'Mel-frequency Cepstrum Coefficients (MFCC)'

Author: Grafiati

Published: 5 June 2025

Last updated: 16 July 2025

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 journal articles for your research on the topic 'Mel-frequency Cepstrum Coefficients (MFCC).'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse journal articles on a wide variety of disciplines and organise your bibliography correctly.

1

Mahalakshmi, P., Muruganandam M, and Sharmila A. "VOICE RECOGNITION SECURITY SYSTEM USING MEL-FREQUENCY CEPSTRUM COEFFICIENTS." Asian Journal of Pharmaceutical and Clinical Research 9, no. 9 (2016): 131. http://dx.doi.org/10.22159/ajpcr.2016.v9s3.13633.

Full text

Abstract:

ABSTRACTObjective: Voice Recognition is a fascinating field spanning several areas of computer science and mathematics. Reliable speaker recognition is a hardproblem, requiring a combination of many techniques; however modern methods have been able to achieve an impressive degree of accuracy. Theobjective of this work is to examine various speech and speaker recognition techniques and to apply them to build a simple voice recognition system.Method: The project is implemented on software which uses different techniques such as Mel frequency Cepstrum Coefficient (MFCC), VectorQuantization (VQ) which are implemented using MATLAB.Results: MFCC is used to extract the characteristics from the input speech signal with respect to a particular word uttered by a particular speaker. VQcodebook is generated by clustering the training feature vectors of each speaker and then stored in the speaker database.Conclusion: Verification of the speaker is carried out using Euclidian Distance. For voice recognition we implement the MFCC approach using softwareplatform MatlabR2013b.Keywords: Mel-frequency cepstrum coefficient, Vector quantization, Voice recognition, Hidden Markov model, Euclidean distance.

APA, Harvard, Vancouver, ISO, and other styles

2

Varma, V. Sai Nitin, and Abdul Majeed K.K. "Advancements in Speaker Recognition: Exploring Mel Frequency Cepstral Coefficients (MFCC) for Enhanced Performance in Speaker Recognition." International Journal for Research in Applied Science and Engineering Technology 11, no. 8 (2023): 88–98. http://dx.doi.org/10.22214/ijraset.2023.55124.

Full text

Abstract:

Abstract: Speaker recognition, a fundamental capability of software or hardware systems, involves receiving speech signals, identifying the speaker present in the speech signal, and subsequently recognizing the speaker for future interactions. This process emulates the cognitive task performed by the human brain. At its core, speaker recognition begins with speech as the input to the system. Various techniques have been developed for speech recognition, including Mel frequency cepstral coefficients (MFCC), Linear Prediction Coefficients (LPC), Linear Prediction Cepstral coefficients (LPCC), Line Spectral Frequencies (LSF), Discrete Wavelet Transform (DWT), and Perceptual Linear Prediction (PLP). Although LPC and several other techniques have been explored, they are often deemed impractical for real-time applications. In contrast, MFCC stands out as one of the most prominent and widely used techniques for speaker recognition. The utilization of cepstrum allows for the computation of resemblance between two cepstral feature vectors, making it an effective tool in this domain. In comparison to LPC-derived cepstrum features, the use of MFCC features has demonstrated superior performance in metrics such as False Acceptance Rate (FAR) and False Rejection Rate (FRR) for speaker recognition systems. MFCCs leverage the human ear's critical bandwidth fluctuations with respect to frequency. To capture phonetically important characteristics of speech signals, filters are linearly separated at low frequencies and logarithmically separated at high frequencies. This design choice is central to the effectiveness of the MFCC technique. The primary objective of the proposed work is to devise efficient techniques that extract pertinent information related to the speaker, thereby enhancing the overall performance of the speaker recognition system. By optimizing feature extraction methods, this research aims to contribute to the advancement of speaker recognition technology.

APA, Harvard, Vancouver, ISO, and other styles

3

Kasim, Anita Ahmad, Muhammad Bakri, Irwan Mahmudi, Rahmawati Rahmawati, and Zulnabil Zulnabil. "Artificial Intelligent for Human Emotion Detection with the Mel-Frequency Cepstral Coefficient (MFCC)." JUITA : Jurnal Informatika 11, no. 1 (2023): 47. http://dx.doi.org/10.30595/juita.v11i1.15435.

Full text

Abstract:

Emotions are an important aspect of human communication. Expression of human emotions can be identified through sound. The development of voice detection or speech recognition is a technology that has developed rapidly to help improve human-machine interaction. This study aims to classify emotions through the detection of human voices. One of the most frequently used methods for sound detection is the Mel-Frequency Cepstrum Coefficient (MFCC) where sound waves are converted into several types of representation. Mel-frequency cepstral coefficients (MFCCs) are the coefficients that collectively represent the short-term power spectrum of a sound, based on a linear cosine transform of a log power spectrum on a nonlinear mel scale of frequency. The primary data used in this research is the data recorded by the author. The secondary data used is data from the "Berlin Database of Emotional Speech" in the amount of 500 voice recording data. The use of MFCC can extract implied information from the human voice, especially to recognize the feelings experienced by humans when pronouncing the sound. In this study, the highest accuracy was obtained when training with epochs of 10000 times, which was 85% accuracy.

APA, Harvard, Vancouver, ISO, and other styles

4

Lankala, Srinija, and Dr M. Ramana Reddy. "Design and Implementation of Energy-Efficient Floating Point MFCC Extraction Architecture for Speech Recognition Systems." International Journal for Research in Applied Science and Engineering Technology 10, no. 9 (2022): 1217–25. http://dx.doi.org/10.22214/ijraset.2022.46807.

Full text

Abstract:

Abstract: This brief presents an energy-efficient architecture to extract mel-frequency cepstrum coefficients (MFCCs) for realtime speech recognition systems. Based on the algorithmic property of MFCC feature extraction, the architecture is designed with floating-point arithmetic units to cover a wide dynamic range with a small bit-width. Moreover, various operations required in the MFCC extraction are examined to optimize operational bit-width and lookup tables needed to compute nonlinear functions, such as trigonometric and logarithmic functions. In addition, the dataflow of MFCC extraction is tailored to minimize the computation time. As a result, the energy consumption is considerably reduced compared with previous MFCC extraction systems

APA, Harvard, Vancouver, ISO, and other styles

5

Rasyid, Muhammad Fahim, Herlina Jayadianti, and Herry Sofyan. "APLIKASI PENGENALAN PENUTUR PADA IDENTIFIKASI SUARA PENELEPON MENGGUNAKAN MEL-FREQUENCY CEPSTRAL COEFFICIENT DAN VECTOR QUANTIZATION (Studi Kasus : Layanan Hotline Universitas Pembangunan Nasional “Veteran” Yogyakarta)." Telematika 17, no. 2 (2020): 68. http://dx.doi.org/10.31315/telematika.v1i1.3380.

Full text

Abstract:

Layanan hotline Universitas Pembangunan Nasional “Veteran” Yogyakarta merupakan layanan yang dapat digunakan oleh semua orang. Layanan tersebut digunakan dosen dan pegawai untuk berbagi informasi dengan bagian-bagian yang berlokasi di gedung rektorat. Penelepon dapat berkomunikasi dengan bagian yang dituju apabila telah teridentifikasi oleh petugas layanan hotline. Terminologi identitas yang terdiri dari nama, jabatan serta asal jurusan atau bagian ditanyakan saat proses identifikasi. Tidak terdapat catatan hasil identifikasi penelepon baik dalam bentuk fisik maupun basis data yang terekam pada komputer. Hal tersebut mengakibatkan tidak adanya dokumentasi yang dapat dijadikan barang bukti untuk menindak lanjuti kasus kesalahan identifikasi. Penelitian ini fokus untuk mengurangi resiko kesalahan identifikasi penelepon menggunakan teknologi speaker recognition. Frekuensi suara diekstraksi menggunakan metode Mel-Frequency Cepstral Coefficient (MFCC) sehingga dihasilkan nilai Mel Frequency Cepstrum Coefficients. Nilai Mel Frequency Cepstrum Coefficients dari semua data latih suara pegawai Universitas Pembangunan Nasional “Veteran” Yogyakarta kemudian dibandingkan dengan sinyal suara penelpon menggunakan metode Vector Quantization (VQ). Aplikasi pengenalan penutur mampu mengidentifikasi suara penelepon dengan tingkat akurasi 80% pada nilai ambang (threshold) 25.

APA, Harvard, Vancouver, ISO, and other styles

6

Yang, Xing Hai, Wen Jie Fu, Yu Tai Wang, Jia Ding, and Chang Zhi Wei. "Heart Sound Clustering Based on Supervised Kohonen Network." Applied Mechanics and Materials 138-139 (November 2011): 1115–20. http://dx.doi.org/10.4028/www.scientific.net/amm.138-139.1115.

Full text

Abstract:

In this paper, a new method based on Supervised Kohonen network (SKN) and Mel-frequency cepstrum coefficients (MFCC) is introduced. MFCC of heart sound signal are extracted firstly, and then features are got by calculating every order of MFCC average energy. Finally, SKN is used to identify heart sound. The experimental result shows that this algorithm has a good performance in heart sound clustering, and is of significant practical value.

APA, Harvard, Vancouver, ISO, and other styles

7

de Souza, Edson Florentino, Túlio Nogueira Bittencourt, Diogo Ribeiro, and Hermes Carvalho. "Feasibility of Applying Mel-Frequency Cepstral Coefficients in a Drive-by Damage Detection Methodology for High-Speed Railway Bridges." Sustainability 14, no. 20 (2022): 13290. http://dx.doi.org/10.3390/su142013290.

Full text

Abstract:

In this paper, a drive-by damage detection methodology for high-speed railway (HSR) bridges is addressed, to appraise the application of Mel-frequency cepstral coefficients (MFCC) to extract the Damage Index (DI). A finite element (FEM) 2D VTBI model that incorporates the train, ballasted track and bridge behavior is presented. The formulation includes track irregularities and a damaged condition induced in a specified structure region. The feasibility of applying cepstrum analysis components to the indirect damage detection in HSR by on-board sensors is evaluated by numerical simulations, in which dynamic analyses are performed through a code implemented in MATLAB. Different damage scenarios are simulated, as well as external excitations such as measurement noises and different levels of track irregularities. The results show that MFCC-based DI are highly sensitive regarding damage detection, and robust to the noise. Bridge stiffness can be recognized satisfactorily at high speeds and under different levels of track irregularities. Moreover, the magnitude of DI extracted from MFCC is related to the relative severity of the damage. The results presented in this study should be seen as a first attempt to link cepstrum-based features in an HSR drive-by damage detection approach.

APA, Harvard, Vancouver, ISO, and other styles

8

Sasilo, Ababil Azies, Rizal Adi Saputra, and Ika Purwanti Ningrum. "Sistem Pengenalan Suara Dengan Metode Mel Frequency Cepstral Coefficients Dan Gaussian Mixture Model." Komputika : Jurnal Sistem Komputer 11, no. 2 (2022): 203–10. http://dx.doi.org/10.34010/komputika.v11i2.6655.

Full text

Abstract:

ABSTRAK – Teknologi biometrik sedang menjadi tren teknologi dalam berbagai bidang kehidupan. Teknologi biometrik memanfaatkan bagian tubuh manusia sebagai alat ukur sistem yang memiliki keunikan disetiap individu. Suara merupakan bagian tubuh manusia yang memiliki keunikan dan cocok dijadikan sebagai alat ukur dalam sistem yang mengadopsi teknologi biometrik. Sistem pengenalan suara adalah salah satu penerapan teknologi biometrik yang fokus kepada suara manusia. Sistem pengenalan suara memerlukan metode ekstraksi fitur dan metode klasifikasi, salah satu metode ekstraksi fitur adalah MFCC. MFCC dimulai dari tahap pre-emphasis, frame blocking, windowing, fast fourier transform, mel frequency wrapping dan cepstrum. Sedangkan metode klasifikasi menggunakan GMM dengan menghitung likehood kesamaan antar suara. Berdasarkan hasil pengujian, metode MFCC-GMM pada kondisi ideal memiliki tingkat akurasi sebesar 82.22% sedangkan pada kondisi tidak ideal mendapatkan akurasi sebesar 66.67%. Kata Kunci – Suara, Pengenalan, MFCC, GMM, Sistem

APA, Harvard, Vancouver, ISO, and other styles

9

Chu, Yun Yun, Wei Hua Xiong, Wei Wei Shi, and Yu Liu. "The Extraction of Differential MFCC Based on EMD." Applied Mechanics and Materials 313-314 (March 2013): 1167–70. http://dx.doi.org/10.4028/www.scientific.net/amm.313-314.1167.

Full text

Abstract:

Feature extraction is the key to the object recognition. How to obtain effective, reliable characteristic parameters from the limited measured data is a question of great importance in feature extraction. This paper presents a method based on Empirical Mode Decomposition (EMD) for the extraction of Mel Frequency Cepstrum Coefficients (MFCCs) and its first order difference from original speech signals that contain four kinds of emotions such as anger, happiness, surprise and natural for emotion recognition. And the experiments compare the recognition rate of MFCC, differential MFCC (Both of them are extracted based on EMD) or their combination through using Support Vector Machine (SVM) to recognize speakers' emotional speech identity. It proves that the combination of MFCC and its first order difference has a highest recognition rate.

APA, Harvard, Vancouver, ISO, and other styles

10

Zhang, Lanyue, Di Wu, Xue Han, and Zhongrui Zhu. "Feature Extraction of Underwater Target Signal Using Mel Frequency Cepstrum Coefficients Based on Acoustic Vector Sensor." Journal of Sensors 2016 (2016): 1–11. http://dx.doi.org/10.1155/2016/7864213.

Full text

Abstract:

Feature extraction method using Mel frequency cepstrum coefficients (MFCC) based on acoustic vector sensor is researched in the paper. Signals of pressure are simulated as well as particle velocity of underwater target, and the features of underwater target using MFCC are extracted to verify the feasibility of the method. The experiment of feature extraction of two kinds of underwater targets is carried out, and these underwater targets are classified and recognized by Backpropagation (BP) neural network using fusion of multi-information. Results of the research show that MFCC, first-order differential MFCC, and second-order differential MFCC features could be used as effective features to recognize those underwater targets and the recognition rate, which using the particle velocity signal is higher than that using the pressure signal, could be improved by using fusion features.

APA, Harvard, Vancouver, ISO, and other styles

11

Gao, Mei Juan, and Zhi Xin Yang. "Research and Realization on the Voice Command Recognition System for Robot Control Based on ARM9." Applied Mechanics and Materials 44-47 (December 2010): 1422–26. http://dx.doi.org/10.4028/www.scientific.net/amm.44-47.1422.

Full text

Abstract:

In this paper, based on the study of two speech recognition algorithms, two designs of speech recognition system are given to realize this isolated speech recognition mobile robot control system based on ARM9 processor. The speech recognition process includes pretreatment of speech signal, characteristic extrication, pattern matching and post-processing. Mel-Frequency cepstrum coefficients (MFCC) and linear prediction cepstrum coefficients (LPCC) are the two most common parameters. Through analysis and comparison the parameters, MFCC shows more noise immunity than LPCC, so MFCC is selected as the characteristic parameters. Both dynamic time warping (DTW) and hidden markov model (HMM) are commonly used algorithm. For the different characteristics of DTW and HMM recognition algorithm, two different programs were designed for mobile robot control system. The effect and speed of the two speech recognition system were analyzed and compared.

APA, Harvard, Vancouver, ISO, and other styles

12

Mengistu, Abrham Debasu, and Dagnachew Melesew Alemayehu. "Text Independent Amharic Language Speaker Identification in Noisy Environments using Speech Processing Techniques." Indonesian Journal of Electrical Engineering and Computer Science 5, no. 1 (2017): 109. http://dx.doi.org/10.11591/ijeecs.v5.i1.pp109-114.

Full text

Abstract:

<p>In Ethiopia, the largest ethnic and linguistic groups are the Oromos, Amharas and Tigrayans. This paper presents the performance analysis of text-independent speaker identification system for the Amharic language in noisy environments. VQ (Vector Quantization), GMM (Gaussian Mixture Models), BPNN (Back propagation neural network), MFCC (Mel-frequency cepstrum coefficients), GFCC (Gammatone Frequency Cepstral Coefficients), and a hybrid approach had been use as techniques for identifying speakers of Amharic language in noisy environments. For the identification process, speech signals are collected from different speakers including both sexes; for our data set, a total of 90 speakers’ speech samples were collected, and each speech have 10 seconds duration from each individual. From these speakers, 59.2%, 70.9% and 84.7% accuracy are achieved when VQ, GMM and BPNN are used on the combined feature vector of MFCC and GFCC. </p>

APA, Harvard, Vancouver, ISO, and other styles

13

Eskidere, Ömer, and Ahmet Gürhanlı. "Voice Disorder Classification Based on Multitaper Mel Frequency Cepstral Coefficients Features." Computational and Mathematical Methods in Medicine 2015 (2015): 1–12. http://dx.doi.org/10.1155/2015/956249.

Full text

Abstract:

The Mel Frequency Cepstral Coefficients (MFCCs) are widely used in order to extract essential information from a voice signal and became a popular feature extractor used in audio processing. However, MFCC features are usually calculated from a single window (taper) characterized by large variance. This study shows investigations on reducing variance for the classification of two different voice qualities (normal voice and disordered voice) using multitaper MFCC features. We also compare their performance by newly proposed windowing techniques and conventional single-taper technique. The results demonstrate that adapted weighted Thomson multitaper method could distinguish between normal voice and disordered voice better than the results done by the conventional single-taper (Hamming window) technique and two newly proposed windowing methods. The multitaper MFCC features may be helpful in identifying voices at risk for a real pathology that has to be proven later.

APA, Harvard, Vancouver, ISO, and other styles

14

Ramashini, Murugaiya, P. Emeroylariffion Abas, Kusuma Mohanchandra, and Liyanage C. De Silva. "Robust cepstral feature for bird sound classification." International Journal of Electrical and Computer Engineering (IJECE) 12, no. 2 (2022): 1477. http://dx.doi.org/10.11591/ijece.v12i2.pp1477-1487.

Full text

Abstract:

Birds are excellent environmental indicators and may indicate sustainability of the ecosystem; birds may be used to provide provisioning, regulating, and supporting services. Therefore, birdlife conservation-related researches always receive centre stage. Due to the airborne nature of birds and the dense nature of the tropical forest, bird identifications through audio may be a better solution than visual identification. The goal of this study is to find the most appropriate cepstral features that can be used to classify bird sounds more accurately. Fifteen (15) endemic Bornean bird sounds have been selected and segmented using an automated energy-based algorithm. Three (3) types of cepstral features are extracted; linear prediction cepstrum coefficients (LPCC), mel frequency cepstral coefficients (MFCC), gammatone frequency cepstral coefficients (GTCC), and used separately for classification purposes using support vector machine (SVM). Through comparison between their prediction results, it has been demonstrated that model utilising GTCC features, with 93.3% accuracy, outperforms models utilising MFCC and LPCC features. This demonstrates the robustness of GTCC for bird sounds classification. The result is significant for the advancement of bird sound classification research, which has been shown to have many applications such as in eco-tourism and wildlife management.

APA, Harvard, Vancouver, ISO, and other styles

15

Murugaiya, Ramashini, Emeroylariffion Abas Pg, Mohanchandra Kusuma, and C. De Silva Liyanage. "Robust cepstral feature for bird sound classification." International Journal of Electrical and Computer Engineering (IJECE) 12, no. 2 (2022): 1477–87. https://doi.org/10.11591/ijece.v12i2.pp1477-1487.

Full text

Abstract:

Birds are excellent environmental indicators and may indicate sustainability of the ecosystem; birds may be used to provide provisioning, regulating, and supporting services. Therefore, birdlife conservation-related researches always receive centre stage. Due to the airborne nature of birds and the dense nature of the tropical forest, bird identifications through audio may be a better solution than visual identification. The goal of this study is to find the most appropriate cepstral features that can be used to classify bird sounds more accurately. Fifteen (15) endemic Bornean bird sounds have been selected and segmented using an automated energy-based algorithm. Three (3) types of cepstral features are extracted; linear prediction cepstrum coefficients (LPCC), mel frequency cepstral coefficients (MFCC), gammatone frequency cepstral coefficients (GTCC), and used separately for classification purposes using support vector machine (SVM). Through comparison between their prediction results, it has been demonstrated that model utilising GTCC features, with 93.3% accuracy, outperforms models utilising MFCC and LPCC features. This demonstrates the robustness of GTCC for bird sounds classification. The result is significant for the advancement of bird sound classification research, which has been shown to have many applications such as in eco-tourism and wildlife management.

APA, Harvard, Vancouver, ISO, and other styles

16

Sakshi, Gupta, S. Shukla Ravi, and K. Shukla Rajesh. "Weighted Mel frequency cepstral coefficient based feature extraction for automatic assessment of stuttered speech using Bi-directional LSTM." Indian Journal of Science and Technology 14, no. 5 (2021): 457–72. https://doi.org/10.17485/IJST/v14i5.2276.

Full text

Abstract:

Abstract <strong>Objective:</strong> To propose a system for automatic assessment of stuttered speech to help the Speech Language Pathologists during their treatment of a person who stutters. <strong>Methods:</strong> A novel technique is proposed for automatic assessment of stuttered speech, composed of feature extraction based on Weighted Mel Frequency Cepstral Coefficient and classification using Bi-directional Long-Short Term Memory neural network. It mainly focuses on detecting prolongation and syllable, word, and phrase repetition in stuttered events.<strong> Findings:</strong> This study has discussed and performed a comparative analysis of WMFCC feature extraction method with different extensions of widely used MFCC, namely, Delta, and Delta-Delta cepstrum. The comparison of speech parameterization techniques is carried out based on the effect of different frame lengths, percentage of window overlapping, and preemphasis filter alpha value. The experimental investigation elucidated that WMFCC outperforms the other feature extraction methods and provides an average recognition accuracy of 96.67%. 14-dimensional WMFCC achieves a low computational overhead compared to conventional 42-dimensional MFCC, including Delta and Delta-delta cepstrum. <strong>Application:</strong> The integration of Weighted MFCC based speech feature extraction and deep learning Bi-LSTM based classification techniques proposed in this study are more efficient for introducing an optimal model to automatically classify the stuttered events such as prolongation and repetition. <strong>Keywords:</strong> Stuttering; MFCC; Delta MFCC; WMFCC; BiLSTM  

APA, Harvard, Vancouver, ISO, and other styles

17

Cinoglu, Bahadir, Umut Durak, and T. Hikmet Karakoc. "Utilizing Mel-Frequency Cepstral Coefficients for Acoustic Diagnostics of Damaged UAV Propellers." International Journal of Aviation Science and Technology vm05, is02 (2024): 79–89. http://dx.doi.org/10.23890/ijast.vm05is02.0201.

Full text

Abstract:

In this study, the diagnostic potential of the acoustic signatures of Unmanned Aerial Vehicle (UAVs) propellers which is one of the critical components of these vehicles were examined under different damage conditions. For this purpose, a test bench was set up and acoustic data of five different damaged propellers and one undamaged propeller were collected. The methodology emphasized contains using an omnidirectional microphone to collect data under three different thrust levels which correspond to 25%, 50% and 75%. Propeller acoustics sound characteristics extracted using the Mel Frequency Cepstrum Coefficient (MFCC) technique that incorporates Fast Fourier Transform (FFT) in order to obtain feature extracted data, and the visual differences of sound patterns were discussed to underline its importance in terms of diagnostics. The results indicated that there is a potential for classifying slightly and symmetrically damaged and undamaged propellers successfully in an Artificial Intelligence-based diagnostic application using MFCC. This study aimed to demonstrate a way to effectively use MFCC detecting damaged and undamaged propellers through their sound profiles and highlighted its usage potential for future integration into Artificial Intelligence (AI) methods in terms of UAV diagnostics. The findings provided a foundation for creating an advanced diagnostic method for increasing UAV safety and operational efficiency.

APA, Harvard, Vancouver, ISO, and other styles

18

Trabelsi, Imen, and Med Salim Bouhlel. "Comparison of Several Acoustic Modeling Techniques for Speech Emotion Recognition." International Journal of Synthetic Emotions 7, no. 1 (2016): 58–68. http://dx.doi.org/10.4018/ijse.2016010105.

Full text

Abstract:

Automatic Speech Emotion Recognition (SER) is a current research topic in the field of Human Computer Interaction (HCI) with a wide range of applications. The purpose of speech emotion recognition system is to automatically classify speaker's utterances into different emotional states such as disgust, boredom, sadness, neutral, and happiness. The speech samples in this paper are from the Berlin emotional database. Mel Frequency cepstrum coefficients (MFCC), Linear prediction coefficients (LPC), linear prediction cepstrum coefficients (LPCC), Perceptual Linear Prediction (PLP) and Relative Spectral Perceptual Linear Prediction (Rasta-PLP) features are used to characterize the emotional utterances using a combination between Gaussian mixture models (GMM) and Support Vector Machines (SVM) based on the Kullback-Leibler Divergence Kernel. In this study, the effect of feature type and its dimension are comparatively investigated. The best results are obtained with 12-coefficient MFCC. Utilizing the proposed features a recognition rate of 84% has been achieved which is close to the performance of humans on this database.

APA, Harvard, Vancouver, ISO, and other styles

19

Lee, Ji-Yeoun. "Classification between Elderly Voices and Young Voices Using an Efficient Combination of Deep Learning Classifiers and Various Parameters." Applied Sciences 11, no. 21 (2021): 9836. http://dx.doi.org/10.3390/app11219836.

Full text

Abstract:

The objective of this research was to develop deep learning classifiers and various parameters that provide an accurate and objective system for classifying elderly and young voice signals. This work focused on deep learning methods, such as feedforward neural network (FNN) and convolutional neural network (CNN), for the detection of elderly voice signals using mel-frequency cepstral coefficients (MFCCs) and linear prediction cepstrum coefficients (LPCCs), skewness, as well as kurtosis parameters. In total, 126 subjects (63 elderly and 63 young) were obtained from the Saarbruecken voice database. The highest performance of 93.75% appeared when the skewness was added to the MFCC and MFCC delta parameters, although the fusion of the skewness and kurtosis parameters had a positive effect on the overall accuracy of the classification. The results of this study also revealed that the performance of FNN was higher than that of CNN. Most parameters estimated from male data samples demonstrated good performance in terms of gender. Rather than using mixed female and male data, this work recommends the development of separate systems that represent the best performance through each optimized parameter using data from independent male and female samples.

APA, Harvard, Vancouver, ISO, and other styles

20

Sasongko, Sudi Mariyanto Al, Shofian Tsaury, Suthami Ariessaputra, and Syafaruddin Ch. "Mel Frequency Cepstral Coefficients (MFCC) Method and Multiple Adaline Neural Network Model for Speaker Identification." JOIV : International Journal on Informatics Visualization 7, no. 4 (2023): 2306. http://dx.doi.org/10.62527/joiv.7.4.1376.

Full text

Abstract:

Speech recognition technology makes human contact with the computer more accessible. There are two phases in the speaker recognition process: capturing or extracting voice features and identifying the speaker's voice pattern based on the voice characteristics of each speaker. Speakers consist of men and women. Their voices are recorded and stored in a computer database. Mel Frequency Cepstrum Coefficients (MFCC) are used at the voice extraction stage with a characteristic coefficient of 13. MFCC is based on variations in the response of the human ear's critical range to frequencies (linear and logarithmic). The sound frame is converted to Mel frequency and processed with several triangular filters to get the cepstrum coefficient. Meanwhile, at the speech pattern recognition stage, the speaker uses an artificial neural network (ANN) Madaline model (many Adaline/ which is the plural form of Adaline) to compare the test sound characteristics. The training voice's features have been inputted as training data. The Madaline Neural Network training is BFGS Quasi-Newton Backpropagation with a goal parameter of 0,0001. The results obtained from the study prove that the Madaline model of artificial neural networks is not recommended for identification research. The results showed that the database's speech recognition rate reached 61% for ten tests. The test outside the database was rejected by only 14%, and 84% refused testing outside the database with different words from the training data. The results of this model can be used as a reference for creating an Android-based real-time system.

APA, Harvard, Vancouver, ISO, and other styles

21

Sasongko, Sudi Mariyanto Al, Shofian Tsaury, Suthami Ariessaputra, and Syafaruddin Ch. "Mel Frequency Cepstral Coefficients (MFCC) Method and Multiple Adaline Neural Network Model for Speaker Identification." JOIV : International Journal on Informatics Visualization 7, no. 4 (2023): 2306. http://dx.doi.org/10.30630/joiv.7.4.01376.

Full text

Abstract:

Speech recognition technology makes human contact with the computer more accessible. There are two phases in the speaker recognition process: capturing or extracting voice features and identifying the speaker's voice pattern based on the voice characteristics of each speaker. Speakers consist of men and women. Their voices are recorded and stored in a computer database. Mel Frequency Cepstrum Coefficients (MFCC) are used at the voice extraction stage with a characteristic coefficient of 13. MFCC is based on variations in the response of the human ear's critical range to frequencies (linear and logarithmic). The sound frame is converted to Mel frequency and processed with several triangular filters to get the cepstrum coefficient. Meanwhile, at the speech pattern recognition stage, the speaker uses an artificial neural network (ANN) Madaline model (many Adaline/ which is the plural form of Adaline) to compare the test sound characteristics. The training voice's features have been inputted as training data. The Madaline Neural Network training is BFGS Quasi-Newton Backpropagation with a goal parameter of 0,0001. The results obtained from the study prove that the Madaline model of artificial neural networks is not recommended for identification research. The results showed that the database's speech recognition rate reached 61% for ten tests. The test outside the database was rejected by only 14%, and 84% refused testing outside the database with different words from the training data. The results of this model can be used as a reference for creating an Android-based real-time system.

APA, Harvard, Vancouver, ISO, and other styles

22

Pratiwi, Tika, Andi Sunyoto, and Dhani Ariatmanto. "Music Genre Classification Using K-Nearest Neighbor and Mel-Frequency Cepstral Coefficients." Sinkron 8, no. 2 (2024): 861–67. http://dx.doi.org/10.33395/sinkron.v8i2.12912.

Full text

Abstract:

Music genre classification plays a pivotal role in organizing and accessing vast music collections, enhancing user experiences, and enabling efficient music recommendation systems. This study focuses on employing the K-Nearest Neighbors (KNN) algorithm in conjunction with Mel-Frequency Cepstral Coefficients (MFCCs) for accurate music genre classification. MFCCs extract essential spectral features from audio signals, which serve as robust representations of music characteristics. The proposed approach achieves a commendable classification accuracy of 80%, showcasing the effectiveness of KNN-MFCC fusion. Nevertheless, the challenge of overlapping genres, particularly rock and country, demands special attention due to their shared acoustic attributes. The inherent similarities between these genres often lead to misclassification, hampering accuracy. To address this issue, an enhanced feature engineering strategy is devised, leveraging deeper insights into the subtle nuances that differentiate rock and country music. Additionally, a refined KNN distance metric and neighbor selection mechanism are introduced to further refine classification decisions. Experimental results underscore the effectiveness of the refined approach in mitigating genre overlap issues, significantly enhancing classification accuracy for rock and country genres. This study contributes to the advancement of music genre classification techniques, offering an innovative solution for handling overlapping genres and demonstrating the potential of KNN-MFCC synergy in achieving accurate and refined genre classification.

APA, Harvard, Vancouver, ISO, and other styles

23

Chandra, Wenripin, Ken Ken, Osfredo Quinn, and Irpan Adiputra Pardosi. "Human Age Estimation Through Audio Utilising MFCC and RNN." SinkrOn 8, no. 3 (2023): 1852–62. http://dx.doi.org/10.33395/sinkron.v8i3.12656.

Full text

Abstract:

Age is one of human main attributes. Age is important factor to improve communication experience. Age estimation has been used in several applications to improve user experience. Therefore, an approach is needed to estimate the user age, one of which is through audio. In this study, Mel Frequency Cepstrum Coefficients (MFCC) and Recurrent Neural Network (RNN) will be used to estimate age through audio. MFCC is used to get features from audio data, while RNN is used to estimate age. Dataset used here was taken from corpus of user speech data on the Common Voice website. This study shows that MFCC and RNN methods are able to estimate human age through audio with highest accuracy obtained in SimpleRNN is 0.5647, and 0.7087 in LSTM.

APA, Harvard, Vancouver, ISO, and other styles

24

Nursholihatun, Erina, Sudi Mariyanto Sasongko, and Abdullah Zainuddin. "IDENTIFIKASI SUARA MENGGUNAKAN METODE MEL FREQUENCY CEPSTRUM COEFFICIENTS (MFCC) DAN JARINGAN SYARAF TIRUAN BACKPROPAGATION." DIELEKTRIKA 7, no. 1 (2020): 48. http://dx.doi.org/10.29303/dielektrika.v7i1.232.

Full text

Abstract:

The voice is basic humans tool of communications. Speakers identifications is the process of recoqnizing the identity of a speaker by comparing the inputed voice features with all the features of each speaker in the database.There are two step of speaker identification process: feature extraction and pattern recognition. For the characteristic extraction phase using Mel Frequency Cepstrum Coefficient (MFCC) method. The method of pattern recognition using backpropagation artificial neural networks that compares the test data with the reference data in the database based on the variable result in the learning process. The result from the research show that increasing SNR (Signal to Noise Ratio) value will determine the success of the speaker recognition system. The higher SNR (Signal to Noise Ratio), will increase percentage level of recognition. Average accuracy speakers recoqnition of the speakers data without noise generating is 86%, the biggest average accuracy speakers recoqnition is 92 % in the data with 80 dB SNR level, and the lowest average accuracy is 45 % in the data with 80 dB SNR level. Rejection rate testing result of speakers outside the database is 100 %.

APA, Harvard, Vancouver, ISO, and other styles

25

Hu, Wen-long, Shun-shan Feng, Bo Zhang, Yue-guang Gao, Xiang Xiao, and Qi-Huang. "Hybrid feature extraction method of MFCC+GFCC helicopter noise based on wavelet decomposition." Journal of Physics: Conference Series 2478, no. 12 (2023): 122008. http://dx.doi.org/10.1088/1742-6596/2478/12/122008.

Full text

Abstract:

Abstract Aiming at the issue that the recognition accuracy of traditional acoustic signal features is low for helicopter acoustic signals with wind noise in the near field, a method of extracting mixed noise features of MFCC+GFCC based on wavelet decomposition is proposed. Firstly, the three-layer wavelet decomposition and reconstruction are applied to the helicopter acoustic signals; then, the Mel-Frequency Cepstral Coefficients (MFCC) and Gammatone-Frequency Cepstrum Coefficient (GFCC) are respectively extracted for the approximation and detail components; next, the coefficients of detail components which are averaged are combined with those of approximation components to form the hybrid feature parameters; finally, the convolutional neural network is used to classify the signal, to realize the correct recognition of helicopter acoustic signals. Experimental results show that the recognition accuracy is improved by almost 40% in contrast with other traditional methods, such as MFCC and GFCC, when the SNR is equal to -5dB. Further, When the SNR is -10dB, the recognition accuracy is more than 49%, while the traditional methods cannot effectively recognize the helicopter acoustic targets. The proposed feature extraction method can significantly improve the recognition accuracy in the low SNR environment, and provide a reference for near-field detection and recognition of helicopter acoustic targets.

APA, Harvard, Vancouver, ISO, and other styles

26

Nagaraja, B. G., and H. S. Jayanna. "Multilingual Speaker Identification by Combining Evidence from LPR and Multitaper MFCC." Journal of Intelligent Systems 22, no. 3 (2013): 241–51. http://dx.doi.org/10.1515/jisys-2013-0038.

Full text

Abstract:

AbstractIn this work, the significance of combining the evidence from multitaper mel-frequency cepstral coefficients (MFCC), linear prediction residual (LPR), and linear prediction residual phase (LPRP) features for multilingual speaker identification with the constraint of limited data condition is demonstrated. The LPR is derived from linear prediction analysis, and LPRP is obtained by dividing the LPR using its Hilbert envelope. The sine-weighted cepstrum estimators (SWCE) with six tapers are considered for multitaper MFCC feature extraction. The Gaussian mixture model–universal background model is used for modeling each speaker for different evidence. The evidence is then combined at scoring level to improve the performance. The monolingual, crosslingual, and multilingual speaker identification studies were conducted using 30 randomly selected speakers from the IITG multivariability speaker recognition database. The experimental results show that the combined evidence improves the performance by nearly 8–10% compared with individual evidence.

APA, Harvard, Vancouver, ISO, and other styles

27

Chen, Young-Long, Neng-Chung Wang, Jing-Fong Ciou, and Rui-Qi Lin. "Combined Bidirectional Long Short-Term Memory with Mel-Frequency Cepstral Coefficients Using Autoencoder for Speaker Recognition." Applied Sciences 13, no. 12 (2023): 7008. http://dx.doi.org/10.3390/app13127008.

Full text

Abstract:

Recently, neural network technology has shown remarkable progress in speech recognition, including word classification, emotion recognition, and identity recognition. This paper introduces three novel speaker recognition methods to improve accuracy. The first method, called long short-term memory with mel-frequency cepstral coefficients for triplet loss (LSTM-MFCC-TL), utilizes MFCC as input features for the LSTM model and incorporates triplet loss and cluster training for effective training. The second method, bidirectional long short-term memory with mel-frequency cepstral coefficients for triplet loss (BLSTM-MFCC-TL), enhances speaker recognition accuracy by employing a bidirectional LSTM model. The third method, bidirectional long short-term memory with mel-frequency cepstral coefficients and autoencoder features for triplet loss (BLSTM-MFCCAE-TL), utilizes an autoencoder to extract additional AE features, which are then concatenated with MFCC and fed into the BLSTM model. The results showed that the performance of the BLSTM model was superior to the LSTM model, and the method of adding AE features achieved the best learning effect. Moreover, the proposed methods exhibit faster computation times compared to the reference GMM-HMM model. Therefore, utilizing pre-trained autoencoders for speaker encoding and obtaining AE features can significantly enhance the learning performance of speaker recognition. Additionally, it also offers faster computation time compared to traditional methods.

APA, Harvard, Vancouver, ISO, and other styles

28

Koolagudi, Shashidhar G., Deepika Rastogi, and K. Sreenivasa Rao. "Identification of Language using Mel-Frequency Cepstral Coefficients (MFCC)." Procedia Engineering 38 (2012): 3391–98. http://dx.doi.org/10.1016/j.proeng.2012.06.392.

Full text

APA, Harvard, Vancouver, ISO, and other styles

29

Urrutia, Robin, Diego Espejo, Natalia Evens, et al. "Clustering Methods for Vibro-Acoustic Sensing Features as a Potential Approach to Tissue Characterisation in Robot-Assisted Interventions." Sensors 23, no. 23 (2023): 9297. http://dx.doi.org/10.3390/s23239297.

Full text

Abstract:

This article provides a comprehensive analysis of the feature extraction methods applied to vibro-acoustic signals (VA signals) in the context of robot-assisted interventions. The primary objective is to extract valuable information from these signals to understand tissue behaviour better and build upon prior research. This study is divided into three key stages: feature extraction using the Cepstrum Transform (CT), Mel-Frequency Cepstral Coefficients (MFCCs), and Fast Chirplet Transform (FCT); dimensionality reduction employing techniques such as Principal Component Analysis (PCA), t-Distributed Stochastic Neighbour Embedding (t-SNE), and Uniform Manifold Approximation and Projection (UMAP); and, finally, classification using a nearest neighbours classifier. The results demonstrate that using feature extraction techniques, especially the combination of CT and MFCC with dimensionality reduction algorithms, yields highly efficient outcomes. The classification metrics (Accuracy, Recall, and F1-score) approach 99%, and the clustering metric is 0.61. The performance of the CT–UMAP combination stands out in the evaluation metrics.

APA, Harvard, Vancouver, ISO, and other styles

30

Reddy, Reddy Phanidhar. "Voice and Face Recognition for Web Browser Security." International Journal for Research in Applied Science and Engineering Technology 9, no. 11 (2021): 199–205. http://dx.doi.org/10.22214/ijraset.2021.38777.

Full text

Abstract:

Abstract: This paper analysis about browser privacy multimodal authentication mechanisms. Face and voice recognition will be used as authentication methods in this process. The OpenCV library is used in the framework's face recognition section. It detects and recognizes faces from a database using basic eigen face recognition approaches. The MFCC (Mel Frequency Cepstrum Coefficients) and Gaussian Mixture Model are used to recognize voices. Following successful authentication, the cookies on the local hard disc are decrypted, allowing us access to the browser cookies. Initially, after a user registers, we will encrypt the browser cookies with AES, one of the most secure encryption methods available. keywords: MFCC, Gaussian Mixture Model, Browser cookies, authentication, AES, encryption, decryption, Open CV, Eigen.

APA, Harvard, Vancouver, ISO, and other styles

31

Sanjaya, WS Mada, and Dyah Anggraeni. "Sistem Kontrol Robot Arm 5 DOF Berbasis Pengenalan Pola Suara Menggunakan Mel-Frequency Cepstrum Coefficients (MFCC) dan Adaptive Neuro-Fuzzy Inference System (ANFIS)." Wahana Fisika 1, no. 2 (2016): 152. http://dx.doi.org/10.17509/wafi.v1i2.4277.

Full text

Abstract:

Telah dilakukan penelitian yang menggambarkan implementasi pengenalan pola suara untuk mengontrol gerak robot arm 5 DoF dalam mengambil dan menyimpan benda. Dalam penelitian ini metode yang digunakan adalah Mel-Frequency Cepstrum Coefficients (MFCC) dan Adaptive Neuro-Fuzzy Inferense System (ANFIS). Metode MFCC digunakan untuk ekstraksi ciri sinyal suara, sedangkan ANFIS digunakan sebagai metode pembelajaran untuk pengenalan pola suara. Pada proses pembelajaran ANFIS data latih yang digunakan sebanyak 6 ciri. Data suara terlatih dan data suara tak terlatih digunakan untuk pengujian sistem pengenalan pola suara. Hasil pengujian menunjukkan tingkat keberhasilan, untuk data suara terlatih sebesar 87,77% dan data tak terlatih sebesar 78,53%. Sistem pengenalan pola suara ini telah diaplikasikan dengan baik untuk mengerakan robot arm 5 DoF berbasis mikrokontroler Arduino.

APA, Harvard, Vancouver, ISO, and other styles

32

Dadula, Cristina P., and Elmer P. Dadios. "Fuzzy Logic System for Abnormal Audio Event Detection Using Mel Frequency Cepstral Coefficients." Journal of Advanced Computational Intelligence and Intelligent Informatics 21, no. 2 (2017): 205–10. http://dx.doi.org/10.20965/jaciii.2017.p0205.

Full text

Abstract:

This paper presents a fuzzy logic system for audio event detection using mel frequency cepstral coefficients (MFCC). Twelve MFCC of audio samples were analyzed. The range of values of MFCC were obtained including its histogram. These values were normalized so that its minimum and maximum values lie between 0 and 1. Rules were formulated based on the histogram to classify audio samples as normal, gunshot, or crowd panic. Five MFCC were chosen as input to the fuzzy logic system. The membership functions and rules of the fuzzy logic system are defined based on the normalized histograms of MFCC. The system was tested with a total of 150 minutes of normal sounds from different buses and 72 seconds audio clips abnormal sounds. The designed fuzzy logic system was able to classify audio events with an average accuracy of 99.4%.

APA, Harvard, Vancouver, ISO, and other styles

33

N., H. Mohd Johari, Abdul Malik Noreha, and A. Sidek K. "Distinctive features for normal and crackles respiratory sounds using cepstral coefficients." Bulletin of Electrical Engineering and Informatics 8, no. 3 (2019): 875–81. https://doi.org/10.11591/eei.v8i3.1517.

Full text

Abstract:

Classification of respiratory sounds between normal and abnormal is very crucial for screening and diagnosis purposes. Lung associated diseases can be detected through this technique. With the advancement of computerized auscultation technology, the adventitious sounds such as crackles can be detected and therefore diagnostic test can be performed earlier. In this paper, Linear Predictive Cepstral Coefficient (LPCC) and Mel-frequency Cepstral Coefficient (MFCC) are used to extract features from normal and crackles respiratory sounds. By using statistical computation such as mean and standard deviation (SD) of cepstral based coefficients it can differentiate between crackles and normal sounds. The statistical computations of the cepstral coefficient of LPCC and MFCC show that the mean LPCC except for the third coefficient and first three statistical coefficient values of MFCC’s SD provide distinctive feature between normal and crackles respiratory sounds. Hence, LPCCs and MFCCs can be used as feature extraction method of respiratory sounds to classify between normal and crackles as screening and diagnostic tool.

APA, Harvard, Vancouver, ISO, and other styles

34

Chen, Qianru, Zhifeng Wu, Qinghua Zhong, and Zhiwei Li. "Heart Sound Classification Based on Mel-Frequency Cepstrum Coefficient Features and Multi-Scale Residual Recurrent Neural Networks." Journal of Nanoelectronics and Optoelectronics 17, no. 8 (2022): 1144–53. http://dx.doi.org/10.1166/jno.2022.3305.

Full text

Abstract:

A rapid and accurate algorithm model of extracting heart sounds plays a vital role in the early detection of cardiovascular disorders, especially for small primary health care clinics. This paper proposes a heart sound extraction and classification algorithm based on static and dynamic combination of Mel-frequency cepstrum coefficient (MFCC) feature extraction and the multi-scale residual recurrent neural network (MsRes-RNN) algorithm model. The standard MFCC parameters represent the static characteristics of the signal. In contrast, the first-order and second-order MFCC parameters represent the dynamic characteristics of the signal. They are extracted and combined to form the MFCC feature representation. Then, the MFCC-based features are fed to a MsRes-RNN algorithm model for feature learning and classification tasks. The proposed classification model can take advantage of the encoded local characteristics extracted from the multi-scale residual neural network (MsResNet) and the long-term dependencies captured by recurrent neural network (RNN). Model estimation experiments and performance comparisons with other state-of-the-art algorithms are presented in this paper. Experiments indicate that a classification accuracy of 93.9% has been achieved on 2016 PhysioNet/CinC Challenge datasets.

APA, Harvard, Vancouver, ISO, and other styles

35

Helmiyah, Siti, Imam Riadi, Rusydi Umar, and Abdullah Hanif. "Ekstraksi Fitur Pengenalan Emosi Berdasarkan Ucapan Menggunakan Linear Predictor Ceptral Coeffecient Dan Mel Frequency Cepstrum Coefficients." Mobile and Forensics 1, no. 2 (2019): 48. http://dx.doi.org/10.12928/mf.v1i2.1259.

Full text

Abstract:

Ucapan suara memiliki informasi penting yang dapat diterima oleh otak melalui gelombang suara. Otak menerima gelombang suara melalui alat pendengaran dan menghasilkan suatu informasi berupa pesan, bahasa, dan emosi. Pengenalan emosi wicara merupakan teknologi yang dirancang untuk mengidentifikasi keadaan emosi seseorang dari sinyal ucapannya. Hal tersebut menarik untuk diteliti, karena berkaitan dengan teknologi zaman sekarang yaitu pada penggunaan smartphone di berbagai macam aktivitas sehari-hari. Penelitian ini membandingkan ekstraksi fitur Metode LPC dan Metode MFCC. Kedua metode ekstraksi tersebut diklasifikasi menggunakan Metode Jaringan Syaraf Tiruan (MLP) untuk pengenalan emosi. Masing-masing metode menggunakan data emosi marah, bosan, bahagia, netral, dan sedih. Data dibagi menjadi dua, yaitu data testing dan data data training dengan perbandingan 80:20. Arsitektur jaringan yang digunakan adalah tiga lapisan yaitu lapisan input, lapisan tersembunyi, dan lapisan output. Parameter MLP yang digunakan learning rate = 0.0001, epsilon = 1e-08, epoch = 500, dan Cross Validation = 5. Hasil akurasi pengenalan emosi dengan ekstraksi fitur LPC sebesar adalah 28%. Sedangkan hasil akurasi dengan ekstraksi fitur MFCC sebesar 61,33%. Hasil akurasi ini bisa ditingkatkan dengan menambahkan data yang lebih banyak lagi, terutama untuk data testing. Perlunya pengujian pada nilai parameter jaringan MLP, yaitu dengan mengubah nilai-nilai parameter, karena dapat mempengaruhi tingkat akurasi pengenalan. Selain itu penentuan ekstraksi fitur dan klasifikasi metode yang lain juga dapat digunakan untuk mencari nilai akurasi pengenalan emosi yang lebih baik lagi.

APA, Harvard, Vancouver, ISO, and other styles

36

Li, Zuge, Haitao Peng, Siwei Tan, and Fangyu Zhu. "Music classification with convolutional and artificial neural network." Journal of Physics: Conference Series 2580, no. 1 (2023): 012059. http://dx.doi.org/10.1088/1742-6596/2580/1/012059.

Full text

Abstract:

Abstract Music genre classification focus on efficiently finding expected music with a similar genre through numerous melodies, which could better satisfy the tastes and expectations of the users when listening to music. This paper proposes a new method to classify different kinds of music with Artificial Neural networks (ANN) and Convolutional Neural Networks (CNNs). First, Mel Frequency Cepstral Coefficients (MFCC) are used to preprocess the Mel-frequency cepstrum (MFC). Then, we upgrade Anupam’s CNN model. Since the extracted features only by MFC are not suitable for CNN to learn as a small dataset like this. Multiple features are then exacted for each audio file. The two most correlated features on the datasets are adopted as the input of an ANN. To verify the proposed method’s effectiveness, we compare our method with other state-of-the-art methods on the GTZAN dataset. The experimental results show that we can get higher accuracy compared to Anupam. If using only one MFCC feature, Conv-Conv-Pool, a sub-structure that we add two convolutional layers before each max pooling layer, performs better than Conv-Pool, and Conv-Pool performs better than ANN. However, by concatenating another correlated feature, spectral centroid means, which is a measure used in digital signal processing to characterize a spectrum, a simple ANN can have much higher accuracy than the one utilizing only a single MFCC feature with an accuracy of about 94.1%.

APA, Harvard, Vancouver, ISO, and other styles

37

Huizen, Roy Rudolf, and Florentina Tatrin Kurniati. "Feature extraction with mel scale separation method on noise audio recordings." Indonesian Journal of Electrical Engineering and Computer Science 24, no. 2 (2021): 815. http://dx.doi.org/10.11591/ijeecs.v24.i2.pp815-824.

Full text

Abstract:

This paper focuses on improving the accuracy of noise audio recordings. High-quality audio recording, extraction using the mel frequency cepstral coefficients (MFCC) method produces high accuracy. While the low-quality is because of noise, the accuracy is low. Improved accuracy by investigating the effect of bandwidth on the mel scale. The proposed improvement uses the mel scale separation methods into two frequency channels (MFCC dual-channel). For the comparison method using the mel scale bandwidth without separation (MFCC single-channel). Feature analysis using k-mean clustering. The data uses a noise variance of up to -16 dB. Testing on the MFCC single-channel method for -16 dB noise has an accuracy of 47.5%, while the MFCC dual-channel method has an accuracy better of 76.25%. The next test used adaptive noise-canceling (ANC) to reduce noise before extraction. The result is that the MFCC single-channel method has an accuracy of 82.5% and the MFCC dual-channel method has an accuracy better of 83.75%. High-quality audio recording testing for the MFCC single-channel method has an accuracy of 92.5% and the MFCC dual-channel method has an accuracy better of 97.5%. The test results show the effect of mel scale bandwidth to increase accuracy. The MFCC dual-channel method has higher accuracy.

APA, Harvard, Vancouver, ISO, and other styles

38

Huizen, Roy Rudolf, and Florentina Tatrin Kurniati. "Feature extraction with mel scale separation method on noise audio recordings." Indonesian Journal of Electrical Engineering and Computer Science 24, no. 1 (2021): 815–24. https://doi.org/10.11591/ijeecs.v24.i2.pp815-824.

Full text

Abstract:

This paper focuses on improving the accuracy of noise audio recordings. High-quality audio recording, extraction using the mel frequency cepstral coefficients (MFCC) method produces high accuracy. While the low-quality is because of noise, the accuracy is low. Improved accuracy by investigating the effect of bandwidth on the mel scale. The proposed improvement uses the mel scale separation methods into two frequency channels (MFCC dualchannel). For the comparison method using the mel scale bandwidth without separation (MFCC single-channel). Feature analysis using k-mean clustering. The data uses a noise variance of up to -16 dB. Testing on the MFCC singlechannel method for -16 dB noise has an accuracy of 47.5%, while the MFCC dual-channel method has an accuracy better of 76.25%. The next test used adaptive noise-canceling (ANC) to reduce noise before extraction. The result is that the MFCC single-channel method has an accuracy of 82.5% and the MFCC dual-channel method has an accuracy better of 83.75%. High-quality audio recording testing for the MFCC single-channel method has an accuracy of 92.5% and the MFCC dual-channel method has an accuracy better of 97.5%. The test results show the effect of mel scale bandwidth to increase accuracy. The MFCC dual-channel method has higher accuracy.

APA, Harvard, Vancouver, ISO, and other styles

39

Umar, Rusydi, Imam Riadi, and Abdullah Hanif. "Analisis Bentuk Pola Suara Menggunakan Ekstraksi Ciri Mel-Frequencey Cepstral Coefficients (MFCC)." CogITo Smart Journal 4, no. 2 (2019): 294. http://dx.doi.org/10.31154/cogito.v4i2.130.294-304.

Full text

Abstract:

Sound is a part of the human body that is unique and can be distinguished, so its application can be used in sound pattern recognition technology, one of which is used for sound biometrics. This study discusses the analysis of the form of a sound pattern that aims to determine the shape of the sound pattern of a person's character based on the spoken voice input. This study discusses the analysis of the form of a sound pattern that aims to determine the shape of the sound pattern of a person's character based on the spoken voice input. This study uses the Melf-Frequency Cepstrum Coefficients (MFCC) method for feature extraction process from speaker speech signals. The MFCC process will convert the sound signal into several feature vectors which will then be displayed in graphical form. Analysis and design of sound patterns using Matlab 2017a software. Tests were carried out on 5 users consisting of 3 men and 2 women, each user said 1 predetermined "LOGIN" word, which for 15 words said. The results of the test are the form of a sound pattern between the characteristics of 1 user with other users. Keywords—Voice, Pattern, Feature Extraction, MFCC

APA, Harvard, Vancouver, ISO, and other styles

40

Hendry, Jans, Aditya Rachman, and Dodi Zulherman. "Recites fidelity detection system of al-Kautsar verse based on words using mel frequency cepstrum coefficients and cosine similarity." Jurnal Teknologi dan Sistem Komputer 8, no. 1 (2019): 27–35. http://dx.doi.org/10.14710/jtsiskom.8.1.2020.27-35.

Full text

Abstract:

In this study, a system has been developed to help detect the accuracy of the reading of the Koran in the Surah Al-Kautsar based on the accuracy of the number and pronunciation of words in one complete surah. This system is very dependent on the accuracy of word segmentation based on envelope signals. The feature extraction method used was Mel Frequency Cepstrum Coefficients (MFCC), while the Cosine Similarity method was used to detect the accuracy of the reading. From 60 data, 30 data were used for training, while the rest were for testing. From each of the 30 training and test data, 15 data were correct readings, and 15 other data were incorrect readings. System accuracy was measured by word-for-word recognition, which results in 100 % of recall and 98.96 % of precision for the training word data, and 100 % of recall and 99.65 % of precision for the test word data. For the overall reading of the surah, there were 15 correct readings and 14 incorrect readings that were recognized correctly.

APA, Harvard, Vancouver, ISO, and other styles

41

Xie, Tao, Xiaodong Zheng, and Yan Zhang. "Seismic facies analysis based on speech recognition feature parameters." GEOPHYSICS 82, no. 3 (2017): O23—O35. http://dx.doi.org/10.1190/geo2016-0121.1.

Full text

Abstract:

Seismic facies analysis plays an important role in seismic stratigraphy. Seismic attributes have been widely applied to seismic facies analysis. One of the most important steps is to optimize the most sensitive attributes with regard to reservoir characteristics. Using different attribute combinations in multidimensional analyses will yield different solutions. Acoustic waves and seismic waves propagating in an elastic medium follow the same law of physics. The generation process of a speech signal based on the acoustic model is similar to the seismic data of the convolution model. We have developed the mel-frequency cepstrum coefficients (MFCCs), which have been successfully applied in speech recognition, as feature parameters for seismic facies analysis. Information about the wavelet and reflection coefficients is well-separated in these cepstrum-domain parameters. Specifically, information about the wavelet mainly appears in the low-domain part, and information about the reflection coefficients mainly appeared in the high-domain part. In the forward model, the seismic MFCCs are used as feature vectors for synthetic data with a noise level of zero and 5%. The Bayesian network is used to classify the traces. Then, classification accuracy rates versus different orders of the MFCCs are obtained. The forwarding results indicate that high accuracy rates are achieved when the order exceeds 10. For the real field data, the seismic data are decomposed into a set of MFCC parameters. The different information is unfolded in the parameter maps, enabling the interpreter to capture the geologic features of the target interval. The geologic features presented in the three instantaneous attributes and coherence can also be found in the MFCC parameter maps. The classification results are in accordance with the paleogeomorphy of the target interval as well as the known wells. The results from the synthetic data and real field data demonstrate the information description abilities of the seismic MFCC parameters. Therefore, using the speech feature parameters to extract information may be helpful for processing and interpreting seismic data.

APA, Harvard, Vancouver, ISO, and other styles

42

Tran, Thi Thanh. "Analysis of Building the Music Feature Extraction Systems: A Review." Engineering and Technology Journal 9, no. 05 (2024): 4055–60. https://doi.org/10.5281/zenodo.11242886.

Full text

Abstract:

Music genre classification is a basic method for sound processing in the field of music retrieval. The application of machine learning has become increasingly popular in automatically classifying music genres. Therefore, in recent years, many methods have been studied and developed to solve this problem. In this article, an overview on the process and some music feature extraction methods is presented. Here, the feature extraction method using Mel Frequency Cepstral Coefficients (MFCC) is discussed in detail. Some typical results in using Mel Frequency Cepstral Coefficients for improving accuracy in the classification process are introduced and discussed. Therefore, the feature extraction method using MFCC has shown its suitability due to high accuracy and has much potential for further research and development.

APA, Harvard, Vancouver, ISO, and other styles

43

Al-Karawi, Khamis A. "Robustness Speaker Recognition Based on Feature Space in Clean and Noisy Condition." International Journal of Sensors, Wireless Communications and Control 9, no. 4 (2019): 497–506. http://dx.doi.org/10.2174/2210327909666181219143918.

Full text

Abstract:

Background & Objective: Speaker Recognition (SR) techniques have been developed into a relatively mature status over the past few decades through development work. Existing methods typically use robust features extracted from clean speech signals, and therefore in idealized conditions can achieve very high recognition accuracy. For critical applications, such as security and forensics, robustness and reliability of the system are crucial. Methods: The background noise and reverberation as often occur in many real-world applications are known to compromise recognition performance. To improve the performance of speaker verification systems, an effective and robust technique is proposed to extract features for speech processing, capable of operating in the clean and noisy condition. Mel Frequency Cepstrum Coefficients (MFCCs) and Gammatone Frequency Cepstral Coefficients (GFCC) are the mature techniques and the most common features, which are used for speaker recognition. MFCCs are calculated from the log energies in frequency bands distributed over a mel scale. While GFCC has been acquired from a bank of Gammatone filters, which was originally suggested to model human cochlear filtering. This paper investigates the performance of GFCC and the conventional MFCC feature in clean and noisy conditions. The effects of the Signal-to-Noise Ratio (SNR) and language mismatch on the system performance have been taken into account in this work. Conclusion: Experimental results have shown significant improvement in system performance in terms of reduced equal error rate and detection error trade-off. Performance in terms of recognition rates under various types of noise, various Signal-to-Noise Ratios (SNRs) was quantified via simulation. Results of the study are also presented and discussed.

APA, Harvard, Vancouver, ISO, and other styles

44

Vinaya, Anindita Adikaputri, and Tiffani Febiola Aciandra. "MEL-FREQUENCY CEPSTRAL COEFFICIENTS (MFCC) FEATURE FOR PUMP ANOMALY DETECTION IN NOISY ENVIRONMENTS." Jurnal Rekayasa Mesin 15, no. 2 (2024): 1175–86. http://dx.doi.org/10.21776/jrm.v15i2.1815.

Full text

Abstract:

The continuity of a production process is supported by the availability of good assets. One of the efforts to support asset availability is through asset maintenance. One of the important assets in the industry is the pump. To detect anomalous conditions in the pump, the sound of the engine can be used. However, noisy environmental conditions can change the characteristics of the sound produced. This can have an impact on errors in identifying the condition of the machine. In this study, Mel Frequency Cepstral Coefficients (MFCC) is used, because the characteristics of MFCC are very attached to the sound signal and are appropriate for sound signals in the case of this noisy environment where the signal tends to be non-stationary. Support Vector Machine will be used as a method that maps input (machine features) and output (machine condition). In this study, a comparison of the use of combined features of time and frequency domains with time-frequency features (MFCC) will be carried out. Improved performance is obtained when the time-frequency domain acoustic feature in the form of MFCC is used with an average accuracy reaching 99.88% on the Medium Gaussian SVM model.

APA, Harvard, Vancouver, ISO, and other styles

45

P, S. Subhashini Pedalanka, SatyaSai Ram M, and Sreenivasa Rao Duggirala. "Mel Frequency Cepstral Coefficients based Bacterial Foraging Optimization with DNN-RBF for Speaker Recognition." Indian Journal of Science and Technology 14, no. 41 (2021): 3082–92. https://doi.org/10.17485/IJST/v14i41.1858.

Full text

Abstract:

<strong>Objectives:</strong> To improve the accuracy and to reduce the time complexity of the Speaker Recognition system using Mel-Frequency Cepstral Coefficients (MFCCs) and Bacterial Foraging optimization (BFO) with DNN –RBF. <strong>Method:</strong> The MFCCs of each speech sample are derived by pre-processing the audio speech signal. The features are optimized with BFO algorithm. Finally, the probability score for each speaker is generated to identify the speaker. Then the features are classified towards the target speaker using DNN-RBF. For the proposed MBFOB speaker recognition function, the TIMIT read corpus is used. It contains a total of 6300 phrases, 10 phrases each. <strong>Findings:</strong> the identity of user is validated in the fields of authentication and surveillance for recognition of speaker. By using the audio speech signal, features are extracted. This paper suggests an MBFOB solution based on Mel-frequency Cepstral Coefficients and DNN-RBF with BFO, for the identification of speakers. The speech utterance from the TIMIT data corpus is preprocessed to obtain MFCC feature vectors DNN-RBF is used for the purpose of classifying the speaker and the feature vectors in the output layers are optimized with Bacterial Foraging optimization. Finally, the scores for each speaker are calculated to identify the speaker. Different output metrics like EER, DCF, Cavg and accuracy are used to test the proposed speaker recognition technique. The execution time of this proposed method is found to be lesser than the other existing methods. The experimental findings are contrasted with other current methods and it shows the efficiency of our approach.<strong> Novelty:</strong> A novel MFCC-based Bacterial Foraging Optimization with Deep Neural Network-Radial Basis Function (DNN-RBF) for identification of exact speaker is proposed in this study. <strong>Keywords:</strong> BFO; DNN; RBF; Speech processing; speaker recognition; MFCC extraction; deep neural network; and Bacterial foraging optimization; scoring  

APA, Harvard, Vancouver, ISO, and other styles

46

Barkana, Buket D., Burak Uzkent, and Inci Saricicek. "Normal and Abnormal Non-Speech Audio Event Detection Using MFCC and PR-Based Feature Sets." Advanced Materials Research 601 (December 2012): 200–208. http://dx.doi.org/10.4028/www.scientific.net/amr.601.200.

Full text

Abstract:

Non-speech audio event detection and classification has become a very active subject of research, since it can be implemented in many important areas: audio surveillance and context awareness systems. In this study, non-speech normal and abnormal audio events were detected by Mel-frequency cepstrum coefficients (MFCC) and Pitch range (PR) based features using artificial neural network (ANN) classifiers. We have 4 abnormal events (glass breaking, dog barking, scream, gunshot) and 2 normal events (engine noise and rain). Event detection, using ANN classifiers, resulted in an accuracy of up to 92%, with recognition rates overall in the range of 78%-87.5%.

APA, Harvard, Vancouver, ISO, and other styles

47

Altayeb, Muneera, and Areen Arabiat. "Crack detection based on mel-frequency cepstral coefficients features using multiple classifiers." International Journal of Electrical and Computer Engineering (IJECE) 14, no. 3 (2024): 3332. http://dx.doi.org/10.11591/ijece.v14i3.pp3332-3341.

Full text

Abstract:

Crack detection plays an essential role in evaluating the strength of structures. In recent years, the use of machine learning and deep learning techniques combined with computer vision has emerged to assess the strength of structures and detect cracks. This research aims to use machine learning (ML) to create a crack detection model based on a dataset consisting of 2432 images of different surfaces that were divided into two groups: 70% of the training dataset and 30% of the testing dataset. The Orange3 data mining tool was used to build a crack detection model, where the support vector machine (SVM), gradient boosting (GB), naive Bayes (NB), and artificial neural network (ANN) were trained and verified based on 3 sets of features, mel-frequency cepstral coefficients (MFCC), delta MFCC (DMFCC), and delta-delta MFCC (DDMFCC) were extracted using MATLAB. The experimental results showed the superiority of SVM with a classification accuracy of (100%), while for NB the accuracy reached (93.9%-99.9%), and (99.9%) for ANN, and finally in GB the accuracy reached (99.8%).

APA, Harvard, Vancouver, ISO, and other styles

48

Rokanatnam, Thurgeaswary, and Hazinah Kutty Mammi. "Study on Gender Identification Based on Audio Recordings Using Gaussian Mixture Model and Mel Frequency Cepstrum Coefficient Technique." International Journal of Innovative Computing 11, no. 2 (2021): 35–41. http://dx.doi.org/10.11113/ijic.v11n2.343.

Full text

Abstract:

Speaker recognition is an ability to identify speaker’s characteristics based from spoken language. The purpose of this study is to identify gender of speakers based on audio recordings. The objective of this study is to evaluate the accuracy rate of this technique to differentiate the gender and also to determine the performance rate to classify even when using self-acquired recordings. Audio forensics uses voice recordings as part of evidence to solve cases. This study is mainly conducted to provide an easier technique to identify the unknown speaker characteristics in forensic field. This experiment is fulfilled by training the pattern classifier using gender dependent data. In order to train the model, a speech database is obtained from an online speech corpus comprising of both male and female speakers. During the testing phase, apart from the data from speech corpus, audio recordings of UTM students will too be used to determine the accuracy rate of this speaker identification experiment. As for the technique to run this experiment, Mel Frequency Cepstrum Coefficient (MFCC) algorithm is used to extract the features from speech data while Gaussian Mixture Model (GMM) is used to model the gender identifier. Noise removal was not used for any speech data in this experiment. Python software is used to extract using MFCC coefficients and model the behavior using GMM technique. Experiment results show that GMM-MFCC technique can identify gender regardless of language but with varying accuracy rate.

APA, Harvard, Vancouver, ISO, and other styles

49

Maulana, Patriaji Ibrahim, Arik Aranta, Fitri Bimantoro, and I. Gede Andika. "KLASIFIKASI MOOD MUSIK BERDASARKAN MEL FREQUENCY CEPSTRAL COEFFICIENTS DENGAN BACKPROPAGATION NEURAL NETWORK." Jurnal RESISTOR (Rekayasa Sistem Komputer) 5, no. 1 (2022): 72–85. http://dx.doi.org/10.31598/jurnalresistor.v5i1.1089.

Full text

Abstract:

In music industry, each music is grouped by type, including music genre, artist identification, instrument introduction, and mood. Then came a field of research called Music Information Retrieval (MIR) which is a field of science that retrieves and processes the metadata of music files to perform the grouping. This research is based on the uniqueness of music that has its own mood implied in it. By creating a Machine Learning model using Backpropagation Neural Network (BPNN) based on the Mel Frequency Cepstral Coefficients (MFCC) input feature, it will be able to classify types of music based on mood. Grouping is carried out on four mood classes based on Thayer's model. Based on several previous studies, the use of MFCC in voice processing produces very good accuracy as well as the use of BPNN for classification, which is expected to result in better machine learning model performance. The data used in this study were obtained from the Internet with a total dataset of 200. The results obtained from this study are the classification of music mood using BPNN based on the MFCC feature capable of producing 87.67%. accuracy.

APA, Harvard, Vancouver, ISO, and other styles

50

Altayeb, Muneera, and Areen Arabiat. "Crack detection based on mel-frequency cepstral coefficients features using multiple classifiers." Crack detection based on mel-frequency cepstral coefficients features using multiple classifiers 14, no. 3 (2024): 3332–41. https://doi.org/10.11591/ijece.v14i3.pp3332-3341.

Full text

Abstract:

Crack detection plays an essential role in evaluating the strength of structures. In recent years, the use of machine learning and deep learning techniques combined with computer vision has emerged to assess the strength of structures and detect cracks. This research aims to use machine learning (ML) to create a crack detection model based on a dataset consisting of 2432 images of different surfaces that were divided into two groups: 70% of the training dataset and 30% of the testing dataset. The Orange3 data mining tool was used to build a crack detection model, where the support vector machine (SVM), gradient boosting (GB), naive Bayes (NB), and artificial neural network (ANN) were trained and verified based on 3 sets of features, mel-frequency cepstral coefficients (MFCC), delta MFCC (DMFCC), and delta-delta MFCC (DDMFCC) were extracted using MATLAB. The experimental results showed the superiority of SVM with a classification accuracy of (100%), while for NB the accuracy reached (93.9%-99.9%), and (99.9%) for ANN, and finally in GB the accuracy reached (99.8%).

APA, Harvard, Vancouver, ISO, and other styles

We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!