To see the other types of publications on this topic, follow the link: Mel Frequency Cepstral Coefficients (MFCC).

Journal articles on the topic 'Mel Frequency Cepstral Coefficients (MFCC)'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 journal articles for your research on the topic 'Mel Frequency Cepstral Coefficients (MFCC).'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse journal articles on a wide variety of disciplines and organise your bibliography correctly.

1

Eskidere, Ömer, and Ahmet Gürhanlı. "Voice Disorder Classification Based on Multitaper Mel Frequency Cepstral Coefficients Features." Computational and Mathematical Methods in Medicine 2015 (2015): 1–12. http://dx.doi.org/10.1155/2015/956249.

Full text
Abstract:
The Mel Frequency Cepstral Coefficients (MFCCs) are widely used in order to extract essential information from a voice signal and became a popular feature extractor used in audio processing. However, MFCC features are usually calculated from a single window (taper) characterized by large variance. This study shows investigations on reducing variance for the classification of two different voice qualities (normal voice and disordered voice) using multitaper MFCC features. We also compare their performance by newly proposed windowing techniques and conventional single-taper technique. The results demonstrate that adapted weighted Thomson multitaper method could distinguish between normal voice and disordered voice better than the results done by the conventional single-taper (Hamming window) technique and two newly proposed windowing methods. The multitaper MFCC features may be helpful in identifying voices at risk for a real pathology that has to be proven later.
APA, Harvard, Vancouver, ISO, and other styles
2

Varma, V. Sai Nitin, and Abdul Majeed K.K. "Advancements in Speaker Recognition: Exploring Mel Frequency Cepstral Coefficients (MFCC) for Enhanced Performance in Speaker Recognition." International Journal for Research in Applied Science and Engineering Technology 11, no. 8 (August 31, 2023): 88–98. http://dx.doi.org/10.22214/ijraset.2023.55124.

Full text
Abstract:
Abstract: Speaker recognition, a fundamental capability of software or hardware systems, involves receiving speech signals, identifying the speaker present in the speech signal, and subsequently recognizing the speaker for future interactions. This process emulates the cognitive task performed by the human brain. At its core, speaker recognition begins with speech as the input to the system. Various techniques have been developed for speech recognition, including Mel frequency cepstral coefficients (MFCC), Linear Prediction Coefficients (LPC), Linear Prediction Cepstral coefficients (LPCC), Line Spectral Frequencies (LSF), Discrete Wavelet Transform (DWT), and Perceptual Linear Prediction (PLP). Although LPC and several other techniques have been explored, they are often deemed impractical for real-time applications. In contrast, MFCC stands out as one of the most prominent and widely used techniques for speaker recognition. The utilization of cepstrum allows for the computation of resemblance between two cepstral feature vectors, making it an effective tool in this domain. In comparison to LPC-derived cepstrum features, the use of MFCC features has demonstrated superior performance in metrics such as False Acceptance Rate (FAR) and False Rejection Rate (FRR) for speaker recognition systems. MFCCs leverage the human ear's critical bandwidth fluctuations with respect to frequency. To capture phonetically important characteristics of speech signals, filters are linearly separated at low frequencies and logarithmically separated at high frequencies. This design choice is central to the effectiveness of the MFCC technique. The primary objective of the proposed work is to devise efficient techniques that extract pertinent information related to the speaker, thereby enhancing the overall performance of the speaker recognition system. By optimizing feature extraction methods, this research aims to contribute to the advancement of speaker recognition technology.
APA, Harvard, Vancouver, ISO, and other styles
3

Kasim, Anita Ahmad, Muhammad Bakri, Irwan Mahmudi, Rahmawati Rahmawati, and Zulnabil Zulnabil. "Artificial Intelligent for Human Emotion Detection with the Mel-Frequency Cepstral Coefficient (MFCC)." JUITA : Jurnal Informatika 11, no. 1 (May 6, 2023): 47. http://dx.doi.org/10.30595/juita.v11i1.15435.

Full text
Abstract:
Emotions are an important aspect of human communication. Expression of human emotions can be identified through sound. The development of voice detection or speech recognition is a technology that has developed rapidly to help improve human-machine interaction. This study aims to classify emotions through the detection of human voices. One of the most frequently used methods for sound detection is the Mel-Frequency Cepstrum Coefficient (MFCC) where sound waves are converted into several types of representation. Mel-frequency cepstral coefficients (MFCCs) are the coefficients that collectively represent the short-term power spectrum of a sound, based on a linear cosine transform of a log power spectrum on a nonlinear mel scale of frequency. The primary data used in this research is the data recorded by the author. The secondary data used is data from the "Berlin Database of Emotional Speech" in the amount of 500 voice recording data. The use of MFCC can extract implied information from the human voice, especially to recognize the feelings experienced by humans when pronouncing the sound. In this study, the highest accuracy was obtained when training with epochs of 10000 times, which was 85% accuracy.
APA, Harvard, Vancouver, ISO, and other styles
4

Chen, Young-Long, Neng-Chung Wang, Jing-Fong Ciou, and Rui-Qi Lin. "Combined Bidirectional Long Short-Term Memory with Mel-Frequency Cepstral Coefficients Using Autoencoder for Speaker Recognition." Applied Sciences 13, no. 12 (June 10, 2023): 7008. http://dx.doi.org/10.3390/app13127008.

Full text
Abstract:
Recently, neural network technology has shown remarkable progress in speech recognition, including word classification, emotion recognition, and identity recognition. This paper introduces three novel speaker recognition methods to improve accuracy. The first method, called long short-term memory with mel-frequency cepstral coefficients for triplet loss (LSTM-MFCC-TL), utilizes MFCC as input features for the LSTM model and incorporates triplet loss and cluster training for effective training. The second method, bidirectional long short-term memory with mel-frequency cepstral coefficients for triplet loss (BLSTM-MFCC-TL), enhances speaker recognition accuracy by employing a bidirectional LSTM model. The third method, bidirectional long short-term memory with mel-frequency cepstral coefficients and autoencoder features for triplet loss (BLSTM-MFCCAE-TL), utilizes an autoencoder to extract additional AE features, which are then concatenated with MFCC and fed into the BLSTM model. The results showed that the performance of the BLSTM model was superior to the LSTM model, and the method of adding AE features achieved the best learning effect. Moreover, the proposed methods exhibit faster computation times compared to the reference GMM-HMM model. Therefore, utilizing pre-trained autoencoders for speaker encoding and obtaining AE features can significantly enhance the learning performance of speaker recognition. Additionally, it also offers faster computation time compared to traditional methods.
APA, Harvard, Vancouver, ISO, and other styles
5

Koolagudi, Shashidhar G., Deepika Rastogi, and K. Sreenivasa Rao. "Identification of Language using Mel-Frequency Cepstral Coefficients (MFCC)." Procedia Engineering 38 (2012): 3391–98. http://dx.doi.org/10.1016/j.proeng.2012.06.392.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

H. Mohd Johari, N., Noreha Abdul Malik, and K. A. Sidek. "Distinctive features for normal and crackles respiratory sounds using cepstral coefficients." Bulletin of Electrical Engineering and Informatics 8, no. 3 (September 1, 2019): 875–81. http://dx.doi.org/10.11591/eei.v8i3.1517.

Full text
Abstract:
Classification of respiratory sounds between normal and abnormal is very crucial for screening and diagnosis purposes. Lung associated diseases can be detected through this technique. With the advancement of computerized auscultation technology, the adventitious sounds such as crackles can be detected and therefore diagnostic test can be performed earlier. In this paper, Linear Predictive Cepstral Coefficient (LPCC) and Mel-frequency Cepstral Coefficient (MFCC) are used to extract features from normal and crackles respiratory sounds. By using statistical computation such as mean and standard deviation (SD) of cepstral based coefficients it can differentiate between crackles and normal sounds. The statistical computations of the cepstral coefficient of LPCC and MFCC show that the mean LPCC except for the third coefficient and first three statistical coefficient values of MFCC’s SD provide distinctive feature between normal and crackles respiratory sounds. Hence, LPCCs and MFCCs can be used as feature extraction method of respiratory sounds to classify between normal and crackles as screening and diagnostic tool.
APA, Harvard, Vancouver, ISO, and other styles
7

INDRAWATY, YOULLIA, IRMA AMELIA DEWI, and RIZKI LUKMAN. "Ekstraksi Ciri Pelafalan Huruf Hijaiyyah Dengan Metode Mel-Frequency Cepstral Coefficients." MIND Journal 4, no. 1 (June 1, 2019): 49–64. http://dx.doi.org/10.26760/mindjournal.v4i1.49-64.

Full text
Abstract:
Huruf hijaiyyah merupakan huruf penyusun ayat dalam Al Qur’an. Setiap hurufhijaiyyah memiliki karakteristik pelafalan yang berbeda. Tetapi dalam praktiknya,ketika membaca huruf hijaiyyah terkadang tidak memperhatikan kaidah bacaanmakhorijul huruf. Makhrorijul huruf adalah cara melafalkan atau tempatkeluarnya huruf hijaiyyah. Dengan adanya teknologi pengenalan suara, dalammelafalkan huruf hijaiyyah dapat dilihat perbedaannya secara kuantitatif melaluisistem. Terdapat dua tahapan agar suara dapat dikenali, dengan terlebih dahulumelakukan ekstraksi sinyal suara selanjutnya melakukan identifikasi suara ataubacaan. MFCC (Mel Frequency Cepstral Coefficients) merupakan sebuah metodeuntuk melakukan ektraksi ciri yang menghasilkan nilai cepstral dari sinyal suara.Penelitian ini bertujuan untuk mengetahui nilai cepstral pada setiap hurufhijaiyyah. Hasil pengujian yang telah dilakukan, setiap huruf hijaiyyah memilikinilai cepstral yang berbeda.
APA, Harvard, Vancouver, ISO, and other styles
8

Mahalakshmi, P. "A REVIEW ON VOICE ACTIVITY DETECTION AND MEL-FREQUENCY CEPSTRAL COEFFICIENTS FOR SPEAKER RECOGNITION (TREND ANALYSIS)." Asian Journal of Pharmaceutical and Clinical Research 9, no. 9 (December 1, 2016): 360. http://dx.doi.org/10.22159/ajpcr.2016.v9s3.14352.

Full text
Abstract:
ABSTRACTObjective: The objective of this review article is to give a complete review of various techniques that are used for speech recognition purposes overtwo decades.Methods: VAD-Voice Activity Detection, SAD-Speech Activity Detection techniques are discussed that are used to distinguish voiced from unvoicedsignals and MFCC- Mel Frequency Cepstral Coefficient technique is discussed which detects specific features.Results: The review results show that research in MFCC has been dominant in signal processing in comparison to VAD and other existing techniques.Conclusion: A comparison of different speaker recognition techniques that were used previously were discussed and those in current research werealso discussed and a clear idea of the better technique was identified through the review of multiple literature for over two decades.Keywords: Cepstral analysis, Mel-frequency cepstral coefficients, signal processing, speaker recognition, voice activity detection.
APA, Harvard, Vancouver, ISO, and other styles
9

Dadula, Cristina P., and Elmer P. Dadios. "Fuzzy Logic System for Abnormal Audio Event Detection Using Mel Frequency Cepstral Coefficients." Journal of Advanced Computational Intelligence and Intelligent Informatics 21, no. 2 (March 15, 2017): 205–10. http://dx.doi.org/10.20965/jaciii.2017.p0205.

Full text
Abstract:
This paper presents a fuzzy logic system for audio event detection using mel frequency cepstral coefficients (MFCC). Twelve MFCC of audio samples were analyzed. The range of values of MFCC were obtained including its histogram. These values were normalized so that its minimum and maximum values lie between 0 and 1. Rules were formulated based on the histogram to classify audio samples as normal, gunshot, or crowd panic. Five MFCC were chosen as input to the fuzzy logic system. The membership functions and rules of the fuzzy logic system are defined based on the normalized histograms of MFCC. The system was tested with a total of 150 minutes of normal sounds from different buses and 72 seconds audio clips abnormal sounds. The designed fuzzy logic system was able to classify audio events with an average accuracy of 99.4%.
APA, Harvard, Vancouver, ISO, and other styles
10

Ramashini, Murugaiya, P. Emeroylariffion Abas, Kusuma Mohanchandra, and Liyanage C. De Silva. "Robust cepstral feature for bird sound classification." International Journal of Electrical and Computer Engineering (IJECE) 12, no. 2 (April 1, 2022): 1477. http://dx.doi.org/10.11591/ijece.v12i2.pp1477-1487.

Full text
Abstract:
Birds are excellent environmental indicators and may indicate sustainability of the ecosystem; birds may be used to provide provisioning, regulating, and supporting services. Therefore, birdlife conservation-related researches always receive centre stage. Due to the airborne nature of birds and the dense nature of the tropical forest, bird identifications through audio may be a better solution than visual identification. The goal of this study is to find the most appropriate cepstral features that can be used to classify bird sounds more accurately. Fifteen (15) endemic Bornean bird sounds have been selected and segmented using an automated energy-based algorithm. Three (3) types of cepstral features are extracted; linear prediction cepstrum coefficients (LPCC), mel frequency cepstral coefficients (MFCC), gammatone frequency cepstral coefficients (GTCC), and used separately for classification purposes using support vector machine (SVM). Through comparison between their prediction results, it has been demonstrated that model utilising GTCC features, with 93.3% accuracy, outperforms models utilising MFCC and LPCC features. This demonstrates the robustness of GTCC for bird sounds classification. The result is significant for the advancement of bird sound classification research, which has been shown to have many applications such as in eco-tourism and wildlife management.
APA, Harvard, Vancouver, ISO, and other styles
11

Anacleto Silva, Harry. "ATRIBUTOS PNCC PARA RECONOCIMIENTO ROBUSTO DE LOCUTOR INDEPENDIENTE DEL TEXTO." INGENIERÍA: Ciencia, Tecnología e Innovación 3, no. 2 (September 12, 2016): 35–40. http://dx.doi.org/10.26495/icti.v3i2.431.

Full text
Abstract:
El reconocimiento automático de locutores ha sido sujeto de intensa investigación durante toda la década pasada. Sin embargo las características, del estado de arte de los algoritmos son drásticamente degradados en presencia de ruido. Este artículo se centra en la aplicación de una nueva técnica llamada Power-Normalized Cepstral Coefficients (PNCC) para el reconocimiento de locutor independiente del texto. El objetivo de este estudio es evaluar las características de esta técnica en comparación con la técnica convencional Mel Frequency Cepstral Coefficients (MFCC) y la técnica Gammatone Frequency Cepstral Coefficients (GFCC).
APA, Harvard, Vancouver, ISO, and other styles
12

Sasilo, Ababil Azies, Rizal Adi Saputra, and Ika Purwanti Ningrum. "Sistem Pengenalan Suara Dengan Metode Mel Frequency Cepstral Coefficients Dan Gaussian Mixture Model." Komputika : Jurnal Sistem Komputer 11, no. 2 (August 25, 2022): 203–10. http://dx.doi.org/10.34010/komputika.v11i2.6655.

Full text
Abstract:
ABSTRAK – Teknologi biometrik sedang menjadi tren teknologi dalam berbagai bidang kehidupan. Teknologi biometrik memanfaatkan bagian tubuh manusia sebagai alat ukur sistem yang memiliki keunikan disetiap individu. Suara merupakan bagian tubuh manusia yang memiliki keunikan dan cocok dijadikan sebagai alat ukur dalam sistem yang mengadopsi teknologi biometrik. Sistem pengenalan suara adalah salah satu penerapan teknologi biometrik yang fokus kepada suara manusia. Sistem pengenalan suara memerlukan metode ekstraksi fitur dan metode klasifikasi, salah satu metode ekstraksi fitur adalah MFCC. MFCC dimulai dari tahap pre-emphasis, frame blocking, windowing, fast fourier transform, mel frequency wrapping dan cepstrum. Sedangkan metode klasifikasi menggunakan GMM dengan menghitung likehood kesamaan antar suara. Berdasarkan hasil pengujian, metode MFCC-GMM pada kondisi ideal memiliki tingkat akurasi sebesar 82.22% sedangkan pada kondisi tidak ideal mendapatkan akurasi sebesar 66.67%. Kata Kunci – Suara, Pengenalan, MFCC, GMM, Sistem
APA, Harvard, Vancouver, ISO, and other styles
13

Almanfaluti, Istian Kriya, and Judi Prajetno Sugiono. "Identifikasi Pola Suara Pada Bahasa Jawa Meggunakan Mel Frequency Cepstral Coefficients (MFCC)." JURNAL MEDIA INFORMATIKA BUDIDARMA 4, no. 1 (January 29, 2020): 22. http://dx.doi.org/10.30865/mib.v4i1.1793.

Full text
Abstract:
Voice Recognition is a process of developing systems used between computer and human. The purpose of this study is to find out the sound pattern of a person based on the spoken Javanese language. This study used the Mel Frequency Cepstral Coefficients (MFCC) method to solve the problem of feature extraction from human voices. Tests were carried out on 4 users consisting of 2 women and 2 men, each saying 1 word "KUTHO", the word pronounced 5 times. The results of the testing are to get a sound pattern from the characteristics of 1 person with another person so that research using the MFCC method can produce different sound patterns
APA, Harvard, Vancouver, ISO, and other styles
14

Yan, Hao, Huajun Bai, Xianbiao Zhan, Zhenghao Wu, Liang Wen, and Xisheng Jia. "Combination of VMD Mapping MFCC and LSTM: A New Acoustic Fault Diagnosis Method of Diesel Engine." Sensors 22, no. 21 (October 30, 2022): 8325. http://dx.doi.org/10.3390/s22218325.

Full text
Abstract:
Diesel engines have a wide range of functions in the industrial and military fields. An urgent problem to be solved is how to diagnose and identify their faults effectively and timely. In this paper, a diesel engine acoustic fault diagnosis method based on variational modal decomposition mapping Mel frequency cepstral coefficients (MFCC) and long-short-term memory network is proposed. Variational mode decomposition (VMD) is used to remove noise from the original signal and differentiate the signal into multiple modes. The sound pressure signals of different modes are mapped to the Mel filter bank in the frequency domain, and then the Mel frequency cepstral coefficients of the respective mode signals are calculated in the mapping range of frequency domain, and the optimized Mel frequency cepstral coefficients are used as the input of long and short time memory network (LSTM) which is trained and verified, and the fault diagnosis model of the diesel engine is obtained. The experimental part compares the fault diagnosis effects of different feature extraction methods, different modal decomposition methods and different classifiers, finally verifying the feasibility and effectiveness of the method proposed in this paper, and providing solutions to the problem of how to realise fault diagnosis using acoustic signals.
APA, Harvard, Vancouver, ISO, and other styles
15

Heriyanto, Heriyanto, Tenia Wahyuningrum, and Gita Fadila Fitriana. "Classification of Javanese Script Hanacara Voice Using Mel Frequency Cepstral Coefficient MFCC and Selection of Dominant Weight Features." JURNAL INFOTEL 13, no. 2 (May 30, 2021): 84–93. http://dx.doi.org/10.20895/infotel.v13i2.657.

Full text
Abstract:
This study investigates the sound of Hanacaraka in Javanese to select the best frame feature in checking the reading sound. Selection of the right frame feature is needed in speech recognition because certain frames have accuracy at their dominant weight, so it is necessary to match frames with the best accuracy. Common and widely used feature extraction models include the Mel Frequency Cepstral Coefficient (MFCC). The MFCC method has an accuracy of 50% to 60%. This research uses MFCC and the selection of Dominant Weight features for the Javanese language script sound Hanacaraka which produces a frame and cepstral coefficient as feature extraction. The use of the cepstral coefficient ranges from 0 to 23 or as many as 24 cepstral coefficients. In comparison, the captured frame consists of 0 to 10 frames or consists of eleven frames. A sound sampling of 300 recorded voice sampling was tested on 300 voice recordings of both male and female voice recordings. The frequency used is 44,100 kHz 16-bit stereo. The accuracy results show that the MFCC method with the ninth frame selection has a higher accuracy rate of 86% than other frames.
APA, Harvard, Vancouver, ISO, and other styles
16

Noda, Juan J., Carlos M. Travieso-González, David Sánchez-Rodríguez, and Jesús B. Alonso-Hernández. "Acoustic Classification of Singing Insects Based on MFCC/LFCC Fusion." Applied Sciences 9, no. 19 (October 1, 2019): 4097. http://dx.doi.org/10.3390/app9194097.

Full text
Abstract:
This work introduces a new approach for automatic identification of crickets, katydids and cicadas analyzing their acoustic signals. We propose the building of a tool to identify this biodiversity. The study proposes a sound parameterization technique designed specifically for identification and classification of acoustic signals of insects using Mel Frequency Cepstral Coefficients (MFCC) and Linear Frequency Cepstral Coefficients (LFCC). These two sets of coefficients are evaluated individually as has been done in previous studies and have been compared with the fusion proposed in this work, showing an outstanding increase in identification and classification at species level reaching a success rate of 98.07% on 343 insect species.
APA, Harvard, Vancouver, ISO, and other styles
17

Li, Guan Yu, Hong Zhi Yu, Yong Hong Li, and Ning Ma. "Features Extraction for Lhasa Tibetan Speech Recognition." Applied Mechanics and Materials 571-572 (June 2014): 205–8. http://dx.doi.org/10.4028/www.scientific.net/amm.571-572.205.

Full text
Abstract:
Speech feature extraction is discussed. Mel frequency cepstral coefficients (MFCC) and perceptual linear prediction coefficient (PLP) method is analyzed. These two types of features are extracted in Lhasa large vocabulary continuous speech recognition system. Then the recognition results are compared.
APA, Harvard, Vancouver, ISO, and other styles
18

Helmiyah, Siti, Abdul Fadlil, and Anton Yudhana. "Pengenalan Pola Emosi Manusia Berdasarkan Ucapan Menggunakan Ekstraksi Fitur Mel-Frequency Cepstral Coefficients (MFCC)." CogITo Smart Journal 4, no. 2 (February 8, 2019): 372. http://dx.doi.org/10.31154/cogito.v4i2.129.372-381.

Full text
Abstract:
Human emotion recognition subject becomes important due to it's usability in daily lifestyle which requires human and computer interraction. Human emotion recognition is a complex problem due to the difference within custom tradition and specific dialect which exists on different ethnic, region and community. This problem also exacerbated due to objectivity assessment for the emotion is difficult since emotion happens unconsciously. This research conducts an experiment to discover pattern of emotion based on feature extracted from speech. Method used for feature extraction on this experiment is Mel-Frequency Cepstral Coefficient (MFCC) which is a method that similar to the human hearing system. Dataset used on this experiment is Berlin Database of Emotional Speech (Emo-DB). Emotions that are used for this experiments are happiness, boredom, neutral, sad and anger. For each of these emotion, 3 samples from Emo-DB are taken as experimental subject. The emotion patterns are successfully visible using specific values for MFCC parameters such as 25 for frame duration, 10 for frame shift, 0.97 for preemphasis coefficient, 20 for filterbank channel and 12 for ceptral coefficients. MFCC features are then extracted and calculated to find mean values from these parameters. These mean values are then plotted based on timeframe graph to be investigated to find the specific pattern which appears from each emotion. Keywords— Emotion, Speech, Mel-Frequency Cepstral Coefficients (MFCC).
APA, Harvard, Vancouver, ISO, and other styles
19

Vashkevich, M. I., D. S. Likhachov, and E. S. Azarov. "Voice Analysis and Classification System Based on Perturbation Parameters and Cepstral Presentation in Psychoacoustic Scales." Doklady BGUIR 20, no. 1 (March 1, 2022): 73–82. http://dx.doi.org/10.35596/1729-7648-2022-20-1-73-82.

Full text
Abstract:
The paper describes an approach to design a system for analyzing and classification of a voice signal based on perturbation parameters and cepstral representation. Two variants of the cepstral representation of the voice signal are considered: based on mel-frequency cepstral coefficients (MFCC) and based on bark-frequency cepstral coefficients (BFCC). The work used a generally accepted approach to calculating the MFCC based on the time-frequency analysis by the method of discrete Fourier transform (DFT) with summation of energy in subbands. This method approximates the frequency resolution of human hearing, but has a fixed temporal resolution. As an alternative, a variant of the cepstral representation based on the BFCC has been proposed. When calculating the BFCC, a warped DFT-modulated filter bank was used, which approximates the frequency and temporal resolution of hearing. The aim of the work was to compare the effectiveness of the use of features based on the MFCC and BFCC for the designing systems for the analysis and classification of the voice signal. The results of the experiment showed that in the case when using acoustic features based on the MFCC, it is possible to obtain a voice classification system with an average recall of 80.6 %, and in the case when using features based on the BFCC, this metric is 83.7 %. With the addition of the set of MFCC features with perturbation parameters of the voice, the average recall of the classification increased to 94.1 %, with a similar addition to the set of BFCC features, the average recall of the classification increased up to 96.7 %.
APA, Harvard, Vancouver, ISO, and other styles
20

Alasadi, A. A., T. H. Aldhayni, R. R. Deshmukh, A. H. Alahmadi, and A. S. Alshebami. "Efficient Feature Extraction Algorithms to Develop an Arabic Speech Recognition System." Engineering, Technology & Applied Science Research 10, no. 2 (April 4, 2020): 5547–53. http://dx.doi.org/10.48084/etasr.3465.

Full text
Abstract:
This paper studies three feature extraction methods, Mel-Frequency Cepstral Coefficients (MFCC), Power-Normalized Cepstral Coefficients (PNCC), and Modified Group Delay Function (ModGDF) for the development of an Automated Speech Recognition System (ASR) in Arabic. The Support Vector Machine (SVM) algorithm processed the obtained features. These feature extraction algorithms extract speech or voice characteristics and process the group delay functionality calculated straight from the voice signal. These algorithms were deployed to extract audio forms from Arabic speakers. PNCC provided the best recognition results in Arabic speech in comparison with the other methods. Simulation results showed that PNCC and ModGDF were more accurate than MFCC in Arabic speech recognition.
APA, Harvard, Vancouver, ISO, and other styles
21

Astuti, Dwi. "Aplikasi Identifikasi Suara Hewan Menggunakan Metode Mel-Frequency Cepstral Coefficients (MFCC)." Journal of Informatics, Information System, Software Engineering and Applications (INISTA) 1, no. 2 (May 30, 2019): 26–34. http://dx.doi.org/10.20895/inista.v1i2.50.

Full text
Abstract:
Pengenalan suara berada dibawah bidang komputasi linguistik. Hal ini mencakup identifikasi, pengakuan, dan terjemahan ucapan yang terdeteksi ke dalam teks oleh komputer. Penelitian ini menggunakan handphone dan sistem yang dirancang menggunakan suara. Tujuan utama dari penelitian ini adalah menggunakan teknik pengenalan suara untuk mendeteksi, mengidentifikasi dan menerjemahkan suara binatang. Sistem ini terdiri dari dua tahap yaitu pelatihan dan pengujian. Pelatihan melibatkan pengajaran sistem dengan membangun kamus, model akustik untuk setiap kata yang perlu dikenali oleh sistem (analisis offline). Tahap pengujian menggunakan model akustik untuk mengenali kata-kata terisolasi menggunakan algoritma klasifikasi. Aplikasi penyimpanan audio untuk mengidentifikasi berbagai suara binatang dapat dilakukan dengan lebih akurat dimasa depan.
APA, Harvard, Vancouver, ISO, and other styles
22

Zhu, Qiang, Zhong Wang, Yunfeng Dou, and Jian Zhou. "Whispered Speech Conversion Based on the Inversion of Mel Frequency Cepstral Coefficient Features." Algorithms 15, no. 2 (February 20, 2022): 68. http://dx.doi.org/10.3390/a15020068.

Full text
Abstract:
A conversion method based on the inversion of Mel frequency cepstral coefficient (MFCC) features was proposed to convert whispered speech into normal speech. First, the MFCC features of whispered speech and normal speech were extracted and a matching relation between the MFCC feature parameters of whispered speech and normal speech was developed through the Gaussian mixture model (GMM). Then, the MFCC feature parameters of normal speech corresponding to whispered speech were obtained based on the GMM and, finally, whispered speech was converted into normal speech through the inversion of MFCC features. The experimental results showed that the cepstral distortion (CD) of the normal speech converted by the proposed method was 21% less than that of the normal speech converted by the linear predictive coefficient (LPC) features, the mean opinion score (MOS) was 3.56, and a satisfactory outcome in both intelligibility and sound quality was achieved.
APA, Harvard, Vancouver, ISO, and other styles
23

Heriyanto, Heriyanto, and Dyah Ayu Irawati. "Comparison of Mel Frequency Cepstral Coefficient (MFCC) Feature Extraction, With and Without Framing Feature Selection, to Test the Shahada Recitation." RSF Conference Series: Engineering and Technology 1, no. 1 (December 23, 2021): 335–54. http://dx.doi.org/10.31098/cset.v1i1.395.

Full text
Abstract:
Voice research for feature extraction using MFCC. Introduction with feature extraction as the first step to get features. Features need to be done further through feature selection. The feature selection in this research used the Dominant Weight feature for the Shahada voice, which produced frames and cepstral coefficients as the feature extraction. The cepstral coefficient was used from 0 to 23 or 24 cepstral coefficients. At the same time, the taken frame consisted of 0 to 10 frames or eleven frames. Voting as many as 300 samples of recorded voices were tested on 200 voices of both male and female voice recordings. The frequency used was 44.100 kHz 16-bit stereo. This research aimed to gain accuracy by selecting the right features on the frame using MFCC feature extraction and matching accuracy with frame feature selection using the Dominant Weight Normalization (NBD). The accuracy results obtained that the MFCC method with the selection of the 9th frame had a higher accuracy rate of 86% compared to other frames. The MFCC without feature selection had an average of 60%. The conclusion was that selecting the right features in the 9th frame impacted the accuracy of the voice of shahada recitation.
APA, Harvard, Vancouver, ISO, and other styles
24

Heriyanto, Heriyanto, Sri Hartati, and Agfianto Eko Putra. "EKSTRAKSI CIRI MEL FREQUENCY CEPSTRAL COEFFICIENT (MFCC) DAN RERATA COEFFICIENT UNTUK PENGECEKAN BACAAN AL-QUR’AN." Telematika 15, no. 2 (October 31, 2018): 99. http://dx.doi.org/10.31315/telematika.v15i2.3123.

Full text
Abstract:
AbstrakBelajar membaca Al-Qur’an menggunakan alat bantu aplikasi sangat diperlukan dalam mempermudah dan memahami bacaan Al-Qur’an. Pengecekan bacaan Al-Qur’an salah satu metode dengan MFCC untuk pengenalan suara cukup baik dalam speech recognition.Metode tersebut telah lama diperkenalkan oleh Davis dan Mermelstein sekitar tahun 1980. MFCC merupakan metode ekstraksi ciri untuk mendapatkan cepstral coefficient dan frame sehingga dapat digunakan untuk pemrosesan pengenalan suara agar lebih baik dalam ketepatan. Tahapan MFCC mulai dari pre-emphasis, frame blocking, windowing, Fast Fourier Transform (FFT), Mel Frequency Wrapping (MFW), Discrete Cosine Transoform (DCT) dan cepstral liftreing. Hasil pengecekan bacaan Al-Qur’an diujikan dalam sebelas surat mulai dari surat Al-Fatihah, Al-Baqarah, Al-Imran, Al-Hadid, Al-Ashr, Ar-rahman, Al-Alaq, Al-Kautsar, Al-Ikhlas, Al-Falaq dan An-Nas menghasilkan akurasi sebesar rata-rata 51,8%. Kata Kunci : Suara, Bacaan, MFCC, Kesesuaian, Ekstraksi Ciri, Referensi, Bobot, Dominan.
APA, Harvard, Vancouver, ISO, and other styles
25

Huizen, Roy Rudolf, and Florentina Tatrin Kurniati. "Feature extraction with mel scale separation method on noise audio recordings." Indonesian Journal of Electrical Engineering and Computer Science 24, no. 2 (November 1, 2021): 815. http://dx.doi.org/10.11591/ijeecs.v24.i2.pp815-824.

Full text
Abstract:
This paper focuses on improving the accuracy of noise audio recordings. High-quality audio recording, extraction using the mel frequency cepstral coefficients (MFCC) method produces high accuracy. While the low-quality is because of noise, the accuracy is low. Improved accuracy by investigating the effect of bandwidth on the mel scale. The proposed improvement uses the mel scale separation methods into two frequency channels (MFCC dual-channel). For the comparison method using the mel scale bandwidth without separation (MFCC single-channel). Feature analysis using k-mean clustering. The data uses a noise variance of up to -16 dB. Testing on the MFCC single-channel method for -16 dB noise has an accuracy of 47.5%, while the MFCC dual-channel method has an accuracy better of 76.25%. The next test used adaptive noise-canceling (ANC) to reduce noise before extraction. The result is that the MFCC single-channel method has an accuracy of 82.5% and the MFCC dual-channel method has an accuracy better of 83.75%. High-quality audio recording testing for the MFCC single-channel method has an accuracy of 92.5% and the MFCC dual-channel method has an accuracy better of 97.5%. The test results show the effect of mel scale bandwidth to increase accuracy. The MFCC dual-channel method has higher accuracy.
APA, Harvard, Vancouver, ISO, and other styles
26

Bhalke, Daulappa Guranna, C. B. Rama Rao, and Dattatraya Bormane. "Hybridisation of Mel Frequency Cepstral Coefficient and Higher Order Spectral Features for Musical Instruments Classification." Archives of Acoustics 41, no. 3 (September 1, 2016): 427–36. http://dx.doi.org/10.1515/aoa-2016-0042.

Full text
Abstract:
Abstract This paper presents the classification of musical instruments using Mel Frequency Cepstral Coefficients (MFCC) and Higher Order Spectral features. MFCC, cepstral, temporal, spectral, and timbral features have been widely used in the task of musical instrument classification. As music sound signal is generated using non-linear dynamics, non-linearity and non-Gaussianity of the musical instruments are important features which have not been considered in the past. In this paper, hybridisation of MFCC and Higher Order Spectral (HOS) based features have been used in the task of musical instrument classification. HOS-based features have been used to provide instrument specific information such as non-Gaussianity and non-linearity of the musical instruments. The extracted features have been presented to Counter Propagation Neural Network (CPNN) to identify the instruments and their family. For experimentation, isolated sounds of 19 musical instruments have been used from McGill University Master Sample (MUMS) sound database. The proposed features show the significant improvement in the classification accuracy of the system.
APA, Harvard, Vancouver, ISO, and other styles
27

Umar, Rusydi, Imam Riadi, and Abdullah Hanif. "Analisis Bentuk Pola Suara Menggunakan Ekstraksi Ciri Mel-Frequencey Cepstral Coefficients (MFCC)." CogITo Smart Journal 4, no. 2 (January 16, 2019): 294. http://dx.doi.org/10.31154/cogito.v4i2.130.294-304.

Full text
Abstract:
Sound is a part of the human body that is unique and can be distinguished, so its application can be used in sound pattern recognition technology, one of which is used for sound biometrics. This study discusses the analysis of the form of a sound pattern that aims to determine the shape of the sound pattern of a person's character based on the spoken voice input. This study discusses the analysis of the form of a sound pattern that aims to determine the shape of the sound pattern of a person's character based on the spoken voice input. This study uses the Melf-Frequency Cepstrum Coefficients (MFCC) method for feature extraction process from speaker speech signals. The MFCC process will convert the sound signal into several feature vectors which will then be displayed in graphical form. Analysis and design of sound patterns using Matlab 2017a software. Tests were carried out on 5 users consisting of 3 men and 2 women, each user said 1 predetermined "LOGIN" word, which for 15 words said. The results of the test are the form of a sound pattern between the characteristics of 1 user with other users. Keywords—Voice, Pattern, Feature Extraction, MFCC
APA, Harvard, Vancouver, ISO, and other styles
28

Singh, Moirangthem Tiken. "Automatic Speech Recognition System: A Survey Report." Science & Technology Journal 4, no. 2 (July 1, 2016): 152–55. http://dx.doi.org/10.22232/stj.2016.04.02.10.

Full text
Abstract:
This paper presents a report on an Automatic Speech Recognition System (ASR) for different Indian language under different accent. The paper is a comparative study of the performance of system developed which uses Hidden Markov Model (HMM) as the classifier and Mel-Frequency Cepstral Coefficients (MFCC) as speech features.
APA, Harvard, Vancouver, ISO, and other styles
29

Mohd Ali, Yusnita, Emilia Noorsal, Nor Fadzilah Mokhtar, Siti Zubaidah Md Saad, Mohd Hanapiah Abdullah, and Lim Chee Chin. "Speech-based gender recognition using linear prediction and mel-frequency cepstral coefficients." Indonesian Journal of Electrical Engineering and Computer Science 28, no. 2 (November 1, 2022): 753. http://dx.doi.org/10.11591/ijeecs.v28.i2.pp753-761.

Full text
Abstract:
Gender discrimination and awareness are essentially practiced in social, education, workplace, and economic sectors across the globe. A person manifests this attribute naturally in gait, body gesture, facial, including speech. For that reason, automatic gender recognition (AGR) has become an interesting sub-topic in speech recognition systems that can be found in many speech technology applications. However, retrieving salient gender-related information from a speech signal is a challenging problem since speech contains abundant information apart from gender. The paper intends to compare the performance of human vocal tract-based model i.e., linear prediction coefficients (LPC) and human auditory-based model i.e., Mel-frequency cepstral coefficients (MFCC) which are popularly used in other speech recognition tasks by experimentation of optimal feature parameters and classifier’s parameters. The audio data used in this study was obtained from 93 speakers uttering selected words with different vowels. The two feature vectors were tested using two classification algorithms namely, discriminant analysis (DA) and artificial neural network (ANN). Although the experimental results were promising using both feature parameters, the best overall accuracy rate of 97.07% was recorded using MFCC-ANN techniques with almost equal performance for male and female classes.
APA, Harvard, Vancouver, ISO, and other styles
30

Mengistu, Abrham Debasu, and Dagnachew Melesew Alemayehu. "Text Independent Amharic Language Speaker Identification in Noisy Environments using Speech Processing Techniques." Indonesian Journal of Electrical Engineering and Computer Science 5, no. 1 (January 1, 2017): 109. http://dx.doi.org/10.11591/ijeecs.v5.i1.pp109-114.

Full text
Abstract:
<p>In Ethiopia, the largest ethnic and linguistic groups are the Oromos, Amharas and Tigrayans. This paper presents the performance analysis of text-independent speaker identification system for the Amharic language in noisy environments. VQ (Vector Quantization), GMM (Gaussian Mixture Models), BPNN (Back propagation neural network), MFCC (Mel-frequency cepstrum coefficients), GFCC (Gammatone Frequency Cepstral Coefficients), and a hybrid approach had been use as techniques for identifying speakers of Amharic language in noisy environments. For the identification process, speech signals are collected from different speakers including both sexes; for our data set, a total of 90 speakers’ speech samples were collected, and each speech have 10 seconds duration from each individual. From these speakers, 59.2%, 70.9% and 84.7% accuracy are achieved when VQ, GMM and BPNN are used on the combined feature vector of MFCC and GFCC. </p>
APA, Harvard, Vancouver, ISO, and other styles
31

Lalitha, S., and Deepa Gupta. "An Encapsulation of Vital Non-Linear Frequency Features for Various Speech Applications." Journal of Computational and Theoretical Nanoscience 17, no. 1 (January 1, 2020): 303–7. http://dx.doi.org/10.1166/jctn.2020.8666.

Full text
Abstract:
Mel Frequency Cepstral Coefficients (MFCCs) and Perceptual linear prediction coefficients (PLPCs) are widely casted nonlinear vocal parameters in majority of the speaker identification, speaker and speech recognition techniques as well in the field of emotion recognition. Post 1980s, significant exertions are put forth on for the progress of these features. Considerations like the usage of appropriate frequency estimation approaches, proposal of appropriate filter banks, and selection of preferred features perform a vital part for the strength of models employing these features. This article projects an overview of MFCC and PLPC features for different speech applications. The insights such as performance metrics of accuracy, background environment, type of data, and size of features are inspected and concise with the corresponding key references. Adding more to this, the advantages and shortcomings of these features have been discussed. This background work will hopefully contribute to floating a heading step in the direction of the enhancement of MFCC and PLPC with respect to novelty, raised levels of accuracy, and lesser complexity.
APA, Harvard, Vancouver, ISO, and other styles
32

de Souza, Edson Florentino, Túlio Nogueira Bittencourt, Diogo Ribeiro, and Hermes Carvalho. "Feasibility of Applying Mel-Frequency Cepstral Coefficients in a Drive-by Damage Detection Methodology for High-Speed Railway Bridges." Sustainability 14, no. 20 (October 16, 2022): 13290. http://dx.doi.org/10.3390/su142013290.

Full text
Abstract:
In this paper, a drive-by damage detection methodology for high-speed railway (HSR) bridges is addressed, to appraise the application of Mel-frequency cepstral coefficients (MFCC) to extract the Damage Index (DI). A finite element (FEM) 2D VTBI model that incorporates the train, ballasted track and bridge behavior is presented. The formulation includes track irregularities and a damaged condition induced in a specified structure region. The feasibility of applying cepstrum analysis components to the indirect damage detection in HSR by on-board sensors is evaluated by numerical simulations, in which dynamic analyses are performed through a code implemented in MATLAB. Different damage scenarios are simulated, as well as external excitations such as measurement noises and different levels of track irregularities. The results show that MFCC-based DI are highly sensitive regarding damage detection, and robust to the noise. Bridge stiffness can be recognized satisfactorily at high speeds and under different levels of track irregularities. Moreover, the magnitude of DI extracted from MFCC is related to the relative severity of the damage. The results presented in this study should be seen as a first attempt to link cepstrum-based features in an HSR drive-by damage detection approach.
APA, Harvard, Vancouver, ISO, and other styles
33

Rasyid, Muhammad Fahim, Herlina Jayadianti, and Herry Sofyan. "APLIKASI PENGENALAN PENUTUR PADA IDENTIFIKASI SUARA PENELEPON MENGGUNAKAN MEL-FREQUENCY CEPSTRAL COEFFICIENT DAN VECTOR QUANTIZATION (Studi Kasus : Layanan Hotline Universitas Pembangunan Nasional “Veteran” Yogyakarta)." Telematika 17, no. 2 (October 31, 2020): 68. http://dx.doi.org/10.31315/telematika.v1i1.3380.

Full text
Abstract:
Layanan hotline Universitas Pembangunan Nasional “Veteran” Yogyakarta merupakan layanan yang dapat digunakan oleh semua orang. Layanan tersebut digunakan dosen dan pegawai untuk berbagi informasi dengan bagian-bagian yang berlokasi di gedung rektorat. Penelepon dapat berkomunikasi dengan bagian yang dituju apabila telah teridentifikasi oleh petugas layanan hotline. Terminologi identitas yang terdiri dari nama, jabatan serta asal jurusan atau bagian ditanyakan saat proses identifikasi. Tidak terdapat catatan hasil identifikasi penelepon baik dalam bentuk fisik maupun basis data yang terekam pada komputer. Hal tersebut mengakibatkan tidak adanya dokumentasi yang dapat dijadikan barang bukti untuk menindak lanjuti kasus kesalahan identifikasi. Penelitian ini fokus untuk mengurangi resiko kesalahan identifikasi penelepon menggunakan teknologi speaker recognition. Frekuensi suara diekstraksi menggunakan metode Mel-Frequency Cepstral Coefficient (MFCC) sehingga dihasilkan nilai Mel Frequency Cepstrum Coefficients. Nilai Mel Frequency Cepstrum Coefficients dari semua data latih suara pegawai Universitas Pembangunan Nasional “Veteran” Yogyakarta kemudian dibandingkan dengan sinyal suara penelpon menggunakan metode Vector Quantization (VQ). Aplikasi pengenalan penutur mampu mengidentifikasi suara penelepon dengan tingkat akurasi 80% pada nilai ambang (threshold) 25.
APA, Harvard, Vancouver, ISO, and other styles
34

Deng, Lei, and Yong Gao. "Gammachirp Filter Banks Applied in Roust Speaker Recognition Based GMM-UBM Classifier." International Arab Journal of Information Technology 17, no. 2 (February 28, 2019): 170–77. http://dx.doi.org/10.34028/iajit/17/2/4.

Full text
Abstract:
In this paper, authors propose an auditory feature extraction algorithm in order to improve the performance of the speaker recognition system in noisy environments. In this auditory feature extraction algorithm, the Gammachirp filter bank is adapted to simulate the auditory model of human cochlea. In addition, the following three techniques are applied: cube-root compression method, Relative Spectral Filtering Technique (RASTA), and Cepstral Mean and Variance Normalization algorithm (CMVN).Subsequently, based on the theory of Gaussian Mixes Model-Universal Background Model (GMM-UBM), the simulated experiment was conducted. The experimental results implied that speaker recognition systems with the new auditory feature has better robustness and recognition performance compared to Mel-Frequency Cepstral Coefficients(MFCC), Relative Spectral-Perceptual Linear Predictive (RASTA-PLP),Cochlear Filter Cepstral Coefficients (CFCC) and gammatone Frequency Cepstral Coefficeints (GFCC)
APA, Harvard, Vancouver, ISO, and other styles
35

Thakur, Surendra, Emmanuel Adetiba, Oludayo O. Olugbara, and Richard Millham. "Experimentation Using Short-Term Spectral Features for Secure Mobile Internet Voting Authentication." Mathematical Problems in Engineering 2015 (2015): 1–21. http://dx.doi.org/10.1155/2015/564904.

Full text
Abstract:
We propose a secure mobile Internet voting architecture based on the Sensus reference architecture and report the experiments carried out using short-term spectral features for realizing the voice biometric based authentication module of the architecture being proposed. The short-term spectral features investigated are Mel-Frequency Cepstral Coefficients (MFCCs), Mel-Frequency Discrete Wavelet Coefficients (MFDWC), Linear Predictive Cepstral Coefficients (LPCC), and Spectral Histogram of Oriented Gradients (SHOGs). The MFCC, MFDWC, and LPCC usually have higher dimensions that oftentimes lead to high computational complexity of the pattern matching algorithms in automatic speaker recognition systems. In this study, higher dimensions of each of the short-term features were reduced to an 81-element feature vector per Speaker using Histogram of Oriented Gradients (HOG) algorithm while neural network ensemble was utilized as the pattern matching algorithm. Out of the four short-term spectral features investigated, the LPCC-HOG gave the best statistical results withRstatistic of 0.9127 and mean square error of 0.0407. These compact LPCC-HOG features are highly promising for implementing the authentication module of the secure mobile Internet voting architecture we are proposing in this paper.
APA, Harvard, Vancouver, ISO, and other styles
36

Abdul, Zrar Khalid. "Kurdish Spoken Letter Recognition based on k-NN and SVM Model." Journal of University of Raparin 7, no. 4 (November 30, 2020): 1–12. http://dx.doi.org/10.26750/vol(7).no(4).paper1.

Full text
Abstract:
Automatic recognition of spoken letters is one of the most challenging tasks in the area of speech recognition system. In this paper, different machine learning approaches are used to classify the Kurdish alphabets such as SVM and k-NN where both approaches are fed by two different features, Linear Predictive Coding (LPC) and Mel Frequency Cepstral Coefficients (MFCCs). Moreover, the features are combined together to learn the classifiers. The experiments are evaluated on the dataset that are collected by the authors as there as not standard Kurdish dataset. The dataset consists of 2720 samples as a total. The results show that the MFCC features outperforms the LPC features as the MFCCs have more relative information of vocal track. Furthermore, fusion of the features (MFCC and LPC) is not capable to improve the classification rate significantly.
APA, Harvard, Vancouver, ISO, and other styles
37

Mohammed, Duraid Y., Khamis Al-Karawi, and Ahmed Aljuboori. "Robust speaker verification by combining MFCC and entrocy in noisy conditions." Bulletin of Electrical Engineering and Informatics 10, no. 4 (August 1, 2021): 2310–19. http://dx.doi.org/10.11591/eei.v10i4.2957.

Full text
Abstract:
Automatic speaker recognition may achieve remarkable performance in matched training and test conditions. Conversely, results drop significantly in incompatible noisy conditions. Furthermore, feature extraction significantly affects performance. Mel-frequency cepstral coefficients MFCCs are most commonly used in this field of study. The literature has reported that the conditions for training and testing are highly correlated. Taken together, these facts support strong recommendations for using MFCC features in similar environmental conditions (train/test) for speaker recognition. However, with noise and reverberation present, MFCC performance is not reliable. To address this, we propose a new feature 'entrocy' for accurate and robust speaker recognition, which we mainly employ to support MFCC coefficients in noisy environments. Entrocy is the fourier transform of the entropy, a measure of the fluctuation of the information in sound segments over time. Entrocy features are combined with MFCCs to generate a composite feature set which is tested using the gaussian mixture model (GMM) speaker recognition method. The proposed method shows improved recognition accuracy over a range of signal-to-noise ratios.
APA, Harvard, Vancouver, ISO, and other styles
38

Heriyanto, Heriyanto, Herlina Jayadianti, and Juwairiah Juwairiah. "The Implementation Of Mfcc Feature Extraction And Selection of Cepstral Coefficient for Qur’an Recitation in TPA (Qur’an Learning Center) Nurul Huda Plus Purbayan." RSF Conference Series: Engineering and Technology 1, no. 1 (December 23, 2021): 453–78. http://dx.doi.org/10.31098/cset.v1i1.417.

Full text
Abstract:
There are two approaches to Qur’an recitation, namely talaqqi and qira'ati. Both approaches use the science of recitation containing knowledge of the rules and procedures for reading the Qur'an properly. Talaqqi requires the teacher and students to sit facing each other while qira'ati is the recitation of the Qur'an with rhythms and tones. Many studies have developed an automatic speech recognition system for Qur’an recitation to help the learning process. Feature extraction model using Mel Frequency Cepstral Coefficient (MFCC) and Linear Predictive Code (LPC). The MFCC method has an accuracy of 50% to 60% while the accuracy of Linear Predictive Code (LPC) is only 45% to 50%, so the non-linear MFCC method has higher accuracy than the linear approach method. The cepstral coefficient feature that is used starts from 0 to 23 or 24 cepstral coefficients. Meanwhile, the frame taken consists of 0 to 10 frames or eleven frames. Voting for 300 recorded voice samples was tested against 200 voice recordings, both male and female voices. The frequency used was 44.100 kHz stereo 16 bit. This study aims to obtain good accuracy by selecting the right feature on the cepstral coefficient using MFCC feature extraction and matching accuracy through the selection of the cepstral coefficient feature with Dominant Weight Normalization (NBD) at TPA Nurul Huda Plus Purbayan. Accuracy results showed that the MFCC method with the selection of the 23rd cepstral coefficient has a higher accuracy rate of 90.2% compared to the others. It can be concluded that the selection of the right features on the 23rd cepstral coefficient affects the accuracy of the voice of Qur’an recitation.
APA, Harvard, Vancouver, ISO, and other styles
39

Sarkar, Swagata, Sanjana R, Rajalakshmi S, and Harini T J. "Simulation and detection of tamil speech accent using modified mel frequency cepstral coefficient algorithm." International Journal of Engineering & Technology 7, no. 3.3 (June 8, 2018): 426. http://dx.doi.org/10.14419/ijet.v7i2.33.14202.

Full text
Abstract:
Automatic Speech reconstruction system is a topic of interest of many researchers. Since many online courses are come into the picture, so recent researchers are concentrating on speech accent recognition. Many works have been done in this field. In this paper speech accent recognition of Tamil speech from different zones of Tamilnadu is addressed. Hidden Markov Model (HMM) and Viterbi algorithms are very popularly used algorithms. Researchers have worked with Mel Frequency Cepstral Coefficients (MFCC) to identify speech as well as speech accent. In this paper speech accent features are identified by modified MFCC algorithm. The classification of features is done by back propagation algorithm.
APA, Harvard, Vancouver, ISO, and other styles
40

Bhagat, Bhavesh, and Mohit Dua. "Enhancing Performance of End-to-End Gujarati Language ASR using combination of Integrated Feature Extraction and Improved Spell Corrector Algorithm." ITM Web of Conferences 54 (2023): 01016. http://dx.doi.org/10.1051/itmconf/20235401016.

Full text
Abstract:
A number of intricate deep learning architectures for effective End-to-End (E2E) speech recognition systems have emerged due to recent advancements in algorithms and technical resources. The proposed work develops an ASR system for the publicly accessible dataset on Gujarati language. The approach provided in this research combines features like Mel frequency Cepstral Coefficients (MFCC) and Constant Q Cepstral Coefficients (CQCC) at front-end feature extraction methodologies. Enhanced spell corrector with BERT-based algorithm and Gated Recurrent Units (GRU) based DeepSpeech2 architecture are used to implement the back end portion of the proposed ASR system. The proposed study shown that combining the MFCC features and CQCC features extracted from speech with the GRU-based DeepSpeech2 model and the upgraded or enhanced spell corrector improves the Word Error Rate (WER) by 17.46% when compared to the model without post processing.
APA, Harvard, Vancouver, ISO, and other styles
41

Lee, Ji-Yeoun. "Classification between Elderly Voices and Young Voices Using an Efficient Combination of Deep Learning Classifiers and Various Parameters." Applied Sciences 11, no. 21 (October 21, 2021): 9836. http://dx.doi.org/10.3390/app11219836.

Full text
Abstract:
The objective of this research was to develop deep learning classifiers and various parameters that provide an accurate and objective system for classifying elderly and young voice signals. This work focused on deep learning methods, such as feedforward neural network (FNN) and convolutional neural network (CNN), for the detection of elderly voice signals using mel-frequency cepstral coefficients (MFCCs) and linear prediction cepstrum coefficients (LPCCs), skewness, as well as kurtosis parameters. In total, 126 subjects (63 elderly and 63 young) were obtained from the Saarbruecken voice database. The highest performance of 93.75% appeared when the skewness was added to the MFCC and MFCC delta parameters, although the fusion of the skewness and kurtosis parameters had a positive effect on the overall accuracy of the classification. The results of this study also revealed that the performance of FNN was higher than that of CNN. Most parameters estimated from male data samples demonstrated good performance in terms of gender. Rather than using mixed female and male data, this work recommends the development of separate systems that represent the best performance through each optimized parameter using data from independent male and female samples.
APA, Harvard, Vancouver, ISO, and other styles
42

Mahalakshmi, P., Muruganandam M, and Sharmila A. "VOICE RECOGNITION SECURITY SYSTEM USING MEL-FREQUENCY CEPSTRUM COEFFICIENTS." Asian Journal of Pharmaceutical and Clinical Research 9, no. 9 (December 1, 2016): 131. http://dx.doi.org/10.22159/ajpcr.2016.v9s3.13633.

Full text
Abstract:
ABSTRACTObjective: Voice Recognition is a fascinating field spanning several areas of computer science and mathematics. Reliable speaker recognition is a hardproblem, requiring a combination of many techniques; however modern methods have been able to achieve an impressive degree of accuracy. Theobjective of this work is to examine various speech and speaker recognition techniques and to apply them to build a simple voice recognition system.Method: The project is implemented on software which uses different techniques such as Mel frequency Cepstrum Coefficient (MFCC), VectorQuantization (VQ) which are implemented using MATLAB.Results: MFCC is used to extract the characteristics from the input speech signal with respect to a particular word uttered by a particular speaker. VQcodebook is generated by clustering the training feature vectors of each speaker and then stored in the speaker database.Conclusion: Verification of the speaker is carried out using Euclidian Distance. For voice recognition we implement the MFCC approach using softwareplatform MatlabR2013b.Keywords: Mel-frequency cepstrum coefficient, Vector quantization, Voice recognition, Hidden Markov model, Euclidean distance.
APA, Harvard, Vancouver, ISO, and other styles
43

Hanafi, Dirman, and Abdul Syafiq Abdul Sukor. "Speaker Identification Using K-means Method Based on Mel Frequency Cepstral Coefficients(MFCC)." i-manager's Journal on Embedded Systems 1, no. 1 (April 15, 2012): 19–28. http://dx.doi.org/10.26634/jes.1.1.1729.

Full text
APA, Harvard, Vancouver, ISO, and other styles
44

Heriyanto, Heriyanto. "Good Morning to Good Night Greeting Classification Using Mel Frequency Cepstral Coefficient (MFCC) Feature Extraction and Frame Feature Selection." Telematika 18, no. 1 (March 16, 2021): 88. http://dx.doi.org/10.31315/telematika.v18i1.4495.

Full text
Abstract:
Purpose:Select the right features on the frame for good accuracyDesign/methodology/approach:Extraction of Mel Frequency Cepstral Coefficient (MFCC) Features and Selection of Dominant Weight Normalized (DWN) FeaturesFindings/result:The accuracy results show that the MFCC method with the 9th frame selection has a higher accuracy rate of 85% compared to other frames.Originality/value/state of the art:Selection of the appropriate features on the frame.
APA, Harvard, Vancouver, ISO, and other styles
45

Dirgantoro, Kevin Putra, Bambang Hidayat, and Nur Andini. "PERBANDINGAN STEGANALISIS SINYAL WICARA BERFORMAT .WAV ANTARA METODE ANALISIS CEPSTRAL DAN MEL-FREQUENCY CEPSTRAL COEFFICIENT (MFCC)." TEKTRIKA - Jurnal Penelitian dan Pengembangan Telekomunikasi, Kendali, Komputer, Elektrik, dan Elektronika 3, no. 2 (August 23, 2019): 56. http://dx.doi.org/10.25124/tektrika.v3i2.2224.

Full text
Abstract:
Teknik menyembunyikan pesan rahasia ke dalam suatu data tertentu atau yang biasa dikenal dengan steganografi mengalami perkembangan yang sangat pesat. Namun, ternyata metode penyembunyian pesan ini juga menimbulkan masalah, di antaranya pihak-pihak yang tidak bertanggung jawab menggunakan teknik tersebut untuk kegiatan kriminalitas. Oleh karena itu, diperlukan teknik untuk mendeteksi pesan tersembunyi di dalam suatu data. Teknik tersebut dikenal dengan istilah steganalisis. Pada penelitian ini, dilakukan analisis terhadap berkas sinyal wicara yang berformat .wav, dengan menggunakan dua metode, yaitu analisis cepstral dan MelFrequency Cepstral Coefficient (MFCC). Perbandingan dari kedua metode ini dilakukan untuk mengetahui metode mana yang lebih baik untuk mendeteksi data memiliki pesan rahasia atau tidak. Nilai akurasi yang didapat dengan menggunakan 45 data latih dan uji untuk metode analisis cepstral yaitu sebesar 51,11%, sedangkan untuk MFCC sebesar 77,78%. Nilai akurasi tersebut didapat dari ciri statistik yang terdiri dari nilai kurtosis, skewness, dan standard deviation yang dihasilkan dari kedua metode, dengan menggunakan metode klasifikasi Support Vector Machine (SVM).
APA, Harvard, Vancouver, ISO, and other styles
46

ARORA, SHRUTI, SUSHMA JAIN, and INDERVEER CHANA. "A FUSION FRAMEWORK BASED ON CEPSTRAL DOMAIN FEATURES FROM PHONOCARDIOGRAM TO PREDICT HEART HEALTH STATUS." Journal of Mechanics in Medicine and Biology 21, no. 04 (April 22, 2021): 2150034. http://dx.doi.org/10.1142/s0219519421500342.

Full text
Abstract:
A great increase in the number of cardiovascular cases has been a cause of serious concern for the medical experts all over the world today. In order to achieve valuable risk stratification for patients, early prediction of heart health can benefit specialists to make effective decisions. Heart sound signals help to know about the condition of heart of a patient. Motivated by the success of cepstral features in speech signal classification, authors have used here three different cepstral features, viz. Mel-frequency cepstral coefficients (MFCCs), gammatone frequency cepstral coefficients (GFCCs), and Mel-spectrogram for classifying phonocardiogram into normal and abnormal. Existing research has explored only MFCCs and Mel-feature set extensively for classifying the phonocardiogram. However, in this work, the authors have used a fusion of GFCCs with MFCCs and Mel-spectrogram, and achieved a better accuracy score of 0.96 with sensitivity and specificity scores as 0.91 and 0.98, respectively. The proposed model has been validated on the publicly available benchmark dataset PhysioNet 2016.
APA, Harvard, Vancouver, ISO, and other styles
47

Sari, Puspita Kartika, Karlisa Priandana, and Agus Buono. "Perbandingan Sistem Perhitungan Suara Tepuk Tangan dengan Metode Berbasis Frekuensi dan Metode Berbasis Amplitudo." Jurnal Ilmu Komputer dan Agri-Informatika 2, no. 1 (May 1, 2013): 29. http://dx.doi.org/10.29244/jika.2.1.29-37.

Full text
Abstract:
<p>Sistem penilaian berdasarkan suara tepuk tangan sering digunakan dalam acara perlombaan di Indonesia. Namun, penentuan pemenang dengan cara konvensional cenderung subjektif. Penelitian ini mengembangkan sistem penilaian otomatis berbasis komputer untuk menghitung jumlah orang bertepuk tangan dan menentukan pemenang dari perlombaan berdasarkan tepuk tangan. Penelitian ini membandingkan dua metode yang dapat diterapkan yaitu metode berbasis frekuensi dan metode berbasis amplitudo. Metode yang berbasis frekuensi mengimplementasikan Mel Frequency Cepstral Coefficient (MFCC) sebagai pengekstraksi ciri dan codebook sebagai pengenal pola. Hasil yang diperoleh merupakan suatu model berupa kelas-kelas yang diklasterkan oleh K-Means clustering. Parameter penting dalam metode ini adalah jumlah koefisien cepstral, overlap, time frame, dan jumlah klaster. Beberapa pengujian dilakukan untuk menemukan parameter optimum dengan nilai akurasi tertinggi. Metode kedua merupakan metode berbasis amplitudo yang dilakukan dengan menghitung jumlah sampel sinyal yang memiliki nilai amplitudo di atas nilai-nilai ambang (thresholds) tertentu yang menghasilkan akurasi maksimum. Hasil penelitian menunjukkan bahwa akurasi sistem berbasis frekuensi untuk tepuk tangan periodik adalah 83.3% dan untuk tepuk tangan acak ialah 50% sehingga akurasi sistem untuk tepuk tangan acak berbasis threshold yang lebih sederhana ialah 66.7 %. Dengan demikian, metode berbasis amplitudo baik digunakan.</p><p>Kata kunci: Codebook, K-means, Mel Frequency Cepstral Coefficients (MFCC), Pengenalan Suara, Threshold</p>
APA, Harvard, Vancouver, ISO, and other styles
48

Zaporowski, Szymon, Andrzej Czyzewski, and Bozena Kostek. "Audio feature optimization approach towards speaker authentication in banking biometric system." Journal of the Acoustical Society of America 150, no. 4 (October 2021): A349. http://dx.doi.org/10.1121/10.0008549.

Full text
Abstract:
Experiments were carried out using algorithms such as Principal Component Analysis, Feature Importance, and Recursive Parameter Elimination dentifying most meaningful Mel-Frequency Cepstral Coefficients representing speech excerpts prepared for their classification are presented and discussed. The parameterization was made using Mel Frequency Cepstral Coefficients, Delta MFCC and Delta MFCC. In the next stage, feature vectors were passed to the input of individual algorithms utilized to reduce the size of the vector by previously mentioned algorithms. The vectors prepared in this way have been used for classifying vocalic segments employing Artificial Neural Network (ANN) and Support Vector Machine (SVM). The classification results using both classifiers and methods applied for reducing the number of parameters were presented. The results of the reduction are also shown explicitly, by indicating parameters proven to be significant and those rejected by particular algorithms. Factors influencing the obtained results were considered, such as difficulties associated with obtaining the data set, and ts labeling. The broader context of banking biometrics research carried-out and the results obtained in this domain were also discussed. [Project No. POIR.01.01.01-0092/19 entitled: “BIOPUAP—A biometric cloud authentication system” is currently financed by the Polish National Centre for Research and Development (NCBR) from the European Regional Development Fund.]
APA, Harvard, Vancouver, ISO, and other styles
49

Dua, Mohit, Rajesh Kumar Aggarwal, and Mantosh Biswas. "Discriminative Training Using Noise Robust Integrated Features and Refined HMM Modeling." Journal of Intelligent Systems 29, no. 1 (February 20, 2018): 327–44. http://dx.doi.org/10.1515/jisys-2017-0618.

Full text
Abstract:
Abstract The classical approach to build an automatic speech recognition (ASR) system uses different feature extraction methods at the front end and various parameter classification techniques at the back end. The Mel-frequency cepstral coefficients (MFCC) and perceptual linear prediction (PLP) techniques are the conventional approaches used for many years for feature extraction, and the hidden Markov model (HMM) has been the most obvious selection for feature classification. However, the performance of MFCC-HMM and PLP-HMM-based ASR system degrades in real-time environments. The proposed work discusses the implementation of discriminatively trained Hindi ASR system using noise robust integrated features and refined HMM model. It sequentially combines MFCC with PLP and MFCC with gammatone-frequency cepstral coefficient (GFCC) to obtain MF-PLP and MF-GFCC integrated feature vectors, respectively. The HMM parameters are refined using genetic algorithm (GA) and particle swarm optimization (PSO). Discriminative training of acoustic model using maximum mutual information (MMI) and minimum phone error (MPE) is preformed to enhance the accuracy of the proposed system. The results show that discriminative training using MPE with MF-GFCC integrated feature vector and PSO-HMM parameter refinement gives significantly better results than the other implemented techniques.
APA, Harvard, Vancouver, ISO, and other styles
50

Bhalke, Daulappa Guranna, Betsy Rajesh, and Dattatraya Shankar Bormane. "Automatic Genre Classification Using Fractional Fourier Transform Based Mel Frequency Cepstral Coefficient and Timbral Features." Archives of Acoustics 42, no. 2 (June 27, 2017): 213–22. http://dx.doi.org/10.1515/aoa-2017-0024.

Full text
Abstract:
Abstract This paper presents the Automatic Genre Classification of Indian Tamil Music and Western Music using Timbral and Fractional Fourier Transform (FrFT) based Mel Frequency Cepstral Coefficient (MFCC) features. The classifier model for the proposed system has been built using K-NN (K-Nearest Neighbours) and Support Vector Machine (SVM). In this work, the performance of various features extracted from music excerpts has been analysed, to identify the appropriate feature descriptors for the two major genres of Indian Tamil music, namely Classical music (Carnatic based devotional hymn compositions) & Folk music and for western genres of Rock and Classical music from the GTZAN dataset. The results for Tamil music have shown that the feature combination of Spectral Roll off, Spectral Flux, Spectral Skewness and Spectral Kurtosis, combined with Fractional MFCC features, outperforms all other feature combinations, to yield a higher classification accuracy of 96.05%, as compared to the accuracy of 84.21% with conventional MFCC. It has also been observed that the FrFT based MFCC effieciently classifies the two western genres of Rock and Classical music from the GTZAN dataset with a higher classification accuracy of 96.25% as compared to the classification accuracy of 80% with MFCC.
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography