Academic literature on the topic 'Mel Frequency Cepstral Coefficients (MFCC)'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the lists of relevant articles, books, theses, conference reports, and other scholarly sources on the topic 'Mel Frequency Cepstral Coefficients (MFCC).'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Journal articles on the topic "Mel Frequency Cepstral Coefficients (MFCC)"

1

Eskidere, Ömer, and Ahmet Gürhanlı. "Voice Disorder Classification Based on Multitaper Mel Frequency Cepstral Coefficients Features." Computational and Mathematical Methods in Medicine 2015 (2015): 1–12. http://dx.doi.org/10.1155/2015/956249.

Full text
Abstract:
The Mel Frequency Cepstral Coefficients (MFCCs) are widely used in order to extract essential information from a voice signal and became a popular feature extractor used in audio processing. However, MFCC features are usually calculated from a single window (taper) characterized by large variance. This study shows investigations on reducing variance for the classification of two different voice qualities (normal voice and disordered voice) using multitaper MFCC features. We also compare their performance by newly proposed windowing techniques and conventional single-taper technique. The results demonstrate that adapted weighted Thomson multitaper method could distinguish between normal voice and disordered voice better than the results done by the conventional single-taper (Hamming window) technique and two newly proposed windowing methods. The multitaper MFCC features may be helpful in identifying voices at risk for a real pathology that has to be proven later.
APA, Harvard, Vancouver, ISO, and other styles
2

Varma, V. Sai Nitin, and Abdul Majeed K.K. "Advancements in Speaker Recognition: Exploring Mel Frequency Cepstral Coefficients (MFCC) for Enhanced Performance in Speaker Recognition." International Journal for Research in Applied Science and Engineering Technology 11, no. 8 (August 31, 2023): 88–98. http://dx.doi.org/10.22214/ijraset.2023.55124.

Full text
Abstract:
Abstract: Speaker recognition, a fundamental capability of software or hardware systems, involves receiving speech signals, identifying the speaker present in the speech signal, and subsequently recognizing the speaker for future interactions. This process emulates the cognitive task performed by the human brain. At its core, speaker recognition begins with speech as the input to the system. Various techniques have been developed for speech recognition, including Mel frequency cepstral coefficients (MFCC), Linear Prediction Coefficients (LPC), Linear Prediction Cepstral coefficients (LPCC), Line Spectral Frequencies (LSF), Discrete Wavelet Transform (DWT), and Perceptual Linear Prediction (PLP). Although LPC and several other techniques have been explored, they are often deemed impractical for real-time applications. In contrast, MFCC stands out as one of the most prominent and widely used techniques for speaker recognition. The utilization of cepstrum allows for the computation of resemblance between two cepstral feature vectors, making it an effective tool in this domain. In comparison to LPC-derived cepstrum features, the use of MFCC features has demonstrated superior performance in metrics such as False Acceptance Rate (FAR) and False Rejection Rate (FRR) for speaker recognition systems. MFCCs leverage the human ear's critical bandwidth fluctuations with respect to frequency. To capture phonetically important characteristics of speech signals, filters are linearly separated at low frequencies and logarithmically separated at high frequencies. This design choice is central to the effectiveness of the MFCC technique. The primary objective of the proposed work is to devise efficient techniques that extract pertinent information related to the speaker, thereby enhancing the overall performance of the speaker recognition system. By optimizing feature extraction methods, this research aims to contribute to the advancement of speaker recognition technology.
APA, Harvard, Vancouver, ISO, and other styles
3

Kasim, Anita Ahmad, Muhammad Bakri, Irwan Mahmudi, Rahmawati Rahmawati, and Zulnabil Zulnabil. "Artificial Intelligent for Human Emotion Detection with the Mel-Frequency Cepstral Coefficient (MFCC)." JUITA : Jurnal Informatika 11, no. 1 (May 6, 2023): 47. http://dx.doi.org/10.30595/juita.v11i1.15435.

Full text
Abstract:
Emotions are an important aspect of human communication. Expression of human emotions can be identified through sound. The development of voice detection or speech recognition is a technology that has developed rapidly to help improve human-machine interaction. This study aims to classify emotions through the detection of human voices. One of the most frequently used methods for sound detection is the Mel-Frequency Cepstrum Coefficient (MFCC) where sound waves are converted into several types of representation. Mel-frequency cepstral coefficients (MFCCs) are the coefficients that collectively represent the short-term power spectrum of a sound, based on a linear cosine transform of a log power spectrum on a nonlinear mel scale of frequency. The primary data used in this research is the data recorded by the author. The secondary data used is data from the "Berlin Database of Emotional Speech" in the amount of 500 voice recording data. The use of MFCC can extract implied information from the human voice, especially to recognize the feelings experienced by humans when pronouncing the sound. In this study, the highest accuracy was obtained when training with epochs of 10000 times, which was 85% accuracy.
APA, Harvard, Vancouver, ISO, and other styles
4

Chen, Young-Long, Neng-Chung Wang, Jing-Fong Ciou, and Rui-Qi Lin. "Combined Bidirectional Long Short-Term Memory with Mel-Frequency Cepstral Coefficients Using Autoencoder for Speaker Recognition." Applied Sciences 13, no. 12 (June 10, 2023): 7008. http://dx.doi.org/10.3390/app13127008.

Full text
Abstract:
Recently, neural network technology has shown remarkable progress in speech recognition, including word classification, emotion recognition, and identity recognition. This paper introduces three novel speaker recognition methods to improve accuracy. The first method, called long short-term memory with mel-frequency cepstral coefficients for triplet loss (LSTM-MFCC-TL), utilizes MFCC as input features for the LSTM model and incorporates triplet loss and cluster training for effective training. The second method, bidirectional long short-term memory with mel-frequency cepstral coefficients for triplet loss (BLSTM-MFCC-TL), enhances speaker recognition accuracy by employing a bidirectional LSTM model. The third method, bidirectional long short-term memory with mel-frequency cepstral coefficients and autoencoder features for triplet loss (BLSTM-MFCCAE-TL), utilizes an autoencoder to extract additional AE features, which are then concatenated with MFCC and fed into the BLSTM model. The results showed that the performance of the BLSTM model was superior to the LSTM model, and the method of adding AE features achieved the best learning effect. Moreover, the proposed methods exhibit faster computation times compared to the reference GMM-HMM model. Therefore, utilizing pre-trained autoencoders for speaker encoding and obtaining AE features can significantly enhance the learning performance of speaker recognition. Additionally, it also offers faster computation time compared to traditional methods.
APA, Harvard, Vancouver, ISO, and other styles
5

Koolagudi, Shashidhar G., Deepika Rastogi, and K. Sreenivasa Rao. "Identification of Language using Mel-Frequency Cepstral Coefficients (MFCC)." Procedia Engineering 38 (2012): 3391–98. http://dx.doi.org/10.1016/j.proeng.2012.06.392.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

H. Mohd Johari, N., Noreha Abdul Malik, and K. A. Sidek. "Distinctive features for normal and crackles respiratory sounds using cepstral coefficients." Bulletin of Electrical Engineering and Informatics 8, no. 3 (September 1, 2019): 875–81. http://dx.doi.org/10.11591/eei.v8i3.1517.

Full text
Abstract:
Classification of respiratory sounds between normal and abnormal is very crucial for screening and diagnosis purposes. Lung associated diseases can be detected through this technique. With the advancement of computerized auscultation technology, the adventitious sounds such as crackles can be detected and therefore diagnostic test can be performed earlier. In this paper, Linear Predictive Cepstral Coefficient (LPCC) and Mel-frequency Cepstral Coefficient (MFCC) are used to extract features from normal and crackles respiratory sounds. By using statistical computation such as mean and standard deviation (SD) of cepstral based coefficients it can differentiate between crackles and normal sounds. The statistical computations of the cepstral coefficient of LPCC and MFCC show that the mean LPCC except for the third coefficient and first three statistical coefficient values of MFCC’s SD provide distinctive feature between normal and crackles respiratory sounds. Hence, LPCCs and MFCCs can be used as feature extraction method of respiratory sounds to classify between normal and crackles as screening and diagnostic tool.
APA, Harvard, Vancouver, ISO, and other styles
7

INDRAWATY, YOULLIA, IRMA AMELIA DEWI, and RIZKI LUKMAN. "Ekstraksi Ciri Pelafalan Huruf Hijaiyyah Dengan Metode Mel-Frequency Cepstral Coefficients." MIND Journal 4, no. 1 (June 1, 2019): 49–64. http://dx.doi.org/10.26760/mindjournal.v4i1.49-64.

Full text
Abstract:
Huruf hijaiyyah merupakan huruf penyusun ayat dalam Al Qur’an. Setiap hurufhijaiyyah memiliki karakteristik pelafalan yang berbeda. Tetapi dalam praktiknya,ketika membaca huruf hijaiyyah terkadang tidak memperhatikan kaidah bacaanmakhorijul huruf. Makhrorijul huruf adalah cara melafalkan atau tempatkeluarnya huruf hijaiyyah. Dengan adanya teknologi pengenalan suara, dalammelafalkan huruf hijaiyyah dapat dilihat perbedaannya secara kuantitatif melaluisistem. Terdapat dua tahapan agar suara dapat dikenali, dengan terlebih dahulumelakukan ekstraksi sinyal suara selanjutnya melakukan identifikasi suara ataubacaan. MFCC (Mel Frequency Cepstral Coefficients) merupakan sebuah metodeuntuk melakukan ektraksi ciri yang menghasilkan nilai cepstral dari sinyal suara.Penelitian ini bertujuan untuk mengetahui nilai cepstral pada setiap hurufhijaiyyah. Hasil pengujian yang telah dilakukan, setiap huruf hijaiyyah memilikinilai cepstral yang berbeda.
APA, Harvard, Vancouver, ISO, and other styles
8

Mahalakshmi, P. "A REVIEW ON VOICE ACTIVITY DETECTION AND MEL-FREQUENCY CEPSTRAL COEFFICIENTS FOR SPEAKER RECOGNITION (TREND ANALYSIS)." Asian Journal of Pharmaceutical and Clinical Research 9, no. 9 (December 1, 2016): 360. http://dx.doi.org/10.22159/ajpcr.2016.v9s3.14352.

Full text
Abstract:
ABSTRACTObjective: The objective of this review article is to give a complete review of various techniques that are used for speech recognition purposes overtwo decades.Methods: VAD-Voice Activity Detection, SAD-Speech Activity Detection techniques are discussed that are used to distinguish voiced from unvoicedsignals and MFCC- Mel Frequency Cepstral Coefficient technique is discussed which detects specific features.Results: The review results show that research in MFCC has been dominant in signal processing in comparison to VAD and other existing techniques.Conclusion: A comparison of different speaker recognition techniques that were used previously were discussed and those in current research werealso discussed and a clear idea of the better technique was identified through the review of multiple literature for over two decades.Keywords: Cepstral analysis, Mel-frequency cepstral coefficients, signal processing, speaker recognition, voice activity detection.
APA, Harvard, Vancouver, ISO, and other styles
9

Dadula, Cristina P., and Elmer P. Dadios. "Fuzzy Logic System for Abnormal Audio Event Detection Using Mel Frequency Cepstral Coefficients." Journal of Advanced Computational Intelligence and Intelligent Informatics 21, no. 2 (March 15, 2017): 205–10. http://dx.doi.org/10.20965/jaciii.2017.p0205.

Full text
Abstract:
This paper presents a fuzzy logic system for audio event detection using mel frequency cepstral coefficients (MFCC). Twelve MFCC of audio samples were analyzed. The range of values of MFCC were obtained including its histogram. These values were normalized so that its minimum and maximum values lie between 0 and 1. Rules were formulated based on the histogram to classify audio samples as normal, gunshot, or crowd panic. Five MFCC were chosen as input to the fuzzy logic system. The membership functions and rules of the fuzzy logic system are defined based on the normalized histograms of MFCC. The system was tested with a total of 150 minutes of normal sounds from different buses and 72 seconds audio clips abnormal sounds. The designed fuzzy logic system was able to classify audio events with an average accuracy of 99.4%.
APA, Harvard, Vancouver, ISO, and other styles
10

Ramashini, Murugaiya, P. Emeroylariffion Abas, Kusuma Mohanchandra, and Liyanage C. De Silva. "Robust cepstral feature for bird sound classification." International Journal of Electrical and Computer Engineering (IJECE) 12, no. 2 (April 1, 2022): 1477. http://dx.doi.org/10.11591/ijece.v12i2.pp1477-1487.

Full text
Abstract:
Birds are excellent environmental indicators and may indicate sustainability of the ecosystem; birds may be used to provide provisioning, regulating, and supporting services. Therefore, birdlife conservation-related researches always receive centre stage. Due to the airborne nature of birds and the dense nature of the tropical forest, bird identifications through audio may be a better solution than visual identification. The goal of this study is to find the most appropriate cepstral features that can be used to classify bird sounds more accurately. Fifteen (15) endemic Bornean bird sounds have been selected and segmented using an automated energy-based algorithm. Three (3) types of cepstral features are extracted; linear prediction cepstrum coefficients (LPCC), mel frequency cepstral coefficients (MFCC), gammatone frequency cepstral coefficients (GTCC), and used separately for classification purposes using support vector machine (SVM). Through comparison between their prediction results, it has been demonstrated that model utilising GTCC features, with 93.3% accuracy, outperforms models utilising MFCC and LPCC features. This demonstrates the robustness of GTCC for bird sounds classification. The result is significant for the advancement of bird sound classification research, which has been shown to have many applications such as in eco-tourism and wildlife management.
APA, Harvard, Vancouver, ISO, and other styles
More sources

Dissertations / Theses on the topic "Mel Frequency Cepstral Coefficients (MFCC)"

1

Alvarenga, Rodrigo Jorge. "Reconhecimento de comandos de voz por redes neurais." Universidade de Taubaté, 2012. http://www.bdtd.unitau.br/tedesimplificado/tde_busca/arquivo.php?codArquivo=587.

Full text
Abstract:
Sistema de reconhecimento de fala tem amplo emprego no universo industrial, no aperfeiçoamento de operações e procedimentos humanos e no setor do entretenimento e recreação. O objetivo específico do trabalho foi conceber e desenvolver um sistema de reconhecimento de voz, capaz de identificar comandos de voz, independentemente do locutor. A finalidade precípua do sistema é controlar movimentos de robôs, com aplicações na indústria e no auxílio de deficientes físicos. Utilizou-se a abordagem da tomada de decisão por meio de uma rede neural treinada com as características distintivas do sinal de fala de 16 locutores. As amostras dos comandos foram coletadas segundo o critério de conveniência (em idade e sexo), a fim de garantir uma maior discriminação entre as características de voz, e assim alcançar a generalização da rede neural utilizada. O préprocessamento consistiu na determinação dos pontos extremos da locução do comando e na filtragem adaptativa de Wiener. Cada comando de fala foi segmentado em 200 janelas, com superposição de 25% . As features utilizadas foram a taxa de cruzamento de zeros, a energia de curto prazo e os coeficientes ceptrais na escala de frequência mel. Os dois primeiros coeficientes da codificação linear preditiva e o seu erro também foram testados. A rede neural empregada como classificador foi um perceptron multicamadas, treinado pelo algoritmo backpropagation. Várias experimentações foram realizadas para a escolha de limiares, valores práticos, features e configurações da rede neural. Os resultados foram considerados muito bons, alcançando uma taxa de acertos de 89,16%, sob as condições de pior caso da amostragem dos comandos.
Systems for speech recognition have widespread use in the industrial universe, in the improvement of human operations and procedures and in the area of entertainment and recreation. The specific objective of this study was to design and develop a voice recognition system, capable of identifying voice commands, regardless of the speaker. The main purpose of the system is to control movement of robots, with applications in industry and in aid of disabled people. We used the approach of decision making, by means of a neural network trained with the distinctive features of the speech of 16 speakers. The samples of the voice commands were collected under the criterion of convenience (age and sex), to ensure a greater discrimination between the voice characteristics and to reach the generalization of the neural network. Preprocessing consisted in the determination of the endpoints of each command signal and in the adaptive Wiener filtering. Each speech command was segmented into 200 windows with overlapping of 25%. The features used were the zero crossing rate, the short-term energy and the mel-frequency ceptral coefficients. The first two coefficients of the linear predictive coding and its error were also tested. The neural network classifier was a multilayer perceptron, trained by the backpropagation algorithm. Several experiments were performed for the choice of thresholds, practical values, features and neural network configurations. Results were considered very good, reaching an acceptance rate of 89,16%, under the `worst case conditions for the sampling of the commands.
APA, Harvard, Vancouver, ISO, and other styles
2

Larsson, Alm Kevin. "Automatic Speech Quality Assessment in Unified Communication : A Case Study." Thesis, Linköpings universitet, Programvara och system, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-159794.

Full text
Abstract:
Speech as a medium for communication has always been important in its ability to convey our ideas, personality and emotions. It is therefore not strange that Quality of Experience (QoE) becomes central to any business relying on voice communication. Using Unified Communication (UC) systems, users can communicate with each other in several ways using many different devices, making QoE an important aspect for such systems. For this thesis, automatic methods for assessing speech quality of the voice calls in Briteback’s UC application is studied, including a comparison of the researched methods. Three methods all using a Gaussian Mixture Model (GMM) as a regressor, paired with extraction of Human Factor Cepstral Coefficients (HFCC), Gammatone Frequency Cepstral Coefficients (GFCC) and Modified Mel Frequency Cepstrum Coefficients (MMFCC) features respectively is studied. The method based on HFCC feature extraction shows better performance in general compared to the two other methods, but all methods show comparatively low performance compared to literature. This most likely stems from implementation errors, showing the difference between theory and practice in the literature, together with the lack of reference implementations. Further work with practical aspects in mind, such as reference implementations or verification tools can make the field more popular and increase its use in the real world.
APA, Harvard, Vancouver, ISO, and other styles
3

Larsson, Joel. "Optimizing text-independent speaker recognition using an LSTM neural network." Thesis, Mälardalens högskola, Akademin för innovation, design och teknik, 2014. http://urn.kb.se/resolve?urn=urn:nbn:se:mdh:diva-26312.

Full text
Abstract:
In this paper a novel speaker recognition system is introduced. Automated speaker recognition has become increasingly popular to aid in crime investigations and authorization processes with the advances in computer science. Here, a recurrent neural network approach is used to learn to identify ten speakers within a set of 21 audio books. Audio signals are processed via spectral analysis into Mel Frequency Cepstral Coefficients that serve as speaker specific features, which are input to the neural network. The Long Short-Term Memory algorithm is examined for the first time within this area, with interesting results. Experiments are made as to find the optimum network model for the problem. These show that the network learns to identify the speakers well, text-independently, when the recording situation is the same. However the system has problems to recognize speakers from different recordings, which is probably due to noise sensitivity of the speech processing algorithm in use.
APA, Harvard, Vancouver, ISO, and other styles
4

Ulrich, Natalja. "Linguistic and speaker variation in Russian fricatives." Electronic Thesis or Diss., Lyon 2, 2022. http://www.theses.fr/2022LYO20031.

Full text
Abstract:
Cette thèse présente une investigation acoustico-phonétique des détails phonétiques des fricatives russes.L'objectif principal était de détecter des corrélats acoustiques porteurs d'infor- mations linguistiques et idiosyncrasiques. Les questions abordées étaient de savoir si le lieu d'articulation, le sexe du locuteur ou son identité peuvent être prédits par des indices acoustiques et quelles mesures acoustiques représentent les indicateurs les plus fiables. En outre, la distribution des caractéristiques spécifiques au locuteur et à la variation inter et intra locuteur à travers les indices acoustiques a été étudiée plus en détail. Le projet a commencé par la création d'une grande base de données audio des fricatives russes. Des enregistrements acoustiques ont été obtenus auprès de 59 locuteurs russes natifs. Le jeu de données résultant est composé de 22 561 occurrences comprenant les fricatives [f], [s], [ʃ], [x], [v], [z], [ʒ], [sj], [ɕ], [vʲ], [zʲ]. Deux analyses ont été menées à partir de cette base de données. Dans la première étude, un échantillon de données de 6320 occurrences (40 locuteurs) a été utilisé. Trois techniques d'extraction acoustisque (à partir du son complet, de la durée du bruit et des fenêtres centrales de 30 ms) ont été sollicitées pour extraire des mesures temporelles et spectrales. En outre, 13 coefficients cepstraux (Mel-Frequency Cepstral Coefficients, MFCC) ont été calculés à partir de la fenêtre centrale de 30 ms. Des classificateurs fondés sur des arbres de décision simples, des forêts aléatoires, des machines à vecteurs de support (Support-vector machine, SVM) et des réseaux neuronaux ont été entraînés et testés pour distinguer trois fricatives non palatalisées [f], [s] et
This thesis represents an acoustic-phonetic investigation of phonetic details in Russian fricatives. The main aim was to detect acoustic correlates that carry linguistic and idiosyncratic information. The questions addressed were whether the place of articulation, speakers' gender and ID can be predicted by a set of acoustic cues and which acoustic measures represent the most reliable indicators. Furthermore, the distribution of speaker-specific characteristics and inter- and intra-speaker variation across acoustic cues were studied in more detail.The project started with the generation of a large audio database of Russian fricatives. Then, two follow-up analyses were conducted. Acoustic recordings were collected from 59 native Russian speakers. The resulting dataset consists of 22,561 tokens including the fricatives [f], [s], [ʃ], [x], [v], [z], [ʒ], [sj], [ɕ], [vʲ], [zʲ].The first study employed a data sample of 6320 tokens (from 40 speakers). Temporal and spectral measurements were extracted using three acoustic cue extraction techniques (full sound, the noise part, and the middle 30ms windows). Furthermore, 13 Mel Frequency Cepstral Coefficients were computed from the middle 30ms window.Classifiers based on single decision trees, random forests, support vector machines, and neural networks were trained and tested to distinguish between the three non-palatalized fricatives [f], [s] and [ʃ].The results demonstrate that machine learning techniques are very successful at classifying the Russian voiceless non-palatalized fricatives [f], [s] and [ʃ] by using the centre of gravity and the spectral spread irrespective of contextual and speaker variation. The three acoustic cue extraction techniques performed similarly in terms of classification accuracy (93% and 99%), but the spectral measurements extracted from the noise parts resulted in slightly better accuracy. Furthermore, Mel Frequency Cepstral Coefficients show marginally higher predictive power over spectral cues (< 2%).This suggests that both spectral measures and Mel Frequency Cepstral provide sufficient information for the classification of these fricatives and their choice depends on the particular research question or application. The second study's dataset consists of 15812 tokens (59 speakers) that contain [f], [s], [ʃ], [x], [v], [z], [ʒ], [sj], [ɕ]. As in the first study, two types of acoustic cues were extracted including 11 acoustic speech features (spectral cues, duration and HNR measures) and 13 Mel Frequency Cepstral Coefficients. Classifiers based on single decision trees and random forests were trained and tested to predict speakers' gender and ID
APA, Harvard, Vancouver, ISO, and other styles
5

Darch, Jonathan J. A. "Robust acoustic speech feature prediction from Mel frequency cepstral coefficients." Thesis, University of East Anglia, 2008. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.445206.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Okuyucu, Cigdem. "Semantic Classification And Retrieval System For Environmental Sounds." Master's thesis, METU, 2012. http://etd.lib.metu.edu.tr/upload/12615114/index.pdf.

Full text
Abstract:
The growth of multimedia content in recent years motivated the research on audio classification and content retrieval area. In this thesis, a general environmental audio classification and retrieval approach is proposed in which higher level semantic classes (outdoor, nature, meeting and violence) are obtained from lower level acoustic classes (emergency alarm, car horn, gun-shot, explosion, automobile, motorcycle, helicopter, wind, water, rain, applause, crowd and laughter). In order to classify an audio sample into acoustic classes, MPEG-7 audio features, Mel Frequency Cepstral Coefficients (MFCC) feature and Zero Crossing Rate (ZCR) feature are used with Hidden Markov Model (HMM) and Support Vector Machine (SVM) classifiers. Additionally, a new classification method is proposed using Genetic Algorithm (GA) for classification of semantic classes. Query by Example (QBE) and keyword-based query capabilities are implemented for content retrieval.
APA, Harvard, Vancouver, ISO, and other styles
7

Assaad, Firas Souhail. "Biometric Multi-modal User Authentication System based on Ensemble Classifier." University of Toledo / OhioLINK, 2014. http://rave.ohiolink.edu/etdc/view?acc_num=toledo1418074931.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Edman, Sebastian. "Radar target classification using Support Vector Machines and Mel Frequency Cepstral Coefficients." Thesis, KTH, Optimeringslära och systemteori, 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-214794.

Full text
Abstract:
In radar applications, there are often times when one does not only want to know that there is a target that reflecting the out sent signals but also what kind of target that reflecting these signals. This project investigates the possibilities to from raw radar data transform reflected signals and take use of human perception, in particular our hearing, and by a machine learning approach where patterns and characteristics in data are used to answer the earlier mentioned question. More specific the investigation treats two kinds of targets that are fairly comparable namely smaller Unmanned Aerial Vehicles (UAV) and Birds. By extracting complex valued radar video so called I/Q data generated by these targets using signal processing techniques and transform this data to a real signals and after this transform the signals to audible signals. A feature set commonly used in speech recognition namely Mel Frequency Cepstral Coefficients are used two describe these signals together with two Support Vector Machine classification models. The two models where tested with an independent test set and the linear model achieved a overall prediction accuracy 93.33 %. Individually the prediction resulted in 93.33 % correct classification on the UAV and 93.33 % on the birds. Secondly a radial basis model with a overall prediction accuracy of 98.33 % where achieved. Individually the prediction resulted in 100% correct classification on the UAV and 96.76 % on the birds. The project is partly done in collaboration with J. Clemedson [2] where the focus is, as mentioned earlier, to transform the signals to audible signals.
I radar applikationer räcker det ibland inte med att veta att systemet observerat ett mål när en reflekted signal dekekteras, det är ofta också utav stort intresse att veta vilket typ av föremål som signalen reflekterades mot. Detta projekt undersöker möjligheterna att utifrån rå radardata transformera de reflekterade signalerna och använda sina mänskliga sinnen, mer specifikt våran hörsel, för att skilja på olika mål och också genom en maskininlärnings approach där med hjälp av mönster och karaktärsdrag för dessa signaler används för att besvara frågeställningen. Mer ingående avgränsas denna undersökning till två typer av mål, mindre obemannade flygande farkoster (UAV) och fåglar. Genom att extrahera komplexvärd radar video även känt som I/Q data från tidigare nämnda typer av mål via signalbehandlingsmetoder transformera denna data till reella signaler, därefter transformeras dessa signaler till hörbara signaler. För att klassificera dessa typer av signaler används typiska särdrag som också används inom taligenkänning, nämligen, Mel Frequency Cepstral Coefficients tillsammans med två modeller av en Support Vector Machine klassificerings metod. Med den linjära modellen uppnåddes en prediktions noggrannhet på 93.33%. Individuellt var noggrannheten 93.33 % korrekt klassificering utav UAV:n och 93.33 % på fåglar. Med radial bas modellen uppnåddes en prediktions noggrannhet på 98.33%. Individuellt var noggrannheten 100 % korrekt klassificering utav UAV:n och 96.76% på fåglar. Projektet är delvis utfört med J. Clemedson [2] vars fokus är att, som tidigare nämnt, transformera dessa signaler till hörbara signaler.
APA, Harvard, Vancouver, ISO, and other styles
9

Yang, Chenguang. "Security in Voice Authentication." Digital WPI, 2014. https://digitalcommons.wpi.edu/etd-dissertations/79.

Full text
Abstract:
We evaluate the security of human voice password databases from an information theoretical point of view. More specifically, we provide a theoretical estimation on the amount of entropy in human voice when processed using the conventional GMM-UBM technologies and the MFCCs as the acoustic features. The theoretical estimation gives rise to a methodology for analyzing the security level in a corpus of human voice. That is, given a database containing speech signals, we provide a method for estimating the relative entropy (Kullback-Leibler divergence) of the database thereby establishing the security level of the speaker verification system. To demonstrate this, we analyze the YOHO database, a corpus of voice samples collected from 138 speakers and show that the amount of entropy extracted is less than 14-bits. We also present a practical attack that succeeds in impersonating the voice of any speaker within the corpus with a 98% success probability with as little as 9 trials. The attack will still succeed with a rate of 62.50% if 4 attempts are permitted. Further, based on the same attack rationale, we mount an attack on the ALIZE speaker verification system. We show through experimentation that the attacker can impersonate any user in the database of 69 people with about 25% success rate with only 5 trials. The success rate can achieve more than 50% by increasing the allowed authentication attempts to 20. Finally, when the practical attack is cast in terms of an entropy metric, we find that the theoretical entropy estimate almost perfectly predicts the success rate of the practical attack, giving further credence to the theoretical model and the associated entropy estimation technique.
APA, Harvard, Vancouver, ISO, and other styles
10

Pešek, Milan. "Detekce logopedických vad v řeči." Master's thesis, Vysoké učení technické v Brně. Fakulta elektrotechniky a komunikačních technologií, 2009. http://www.nusl.cz/ntk/nusl-218106.

Full text
Abstract:
The thesis deals with a design and an implementation of software for a detection of logopaedia defects of speech. Due to the need of early logopaedia defects detecting, this software is aimed at a child’s age speaker. The introductory part describes the theory of speech realization, simulation of speech realization for numerical processing, phonetics, logopaedia and basic logopaedia defects of speech. There are also described used methods for feature extraction, for segmentation of words to speech sounds and for features classification into either correct or incorrect pronunciation class. In the next part of the thesis there are results of testing of selected methods presented. For logopaedia speech defects recognition algorithms are used in order to extract the features MFCC and PLP. The segmentation of words to speech sounds is performed on the base of Differential Function method. The extracted features of a sound are classified into either a correct or an incorrect pronunciation class with one of tested methods of pattern recognition. To classify the features, the k-NN, SVN, ANN, and GMM methods are tested.
APA, Harvard, Vancouver, ISO, and other styles
More sources

Book chapters on the topic "Mel Frequency Cepstral Coefficients (MFCC)"

1

Suman, Preetam, Subhdeep Karan, Vrijendra Singh, and R. Maringanti. "Algorithm for Gunshot Detection Using Mel-Frequency Cepstrum Coefficients (MFCC)." In Lecture Notes in Electrical Engineering, 155–66. New Delhi: Springer India, 2014. http://dx.doi.org/10.1007/978-81-322-1823-4_15.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Sulistijono, Indra Adji, Renita Chulafa Urrosyda, and Zaqiatud Darojah. "Mel-Frequency Cepstral Coefficient (MFCC) for Music Feature Extraction for the Dancing Robot Movement Decision." In Intelligent Robotics and Applications, 283–94. Cham: Springer International Publishing, 2016. http://dx.doi.org/10.1007/978-3-319-43518-3_28.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Sueur, Jérôme. "Mel-Frequency Cepstral and Linear Predictive Coefficients." In Sound Analysis and Synthesis with R, 381–98. Cham: Springer International Publishing, 2018. http://dx.doi.org/10.1007/978-3-319-77647-7_12.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Karahoda, Bertan, Krenare Pireva, and Ali Shariq Imran. "Mel Frequency Cepstral Coefficients Based Similar Albanian Phonemes Recognition." In Human Interface and the Management of Information: Information, Design and Interaction, 491–500. Cham: Springer International Publishing, 2016. http://dx.doi.org/10.1007/978-3-319-40349-6_47.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Srivastava, Sumit, Mahesh Chandra, and G. Sahoo. "Phase Based Mel Frequency Cepstral Coefficients for Speaker Identification." In Advances in Intelligent Systems and Computing, 309–16. New Delhi: Springer India, 2016. http://dx.doi.org/10.1007/978-81-322-2757-1_31.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Mashika, Mpho, and Dustin van der Haar. "Mel Frequency Cepstral Coefficients and Support Vector Machines for Cough Detection." In Digital Human Modeling and Applications in Health, Safety, Ergonomics and Risk Management, 250–59. Cham: Springer Nature Switzerland, 2023. http://dx.doi.org/10.1007/978-3-031-35748-0_18.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Palo, Hemanta Kumar, Mahesh Chandra, and Mihir Narayan Mohanty. "Recognition of Human Speech Emotion Using Variants of Mel-Frequency Cepstral Coefficients." In Advances in Systems, Control and Automation, 491–98. Singapore: Springer Singapore, 2017. http://dx.doi.org/10.1007/978-981-10-4762-6_47.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Ezeiza, Aitzol, Karmele López de Ipiña, Carmen Hernández, and Nora Barroso. "Combining Mel Frequency Cepstral Coefficients and Fractal Dimensions for Automatic Speech Recognition." In Advances in Nonlinear Speech Processing, 183–89. Berlin, Heidelberg: Springer Berlin Heidelberg, 2011. http://dx.doi.org/10.1007/978-3-642-25020-0_24.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Traboulsi, Ahmad, and Michel Barbeau. "Identification of Drone Payload Using Mel-Frequency Cepstral Coefficients and LSTM Neural Networks." In Proceedings of the Future Technologies Conference (FTC) 2020, Volume 1, 402–12. Cham: Springer International Publishing, 2020. http://dx.doi.org/10.1007/978-3-030-63128-4_30.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

Benkedjouh, Tarak, Taha Chettibi, Yassine Saadouni, and Mohamed Afroun. "Gearbox Fault Diagnosis Based on Mel-Frequency Cepstral Coefficients and Support Vector Machine." In Computational Intelligence and Its Applications, 220–31. Cham: Springer International Publishing, 2018. http://dx.doi.org/10.1007/978-3-319-89743-1_20.

Full text
APA, Harvard, Vancouver, ISO, and other styles

Conference papers on the topic "Mel Frequency Cepstral Coefficients (MFCC)"

1

Ramirez, Angel David Pedroza, Jose Ismael de la Rosa Vargas, Rogelio Rosas Valdez, and Aldonso Becerra. "A comparative between Mel Frequency Cepstral Coefficients (MFCC) and Inverse Mel Frequency Cepstral Coefficients (IMFCC) features for an Automatic Bird Species Recognition System." In 2018 IEEE Latin American Conference on Computational Intelligence (LA-CCI). IEEE, 2018. http://dx.doi.org/10.1109/la-cci.2018.8625230.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Martinez, Jorge, Hector Perez, Enrique Escamilla, and Masahisa Mabo Suzuki. "Speaker recognition using Mel frequency Cepstral Coefficients (MFCC) and Vector quantization (VQ) techniques." In 2012 22nd International Conference on Electrical Communications and Computers (CONIELECOMP). IEEE, 2012. http://dx.doi.org/10.1109/conielecomp.2012.6189918.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Nurahmad, Chairunissa Atimas, and Mirna Adriani. "Identifying traditional music instruments on polyphonic Indonesian folksong using mel-frequency cepstral coefficients (MFCC)." In the 10th International Conference. New York, New York, USA: ACM Press, 2012. http://dx.doi.org/10.1145/2428955.2428967.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Chauhan, Paresh M., and Nikita P. Desai. "Mel Frequency Cepstral Coefficients (MFCC) based speaker identification in noisy environment using wiener filter." In 2014 International Conference on Green Computing Communication and Electrical Engineering (ICGCCEE). IEEE, 2014. http://dx.doi.org/10.1109/icgccee.2014.6921394.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Rahmandani, Muhammad, Hanung Adi Nugroho, and Noor Akhmad Setiawan. "Cardiac Sound Classification Using Mel-Frequency Cepstral Coefficients (MFCC) and Artificial Neural Network (ANN)." In 2018 3rd International Conference on Information Technology, Information System and Electrical Engineering (ICITISEE). IEEE, 2018. http://dx.doi.org/10.1109/icitisee.2018.8721007.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

A. Lopes Neto, Guilherme, Rui Bertho Jr, and Hermes M. G. Castelo Branco. "Localização de Faltas em Redes VSC-HVDC por RNA e Coeficientes de Frequência Mel Cepstrais." In Congresso Brasileiro de Automática - 2020. sbabra, 2020. http://dx.doi.org/10.48011/asba.v2i1.1174.

Full text
Abstract:
Com o crescimento dos sistemas de transmissão de corrente contínua, é necessário propor e analisar métodos eficientes para localização de faltas. Esta pesquisa tem como objetivo obter e avaliar o desempenho de um algoritmo de localização de faltas em uma linha de transmissão de 200 km. O sistema VSC-HVDC de dois terminais foi modelado no Simulink e os dados foram pré-processados com os Coeficientes de Frequência Mel Cepstrais, do inglês Mel-Frequency Cepstral Coefficients (MFCC). Em seguida, uma Rede Neural Artificial (RNA) foi usada para estimar a localização da falta. Vários cenários foram simulados, variando a resistência e a localização da falta ao longo da linha. O baixo erro verificado na estimativa da distância demonstra a alta confiabilidade do método na localização da falta.
APA, Harvard, Vancouver, ISO, and other styles
7

Yuan, Jianjian, Hua Shao, and Hongcheng Huang. "Recognition Types of Cracked Material under Uniaxial Tension Based on Improved Mel Frequency Cepstral Coefficients (MFCC)." In 2022 IEEE 5th International Conference on Electronics and Communication Engineering (ICECE). IEEE, 2022. http://dx.doi.org/10.1109/icece56287.2022.10048667.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Muheidat, Fadi, W. Harry Tyrer, and Mihail Popescu. "Walk Identification using a smart carpet and Mel-Frequency Cepstral Coefficient (MFCC) features." In 2018 40th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC). IEEE, 2018. http://dx.doi.org/10.1109/embc.2018.8513340.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Mawadda Warohma, Ayu, Puspa Kurniasari, Suci Dwijayanti, Irmawan, and Bhakti Yudho Suprapto. "Identification of Regional Dialects Using Mel Frequency Cepstral Coefficients (MFCCs) and Neural Network." In 2018 International Seminar on Application for Technology of Information and Communication (iSemantic). IEEE, 2018. http://dx.doi.org/10.1109/isemantic.2018.8549731.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

Surovi, Nowrin Akter, Audelia G. Dharmawan, and Gim Song Soh. "A Study on the Acoustic Signal Based Frameworks for the Real-Time Identification of Geometrically Defective Wire Arc Bead." In ASME 2021 International Design Engineering Technical Conferences and Computers and Information in Engineering Conference. American Society of Mechanical Engineers, 2021. http://dx.doi.org/10.1115/detc2021-69573.

Full text
Abstract:
Abstract In Wire Arc Additive Manufacturing (WAAM), weld beads are deposited bead-by-bead and layer-by-layer, leading to the final part. Thus, the lack of uniformity or geometrically defective bead will subsequently lead to voids in the printed part, which will have a great impact on the overall part quality and mechanical strength. To resolve this, several techniques have been proposed to identity such defects using vision or thermal-based sensing, so as to aid in the implementation of in-situ corrective measures to save time and cost. However, due to the environment that they are operating in, these sensors are not an effective way of picking up irregularities as compared to acoustic sensing. Therefore, in this paper, we seek to study into three acoustic feature-based machine learning frameworks — Principal Component Analysis (PCA) + K-Nearest Neighbors (KNN), Mel Frequency Cepstral Coefficients (MFCC) + Neural Network (NN) and Mel Frequency Cepstral Coefficients (MFCC) + Convolutional Neural Network (CNN) and evaluate their performance for the real-time identification of geometrically defective weld bead. Experiments are carried out on stainless steel (ER316LSi), bronze (ERCuNiAl) and mixed dataset containing both stainless steel and bronze. The results show that all three frameworks outperform the state-of-the-art acoustic signal based ANN approach in terms of accuracy. The best performing framework PCA+KNN outperforms ANN by more than 15%, 30% and 30% for stainless steel, bronze and mixed datasets, respectively.
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography