Dissertations / Theses on the topic 'Mel Frequency Cepstral Coefficients (MFCC)'
Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles
Consult the top 38 dissertations / theses for your research on the topic 'Mel Frequency Cepstral Coefficients (MFCC).'
Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.
You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.
Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.
Alvarenga, Rodrigo Jorge. "Reconhecimento de comandos de voz por redes neurais." Universidade de Taubaté, 2012. http://www.bdtd.unitau.br/tedesimplificado/tde_busca/arquivo.php?codArquivo=587.
Full textSystems for speech recognition have widespread use in the industrial universe, in the improvement of human operations and procedures and in the area of entertainment and recreation. The specific objective of this study was to design and develop a voice recognition system, capable of identifying voice commands, regardless of the speaker. The main purpose of the system is to control movement of robots, with applications in industry and in aid of disabled people. We used the approach of decision making, by means of a neural network trained with the distinctive features of the speech of 16 speakers. The samples of the voice commands were collected under the criterion of convenience (age and sex), to ensure a greater discrimination between the voice characteristics and to reach the generalization of the neural network. Preprocessing consisted in the determination of the endpoints of each command signal and in the adaptive Wiener filtering. Each speech command was segmented into 200 windows with overlapping of 25%. The features used were the zero crossing rate, the short-term energy and the mel-frequency ceptral coefficients. The first two coefficients of the linear predictive coding and its error were also tested. The neural network classifier was a multilayer perceptron, trained by the backpropagation algorithm. Several experiments were performed for the choice of thresholds, practical values, features and neural network configurations. Results were considered very good, reaching an acceptance rate of 89,16%, under the `worst case conditions for the sampling of the commands.
Larsson, Alm Kevin. "Automatic Speech Quality Assessment in Unified Communication : A Case Study." Thesis, Linköpings universitet, Programvara och system, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-159794.
Full textLarsson, Joel. "Optimizing text-independent speaker recognition using an LSTM neural network." Thesis, Mälardalens högskola, Akademin för innovation, design och teknik, 2014. http://urn.kb.se/resolve?urn=urn:nbn:se:mdh:diva-26312.
Full textUlrich, Natalja. "Linguistic and speaker variation in Russian fricatives." Electronic Thesis or Diss., Lyon 2, 2022. http://www.theses.fr/2022LYO20031.
Full textThis thesis represents an acoustic-phonetic investigation of phonetic details in Russian fricatives. The main aim was to detect acoustic correlates that carry linguistic and idiosyncratic information. The questions addressed were whether the place of articulation, speakers' gender and ID can be predicted by a set of acoustic cues and which acoustic measures represent the most reliable indicators. Furthermore, the distribution of speaker-specific characteristics and inter- and intra-speaker variation across acoustic cues were studied in more detail.The project started with the generation of a large audio database of Russian fricatives. Then, two follow-up analyses were conducted. Acoustic recordings were collected from 59 native Russian speakers. The resulting dataset consists of 22,561 tokens including the fricatives [f], [s], [ʃ], [x], [v], [z], [ʒ], [sj], [ɕ], [vʲ], [zʲ].The first study employed a data sample of 6320 tokens (from 40 speakers). Temporal and spectral measurements were extracted using three acoustic cue extraction techniques (full sound, the noise part, and the middle 30ms windows). Furthermore, 13 Mel Frequency Cepstral Coefficients were computed from the middle 30ms window.Classifiers based on single decision trees, random forests, support vector machines, and neural networks were trained and tested to distinguish between the three non-palatalized fricatives [f], [s] and [ʃ].The results demonstrate that machine learning techniques are very successful at classifying the Russian voiceless non-palatalized fricatives [f], [s] and [ʃ] by using the centre of gravity and the spectral spread irrespective of contextual and speaker variation. The three acoustic cue extraction techniques performed similarly in terms of classification accuracy (93% and 99%), but the spectral measurements extracted from the noise parts resulted in slightly better accuracy. Furthermore, Mel Frequency Cepstral Coefficients show marginally higher predictive power over spectral cues (< 2%).This suggests that both spectral measures and Mel Frequency Cepstral provide sufficient information for the classification of these fricatives and their choice depends on the particular research question or application. The second study's dataset consists of 15812 tokens (59 speakers) that contain [f], [s], [ʃ], [x], [v], [z], [ʒ], [sj], [ɕ]. As in the first study, two types of acoustic cues were extracted including 11 acoustic speech features (spectral cues, duration and HNR measures) and 13 Mel Frequency Cepstral Coefficients. Classifiers based on single decision trees and random forests were trained and tested to predict speakers' gender and ID
Darch, Jonathan J. A. "Robust acoustic speech feature prediction from Mel frequency cepstral coefficients." Thesis, University of East Anglia, 2008. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.445206.
Full textOkuyucu, Cigdem. "Semantic Classification And Retrieval System For Environmental Sounds." Master's thesis, METU, 2012. http://etd.lib.metu.edu.tr/upload/12615114/index.pdf.
Full textAssaad, Firas Souhail. "Biometric Multi-modal User Authentication System based on Ensemble Classifier." University of Toledo / OhioLINK, 2014. http://rave.ohiolink.edu/etdc/view?acc_num=toledo1418074931.
Full textEdman, Sebastian. "Radar target classification using Support Vector Machines and Mel Frequency Cepstral Coefficients." Thesis, KTH, Optimeringslära och systemteori, 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-214794.
Full textI radar applikationer räcker det ibland inte med att veta att systemet observerat ett mål när en reflekted signal dekekteras, det är ofta också utav stort intresse att veta vilket typ av föremål som signalen reflekterades mot. Detta projekt undersöker möjligheterna att utifrån rå radardata transformera de reflekterade signalerna och använda sina mänskliga sinnen, mer specifikt våran hörsel, för att skilja på olika mål och också genom en maskininlärnings approach där med hjälp av mönster och karaktärsdrag för dessa signaler används för att besvara frågeställningen. Mer ingående avgränsas denna undersökning till två typer av mål, mindre obemannade flygande farkoster (UAV) och fåglar. Genom att extrahera komplexvärd radar video även känt som I/Q data från tidigare nämnda typer av mål via signalbehandlingsmetoder transformera denna data till reella signaler, därefter transformeras dessa signaler till hörbara signaler. För att klassificera dessa typer av signaler används typiska särdrag som också används inom taligenkänning, nämligen, Mel Frequency Cepstral Coefficients tillsammans med två modeller av en Support Vector Machine klassificerings metod. Med den linjära modellen uppnåddes en prediktions noggrannhet på 93.33%. Individuellt var noggrannheten 93.33 % korrekt klassificering utav UAV:n och 93.33 % på fåglar. Med radial bas modellen uppnåddes en prediktions noggrannhet på 98.33%. Individuellt var noggrannheten 100 % korrekt klassificering utav UAV:n och 96.76% på fåglar. Projektet är delvis utfört med J. Clemedson [2] vars fokus är att, som tidigare nämnt, transformera dessa signaler till hörbara signaler.
Yang, Chenguang. "Security in Voice Authentication." Digital WPI, 2014. https://digitalcommons.wpi.edu/etd-dissertations/79.
Full textPešek, Milan. "Detekce logopedických vad v řeči." Master's thesis, Vysoké učení technické v Brně. Fakulta elektrotechniky a komunikačních technologií, 2009. http://www.nusl.cz/ntk/nusl-218106.
Full textWu, Qiming. "A robust audio-based symbol recognition system using machine learning techniques." University of the Western Cape, 2020. http://hdl.handle.net/11394/7614.
Full textThis research investigates the creation of an audio-shape recognition system that is able to interpret a user’s drawn audio shapes—fundamental shapes, digits and/or letters— on a given surface such as a table-top using a generic stylus such as the back of a pen. The system aims to make use of one, two or three Piezo microphones, as required, to capture the sound of the audio gestures, and a combination of the Mel-Frequency Cepstral Coefficients (MFCC) feature descriptor and Support Vector Machines (SVMs) to recognise audio shapes. The novelty of the system is in the use of piezo microphones which are low cost, light-weight and portable, and the main investigation is around determining whether these microphones are able to provide sufficiently rich information to recognise the audio shapes mentioned in such a framework.
Sklar, Alexander Gabriel. "Channel Modeling Applied to Robust Automatic Speech Recognition." Scholarly Repository, 2007. http://scholarlyrepository.miami.edu/oa_theses/87.
Full textCandel, Ramón Antonio José. "Verificación automática de locutores aplicando pruebas diagnósticas múltiples en serie y en paralelo basadas en DTW (Dynamic Time Warping) y NFCC (Mel-Frequency Cepstral coefficients)." Doctoral thesis, Universidad de Murcia, 2015. http://hdl.handle.net/10803/300433.
Full textThe present thesis is the design of a system capable of performing automatic speaker verification, for which is based on modeling using the DTW (Dynamic Time Warping) and procedures MFCC (Mel-Frequency Cepstral Coefficients). Once designed it, we have evaluated the system so both at individual events, DTW and MFCC separately as multiple, combining both in series and in parallel, to recordings obtained from the data base AHUMADA from the Guardia Civil. All results have been seen considering the statistical significance thereof, derived from performing a given finite number of tests. Statistical results have been obtained in such a system for different sizes of the databases used, allowing us to conclude the influence of these in the method in order to fix a priori the different variables of this, in order to make the best possible study. To the same conclusion, we can identify what is the best system, consisting of model type and sample size, we use a forensic study based on the intended purpose.
Lindstål, Tim, and Daniel Marklund. "Application of LabVIEW and myRIO to voice controlled home automation." Thesis, Uppsala universitet, Signaler och System, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-380866.
Full textNeville, Katrina Lee, and katrina neville@rmit edu au. "Channel Compensation for Speaker Recognition Systems." RMIT University. Electrical and Computer Engineering, 2007. http://adt.lib.rmit.edu.au/adt/public/adt-VIT20080514.093453.
Full textHrabina, Martin. "VÝVOJ ALGORITMŮ PRO ROZPOZNÁVÁNÍ VÝSTŘELŮ." Doctoral thesis, Vysoké učení technické v Brně. Fakulta elektrotechniky a komunikačních technologií, 2019. http://www.nusl.cz/ntk/nusl-409087.
Full textZezula, Miroslav. "Online detekce jednoduchých příkazů v audiosignálu." Master's thesis, Vysoké učení technické v Brně. Fakulta strojního inženýrství, 2011. http://www.nusl.cz/ntk/nusl-229484.
Full textAlsouda, Yasser. "An IoT Solution for Urban Noise Identification in Smart Cities : Noise Measurement and Classification." Thesis, Linnéuniversitetet, Institutionen för fysik och elektroteknik (IFE), 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:lnu:diva-80858.
Full textHrušovský, Enrik. "Automatická klasifikace výslovnosti hlásky R." Master's thesis, Vysoké učení technické v Brně. Fakulta elektrotechniky a komunikačních technologií, 2018. http://www.nusl.cz/ntk/nusl-377664.
Full textKufa, Tomáš. "Rozpoznáváni standardních PILOT-CONTROLLER řídicích povelů v hlasové podobě." Master's thesis, Vysoké učení technické v Brně. Fakulta elektrotechniky a komunikačních technologií, 2009. http://www.nusl.cz/ntk/nusl-217849.
Full textDušil, Lubomír. "Automatické rozpoznávání logopedických vad v řečovém projevu." Master's thesis, Vysoké učení technické v Brně. Fakulta elektrotechniky a komunikačních technologií, 2009. http://www.nusl.cz/ntk/nusl-218161.
Full textЛавриненко, Олександр Юрійович, Александр Юрьевич Лавриненко, and Oleksandr Lavrynenko. "Методи підвищення ефективності семантичного кодування мовних сигналів." Thesis, Національний авіаційний університет, 2021. https://er.nau.edu.ua/handle/NAU/52212.
Full textThe thesis is devoted to the solution of the actual scientific and practical problem in telecommunication systems, namely increasing the bandwidth of the semantic speech data transmission channel due to their efficient coding, that is the question of increasing the efficiency of semantic coding is formulated, namely – at what minimum speed it is possible to encode semantic features of speech signals with the set probability of their error-free recognition? It is on this question will be answered in this research, which is an urgent scientific and technical task given the growing trend of remote human interaction and robotic technology through speech, where the accurateness of this type of system directly depends on the effectiveness of semantic coding of speech signals. In the thesis the well-known method of increasing the efficiency of semantic coding of speech signals based on mel-frequency cepstral coefficients is investigated, which consists in finding the average values of the coefficients of the discrete cosine transformation of the prologarithmic energy of the spectrum of the discrete Fourier transform treated by a triangular filter in the mel-scale. The problem is that the presented method of semantic coding of speech signals based on mel-frequency cepstral coefficients does not meet the condition of adaptability, therefore the main scientific hypothesis of the study was formulated, which is that to increase the efficiency of semantic coding of speech signals is possible through the use of adaptive empirical wavelet transform followed by the use of Hilbert spectral analysis. Coding efficiency means a decrease in the rate of information transmission with a given probability of error-free recognition of semantic features of speech signals, which will significantly reduce the required passband, thereby increasing the bandwidth of the communication channel. In the process of proving the formulated scientific hypothesis of the study, the following results were obtained: 1) the first time the method of semantic coding of speech signals based on empirical wavelet transform is developed, which differs from existing methods by constructing a sets of adaptive bandpass wavelet-filters Meyer followed by the use of Hilbert spectral analysis for finding instantaneous amplitudes and frequencies of the functions of internal empirical modes, which will determine the semantic features of speech signals and increase the efficiency of their coding; 2) the first time it is proposed to use the method of adaptive empirical wavelet transform in problems of multiscale analysis and semantic coding of speech signals, which will increase the efficiency of spectral analysis due to the decomposition of high-frequency speech oscillations into its low-frequency components, namely internal empirical modes; 3) received further development the method of semantic coding of speech signals based on mel-frequency cepstral coefficients, but using the basic principles of adaptive spectral analysis with the application empirical wavelet transform, which increases the efficiency of this method. Conducted experimental research in the software environment MATLAB R2020b showed, that the developed method of semantic coding of speech signals based on empirical wavelet transform allows you to reduce the encoding speed from 320 to 192 bit/s and the required passband from 40 to 24 Hz with a probability of error-free recognition of about 0.96 (96%) and a signal-to-noise ratio of 48 dB, according to which its efficiency increases 1.6 times in contrast to the existing method. The results obtained in the thesis can be used to build systems for remote interaction of people and robotic equipment using speech technologies, such as speech recognition and synthesis, voice control of technical objects, low-speed encoding of speech information, voice translation from foreign languages, etc.
Sujatha, J. "Improved MFCC Front End Using Spectral Maxima For Noisy Speech Recognition." Thesis, 2005. https://etd.iisc.ac.in/handle/2005/1506.
Full textSujatha, J. "Improved MFCC Front End Using Spectral Maxima For Noisy Speech Recognition." Thesis, 2005. http://etd.iisc.ernet.in/handle/2005/1506.
Full text(6642491), Jingzhao Dai. "SPARSE DISCRETE WAVELET DECOMPOSITION AND FILTER BANK TECHNIQUES FOR SPEECH RECOGNITION." Thesis, 2019.
Find full textSpeech recognition is widely applied to translation from speech to related text, voice driven commands, human machine interface and so on [1]-[8]. It has been increasingly proliferated to Human’s lives in the modern age. To improve the accuracy of speech recognition, various algorithms such as artificial neural network, hidden Markov model and so on have been developed [1], [2].
In this thesis work, the tasks of speech recognition with various classifiers are investigated. The classifiers employed include the support vector machine (SVM), k-nearest neighbors (KNN), random forest (RF) and convolutional neural network (CNN). Two novel features extraction methods of sparse discrete wavelet decomposition (SDWD) and bandpass filtering (BPF) based on the Mel filter banks [9] are developed and proposed. In order to meet diversity of classification algorithms, one-dimensional (1D) and two-dimensional (2D) features are required to be obtained. The 1D features are the array of power coefficients in frequency bands, which are dedicated for training SVM, KNN and RF classifiers while the 2D features are formed both in frequency domain and temporal variations. In fact, the 2D feature consists of the power values in decomposed bands versus consecutive speech frames. Most importantly, the 2D feature with geometric transformation are adopted to train CNN.
Speech recognition including males and females are from the recorded data set as well as the standard data set. Firstly, the recordings with little noise and clear pronunciation are applied with the proposed feature extraction methods. After many trials and experiments using this dataset, a high recognition accuracy is achieved. Then, these feature extraction methods are further applied to the standard recordings having random characteristics with ambient noise and unclear pronunciation. Many experiment results validate the effectiveness of the proposed feature extraction techniques.
Tang, Chu-Liang, and 唐曲亮. "Improved Mel Frequency Cepstral Coefficients Combined with Multiple Speech Features." Thesis, 2015. http://ndltd.ncl.edu.tw/handle/57856949340151071584.
Full text國立中央大學
電機工程學系
103
This thesis studies the speech feature extracting and feature compensation in speech recognition. Several speech features are selected for combinations. The best one is cascading Linear Prediction Cepstral Coefficients (LPCC) and Mel-Frequency Cepstral Coefficient (MFCC). The MFCCs used here are obtained by utilizing a Gaussian Mel-Frequency band instead of using a triangular filter bank. And by experiments, it is found that the best combination ratio of LPCC and MFCC is 1:1. The thesis also showed that further improved performance is possible if Cepstral Mean and Variance Normalization (CMVN) is added.
Kuo, Yo-zhen, and 郭又禎. "Improved Mel-scale Frequency Cepstral Coefficients for Keyword Spotting Technique." Thesis, 2014. http://ndltd.ncl.edu.tw/handle/27592493670347223949.
Full text國立中央大學
電機工程學系
102
In the speech recognition system, Mel frequency cepstral coefficients (MFCCs) are the feature parameters that are used widely. Because of the wide applications of MFCC in the audio signal processing, lots of studies on the improvement of MFCCs were presented. In this study, we use particle swarm optimization algorithm to optimize the weight of MFCC filter bank. We utilize the difference between voice training database’s energy statistical curve and MFCC filter bank’s envelope as fitness function. Experimental results show that the proposed MFCCs method improves the recognition rate. In noisy environment experiments, the presented MFCCs method also improves the recognition performance.
Lin, Shih-Fen, and 林士棻. "Bird songs recognition using two-dimensional Mel-scale frequency cepstral coefficients." Thesis, 2006. http://ndltd.ncl.edu.tw/handle/94553686394732089037.
Full text林士棻. "Bird songs recognition using two-dimensional Mel-scale frequency cepstral coefficients." Thesis, 2006. http://ndltd.ncl.edu.tw/handle/38302762655714685237.
Full text中華大學
資訊工程學系(所)
94
We propose a method to automatically identify birds from their sounds in this paper. First, each syllable corresponding to a piece of vocalization is segmented. The average LPCC (ALPCC), average MFCC (AMFCC), Static MFCC (SMFCC), Two-dimensional MFCC (TDMFCC), Dynamic two-dimensional MFCC (DTDMFCC) and TDMFCC+DTDMFCC over all frames in a syllable are calculated as the vocalization features. Linear discriminant analysis (LDA) is exploited to increase the classification accuracy at a lower dimensional feature vector space. A clustering algorithm, called progressive constructive clustering (PCC) algorithm, is used to divide the feature vectors which were computed from the same bird species into several subclasses. In our experiments, TDMFCC+DTDMFCC can achieve average classification accuracy 90% and 89% for 420 bird species and 561 bird species.
HUANG, CHUAN-HAO, and 黃川豪. "Multi-feature Speaker Verification Based on Mel-frequency cepstral coefficients and Formants." Thesis, 2019. http://ndltd.ncl.edu.tw/handle/4nbqev.
Full textXu, Sheng-Bin, and 徐勝斌. "Continuous Birdsong Recognition Using Dynamic and Temporal Two-Dimensional Mel-Frequency Cepstral Coefficients." Thesis, 2009. http://ndltd.ncl.edu.tw/handle/21749503795140776068.
Full text中華大學
資訊工程學系(所)
97
In this paper, we will propose an approach for the classification of bird species using fixed-duration sound segments extracted from continuous birdsong recordings. First, each sound segment is divided into a number of overlapped texture windows. Each texture window will be individually classified and then a fusion approach is employed to determine the classification result of the input segment. The features derived from static, transitional, and temporal information of two-dimensional Mel-frequency cepstral coefficients (TDMFCC) will be extracted for the classification of each texture window. TDMFCC can describe both static and dynamic characteristics of a texture window, and dynamic TDMFCC (DTDMFCC) is used to describe sharp transitions within a texture window, and global dynamic TDMFCC (GDTDMFCC) is developed to describe long-time temporal variations in a texture window, and the concepts of DTDMFCC, which computes local regression coefficients, and GDTDMFCC, which evaluates global contrast information, can be integrated to form a new feature vector, called global and local DTDMFCC (GLDTDMFCC). Furthermore, we use principal component analysis (PCA) to reduce the feature dimension, Gaussian mixture models (GMM) to model the sound of different bird species, and linear discriminant analysis (LDA) to improve the classification accuracy at a lower dimensional feature vector space. In our experiment, the highest average classification accuracy is 94.62% for the classification of 28 kinds of bird species.
CHIANG, MING-DA, and 蔣明達. "Speaker Recognition Using Mel-Scale Frequency Cepstral Coefficients by Time Domain Filtering method." Thesis, 2008. http://ndltd.ncl.edu.tw/handle/13444981721982290438.
Full text中華技術學院
電子工程研究所碩士班
96
ABSTRACT According to past papers, we find that the algorithm based on Mel-frequency cepstral coefficients (MFCCs) has a better performance than any other algorithms which based on the other feature parameters [1-7]. The Mel-frequency cepstral coefficients are taken by following procedures, including: framing, multiplied by the Hamming Window, taking the fast Fourier transform (FFT), filtered in frequency domain by Mel-frequency triangular filter bank, calculating the logarithmic energy of filter outputs, and taking discrete cosine transform (DCT) to obtain the Mel-frequency cepstral coefficients. In this paper, for finding the Mel-frequency cepstral coefficients, the conventional frequency domain filtering procedures [1] are replaced by our new directly time domain filtering procedures proposed in this thesis. The simulation results show that the performances between our new method and the previous approach [1] are quite similar.
Lin, Bo-Zhi, and 林柏志. "Speaker Recognition Algorithm Using Mel-Scale Frequency Cepstral Coefficients with Two Stages Linear Prediction Filters." Thesis, 2006. http://ndltd.ncl.edu.tw/handle/18209732501243789128.
Full text中華技術學院
電子工程研究所碩士班
94
The development of computer and communication technologies hastens the application requirements of speaker recognition and speech recognition. The purpose of this paper is to present a new algorithm to promote the performance of speaker recognition. The algorithm uses two stages linear prediction error filters to estimate the spectrogram of the processed speech signal. Then, the algorithm uses Mel-scale triangle bandpass filters bank to obtain the Mel-scale frequency cepstral coefficients(MFCC)to build the needed Gaussian mixture model for speaker recognition. To verify that the algorithm can work well and to compare the performance with the other algorithms, we use the mandarin speech data base, MAT-400, which was bought from the Association for Computational Linguistics and Chinese Language Processing. The experimental results show that the proposed algorithm has the best performance in the case of higher signal-to-noise ratio.
Yang-Ming, Cheng, and 鄭陽銘. "A Mel-Scale Frequency Cepstral Coefficients Speaker Recognition Algorithm Based on Linear Prediction Spectrum Estimation." Thesis, 2005. http://ndltd.ncl.edu.tw/handle/38345345070598427641.
Full text中華技術學院
電子工程研究所碩士班
93
According to the past research, we know that the spectrum estimation based on linear prediction is more robust than the spectrum estimation based on FFT in the case of lower SNR. In this paper, we propose a new speaker identification algorithm based on linear prediction spectrum estimation. In this algorithm, the spectrum estimation algorithm based on short time faster Furrier transform is replaced by the linear prediction spectrum estimation algorithm, then, the Mel-scale frequency cepstral coefficients are obtained by using the Mel-scale frequency triangle filter-bank. Experimental results show that the new algorithm have better performance than the algorithm based on FFT in the case of lower SNR.
Chu, Feng-Seng, and 朱峰森. "Improved Approaches of Processing Perceptual Linear Prediction(PLP)and Mel Frequency Cepstrum Coefficient(MFCC)Parameters for Robust Speech Recognition." Thesis, 2005. http://ndltd.ncl.edu.tw/handle/26578739886453071884.
Full textBowman, Casady. "Perceiving Emotion in Sounds: Does Timbre Play a Role?" Thesis, 2011. http://hdl.handle.net/1969.1/ETD-TAMU-2011-12-10656.
Full textWu, Sunrise, and 吳尚叡. "Design Time Domain Filter Banks Using Least Squares Method to Calculate the Mel-Frequency Cepstral Coefficients for Speaker Recognition." Thesis, 2008. http://ndltd.ncl.edu.tw/handle/08178129842426697899.
Full text中華技術學院
電子工程研究所碩士班
96
Up to now, the best speaker recognition technique is based on Mel-frequency cepstral coefficients (MFCCs) [1-4,11] method. The main procedures on taking MFCCs are undergone by: framing, Hamming windowing, multiplied by FFT(Fast Fourier Transform)[7], filtered by Mel-scale triangular filter bank, taken the logarithmic energies of outputs, and transformed by DCT (Discrete Cosine Transform)[1-8]. After these processes, the MFCCs are obtained. The main topic of this thesis is we replace previous procedures of FFT [7] and filtering using a frequency-domain Mel-scale triangular filter bank[15] by filtering using a time-domain Mel-scale triangular filter bank. The time-domain Mel-scale triangular filter bank[1-8,14] we mentioned is obtained by the least square method[10,13], which is used to obtain the Mel-frequency cepstral coefficients of speaker speeches. From the results of our experiments, we find that the successful speaker recognition ratios between the conventional MFCC method[2,3,6,14] and our new approach are very similar.
Yuan, Hor, and 原禾. "Design Time Domain Filter Banks Using Least Squares Method to Calculate the Mel-Frequency Cepstral Coefficients for Non-Continuous Speech Recognition." Thesis, 2009. http://ndltd.ncl.edu.tw/handle/76162451347630250736.
Full text中華技術學院
電子工程研究所碩士班
97
In speech recognition, the Mel frequency cepstral coefficients (MFCC) is currently popular to be used in speech recognition and speaker recognition[2,8-11,14,15]. To obtain the MFCC, the main procedures are filtering the speech signal by a set of triangular Mel-scale Filter Bank in the frequency domain to obtain the logarithm of the output powers of filter bank, and then taking Discrete Cosine Transform to obtain the MFCC. In this paper, the frequency domain triangular Mel-scale filter bank is replaced by a new designed time domain triangular Mel-scale filter bank. The experimental results show that the performances of speech recognition algorithms between that extracting MFCC using the conventional triangular Mel-scale filter bank and that extracting MFCC using the new designed time domain Mel-scale filter bank are very similar.