To see the other types of publications on this topic, follow the link: Mel Frequency Cepstral Coefficients (MFCC).

Dissertations / Theses on the topic 'Mel Frequency Cepstral Coefficients (MFCC)'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 38 dissertations / theses for your research on the topic 'Mel Frequency Cepstral Coefficients (MFCC).'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.

1

Alvarenga, Rodrigo Jorge. "Reconhecimento de comandos de voz por redes neurais." Universidade de Taubaté, 2012. http://www.bdtd.unitau.br/tedesimplificado/tde_busca/arquivo.php?codArquivo=587.

Full text
Abstract:
Sistema de reconhecimento de fala tem amplo emprego no universo industrial, no aperfeiçoamento de operações e procedimentos humanos e no setor do entretenimento e recreação. O objetivo específico do trabalho foi conceber e desenvolver um sistema de reconhecimento de voz, capaz de identificar comandos de voz, independentemente do locutor. A finalidade precípua do sistema é controlar movimentos de robôs, com aplicações na indústria e no auxílio de deficientes físicos. Utilizou-se a abordagem da tomada de decisão por meio de uma rede neural treinada com as características distintivas do sinal de fala de 16 locutores. As amostras dos comandos foram coletadas segundo o critério de conveniência (em idade e sexo), a fim de garantir uma maior discriminação entre as características de voz, e assim alcançar a generalização da rede neural utilizada. O préprocessamento consistiu na determinação dos pontos extremos da locução do comando e na filtragem adaptativa de Wiener. Cada comando de fala foi segmentado em 200 janelas, com superposição de 25% . As features utilizadas foram a taxa de cruzamento de zeros, a energia de curto prazo e os coeficientes ceptrais na escala de frequência mel. Os dois primeiros coeficientes da codificação linear preditiva e o seu erro também foram testados. A rede neural empregada como classificador foi um perceptron multicamadas, treinado pelo algoritmo backpropagation. Várias experimentações foram realizadas para a escolha de limiares, valores práticos, features e configurações da rede neural. Os resultados foram considerados muito bons, alcançando uma taxa de acertos de 89,16%, sob as condições de pior caso da amostragem dos comandos.
Systems for speech recognition have widespread use in the industrial universe, in the improvement of human operations and procedures and in the area of entertainment and recreation. The specific objective of this study was to design and develop a voice recognition system, capable of identifying voice commands, regardless of the speaker. The main purpose of the system is to control movement of robots, with applications in industry and in aid of disabled people. We used the approach of decision making, by means of a neural network trained with the distinctive features of the speech of 16 speakers. The samples of the voice commands were collected under the criterion of convenience (age and sex), to ensure a greater discrimination between the voice characteristics and to reach the generalization of the neural network. Preprocessing consisted in the determination of the endpoints of each command signal and in the adaptive Wiener filtering. Each speech command was segmented into 200 windows with overlapping of 25%. The features used were the zero crossing rate, the short-term energy and the mel-frequency ceptral coefficients. The first two coefficients of the linear predictive coding and its error were also tested. The neural network classifier was a multilayer perceptron, trained by the backpropagation algorithm. Several experiments were performed for the choice of thresholds, practical values, features and neural network configurations. Results were considered very good, reaching an acceptance rate of 89,16%, under the `worst case conditions for the sampling of the commands.
APA, Harvard, Vancouver, ISO, and other styles
2

Larsson, Alm Kevin. "Automatic Speech Quality Assessment in Unified Communication : A Case Study." Thesis, Linköpings universitet, Programvara och system, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-159794.

Full text
Abstract:
Speech as a medium for communication has always been important in its ability to convey our ideas, personality and emotions. It is therefore not strange that Quality of Experience (QoE) becomes central to any business relying on voice communication. Using Unified Communication (UC) systems, users can communicate with each other in several ways using many different devices, making QoE an important aspect for such systems. For this thesis, automatic methods for assessing speech quality of the voice calls in Briteback’s UC application is studied, including a comparison of the researched methods. Three methods all using a Gaussian Mixture Model (GMM) as a regressor, paired with extraction of Human Factor Cepstral Coefficients (HFCC), Gammatone Frequency Cepstral Coefficients (GFCC) and Modified Mel Frequency Cepstrum Coefficients (MMFCC) features respectively is studied. The method based on HFCC feature extraction shows better performance in general compared to the two other methods, but all methods show comparatively low performance compared to literature. This most likely stems from implementation errors, showing the difference between theory and practice in the literature, together with the lack of reference implementations. Further work with practical aspects in mind, such as reference implementations or verification tools can make the field more popular and increase its use in the real world.
APA, Harvard, Vancouver, ISO, and other styles
3

Larsson, Joel. "Optimizing text-independent speaker recognition using an LSTM neural network." Thesis, Mälardalens högskola, Akademin för innovation, design och teknik, 2014. http://urn.kb.se/resolve?urn=urn:nbn:se:mdh:diva-26312.

Full text
Abstract:
In this paper a novel speaker recognition system is introduced. Automated speaker recognition has become increasingly popular to aid in crime investigations and authorization processes with the advances in computer science. Here, a recurrent neural network approach is used to learn to identify ten speakers within a set of 21 audio books. Audio signals are processed via spectral analysis into Mel Frequency Cepstral Coefficients that serve as speaker specific features, which are input to the neural network. The Long Short-Term Memory algorithm is examined for the first time within this area, with interesting results. Experiments are made as to find the optimum network model for the problem. These show that the network learns to identify the speakers well, text-independently, when the recording situation is the same. However the system has problems to recognize speakers from different recordings, which is probably due to noise sensitivity of the speech processing algorithm in use.
APA, Harvard, Vancouver, ISO, and other styles
4

Ulrich, Natalja. "Linguistic and speaker variation in Russian fricatives." Electronic Thesis or Diss., Lyon 2, 2022. http://www.theses.fr/2022LYO20031.

Full text
Abstract:
Cette thèse présente une investigation acoustico-phonétique des détails phonétiques des fricatives russes.L'objectif principal était de détecter des corrélats acoustiques porteurs d'infor- mations linguistiques et idiosyncrasiques. Les questions abordées étaient de savoir si le lieu d'articulation, le sexe du locuteur ou son identité peuvent être prédits par des indices acoustiques et quelles mesures acoustiques représentent les indicateurs les plus fiables. En outre, la distribution des caractéristiques spécifiques au locuteur et à la variation inter et intra locuteur à travers les indices acoustiques a été étudiée plus en détail. Le projet a commencé par la création d'une grande base de données audio des fricatives russes. Des enregistrements acoustiques ont été obtenus auprès de 59 locuteurs russes natifs. Le jeu de données résultant est composé de 22 561 occurrences comprenant les fricatives [f], [s], [ʃ], [x], [v], [z], [ʒ], [sj], [ɕ], [vʲ], [zʲ]. Deux analyses ont été menées à partir de cette base de données. Dans la première étude, un échantillon de données de 6320 occurrences (40 locuteurs) a été utilisé. Trois techniques d'extraction acoustisque (à partir du son complet, de la durée du bruit et des fenêtres centrales de 30 ms) ont été sollicitées pour extraire des mesures temporelles et spectrales. En outre, 13 coefficients cepstraux (Mel-Frequency Cepstral Coefficients, MFCC) ont été calculés à partir de la fenêtre centrale de 30 ms. Des classificateurs fondés sur des arbres de décision simples, des forêts aléatoires, des machines à vecteurs de support (Support-vector machine, SVM) et des réseaux neuronaux ont été entraînés et testés pour distinguer trois fricatives non palatalisées [f], [s] et
This thesis represents an acoustic-phonetic investigation of phonetic details in Russian fricatives. The main aim was to detect acoustic correlates that carry linguistic and idiosyncratic information. The questions addressed were whether the place of articulation, speakers' gender and ID can be predicted by a set of acoustic cues and which acoustic measures represent the most reliable indicators. Furthermore, the distribution of speaker-specific characteristics and inter- and intra-speaker variation across acoustic cues were studied in more detail.The project started with the generation of a large audio database of Russian fricatives. Then, two follow-up analyses were conducted. Acoustic recordings were collected from 59 native Russian speakers. The resulting dataset consists of 22,561 tokens including the fricatives [f], [s], [ʃ], [x], [v], [z], [ʒ], [sj], [ɕ], [vʲ], [zʲ].The first study employed a data sample of 6320 tokens (from 40 speakers). Temporal and spectral measurements were extracted using three acoustic cue extraction techniques (full sound, the noise part, and the middle 30ms windows). Furthermore, 13 Mel Frequency Cepstral Coefficients were computed from the middle 30ms window.Classifiers based on single decision trees, random forests, support vector machines, and neural networks were trained and tested to distinguish between the three non-palatalized fricatives [f], [s] and [ʃ].The results demonstrate that machine learning techniques are very successful at classifying the Russian voiceless non-palatalized fricatives [f], [s] and [ʃ] by using the centre of gravity and the spectral spread irrespective of contextual and speaker variation. The three acoustic cue extraction techniques performed similarly in terms of classification accuracy (93% and 99%), but the spectral measurements extracted from the noise parts resulted in slightly better accuracy. Furthermore, Mel Frequency Cepstral Coefficients show marginally higher predictive power over spectral cues (< 2%).This suggests that both spectral measures and Mel Frequency Cepstral provide sufficient information for the classification of these fricatives and their choice depends on the particular research question or application. The second study's dataset consists of 15812 tokens (59 speakers) that contain [f], [s], [ʃ], [x], [v], [z], [ʒ], [sj], [ɕ]. As in the first study, two types of acoustic cues were extracted including 11 acoustic speech features (spectral cues, duration and HNR measures) and 13 Mel Frequency Cepstral Coefficients. Classifiers based on single decision trees and random forests were trained and tested to predict speakers' gender and ID
APA, Harvard, Vancouver, ISO, and other styles
5

Darch, Jonathan J. A. "Robust acoustic speech feature prediction from Mel frequency cepstral coefficients." Thesis, University of East Anglia, 2008. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.445206.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Okuyucu, Cigdem. "Semantic Classification And Retrieval System For Environmental Sounds." Master's thesis, METU, 2012. http://etd.lib.metu.edu.tr/upload/12615114/index.pdf.

Full text
Abstract:
The growth of multimedia content in recent years motivated the research on audio classification and content retrieval area. In this thesis, a general environmental audio classification and retrieval approach is proposed in which higher level semantic classes (outdoor, nature, meeting and violence) are obtained from lower level acoustic classes (emergency alarm, car horn, gun-shot, explosion, automobile, motorcycle, helicopter, wind, water, rain, applause, crowd and laughter). In order to classify an audio sample into acoustic classes, MPEG-7 audio features, Mel Frequency Cepstral Coefficients (MFCC) feature and Zero Crossing Rate (ZCR) feature are used with Hidden Markov Model (HMM) and Support Vector Machine (SVM) classifiers. Additionally, a new classification method is proposed using Genetic Algorithm (GA) for classification of semantic classes. Query by Example (QBE) and keyword-based query capabilities are implemented for content retrieval.
APA, Harvard, Vancouver, ISO, and other styles
7

Assaad, Firas Souhail. "Biometric Multi-modal User Authentication System based on Ensemble Classifier." University of Toledo / OhioLINK, 2014. http://rave.ohiolink.edu/etdc/view?acc_num=toledo1418074931.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Edman, Sebastian. "Radar target classification using Support Vector Machines and Mel Frequency Cepstral Coefficients." Thesis, KTH, Optimeringslära och systemteori, 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-214794.

Full text
Abstract:
In radar applications, there are often times when one does not only want to know that there is a target that reflecting the out sent signals but also what kind of target that reflecting these signals. This project investigates the possibilities to from raw radar data transform reflected signals and take use of human perception, in particular our hearing, and by a machine learning approach where patterns and characteristics in data are used to answer the earlier mentioned question. More specific the investigation treats two kinds of targets that are fairly comparable namely smaller Unmanned Aerial Vehicles (UAV) and Birds. By extracting complex valued radar video so called I/Q data generated by these targets using signal processing techniques and transform this data to a real signals and after this transform the signals to audible signals. A feature set commonly used in speech recognition namely Mel Frequency Cepstral Coefficients are used two describe these signals together with two Support Vector Machine classification models. The two models where tested with an independent test set and the linear model achieved a overall prediction accuracy 93.33 %. Individually the prediction resulted in 93.33 % correct classification on the UAV and 93.33 % on the birds. Secondly a radial basis model with a overall prediction accuracy of 98.33 % where achieved. Individually the prediction resulted in 100% correct classification on the UAV and 96.76 % on the birds. The project is partly done in collaboration with J. Clemedson [2] where the focus is, as mentioned earlier, to transform the signals to audible signals.
I radar applikationer räcker det ibland inte med att veta att systemet observerat ett mål när en reflekted signal dekekteras, det är ofta också utav stort intresse att veta vilket typ av föremål som signalen reflekterades mot. Detta projekt undersöker möjligheterna att utifrån rå radardata transformera de reflekterade signalerna och använda sina mänskliga sinnen, mer specifikt våran hörsel, för att skilja på olika mål och också genom en maskininlärnings approach där med hjälp av mönster och karaktärsdrag för dessa signaler används för att besvara frågeställningen. Mer ingående avgränsas denna undersökning till två typer av mål, mindre obemannade flygande farkoster (UAV) och fåglar. Genom att extrahera komplexvärd radar video även känt som I/Q data från tidigare nämnda typer av mål via signalbehandlingsmetoder transformera denna data till reella signaler, därefter transformeras dessa signaler till hörbara signaler. För att klassificera dessa typer av signaler används typiska särdrag som också används inom taligenkänning, nämligen, Mel Frequency Cepstral Coefficients tillsammans med två modeller av en Support Vector Machine klassificerings metod. Med den linjära modellen uppnåddes en prediktions noggrannhet på 93.33%. Individuellt var noggrannheten 93.33 % korrekt klassificering utav UAV:n och 93.33 % på fåglar. Med radial bas modellen uppnåddes en prediktions noggrannhet på 98.33%. Individuellt var noggrannheten 100 % korrekt klassificering utav UAV:n och 96.76% på fåglar. Projektet är delvis utfört med J. Clemedson [2] vars fokus är att, som tidigare nämnt, transformera dessa signaler till hörbara signaler.
APA, Harvard, Vancouver, ISO, and other styles
9

Yang, Chenguang. "Security in Voice Authentication." Digital WPI, 2014. https://digitalcommons.wpi.edu/etd-dissertations/79.

Full text
Abstract:
We evaluate the security of human voice password databases from an information theoretical point of view. More specifically, we provide a theoretical estimation on the amount of entropy in human voice when processed using the conventional GMM-UBM technologies and the MFCCs as the acoustic features. The theoretical estimation gives rise to a methodology for analyzing the security level in a corpus of human voice. That is, given a database containing speech signals, we provide a method for estimating the relative entropy (Kullback-Leibler divergence) of the database thereby establishing the security level of the speaker verification system. To demonstrate this, we analyze the YOHO database, a corpus of voice samples collected from 138 speakers and show that the amount of entropy extracted is less than 14-bits. We also present a practical attack that succeeds in impersonating the voice of any speaker within the corpus with a 98% success probability with as little as 9 trials. The attack will still succeed with a rate of 62.50% if 4 attempts are permitted. Further, based on the same attack rationale, we mount an attack on the ALIZE speaker verification system. We show through experimentation that the attacker can impersonate any user in the database of 69 people with about 25% success rate with only 5 trials. The success rate can achieve more than 50% by increasing the allowed authentication attempts to 20. Finally, when the practical attack is cast in terms of an entropy metric, we find that the theoretical entropy estimate almost perfectly predicts the success rate of the practical attack, giving further credence to the theoretical model and the associated entropy estimation technique.
APA, Harvard, Vancouver, ISO, and other styles
10

Pešek, Milan. "Detekce logopedických vad v řeči." Master's thesis, Vysoké učení technické v Brně. Fakulta elektrotechniky a komunikačních technologií, 2009. http://www.nusl.cz/ntk/nusl-218106.

Full text
Abstract:
The thesis deals with a design and an implementation of software for a detection of logopaedia defects of speech. Due to the need of early logopaedia defects detecting, this software is aimed at a child’s age speaker. The introductory part describes the theory of speech realization, simulation of speech realization for numerical processing, phonetics, logopaedia and basic logopaedia defects of speech. There are also described used methods for feature extraction, for segmentation of words to speech sounds and for features classification into either correct or incorrect pronunciation class. In the next part of the thesis there are results of testing of selected methods presented. For logopaedia speech defects recognition algorithms are used in order to extract the features MFCC and PLP. The segmentation of words to speech sounds is performed on the base of Differential Function method. The extracted features of a sound are classified into either a correct or an incorrect pronunciation class with one of tested methods of pattern recognition. To classify the features, the k-NN, SVN, ANN, and GMM methods are tested.
APA, Harvard, Vancouver, ISO, and other styles
11

Wu, Qiming. "A robust audio-based symbol recognition system using machine learning techniques." University of the Western Cape, 2020. http://hdl.handle.net/11394/7614.

Full text
Abstract:
Masters of Science
This research investigates the creation of an audio-shape recognition system that is able to interpret a user’s drawn audio shapes—fundamental shapes, digits and/or letters— on a given surface such as a table-top using a generic stylus such as the back of a pen. The system aims to make use of one, two or three Piezo microphones, as required, to capture the sound of the audio gestures, and a combination of the Mel-Frequency Cepstral Coefficients (MFCC) feature descriptor and Support Vector Machines (SVMs) to recognise audio shapes. The novelty of the system is in the use of piezo microphones which are low cost, light-weight and portable, and the main investigation is around determining whether these microphones are able to provide sufficiently rich information to recognise the audio shapes mentioned in such a framework.
APA, Harvard, Vancouver, ISO, and other styles
12

Sklar, Alexander Gabriel. "Channel Modeling Applied to Robust Automatic Speech Recognition." Scholarly Repository, 2007. http://scholarlyrepository.miami.edu/oa_theses/87.

Full text
Abstract:
In automatic speech recognition systems (ASRs), training is a critical phase to the system?s success. Communication media, either analog (such as analog landline phones) or digital (VoIP) distort the speaker?s speech signal often in very complex ways: linear distortion occurs in all channels, either in the magnitude or phase spectrum. Non-linear but time-invariant distortion will always appear in all real systems. In digital systems we also have network effects which will produce packet losses and delays and repeated packets. Finally, one cannot really assert what path a signal will take, and so having error or distortion in between is almost a certainty. The channel introduces an acoustical mismatch between the speaker's signal and the trained data in the ASR, which results in poor recognition performance. The approach so far, has been to try to undo the havoc produced by the channels, i.e. compensate for the channel's behavior. In this thesis, we try to characterize the effects of different transmission media and use that as an inexpensive and repeatable way to train ASR systems.
APA, Harvard, Vancouver, ISO, and other styles
13

Candel, Ramón Antonio José. "Verificación automática de locutores aplicando pruebas diagnósticas múltiples en serie y en paralelo basadas en DTW (Dynamic Time Warping) y NFCC (Mel-Frequency Cepstral coefficients)." Doctoral thesis, Universidad de Murcia, 2015. http://hdl.handle.net/10803/300433.

Full text
Abstract:
La presente Tesis Doctoral consiste en el diseño de un sistema capaz de realizar tareas de verificación automática de locutores, para lo cual se basa en el modelado mediante los procedimientos DTW (Dynamic Time Warping) y MFCC (Mel-Frequency Cepstral Coefficients). Una vez diseñado éste, se ha evaluado el sistema de forma tanto a nivel de pruebas individuales, DTW y MFCC por separado, como múltiples, combinación de ambas en serie y en paralelo, para grabaciones obtenidas de la base de datos AHUMADA de la Guardia Civil. Todos los resultados han sido vistos teniendo en cuenta la significación estadística de los mismos, derivada de la realización de un determinado número finito de pruebas. Se han obtenido resultados estadísticos de dicho sistema para diferentes tamaños de las bases de datos utilizadas, lo que nos permite concluir la influencia de estos en el método. Como conclusión a los mismos, podemos identificar cuál es el mejor sistema, compuesto por el tipo de modelo y el tamaño de la muestra, que debemos utilizar en un estudio forense en función de la finalidad perseguida.
The present thesis is the design of a system capable of performing automatic speaker verification, for which is based on modeling using the DTW (Dynamic Time Warping) and procedures MFCC (Mel-Frequency Cepstral Coefficients). Once designed it, we have evaluated the system so both at individual events, DTW and MFCC separately as multiple, combining both in series and in parallel, to recordings obtained from the data base AHUMADA from the Guardia Civil. All results have been seen considering the statistical significance thereof, derived from performing a given finite number of tests. Statistical results have been obtained in such a system for different sizes of the databases used, allowing us to conclude the influence of these in the method in order to fix a priori the different variables of this, in order to make the best possible study. To the same conclusion, we can identify what is the best system, consisting of model type and sample size, we use a forensic study based on the intended purpose.
APA, Harvard, Vancouver, ISO, and other styles
14

Lindstål, Tim, and Daniel Marklund. "Application of LabVIEW and myRIO to voice controlled home automation." Thesis, Uppsala universitet, Signaler och System, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-380866.

Full text
Abstract:
The aim of this project is to use NI myRIO and LabVIEW for voice controlled home automation. The NI myRIO is an embedded device which has a Xilinx FPGA and a dual-core ARM Cortex-A9processor as well as analog input/output and digital input/output, and is programmed with theLabVIEW, a graphical programming language. The voice control is implemented in two differentsystems. The first system is based on an Amazon Echo Dot for voice recognition, which is acommercial smart speaker developed by Amazon Lab126. The Echo Dot devices are connectedvia the Internet to the voice-controlled intelligent personal assistant service known as Alexa(developed by Amazon), which is capable of voice interaction, music playback, and controllingsmart devices for home automation. This system in the present thesis project is more focusingon myRIO used for the wireless control of smart home devices, where smart lamps, sensors,speakers and a LCD-display was implemented. The other system is more focusing on myRIO for speech recognition and was built on myRIOwith a microphone connected. The speech recognition was implemented using mel frequencycepstral coefficients and dynamic time warping. A few commands could be recognized, includinga wake word ”Bosse” as well as other four commands for controlling the colors of a smart lamp. The thesis project is shown to be successful, having demonstrated that the implementation ofhome automation using the NI myRIO with two voice-controlled systems can correctly controlhome devices such as smart lamps, sensors, speakers and a LCD-display.
APA, Harvard, Vancouver, ISO, and other styles
15

Neville, Katrina Lee, and katrina neville@rmit edu au. "Channel Compensation for Speaker Recognition Systems." RMIT University. Electrical and Computer Engineering, 2007. http://adt.lib.rmit.edu.au/adt/public/adt-VIT20080514.093453.

Full text
Abstract:
This thesis attempts to address the problem of how best to remedy different types of channel distortions on speech when that speech is to be used in automatic speaker recognition and verification systems. Automatic speaker recognition is when a person's voice is analysed by a machine and the person's identity is worked out by the comparison of speech features to a known set of speech features. Automatic speaker verification is when a person claims an identity and the machine determines if that claimed identity is correct or whether that person is an impostor. Channel distortion occurs whenever information is sent electronically through any type of channel whether that channel is a basic wired telephone channel or a wireless channel. The types of distortion that can corrupt the information include time-variant or time-invariant filtering of the information or the addition of 'thermal noise' to the information, both of these types of distortion can cause varying degrees of error in information being received and analysed. The experiments presented in this thesis investigate the effects of channel distortion on the average speaker recognition rates and testing the effectiveness of various channel compensation algorithms designed to mitigate the effects of channel distortion. The speaker recognition system was represented by a basic recognition algorithm consisting of: speech analysis, extraction of feature vectors in the form of the Mel-Cepstral Coefficients, and a classification part based on the minimum distance rule. Two types of channel distortion were investigated: • Convolutional (or lowpass filtering) effects • Addition of white Gaussian noise Three different methods of channel compensation were tested: • Cepstral Mean Subtraction (CMS) • RelAtive SpecTrAl (RASTA) Processing • Constant Modulus Algorithm (CMA) The results from the experiments showed that for both CMS and RASTA processing that filtering at low cutoff frequencies, (3 or 4 kHz), produced improvements in the average speaker recognition rates compared to speech with no compensation. The levels of improvement due to RASTA processing were higher than the levels achieved due to the CMS method. Neither the CMS or RASTA methods were able to improve accuracy of the speaker recognition system for cutoff frequencies of 5 kHz, 6 kHz or 7 kHz. In the case of noisy speech all methods analysed were able to compensate for high SNR of 40 dB and 30 dB and only RASTA processing was able to compensate and improve the average recognition rate for speech corrupted with a high level of noise (SNR of 20 dB and 10 dB).
APA, Harvard, Vancouver, ISO, and other styles
16

Hrabina, Martin. "VÝVOJ ALGORITMŮ PRO ROZPOZNÁVÁNÍ VÝSTŘELŮ." Doctoral thesis, Vysoké učení technické v Brně. Fakulta elektrotechniky a komunikačních technologií, 2019. http://www.nusl.cz/ntk/nusl-409087.

Full text
Abstract:
Táto práca sa zaoberá rozpoznávaním výstrelov a pridruženými problémami. Ako prvé je celá vec predstavená a rozdelená na menšie kroky. Ďalej je poskytnutý prehľad zvukových databáz, významné publikácie, akcie a súčasný stav veci spoločne s prehľadom možných aplikácií detekcie výstrelov. Druhá časť pozostáva z porovnávania príznakov pomocou rôznych metrík spoločne s porovnaním ich výkonu pri rozpoznávaní. Nasleduje porovnanie algoritmov rozpoznávania a sú uvedené nové príznaky použiteľné pri rozpoznávaní. Práca vrcholí návrhom dvojstupňového systému na rozpoznávanie výstrelov, monitorujúceho okolie v reálnom čase. V závere sú zhrnuté dosiahnuté výsledky a načrtnutý ďalší postup.
APA, Harvard, Vancouver, ISO, and other styles
17

Zezula, Miroslav. "Online detekce jednoduchých příkazů v audiosignálu." Master's thesis, Vysoké učení technické v Brně. Fakulta strojního inženýrství, 2011. http://www.nusl.cz/ntk/nusl-229484.

Full text
Abstract:
This thesis describes the development of voice module, that can recognize simple speech commands by comparation of input sound with recorded templates. The first part of thesis contains a description of used algorithm and a verification of its functionality. The algorithm is based on Mel-frequency cepstral coefficients and dynamic time warping. Thereafter the hardware of voice module is designed, containing signal controller 56F805 from Freescale. The signal from microphone is conditioned by operational amplifiers and digital filter. The third part deals with the development of software for the controller and describes the fixed point implementation of the algorithm, respecting limited capabilities of the controller. Final test proves the usability of voice module in low-noise environment.
APA, Harvard, Vancouver, ISO, and other styles
18

Alsouda, Yasser. "An IoT Solution for Urban Noise Identification in Smart Cities : Noise Measurement and Classification." Thesis, Linnéuniversitetet, Institutionen för fysik och elektroteknik (IFE), 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:lnu:diva-80858.

Full text
Abstract:
Noise is defined as any undesired sound. Urban noise and its effect on citizens area significant environmental problem, and the increasing level of noise has become a critical problem in some cities. Fortunately, noise pollution can be mitigated by better planning of urban areas or controlled by administrative regulations. However, the execution of such actions requires well-established systems for noise monitoring. In this thesis, we present a solution for noise measurement and classification using a low-power and inexpensive IoT unit. To measure the noise level, we implement an algorithm for calculating the sound pressure level in dB. We achieve a measurement error of less than 1 dB. Our machine learning-based method for noise classification uses Mel-frequency cepstral coefficients for audio feature extraction and four supervised classification algorithms (that is, support vector machine, k-nearest neighbors, bootstrap aggregating, and random forest). We evaluate our approach experimentally with a dataset of about 3000 sound samples grouped in eight sound classes (such as car horn, jackhammer, or street music). We explore the parameter space of the four algorithms to estimate the optimal parameter values for the classification of sound samples in the dataset under study. We achieve noise classification accuracy in the range of 88% – 94%.
APA, Harvard, Vancouver, ISO, and other styles
19

Hrušovský, Enrik. "Automatická klasifikace výslovnosti hlásky R." Master's thesis, Vysoké učení technické v Brně. Fakulta elektrotechniky a komunikačních technologií, 2018. http://www.nusl.cz/ntk/nusl-377664.

Full text
Abstract:
This diploma thesis deals with automatic clasification of vowel R. Purpose of this thesis is to made program for detection of pronounciation of speech defects at vowel R in children. In thesis are processed parts as speech creation, speech therapy, dyslalia and subsequently speech signal processing and analysis methods. In the last part is designed software for automatic detection of pronounciation of vowel R. For recognition of pronounciation is used algorithm MFCC for extracting features. This features are subsequently classified by neural network to the group of correct or incorrect pronounciation and is evaluated classification success.
APA, Harvard, Vancouver, ISO, and other styles
20

Kufa, Tomáš. "Rozpoznáváni standardních PILOT-CONTROLLER řídicích povelů v hlasové podobě." Master's thesis, Vysoké učení technické v Brně. Fakulta elektrotechniky a komunikačních technologií, 2009. http://www.nusl.cz/ntk/nusl-217849.

Full text
Abstract:
The subject of this graduation thesis is an application of speech recognition into ATC commands. The selection of methods and approaches to automatic recognition of ATC commands rises from detailed air traffic studies. By the reason that there is not any definite solution in such extensive field like speech recognition, this diploma work is focused just on speech recognizer based on comparison with templates (DTW). This recognizor is in this thesis realized and compared with freely accessible HTK system from Cambrige University based on statistic methods making use of Hidden Markov models. The usage propriety of both methods is verified by practical testing and results evaluation.
APA, Harvard, Vancouver, ISO, and other styles
21

Dušil, Lubomír. "Automatické rozpoznávání logopedických vad v řečovém projevu." Master's thesis, Vysoké učení technické v Brně. Fakulta elektrotechniky a komunikačních technologií, 2009. http://www.nusl.cz/ntk/nusl-218161.

Full text
Abstract:
The thesis is aimed at an analysis and automatic detection of logopaedic defects in speech utterance. Its objective is to facilitate and accelerate the work of logopaedists and to increase percentage of detected logopaedic defects in children of the youngest possible age followed by the most successful treatment. It presents methods of speech work, classification of the defects within individual stages of child development and appropriate words for identification of the speech defects and their subsequent remedy. After that there are analyses of methods of calculating coefficients which reflect human speech best. Also classifiers which are used to discern and determine whether it is a speech defect or not. Classifiers exploit coefficients for their work. Coefficients and classifiers are being tested and their best combination is being looked for in order to achieve the highest possible success rate of the automatic detection of the speech defects. All the programming and testing jobs has been conducted in the Matlab programme.
APA, Harvard, Vancouver, ISO, and other styles
22

Лавриненко, Олександр Юрійович, Александр Юрьевич Лавриненко, and Oleksandr Lavrynenko. "Методи підвищення ефективності семантичного кодування мовних сигналів." Thesis, Національний авіаційний університет, 2021. https://er.nau.edu.ua/handle/NAU/52212.

Full text
Abstract:
Дисертаційна робота присвячена вирішенню актуальної науково-практичної проблеми в телекомунікаційних системах, а саме підвищення пропускної здатності каналу передачі семантичних мовних даних за рахунок ефективного їх кодування, тобто формулюється питання підвищення ефективності семантичного кодування, а саме – з якою мінімальною швидкістю можливо кодувати семантичні ознаки мовних сигналів із заданою ймовірністю безпомилкового їх розпізнавання? Саме на це питання буде дана відповідь у даному науковому дослідженні, що є актуальною науково-технічною задачею враховуючи зростаючу тенденцію дистанційної взаємодії людей і роботизованої техніки за допомогою мови, де безпомилковість функціонування даного типу систем безпосередньо залежить від ефективності семантичного кодування мовних сигналів. У роботі досліджено відомий метод підвищення ефективності семантичного кодування мовних сигналів на основі мел-частотних кепстральних коефіцієнтів, який полягає в знаходженні середніх значень коефіцієнтів дискретного косинусного перетворення прологарифмованої енергії спектра дискретного перетворення Фур'є обробленого трикутним фільтром в мел-шкалі. Проблема полягає в тому, що представлений метод семантичного кодування мовних сигналів на основі мел-частотних кепстральних коефіцієнтів не дотримується умови адаптивності, тому було сформульовано основну наукову гіпотезу дослідження, яка полягає в тому що підвищити ефективність семантичного кодування мовних сигналів можливо за рахунок використання адаптивного емпіричного вейвлет-перетворення з подальшим застосуванням спектрального аналізу Гільберта. Під ефективністю кодування розуміється зниження швидкості передачі інформації із заданою ймовірністю безпомилкового розпізнавання семантичних ознак мовних сигналів, що дозволить значно знизити необхідну смугу пропускання, тим самим підвищуючи пропускну здатність каналу зв'язку. У процесі доведення сформульованої наукової гіпотези дослідження були отримані наступні результати: 1) вперше розроблено метод семантичного кодування мовних сигналів на основі емпіричного вейвлетперетворення, який відрізняється від існуючих методів побудовою множини адаптивних смугових вейвлет-фільтрів Мейера з подальшим застосуванням спектрального аналізу Гільберта для знаходження миттєвих амплітуд і частот функцій внутрішніх емпіричних мод, що дозволить визначити семантичні ознаки мовних сигналів та підвищити ефективність їх кодування; 2) вперше запропоновано використовувати метод адаптивного емпіричного вейвлет-перетворення в задачах кратномасштабного аналізу та семантичного кодування мовних сигналів, що дозволить підвищити ефективність спектрального аналізу за рахунок розкладання високочастотного мовного коливання на його низькочастотні складові, а саме внутрішні емпіричні моди; 3) отримав подальший розвиток метод семантичного кодування мовних сигналів на основі мел-частотних кепстральних коефіцієнтів, але з використанням базових принципів адаптивного спектрального аналізу за допомогою емпіричного вейвлет-перетворення, що підвищує ефективність даного методу.
The thesis is devoted to the solution of the actual scientific and practical problem in telecommunication systems, namely increasing the bandwidth of the semantic speech data transmission channel due to their efficient coding, that is the question of increasing the efficiency of semantic coding is formulated, namely – at what minimum speed it is possible to encode semantic features of speech signals with the set probability of their error-free recognition? It is on this question will be answered in this research, which is an urgent scientific and technical task given the growing trend of remote human interaction and robotic technology through speech, where the accurateness of this type of system directly depends on the effectiveness of semantic coding of speech signals. In the thesis the well-known method of increasing the efficiency of semantic coding of speech signals based on mel-frequency cepstral coefficients is investigated, which consists in finding the average values of the coefficients of the discrete cosine transformation of the prologarithmic energy of the spectrum of the discrete Fourier transform treated by a triangular filter in the mel-scale. The problem is that the presented method of semantic coding of speech signals based on mel-frequency cepstral coefficients does not meet the condition of adaptability, therefore the main scientific hypothesis of the study was formulated, which is that to increase the efficiency of semantic coding of speech signals is possible through the use of adaptive empirical wavelet transform followed by the use of Hilbert spectral analysis. Coding efficiency means a decrease in the rate of information transmission with a given probability of error-free recognition of semantic features of speech signals, which will significantly reduce the required passband, thereby increasing the bandwidth of the communication channel. In the process of proving the formulated scientific hypothesis of the study, the following results were obtained: 1) the first time the method of semantic coding of speech signals based on empirical wavelet transform is developed, which differs from existing methods by constructing a sets of adaptive bandpass wavelet-filters Meyer followed by the use of Hilbert spectral analysis for finding instantaneous amplitudes and frequencies of the functions of internal empirical modes, which will determine the semantic features of speech signals and increase the efficiency of their coding; 2) the first time it is proposed to use the method of adaptive empirical wavelet transform in problems of multiscale analysis and semantic coding of speech signals, which will increase the efficiency of spectral analysis due to the decomposition of high-frequency speech oscillations into its low-frequency components, namely internal empirical modes; 3) received further development the method of semantic coding of speech signals based on mel-frequency cepstral coefficients, but using the basic principles of adaptive spectral analysis with the application empirical wavelet transform, which increases the efficiency of this method. Conducted experimental research in the software environment MATLAB R2020b showed, that the developed method of semantic coding of speech signals based on empirical wavelet transform allows you to reduce the encoding speed from 320 to 192 bit/s and the required passband from 40 to 24 Hz with a probability of error-free recognition of about 0.96 (96%) and a signal-to-noise ratio of 48 dB, according to which its efficiency increases 1.6 times in contrast to the existing method. The results obtained in the thesis can be used to build systems for remote interaction of people and robotic equipment using speech technologies, such as speech recognition and synthesis, voice control of technical objects, low-speed encoding of speech information, voice translation from foreign languages, etc.
APA, Harvard, Vancouver, ISO, and other styles
23

Sujatha, J. "Improved MFCC Front End Using Spectral Maxima For Noisy Speech Recognition." Thesis, 2005. https://etd.iisc.ac.in/handle/2005/1506.

Full text
APA, Harvard, Vancouver, ISO, and other styles
24

Sujatha, J. "Improved MFCC Front End Using Spectral Maxima For Noisy Speech Recognition." Thesis, 2005. http://etd.iisc.ernet.in/handle/2005/1506.

Full text
APA, Harvard, Vancouver, ISO, and other styles
25

(6642491), Jingzhao Dai. "SPARSE DISCRETE WAVELET DECOMPOSITION AND FILTER BANK TECHNIQUES FOR SPEECH RECOGNITION." Thesis, 2019.

Find full text
Abstract:

Speech recognition is widely applied to translation from speech to related text, voice driven commands, human machine interface and so on [1]-[8]. It has been increasingly proliferated to Human’s lives in the modern age. To improve the accuracy of speech recognition, various algorithms such as artificial neural network, hidden Markov model and so on have been developed [1], [2].

In this thesis work, the tasks of speech recognition with various classifiers are investigated. The classifiers employed include the support vector machine (SVM), k-nearest neighbors (KNN), random forest (RF) and convolutional neural network (CNN). Two novel features extraction methods of sparse discrete wavelet decomposition (SDWD) and bandpass filtering (BPF) based on the Mel filter banks [9] are developed and proposed. In order to meet diversity of classification algorithms, one-dimensional (1D) and two-dimensional (2D) features are required to be obtained. The 1D features are the array of power coefficients in frequency bands, which are dedicated for training SVM, KNN and RF classifiers while the 2D features are formed both in frequency domain and temporal variations. In fact, the 2D feature consists of the power values in decomposed bands versus consecutive speech frames. Most importantly, the 2D feature with geometric transformation are adopted to train CNN.

Speech recognition including males and females are from the recorded data set as well as the standard data set. Firstly, the recordings with little noise and clear pronunciation are applied with the proposed feature extraction methods. After many trials and experiments using this dataset, a high recognition accuracy is achieved. Then, these feature extraction methods are further applied to the standard recordings having random characteristics with ambient noise and unclear pronunciation. Many experiment results validate the effectiveness of the proposed feature extraction techniques.

APA, Harvard, Vancouver, ISO, and other styles
26

Tang, Chu-Liang, and 唐曲亮. "Improved Mel Frequency Cepstral Coefficients Combined with Multiple Speech Features." Thesis, 2015. http://ndltd.ncl.edu.tw/handle/57856949340151071584.

Full text
Abstract:
碩士
國立中央大學
電機工程學系
103
This thesis studies the speech feature extracting and feature compensation in speech recognition. Several speech features are selected for combinations. The best one is cascading Linear Prediction Cepstral Coefficients (LPCC) and Mel-Frequency Cepstral Coefficient (MFCC). The MFCCs used here are obtained by utilizing a Gaussian Mel-Frequency band instead of using a triangular filter bank. And by experiments, it is found that the best combination ratio of LPCC and MFCC is 1:1. The thesis also showed that further improved performance is possible if Cepstral Mean and Variance Normalization (CMVN) is added.
APA, Harvard, Vancouver, ISO, and other styles
27

Kuo, Yo-zhen, and 郭又禎. "Improved Mel-scale Frequency Cepstral Coefficients for Keyword Spotting Technique." Thesis, 2014. http://ndltd.ncl.edu.tw/handle/27592493670347223949.

Full text
Abstract:
碩士
國立中央大學
電機工程學系
102
In the speech recognition system, Mel frequency cepstral coefficients (MFCCs) are the feature parameters that are used widely. Because of the wide applications of MFCC in the audio signal processing, lots of studies on the improvement of MFCCs were presented. In this study, we use particle swarm optimization algorithm to optimize the weight of MFCC filter bank. We utilize the difference between voice training database’s energy statistical curve and MFCC filter bank’s envelope as fitness function. Experimental results show that the proposed MFCCs method improves the recognition rate. In noisy environment experiments, the presented MFCCs method also improves the recognition performance.
APA, Harvard, Vancouver, ISO, and other styles
28

Lin, Shih-Fen, and 林士棻. "Bird songs recognition using two-dimensional Mel-scale frequency cepstral coefficients." Thesis, 2006. http://ndltd.ncl.edu.tw/handle/94553686394732089037.

Full text
APA, Harvard, Vancouver, ISO, and other styles
29

林士棻. "Bird songs recognition using two-dimensional Mel-scale frequency cepstral coefficients." Thesis, 2006. http://ndltd.ncl.edu.tw/handle/38302762655714685237.

Full text
Abstract:
碩士
中華大學
資訊工程學系(所)
94
We propose a method to automatically identify birds from their sounds in this paper. First, each syllable corresponding to a piece of vocalization is segmented. The average LPCC (ALPCC), average MFCC (AMFCC), Static MFCC (SMFCC), Two-dimensional MFCC (TDMFCC), Dynamic two-dimensional MFCC (DTDMFCC) and TDMFCC+DTDMFCC over all frames in a syllable are calculated as the vocalization features. Linear discriminant analysis (LDA) is exploited to increase the classification accuracy at a lower dimensional feature vector space. A clustering algorithm, called progressive constructive clustering (PCC) algorithm, is used to divide the feature vectors which were computed from the same bird species into several subclasses. In our experiments, TDMFCC+DTDMFCC can achieve average classification accuracy 90% and 89% for 420 bird species and 561 bird species.
APA, Harvard, Vancouver, ISO, and other styles
30

HUANG, CHUAN-HAO, and 黃川豪. "Multi-feature Speaker Verification Based on Mel-frequency cepstral coefficients and Formants." Thesis, 2019. http://ndltd.ncl.edu.tw/handle/4nbqev.

Full text
APA, Harvard, Vancouver, ISO, and other styles
31

Xu, Sheng-Bin, and 徐勝斌. "Continuous Birdsong Recognition Using Dynamic and Temporal Two-Dimensional Mel-Frequency Cepstral Coefficients." Thesis, 2009. http://ndltd.ncl.edu.tw/handle/21749503795140776068.

Full text
Abstract:
碩士
中華大學
資訊工程學系(所)
97
In this paper, we will propose an approach for the classification of bird species using fixed-duration sound segments extracted from continuous birdsong recordings. First, each sound segment is divided into a number of overlapped texture windows. Each texture window will be individually classified and then a fusion approach is employed to determine the classification result of the input segment. The features derived from static, transitional, and temporal information of two-dimensional Mel-frequency cepstral coefficients (TDMFCC) will be extracted for the classification of each texture window. TDMFCC can describe both static and dynamic characteristics of a texture window, and dynamic TDMFCC (DTDMFCC) is used to describe sharp transitions within a texture window, and global dynamic TDMFCC (GDTDMFCC) is developed to describe long-time temporal variations in a texture window, and the concepts of DTDMFCC, which computes local regression coefficients, and GDTDMFCC, which evaluates global contrast information, can be integrated to form a new feature vector, called global and local DTDMFCC (GLDTDMFCC). Furthermore, we use principal component analysis (PCA) to reduce the feature dimension, Gaussian mixture models (GMM) to model the sound of different bird species, and linear discriminant analysis (LDA) to improve the classification accuracy at a lower dimensional feature vector space. In our experiment, the highest average classification accuracy is 94.62% for the classification of 28 kinds of bird species.
APA, Harvard, Vancouver, ISO, and other styles
32

CHIANG, MING-DA, and 蔣明達. "Speaker Recognition Using Mel-Scale Frequency Cepstral Coefficients by Time Domain Filtering method." Thesis, 2008. http://ndltd.ncl.edu.tw/handle/13444981721982290438.

Full text
Abstract:
碩士
中華技術學院
電子工程研究所碩士班
96
ABSTRACT According to past papers, we find that the algorithm based on Mel-frequency cepstral coefficients (MFCCs) has a better performance than any other algorithms which based on the other feature parameters [1-7]. The Mel-frequency cepstral coefficients are taken by following procedures, including: framing, multiplied by the Hamming Window, taking the fast Fourier transform (FFT), filtered in frequency domain by Mel-frequency triangular filter bank, calculating the logarithmic energy of filter outputs, and taking discrete cosine transform (DCT) to obtain the Mel-frequency cepstral coefficients. In this paper, for finding the Mel-frequency cepstral coefficients, the conventional frequency domain filtering procedures [1] are replaced by our new directly time domain filtering procedures proposed in this thesis. The simulation results show that the performances between our new method and the previous approach [1] are quite similar.
APA, Harvard, Vancouver, ISO, and other styles
33

Lin, Bo-Zhi, and 林柏志. "Speaker Recognition Algorithm Using Mel-Scale Frequency Cepstral Coefficients with Two Stages Linear Prediction Filters." Thesis, 2006. http://ndltd.ncl.edu.tw/handle/18209732501243789128.

Full text
Abstract:
碩士
中華技術學院
電子工程研究所碩士班
94
The development of computer and communication technologies hastens the application requirements of speaker recognition and speech recognition. The purpose of this paper is to present a new algorithm to promote the performance of speaker recognition. The algorithm uses two stages linear prediction error filters to estimate the spectrogram of the processed speech signal. Then, the algorithm uses Mel-scale triangle bandpass filters bank to obtain the Mel-scale frequency cepstral coefficients(MFCC)to build the needed Gaussian mixture model for speaker recognition. To verify that the algorithm can work well and to compare the performance with the other algorithms, we use the mandarin speech data base, MAT-400, which was bought from the Association for Computational Linguistics and Chinese Language Processing. The experimental results show that the proposed algorithm has the best performance in the case of higher signal-to-noise ratio.
APA, Harvard, Vancouver, ISO, and other styles
34

Yang-Ming, Cheng, and 鄭陽銘. "A Mel-Scale Frequency Cepstral Coefficients Speaker Recognition Algorithm Based on Linear Prediction Spectrum Estimation." Thesis, 2005. http://ndltd.ncl.edu.tw/handle/38345345070598427641.

Full text
Abstract:
碩士
中華技術學院
電子工程研究所碩士班
93
According to the past research, we know that the spectrum estimation based on linear prediction is more robust than the spectrum estimation based on FFT in the case of lower SNR. In this paper, we propose a new speaker identification algorithm based on linear prediction spectrum estimation. In this algorithm, the spectrum estimation algorithm based on short time faster Furrier transform is replaced by the linear prediction spectrum estimation algorithm, then, the Mel-scale frequency cepstral coefficients are obtained by using the Mel-scale frequency triangle filter-bank. Experimental results show that the new algorithm have better performance than the algorithm based on FFT in the case of lower SNR.
APA, Harvard, Vancouver, ISO, and other styles
35

Chu, Feng-Seng, and 朱峰森. "Improved Approaches of Processing Perceptual Linear Prediction(PLP)and Mel Frequency Cepstrum Coefficient(MFCC)Parameters for Robust Speech Recognition." Thesis, 2005. http://ndltd.ncl.edu.tw/handle/26578739886453071884.

Full text
APA, Harvard, Vancouver, ISO, and other styles
36

Bowman, Casady. "Perceiving Emotion in Sounds: Does Timbre Play a Role?" Thesis, 2011. http://hdl.handle.net/1969.1/ETD-TAMU-2011-12-10656.

Full text
Abstract:
Acoustic features of sound such as pitch, loudness, perceived duration and timbre have been shown to be related to emotion in regard to sound, demonstrating that an important connection between the perceived emotions and their timbres is lacking. This study investigates the relationship between acoustic features of sound and emotion in regard to timbre. In two experiments we investigated whether particular acoustic components of sound can predict timbre, and particular categories of emotion, and how these attributes are related. Two behavioral experiments related perceived emotion ratings with synthetically created sounds and International Affective Digitized Sounds (Bradley & Lang, 2007) sounds. Also, two timbre experiments found acoustic components of synthetically created sounds, and IADS. Regression analyses uncovered some relationships between emotion, timbre, and acoustic features of sound. Results indicate that emotion is perceived differently for synthetic instrumental sounds and IADS. Mel-frequency cepstral coefficients were a strong predictor of perceived emotion of instrumental sounds; however, this was not the case for the IADS. This difference lends itself to the idea that there is a strong relationship between emotion and timbre for instrumental sounds, perhaps in part because of their relationship to speech and the way these different sounds are processed.
APA, Harvard, Vancouver, ISO, and other styles
37

Wu, Sunrise, and 吳尚叡. "Design Time Domain Filter Banks Using Least Squares Method to Calculate the Mel-Frequency Cepstral Coefficients for Speaker Recognition." Thesis, 2008. http://ndltd.ncl.edu.tw/handle/08178129842426697899.

Full text
Abstract:
碩士
中華技術學院
電子工程研究所碩士班
96
Up to now, the best speaker recognition technique is based on Mel-frequency cepstral coefficients (MFCCs) [1-4,11] method. The main procedures on taking MFCCs are undergone by: framing, Hamming windowing, multiplied by FFT(Fast Fourier Transform)[7], filtered by Mel-scale triangular filter bank, taken the logarithmic energies of outputs, and transformed by DCT (Discrete Cosine Transform)[1-8]. After these processes, the MFCCs are obtained. The main topic of this thesis is we replace previous procedures of FFT [7] and filtering using a frequency-domain Mel-scale triangular filter bank[15] by filtering using a time-domain Mel-scale triangular filter bank. The time-domain Mel-scale triangular filter bank[1-8,14] we mentioned is obtained by the least square method[10,13], which is used to obtain the Mel-frequency cepstral coefficients of speaker speeches. From the results of our experiments, we find that the successful speaker recognition ratios between the conventional MFCC method[2,3,6,14] and our new approach are very similar.
APA, Harvard, Vancouver, ISO, and other styles
38

Yuan, Hor, and 原禾. "Design Time Domain Filter Banks Using Least Squares Method to Calculate the Mel-Frequency Cepstral Coefficients for Non-Continuous Speech Recognition." Thesis, 2009. http://ndltd.ncl.edu.tw/handle/76162451347630250736.

Full text
Abstract:
碩士
中華技術學院
電子工程研究所碩士班
97
In speech recognition, the Mel frequency cepstral coefficients (MFCC) is currently popular to be used in speech recognition and speaker recognition[2,8-11,14,15]. To obtain the MFCC, the main procedures are filtering the speech signal by a set of triangular Mel-scale Filter Bank in the frequency domain to obtain the logarithm of the output powers of filter bank, and then taking Discrete Cosine Transform to obtain the MFCC. In this paper, the frequency domain triangular Mel-scale filter bank is replaced by a new designed time domain triangular Mel-scale filter bank. The experimental results show that the performances of speech recognition algorithms between that extracting MFCC using the conventional triangular Mel-scale filter bank and that extracting MFCC using the new designed time domain Mel-scale filter bank are very similar.
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography