Academic literature on the topic 'Speech Recognition (SR). MFCC (Mel Frequency Cepstral Coefficient)'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the lists of relevant articles, books, theses, conference reports, and other scholarly sources on the topic 'Speech Recognition (SR). MFCC (Mel Frequency Cepstral Coefficient).'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Journal articles on the topic "Speech Recognition (SR). MFCC (Mel Frequency Cepstral Coefficient)"

1

Al-Karawi, Khamis A. "Robustness Speaker Recognition Based on Feature Space in Clean and Noisy Condition." International Journal of Sensors, Wireless Communications and Control 9, no. 4 (2019): 497–506. http://dx.doi.org/10.2174/2210327909666181219143918.

Full text
Abstract:
Background & Objective: Speaker Recognition (SR) techniques have been developed into a relatively mature status over the past few decades through development work. Existing methods typically use robust features extracted from clean speech signals, and therefore in idealized conditions can achieve very high recognition accuracy. For critical applications, such as security and forensics, robustness and reliability of the system are crucial. Methods: The background noise and reverberation as often occur in many real-world applications are known to compromise recognition performance. To improve the performance of speaker verification systems, an effective and robust technique is proposed to extract features for speech processing, capable of operating in the clean and noisy condition. Mel Frequency Cepstrum Coefficients (MFCCs) and Gammatone Frequency Cepstral Coefficients (GFCC) are the mature techniques and the most common features, which are used for speaker recognition. MFCCs are calculated from the log energies in frequency bands distributed over a mel scale. While GFCC has been acquired from a bank of Gammatone filters, which was originally suggested to model human cochlear filtering. This paper investigates the performance of GFCC and the conventional MFCC feature in clean and noisy conditions. The effects of the Signal-to-Noise Ratio (SNR) and language mismatch on the system performance have been taken into account in this work. Conclusion: Experimental results have shown significant improvement in system performance in terms of reduced equal error rate and detection error trade-off. Performance in terms of recognition rates under various types of noise, various Signal-to-Noise Ratios (SNRs) was quantified via simulation. Results of the study are also presented and discussed.
APA, Harvard, Vancouver, ISO, and other styles
2

Nikita, Dhanvijay *. Prof. P. R. Badadapure. "HINDI SPEECH RECOGNITION SYSTEM USING MFCC AND HTK TOOLKIT." INTERNATIONAL JOURNAL OF ENGINEERING SCIENCES & RESEARCH TECHNOLOGY 5, no. 12 (2016): 690–95. https://doi.org/10.5281/zenodo.212079.

Full text
Abstract:
This paper presents the approach for Hindi fruit name recognizer system. Every person has its uniqueness in his speech. So in this approach the database speech samples are collected from different 20 speakers with two iterations. These recordings are used to train by acoustic model. This model is trained on 20 speaker database having vocabulary size is 45 words. HTK toolkit is used to train the input data and evaluation of the results. The proposed system gives a recognition rate of 94.28% for sentence and 98.09 for word level.
APA, Harvard, Vancouver, ISO, and other styles
3

Rudramurthy, M. S., V. Kamakshi Prasad, and R. Kumaraswamy. "Speaker Verification Under Degraded Conditions Using Empirical Mode Decomposition Based Voice Activity Detection Algorithm." Journal of Intelligent Systems 23, no. 4 (2014): 359–78. http://dx.doi.org/10.1515/jisys-2013-0085.

Full text
Abstract:
AbstractThe performance of most of the state-of-the-art speaker recognition (SR) systems deteriorates under degraded conditions, owing to mismatch between the training and testing sessions. This study focuses on the front end of the speaker verification (SV) system to reduce the mismatch between training and testing. An adaptive voice activity detection (VAD) algorithm using zero-frequency filter assisted peaking resonator (ZFFPR) was integrated into the front end of the SV system. The performance of this proposed SV system was studied under degraded conditions with 50 selected speakers from the NIST 2003 database. The degraded condition was simulated by adding different types of noises to the original speech utterances. The different types of noises were chosen from the NOISEX-92 database to simulate degraded conditions at signal-to-noise ratio levels from 0 to 20 dB. In this study, widely used 39-dimension Mel frequency cepstral coefficient (MFCC; i.e., 13-dimension MFCCs augmented with 13-dimension velocity and 13-dimension acceleration coefficients) features were used, and Gaussian mixture model–universal background model was used for speaker modeling. The proposed system’s performance was studied against the energy-based VAD used as the front end of the SV system. The proposed SV system showed some encouraging results when EMD-based VAD was used at its front end.
APA, Harvard, Vancouver, ISO, and other styles
4

Isaac, Samson, Khalid Haruna, Muhammad Aminu Ahmad, and Rabi Mustapha. "DEEP REINFORCEMENT LEARNING WITH HIDDEN MARKOV MODEL FOR SPEECH RECOGNITION." JOURNAL OF TECHNOLOGY & INNOVATION 3, no. 1 (2023): 01–05. http://dx.doi.org/10.26480/jtin.01.2023.01.05.

Full text
Abstract:
Nowadays, many applications uses speech recognition especially the field of computer science and electronics, Speech Recognition (SR) is the interpretation of words spoken into a text. It is also known as Speech-To-Text (STT) or Automatic-Speech-Recognition(ASR), or just Word-Recognition(WR). The Hidden-Markov-Model (HMM) is a type of Markov model, which means that the future state of the model depends on the current state, not on the entire history of the system and the goal of HMM is to learn a sequence of hidden states from a set of known states. The Long-Short-Time-Memory (LSTM) network is a type of Recurrent Neural Network (RNN) that can learn long-term dependencies between time steps of sequence data. The LSTM network is trained by the network in order to predict the values of subsequent time steps in a series-to-series regression. Deep Neural Network (DNN) models are better classifiers than Gaussian Mixture Models (GMMs), they can generalize much better with a smaller number of parameters over complex distributions. They model distributions of different classes jointly, called “distributed” learning, or, more properly “tied” learning. This work is aimed at developing a speech recognition model that will predict isolated speech of some selected fruits in Hausa, Igbo and Yoruba language by using the predicting power of Mel-Frequency-Cepstral-Coefficient (MFCC), LSTM and HMM algorithms. The findings of the study would improve the development of better automatic speech applications systems and would benefit the academic and research community in the field of Natural Language Processing.
APA, Harvard, Vancouver, ISO, and other styles
5

Li, Guan Yu, Hong Zhi Yu, Yong Hong Li, and Ning Ma. "Features Extraction for Lhasa Tibetan Speech Recognition." Applied Mechanics and Materials 571-572 (June 2014): 205–8. http://dx.doi.org/10.4028/www.scientific.net/amm.571-572.205.

Full text
Abstract:
Speech feature extraction is discussed. Mel frequency cepstral coefficients (MFCC) and perceptual linear prediction coefficient (PLP) method is analyzed. These two types of features are extracted in Lhasa large vocabulary continuous speech recognition system. Then the recognition results are compared.
APA, Harvard, Vancouver, ISO, and other styles
6

Mahalakshmi, P. "A REVIEW ON VOICE ACTIVITY DETECTION AND MEL-FREQUENCY CEPSTRAL COEFFICIENTS FOR SPEAKER RECOGNITION (TREND ANALYSIS)." Asian Journal of Pharmaceutical and Clinical Research 9, no. 9 (2016): 360. http://dx.doi.org/10.22159/ajpcr.2016.v9s3.14352.

Full text
Abstract:
ABSTRACTObjective: The objective of this review article is to give a complete review of various techniques that are used for speech recognition purposes overtwo decades.Methods: VAD-Voice Activity Detection, SAD-Speech Activity Detection techniques are discussed that are used to distinguish voiced from unvoicedsignals and MFCC- Mel Frequency Cepstral Coefficient technique is discussed which detects specific features.Results: The review results show that research in MFCC has been dominant in signal processing in comparison to VAD and other existing techniques.Conclusion: A comparison of different speaker recognition techniques that were used previously were discussed and those in current research werealso discussed and a clear idea of the better technique was identified through the review of multiple literature for over two decades.Keywords: Cepstral analysis, Mel-frequency cepstral coefficients, signal processing, speaker recognition, voice activity detection.
APA, Harvard, Vancouver, ISO, and other styles
7

Dua, Mohit, Rajesh Kumar Aggarwal, and Mantosh Biswas. "Optimizing Integrated Features for Hindi Automatic Speech Recognition System." Journal of Intelligent Systems 29, no. 1 (2018): 959–76. http://dx.doi.org/10.1515/jisys-2018-0057.

Full text
Abstract:
Abstract An automatic speech recognition (ASR) system translates spoken words or utterances (isolated, connected, continuous, and spontaneous) into text format. State-of-the-art ASR systems mainly use Mel frequency (MF) cepstral coefficient (MFCC), perceptual linear prediction (PLP), and Gammatone frequency (GF) cepstral coefficient (GFCC) for extracting features in the training phase of the ASR system. Initially, the paper proposes a sequential combination of all three feature extraction methods, taking two at a time. Six combinations, MF-PLP, PLP-MFCC, MF-GFCC, GF-MFCC, GF-PLP, and PLP-GFCC, are used, and the accuracy of the proposed system using all these combinations was tested. The results show that the GF-MFCC and MF-GFCC integrations outperform all other proposed integrations. Further, these two feature vector integrations are optimized using three different optimization methods, particle swarm optimization (PSO), PSO with crossover, and PSO with quadratic crossover (Q-PSO). The results demonstrate that the Q-PSO-optimized GF-MFCC integration show significant improvement over all other optimized combinations.
APA, Harvard, Vancouver, ISO, and other styles
8

Abbasi, Muhammad Daud, Zubair Sajid, Shahzad Karim Khawer, Syed Zain Mir, Abdul Basit, and Muhammad Kashif. "Automatic Speech Recognition by Using Neural Network Based on Mel Frequency Cepstral Coefficient." Asian Bulletin of Big Data Management 5, no. 2 (2025): 63–85. https://doi.org/10.62019/vs3esy64.

Full text
Abstract:
This paper deliberated and estimated the Neural Networks Automatic Speech Recognition (ASR) system based on an isolated small vocabulary speaker-independent manual cropping technique, from the training stage to the recognition stage. Besides this, the paper also examines three distinct blocks of speech recognition, i.e., Speech Preprocessor, Feature Extractor, and a Recognizer. Speech preprocessing involves windowing, framing, Short Term and Zero Crossing threshold energy, and End Point Detection calculation. Mel Frequency Cepstral Coefficients (MFCC) are extracted to represent the speech signal in frames and then passed through a Mel frequency filter. Multi-layer feed-forward network trained by the back-propagation method.
APA, Harvard, Vancouver, ISO, and other styles
9

Bhuvaneshwari, Jolad*1 &. Dr. Rajashri Khanai2. "DIFFERENT FEATURE EXTRACTION TECHNIQUES FOR AUTOMATIC SPEECH RECOGNITION: A REVIEW." INTERNATIONAL JOURNAL OF ENGINEERING SCIENCES & RESEARCH TECHNOLOGY 7, no. 2 (2018): 181–88. https://doi.org/10.5281/zenodo.1165872.

Full text
Abstract:
Automatic speech recognition, which allows a usual and user-friendly communication technique among individual and device, is a dynamic research area. The speech recognition is the skill to pay attention to what we are talking about, to interpret and to perform actions based on the information spoken. This article presents a short outline of speech recognition and the various techniques like MFCC, LPC and PLP intended for feature extraction in speech recognition system. Among the three techniques i.e. MFCC, LPC, PLP, Mel frequency cepstral coefficient's (MFCC) is repeatedly used feature extraction technique in speech recognition process because it is most nearby to the real individual acoustic speech opinion.
APA, Harvard, Vancouver, ISO, and other styles
10

Sarkar, Swagata, Sanjana R, Rajalakshmi S, and Harini T J. "Simulation and detection of tamil speech accent using modified mel frequency cepstral coefficient algorithm." International Journal of Engineering & Technology 7, no. 3.3 (2018): 426. http://dx.doi.org/10.14419/ijet.v7i2.33.14202.

Full text
Abstract:
Automatic Speech reconstruction system is a topic of interest of many researchers. Since many online courses are come into the picture, so recent researchers are concentrating on speech accent recognition. Many works have been done in this field. In this paper speech accent recognition of Tamil speech from different zones of Tamilnadu is addressed. Hidden Markov Model (HMM) and Viterbi algorithms are very popularly used algorithms. Researchers have worked with Mel Frequency Cepstral Coefficients (MFCC) to identify speech as well as speech accent. In this paper speech accent features are identified by modified MFCC algorithm. The classification of features is done by back propagation algorithm.
APA, Harvard, Vancouver, ISO, and other styles
More sources

Dissertations / Theses on the topic "Speech Recognition (SR). MFCC (Mel Frequency Cepstral Coefficient)"

1

Sklar, Alexander Gabriel. "Channel Modeling Applied to Robust Automatic Speech Recognition." Scholarly Repository, 2007. http://scholarlyrepository.miami.edu/oa_theses/87.

Full text
Abstract:
In automatic speech recognition systems (ASRs), training is a critical phase to the system?s success. Communication media, either analog (such as analog landline phones) or digital (VoIP) distort the speaker?s speech signal often in very complex ways: linear distortion occurs in all channels, either in the magnitude or phase spectrum. Non-linear but time-invariant distortion will always appear in all real systems. In digital systems we also have network effects which will produce packet losses and delays and repeated packets. Finally, one cannot really assert what path a signal will take, and so having error or distortion in between is almost a certainty. The channel introduces an acoustical mismatch between the speaker's signal and the trained data in the ASR, which results in poor recognition performance. The approach so far, has been to try to undo the havoc produced by the channels, i.e. compensate for the channel's behavior. In this thesis, we try to characterize the effects of different transmission media and use that as an inexpensive and repeatable way to train ASR systems.
APA, Harvard, Vancouver, ISO, and other styles

Book chapters on the topic "Speech Recognition (SR). MFCC (Mel Frequency Cepstral Coefficient)"

1

Nidhyananthan, S. Selva, Joe Virgin A., and Shantha Selva Kumari R. "Wireless Enhanced Security Based on Speech Recognition." In Handbook of Research on Information Security in Biomedical Signal Processing. IGI Global, 2018. http://dx.doi.org/10.4018/978-1-5225-5152-2.ch012.

Full text
Abstract:
Security is the most notable fact of all computerized control gadgets. In this chapter, a voice ID computerized gadget is utilized for the security motivation using speech recognition. Mostly, the voices are trained by extracting mel frequency cepstral coefficient feature (MFCC), but it is very sensitive to noise interference and degrades the performance; hence, dynamic MFCC is used for speech and speaker recognition. The registered voices are stored in a database. When the device senses any voice, it cross checks with the registered voice. If any mismatches occur, it gives an alert to the authorized person through global system for mobile communication (GSM) to intimate the unauthorized access. GSM works at a rate of 168 Kb/s up to 40 km and it operates at different operating frequencies like 800MHz, 900MHz, etc. This proposed work is more advantageous for the security systems to trap the unauthorized persons through an efficient communication.
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!