Log in

Relevant bibliographies by topics / Robust speaker identification

Contents

Journal articles
Dissertations / Theses
Book chapters
Conference papers
Reports

Academic literature on the topic 'Robust speaker identification'

Author: Grafiati

Published: 4 June 2025

Last updated: 15 July 2025

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the lists of relevant articles, books, theses, conference reports, and other scholarly sources on the topic 'Robust speaker identification.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Journal articles on the topic "Robust speaker identification"

1

Aung, Zaw Win. "A Robust Speaker Identification System." International Journal of Trend in Scientific Research and Development Volume-2, Issue-5 (2018): 2057–64. http://dx.doi.org/10.31142/ijtsrd18274.

Full text

APA, Harvard, Vancouver, ISO, and other styles

2

Shah, Shahid Munir, Muhammad Moinuddin, and Rizwan Ahmed Khan. "A Robust Approach for Speaker Identification Using Dialect Information." Applied Computational Intelligence and Soft Computing 2022 (March 7, 2022): 1–16. http://dx.doi.org/10.1155/2022/4980920.

Full text

Abstract:

The present research is an effort to enhance the performance of voice processing systems, in our case the speaker identification system (SIS) by addressing the variability caused by the dialectical variations of a language. We present an effective solution to reduce dialect-related variability from voice processing systems. The proposed method minimizes the system’s complexity by reducing search space during the testing process of speaker identification. The speaker is searched from the set of speakers of the identified dialect instead of all the speakers present in system training. The study is conducted on the Pashto language, and the voice data samples are collected from native Pashto speakers of specific regions of Pakistan and Afghanistan where Pashto is spoken with different dialectal variations. The task of speaker identification is achieved with the help of a novel hierarchical framework that works in two steps. In the first step, the speaker’s dialect is identified. For automated dialect identification, the spectral and prosodic features have been used in conjunction with Gaussian mixture model (GMM). In the second step, the speaker is identified using a multilayer perceptron (MLP)-based speaker identification system, which gets aggregated input from the first step, i.e., dialect identification along with prosodic and spectral features. The robustness of the proposed SIS is compared with traditional state-of-the-art methods in the literature. The results show that the proposed framework is better in terms of average speaker recognition accuracy (84.5% identification accuracy) and consumes 39% less time for the identification of speaker.

APA, Harvard, Vancouver, ISO, and other styles

3

Zaw, Win Aung. "A Robust Speaker Identification System." International Journal of Trend in Scientific Research and Development 2, no. 5 (2018): 2057–64. https://doi.org/10.31142/ijtsrd18274.

Full text

Abstract:

This paper is aimed to implement a robust speaker identification system. It is a software architecture which identifies the current talker out of a set of speakers. The system is emphasized on text dependent speaker identification system. It contains three main modules endpoint detection, feature extraction and feature matching. The additional module, endpoint detection, removes unwanted signal and background noise from the input speech signal before subsequent processing. In the proposed system, Short Term Energy analysis is used for endpoint detection. Mel frequency Cepstrum Coefficients MFCC is applied for feature extraction to extract a small amount of data from the voice signal that can later be used to represent each speaker. For feature matching, Vector Quantization VQ approach using Linde, Buzo and Gray LBG clustering algorithm is proposed because it can reduce the amount of data and complexity. The experimental study shows that the proposed system is more robust than using the original system and it is faster in computation than the existing one. To implement this system MATLAB is used for programming. Zaw Win Aung "A Robust Speaker Identification System" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-2 | Issue-5 , August 2018, URL: https://www.ijtsrd.com/papers/ijtsrd18274.pdf

APA, Harvard, Vancouver, ISO, and other styles

4

Jia-Ching Wang, Chung-Hsien Yang, Jhing-Fa Wang, and Hsiao-Ping Lee. "Robust Speaker Identification and Verification." IEEE Computational Intelligence Magazine 2, no. 2 (2007): 52–59. http://dx.doi.org/10.1109/mci.2007.353420.

Full text

APA, Harvard, Vancouver, ISO, and other styles

5

Zhao, Xiaojia, Yang Shao, and DeLiang Wang. "CASA-Based Robust Speaker Identification." IEEE Transactions on Audio, Speech, and Language Processing 20, no. 5 (2012): 1608–16. http://dx.doi.org/10.1109/tasl.2012.2186803.

Full text

APA, Harvard, Vancouver, ISO, and other styles

6

Bose, Smarajit, Amita Pal, Anish Mukherjee, and Debasmita Das. "Robust Speaker Identification Using Fusion of Features and Classifiers." International Journal of Machine Learning and Computing 7, no. 5 (2017): 133–38. http://dx.doi.org/10.18178/ijmlc.2017.7.5.635.

Full text

APA, Harvard, Vancouver, ISO, and other styles

7

Jayanna, H. S., and B. G. Nagaraja. "An Experimental Comparison of Modeling Techniques and Combination of Speaker – Specific Information from Different Languages for Multilingual Speaker Identification." Journal of Intelligent Systems 25, no. 4 (2016): 529–38. http://dx.doi.org/10.1515/jisys-2014-0128.

Full text

Abstract:

AbstractMost of the state-of-the-art speaker identification systems work on a monolingual (preferably English) scenario. Therefore, English-language autocratic countries can use the system efficiently for speaker recognition. However, there are many countries, including India, that are multilingual in nature. People in such countries have habituated to speak multiple languages. The existing speaker identification system may yield poor performance if a speaker’s train and test data are in different languages. Thus, developing a robust multilingual speaker identification system is an issue in many countries. In this work, an experimental evaluation of the modeling techniques, including self-organizing map (SOM), learning vector quantization (LVQ), and Gaussian mixture model-universal background model (GMM-UBM) classifiers for multilingual speaker identification, is presented. The monolingual and crosslingual speaker identification studies are conducted using 50 speakers of our own database. It is observed from the experimental results that the GMM-UBM classifier gives better identification performance than the SOM and LVQ classifiers. Furthermore, we propose a combination of speaker-specific information from different languages for crosslingual speaker identification, and it is observed that the combination feature gives better performance in all the crosslingual speaker identification experiments.

APA, Harvard, Vancouver, ISO, and other styles

8

Fredj, Ines Ben, Youssef Zouhir, and Kaïs Ouni. "Fusion features for robust speaker identification." International Journal of Signal and Imaging Systems Engineering 11, no. 2 (2018): 65. http://dx.doi.org/10.1504/ijsise.2018.091881.

Full text

APA, Harvard, Vancouver, ISO, and other styles

9

Fredj, Ines Ben, Youssef Zouhir, and Kaïs Ouni. "Fusion features for robust speaker identification." International Journal of Signal and Imaging Systems Engineering 11, no. 2 (2018): 65. http://dx.doi.org/10.1504/ijsise.2018.10013027.

Full text

APA, Harvard, Vancouver, ISO, and other styles

10

Milošević, M., Ž. Nedeljković, U. Glavitsch, and Ž. Đurović. "Speaker Modeling Using Emotional Speech for More Robust Speaker Identification." Journal of Communications Technology and Electronics 64, no. 11 (2019): 1256–65. http://dx.doi.org/10.1134/s1064226919110184.

Full text

APA, Harvard, Vancouver, ISO, and other styles

More sources

Dissertations / Theses on the topic "Robust speaker identification"

1

Zhao, Xiaojia. "CASA-BASED ROBUST SPEAKER IDENTIFICATION." The Ohio State University, 2014. http://rave.ohiolink.edu/etdc/view?acc_num=osu1402620178.

Full text

APA, Harvard, Vancouver, ISO, and other styles

2

Haider, Zargham. "Robust speaker identification against computer aided voice impersonation." Thesis, University of Surrey, 2011. http://epubs.surrey.ac.uk/770387/.

Full text

Abstract:

Speaker Identification (SID) systems offer good performance in the case of noise free speech and most of the on-going research aims at improving their reliability in noisy environments. In ideal operating conditions very low identification error rates can be achieved. The low error rates suggest that SID systems can be used in real-life applications as an extra layer of security along with existing secure layers. They can, for instance, be used alongside a Personal Identification Number (PIN) or passwords. SID systems can also be used by law enforcements agencies as a detection system to track wanted people over voice communications networks. In this thesis, the performance of 'the existing SID systems against impersonation attacks is analysed and strategies to counteract them are discussed. A voice impersonation system is developed using Gaussian Mixture Modelling (GMM) utilizing Line Spectral Frequencies (LSF) as the features representing the spectral parameters of the source-target pair. Voice conversion systems based on probabilistic approaches suffer from the problem of over smoothing of the converted spectrum. A hybrid scheme using Linear Multivariate Regression and GMM, together with posterior probability smoothing is proposed to reduce over smoothing and alleviate the discontinuities in the converted speech. The converted voices are used to intrude a closed-set SID system in the scenarios of identity disguise and targeted speaker impersonation. The results of the intrusion suggest that in their present form the SID systems are vulnerable to deliberate voice conversion attacks. For impostors to transform their voices, a large volume of speech data is required, which may not be easily accessible. In the context of improving the performance of SID against deliberate impersonation attacks, the use of multiple classifiers is explored. Linear Prediction (LP) residual of the speech signal is also analysed for speaker-specific excitation information. A speaker identification system based on multiple classifier system, using features to describe the vocal tract and the LP residual is targeted by the impersonation system. The identification results provide an improvement in rejecting impostor claims when presented with converted voices. It is hoped that the findings in this thesis, can lead to the development of speaker identification systems which are better equipped to deal with the problem with deliberate voice impersonation.

APA, Harvard, Vancouver, ISO, and other styles

3

VALE, EDUARDO ESTEVES. "ROBUST TEXT-INDEPENDENT SPEAKER IDENTIFICATION USING MULTIPLE CLASSIFIERS IN SUB-BANDS." PONTIFÍCIA UNIVERSIDADE CATÓLICA DO RIO DE JANEIRO, 2010. http://www.maxwell.vrac.puc-rio.br/Busca_etds.php?strSecao=resultado&nrSeq=17227@1.

Full text

Abstract:

Esta tese destina-se ao desenvolvimento de novas técnicas de combinação de classificadores aplicados em sub-bandas visando melhorar a identificação de locutor robusta e independente do texto. As vantagens observadas nas pesquisas utilizando múltiplos classificadores em sub-bandas para o reconhecimento de locutor robusto motivaram o desenvolvimento de técnicas de combinação desses algoritmos. Nessa tese foram propostas novas abordagens para a combinação das respostas dos classificadores nas sub-bandas. O principal objetivo é melhorar a taxa de acerto em situações onde nada se sabe sobre o tipo de ruído que pode estar corrompendo os sinais de voz usados no teste do sistema. As diferentes propostas consistem no emprego de pesos não-uniformes, espaço nulo, treinamento em múltiplas condições, atributos dinâmicos e coeficientes de autocorrelação – MFCC. A aplicação das novas propostas contribui significativamente para a melhoria da taxa de acerto do sistema de reconhecimento. Obteve-se, por exemplo, um aumento na taxa de reconhecimento, em relação à técnica de combinação Soma apresentada na literatura, de aproximadamente 47% em testes com ruído branco, e de 32% em testes com ruído não-branco em 15 segundos de fala e 10 dB de RSR (Relação Sinal Ruído), apenas utilizando uma nova estratégia que emprega o espaço nulo na combinação de classificadores em sub-bandas. Resultados mais significativos foram obtidos empregando-se as demais propostas apresentadas no presente trabalho.<br>This Thesis aims to develop new classifier combination techniques applied in sub-bands in order to improve the robustness of text-independent speaker identification systems. The advantages observed in previous experiments using multiple classifiers in sub-bands for robust speaker recognition motivated the development of combination techniques for these algorithms. New strategies to combine the classifiers responses are proposed in this Thesis. The main purpose is to increase the recognition performance in situations when there is no knowledge about the type of noise that corrupts the testing speech signal. The different proposals consist in applying non-uniform weights, null space, multicondition training, dynamic features and autocorrelation based MFCC features. The employment of the new strategies significantly contribute to increase the recognition performance. It was obtained an increase, for instance, compared to the Sum combination technique shown in the literature, of about 47% in tests with white noise, and 32% with non-white noise in 15 seconds of speech in 10 dB of SNR (Signal-to-noise ratio), just using a new strategy which employ the null space to combine the sub-band classifiers. Even better results were obtained by using the other proposals.

APA, Harvard, Vancouver, ISO, and other styles

4

Al-Kaltakchi, Musab Tahseen Salahaldeen. "Robust text independent closed set speaker identification systems and their evaluation." Thesis, University of Newcastle upon Tyne, 2018. http://hdl.handle.net/10443/3978.

Full text

Abstract:

This thesis focuses upon text independent closed set speaker identi cation. The contributions relate to evaluation studies in the presence of various types of noise and handset e ects. Extensive evaluations are performed on four databases. The rst contribution is in the context of the use of the Gaussian Mixture Model-Universal Background Model (GMM-UBM) with original speech recordings from only the TIMIT database. Four main simulations for Speaker Identi cation Accuracy (SIA) are presented including di erent fusion strategies: Late fusion (score based), early fusion (feature based) and early-late fusion (combination of feature and score based), late fusion using concatenated static and dynamic features (features with temporal derivatives such as rst order derivative delta and second order derivative delta-delta features, namely acceleration features), and nally fusion of statistically independent normalized scores. The second contribution is again based on the GMM-UBM approach. Comprehensive evaluations of the e ect of Additive White Gaussian Noise (AWGN), and Non-Stationary Noise (NSN) (with and without a G.712 type handset) upon identi cation performance are undertaken. In particular, three NSN types with varying Signal to Noise Ratios (SNRs) were tested corresponding to: street tra c, a bus interior and a crowded talking environment. The performance evaluation also considered the e ect of late fusion techniques based on score fusion, namely mean, maximum, and linear weighted sum fusion. The databases employed were: TIMIT, SITW, and NIST 2008; and 120 speakers were selected from each database to yield 3,600 speech utterances. The third contribution is based on the use of the I-vector, four combinations of I-vectors with 100 and 200 dimensions were employed. Then, various fusion techniques using maximum, mean, weighted sum and cumulative fusion with the same I-vector dimension were used to improve the SIA. Similarly, both interleaving and concatenated I-vector fusion were exploited to produce 200 and 400 I-vector dimensions. The system was evaluated with four di erent databases using 120 speakers from each database. TIMIT, SITW and NIST 2008 databases were evaluated for various types of NSN namely, street-tra c NSN, bus-interior NSN and crowd talking NSN; and the G.712 type handset at 16 kHz was also applied. As recommendations from the study in terms of the GMM-UBM approach, mean fusion is found to yield overall best performance in terms of the SIA with noisy speech, whereas linear weighted sum fusion is overall best for original database recordings. However, in the I-vector approach the best SIA was obtained from the weighted sum and the concatenated fusion.

APA, Harvard, Vancouver, ISO, and other styles

5

Mtibaa, Aymen. "Towards robust and privacy-preserving speaker verification systems." Electronic Thesis or Diss., Institut polytechnique de Paris, 2022. http://www.theses.fr/2022IPPAS002.

Full text

Abstract:

Les systèmes de vérification du locuteur sont une technologie clé dans de nombreux appareils et services tels que les smartphones, les assistants numériques intelligents et les applications bancaires. Pendant la pandémie de COVID-19, les systèmes de contrôle d'accès basés sur des lecteurs d'empreintes digitales ou des claviers augmentent le risque de propagation du virus. Par conséquent, les entreprises repensent maintenant leurs systèmes de contrôle d'accès des employés et envisagent des technologies d'autorisation sans contact, telles que les systèmes de vérification des locuteurs. Cependant, les systèmes de vérification des locuteurs exigent que le système d'accès stocke les modèles des locuteurs et ait accès aux enregistrements ou aux caractéristiques dérivées des voix des locuteurs lors de l'authentification. Ce processus soulève certaines préoccupations concernant le respect de la vie privée de l'utilisateur et la protection de ces données biométriques sensibles. Un adversaire peut voler les informations biométriques des locuteurs pour usurper l'identité de l'utilisateur authentique et obtenir un accès non autorisé. De plus, lorsqu'il s'agit de données vocales, nous sommes confrontés à des problèmes supplémentaires de confidentialité et de respect de vie privée parce que à partir des données vocales plusieurs informations personnelles liées à l'identité, au sexe, à l'âge ou à l'état de santé du locuteur peuvent être extraites. Dans ce contexte, la présente thèse de doctorat aborde les problèmes de protection des données biométriques, le respect de vie privée et la sécurité pour les systèmes de vérification du locuteur basés sur les modèles de mélange gaussien (GMM), i-vecteur et x-vecteur comme modélisation du locuteur. L'objectif est le développement de systèmes de vérification du locuteur qui effectuent une vérification biométrique tout en respectant la vie privée et la protection des données biométriques de l'utilisateur. Pour cela, nous avons proposé des schémas de protection biométrique afin de répondre aux exigences de protection des données biométriques (révocabilité, diversité, et irréversibilité) décrites dans la norme ISO/IEC IS~24745 et pour améliorer la robustesse des systèmes contre différentes scénarios d'attaques<br>Speaker verification systems are a key technology in many devices and services like smartphones, intelligent digital assistants, healthcare, and banking applications. Additionally, with the COVID pandemic, access control systems based on fingerprint scanners or keypads increase the risk of virus propagation. Therefore, companies are now rethinking their employee access control systems and considering touchless authorization technologies, such as speaker verification systems.However, speaker verification system requires users to transmit their recordings, features, or models derived from their voice samples without any obfuscation over untrusted public networks which stored and processed them on a cloud-based infrastructure. If the system is compromised, an adversary can use this biometric information to impersonate the genuine user and extract personal information. The voice samples may contain information about the user's gender, accent, ethnicity, and health status which raises several privacy issues.In this context, the present PhD Thesis address the privacy and security issues for speaker verification systems based on Gaussian mixture models (GMM), i-vector, and x-vector as speaker modeling. The objective is the development of speaker verification systems that perform biometric verification while preserving the privacy and the security of the user. To that end, we proposed biometric protection schemes for speaker verification systems to achieve the privacy requirements (revocability, unlinkability, irreversibility) described in the standard ISO/IEC IS~24745 on biometric information protection and to improve the robustness of the systems against different attack scenarios

APA, Harvard, Vancouver, ISO, and other styles

6

Nicolson, Aaron M. "Deep Learning for Minimum Mean-Square Error and Missing Data Approaches to Robust Speech Processing." Thesis, Griffith University, 2020. http://hdl.handle.net/10072/399974.

Full text

Abstract:

Speech corrupted by background noise (or noisy speech) can cause misinterpretation and fatigue during phone and conference calls, and for hearing aid users. Noisy speech can also severely impact the performance of speech processing systems such as automatic speech recognition (ASR), automatic speaker verification (ASV), and automatic speaker identification (ASI) systems. Currently, deep learning approaches are employed in an end-to-end fashion to improve robustness. The target speech (or clean speech) is used as the training target or large noisy speech datasets are used to facilitate multi-condition training. In this dissertation, we propose competitive alternatives to the preceding approaches by updating two classic robust speech processing techniques using deep learning. The two techniques include minimum mean-square error (MMSE) and missing data approaches. An MMSE estimator aims to improve the perceived quality and intelligibility of noisy speech. This is accomplished by suppressing any background noise without distorting the speech. Prior to the introduction of deep learning, MMSE estimators were the standard speech enhancement approach. MMSE estimators require the accurate estimation of the a priori signal-to-noise ratio (SNR) to attain a high level of speech enhancement performance. However, current methods produce a priori SNR estimates with a large tracking delay and a considerable amount of bias. Hence, we propose a deep learning approach to a priori SNR estimation that is significantly more accurate than previous estimators, called Deep Xi. Through objective and subjective testing across multiple conditions, such as real-world non-stationary and coloured noise sources at multiple SNR levels, we show that Deep Xi allows MMSE estimators to produce the highest quality enhanced speech amongst all clean speech magnitude spectrum estimators. Missing data approaches improve robustness by performing inference only on noisy speech features that reliably represent clean speech. In particular, the marginalisation method was able to significantly increase the robustness of Gaussian mixture model (GMM)-based speech classification systems (e.g. GMM-based ASR, ASV, or ASI systems) in the early 2000s. However, deep neural networks (DNNs) used in current speech classification systems are non-probabilistic, a requirement for marginalisation. Hence, multi-condition training or noisy speech pre-processing is used to increase the robustness of DNN-based speech classification systems. Recently, sum-product networks (SPNs) were proposed, which are deep probabilistic graphical models that can perform the probabilistic queries required for missing data approaches. While available toolkits for SPNs are in their infancy, we show through an ASI task that SPNs using missing data approaches could be a strong alternative for robust speech processing in the future. This dissertation demonstrates that MMSE estimators and missing data approaches are still relevant approaches to robust speech processing when assisted by deep learning.<br>Thesis (PhD Doctorate)<br>Doctor of Philosophy (PhD)<br>School of Eng & Built Env<br>Science, Environment, Engineering and Technology<br>Full Text

APA, Harvard, Vancouver, ISO, and other styles

7

Tahon, Marie. "Analyse acoustique de la voix émotionnelle de locuteurs lors d’une interaction humain-robot." Thesis, Paris 11, 2012. http://www.theses.fr/2012PA112275/document.

Full text

Abstract:

Mes travaux de thèse s'intéressent à la voix émotionnelle dans un contexte d'interaction humain-robot. Dans une interaction réaliste, nous définissons au moins quatre grands types de variabilités : l'environnement (salle, microphone); le locuteur, ses caractéristiques physiques (genre, âge, type de voix) et sa personnalité; ses états émotionnels; et enfin le type d'interaction (jeu, situation d'urgence ou de vie quotidienne). A partir de signaux audio collectés dans différentes conditions, nous avons cherché, grâce à des descripteurs acoustiques, à imbriquer la caractérisation d'un locuteur et de son état émotionnel en prenant en compte ces variabilités.Déterminer quels descripteurs sont essentiels et quels sont ceux à éviter est un défi complexe puisqu'il nécessite de travailler sur un grand nombre de variabilités et donc d'avoir à sa disposition des corpus riches et variés. Les principaux résultats portent à la fois sur la collecte et l'annotation de corpus émotionnels réalistes avec des locuteurs variés (enfants, adultes, personnes âgées), dans plusieurs environnements, et sur la robustesse de descripteurs acoustiques suivant ces quatre variabilités. Deux résultats intéressants découlent de cette analyse acoustique: la caractérisation sonore d'un corpus et l'établissement d'une liste "noire" de descripteurs très variables. Les émotions ne sont qu'une partie des indices paralinguistiques supportés par le signal audio, la personnalité et le stress dans la voix ont également été étudiés. Nous avons également mis en oeuvre un module de reconnaissance automatique des émotions et de caractérisation du locuteur qui a été testé au cours d'interactions humain-robot réalistes. Une réflexion éthique a été menée sur ces travaux<br>This thesis deals with emotional voices during a human-robot interaction. In a natural interaction, we define at least, four kinds of variabilities: environment (room, microphone); speaker, its physic characteristics (gender, age, voice type) and personality; emotional states; and finally the kind of interaction (game scenario, emergency, everyday life). From audio signals collected in different conditions, we tried to find out, with acoustic features, to overlap speaker and his emotional state characterisation taking into account these variabilities.To find which features are essential and which are to avoid is hard challenge because it needs to work with a high number of variabilities and then to have riche and diverse data to our disposal. The main results are about the collection and the annotation of natural emotional corpora that have been recorded with different kinds of speakers (children, adults, elderly people) in various environments, and about how reliable are acoustic features across the four variabilities. This analysis led to two interesting aspects: the audio characterisation of a corpus and the drawing of a black list of features which vary a lot. Emotions are ust a part of paralinguistic features that are supported by the audio channel, other paralinguistic features have been studied such as personality and stress in the voice. We have also built automatic emotion recognition and speaker characterisation module that we have tested during realistic interactions. An ethic discussion have been driven on our work

APA, Harvard, Vancouver, ISO, and other styles

8

Lin, Wei-Lun, and 林煒倫. "Study on Discriminative and Robust Speaker Identification." Thesis, 2007. http://ndltd.ncl.edu.tw/handle/97646650632506001615.

Full text

Abstract:

碩士<br>國立暨南國際大學<br>電機工程學系<br>95<br>In this thesis, we investigate several discriminative and robust speech feature extraction techniques for speaker identification. For the issue of discrimination, the related approaches include linear discriminant analysis (LDA), modified linear discriminant analysis (MOLDA), principal component analysis (PCA) and orthogonal Gaussian Mixture Model (OGMM). These four approaches are applied to the mel-frequency cepstral coefficients to improve the discriminating capability for a speaker identification system under a noise-free condition. For the issue of robustness, four approaches are considered here, including relative autocorrelation sequence (RAS), differentiated power spectrum (DPS), exponentiated log-MelFBS (ExpoMFCC) and differentiated autocorrelation sequence (DAS). These approaches are shown to reduce the environmental mismatch caused by the additive noise, and thus improve the performance of a speaker identification system under a noise-corrupted environment.

APA, Harvard, Vancouver, ISO, and other styles

9

Yuo, Kuo-Hwei, and 游國輝. "Robust Features and Efficient Models for Speaker Identification." Thesis, 2000. http://ndltd.ncl.edu.tw/handle/58137971312172176468.

Full text

Abstract:

博士<br>國立清華大學<br>電機工程學系<br>88<br>The objective of this dissertation is to find robust features and efficient models to improve the speaker recognition performance. Two types of robust features are presented. One is robust to additive noise, and the other is robust to the coexistence of additive and convolutional noises. In addition, we present two statistical models that depict a speaker’s feature space more efficiently than the classical method using Gaussian mixture model with diagonal covariance matrices. The first robust feature is based on filtering the temporal trajectories of short-time one-sided autocorrelation sequences of speech to remove the additive noise. The filtered sequences are denoted the relative autocorrelation sequences (RAS), and the mel-scale frequency cepstral coefficients (MFCC) are extracted from RAS instead of the original speech. This new speech feature set is denoted RAS-MFCC. The second robust feature is based on involving two steps of temporal trajectory filtering. The first filtering is applied in autocorrelation domain to remove the additive noise, and the second filtering is applied in logarithmic spectrum domain to remove the convolutional noise. The filtered sequence is called CHAnnel-Normalization Relative Autocorrelation Sequence (CHANRAS). The MFCCs are extracted from CHARAS and called CHARAS-MFCC. The RAS-MFCC is a special case of CHARAS-MFCC. We conduct experiments under a variety of noisy environments including additive and convolutional noises. The RAS-MFCC and CHARAS-MFCC are shown to be superior to projection method. The RAS-MFCC and the CHARAS-MFCC combining with projection measure can further improve identification accuracy. Next, we present a new GMM structure that can depict the speaker’s feature space more efficiently than the traditional GMM structure. This is based on that we embed a common uncorrelated transformation matrix to all Gaussian pdfs. The idea is similar to a classical approach derived from the Karhunen-Loéve transformation. However both algorithms to derive the trasformation matrix are inherently different. The proposed new GMM is called transformation embedded GMM (TE-GMM). The transformation matrix of TE-GMM as well as the other model parameters could be trained simltaneously using maximum likelihood estimation. Then we generalizes the one transformation used in TE-EMM to multiple transformations. We derive a new GMM, called General Covariance GMM (GC-GMM). The GMM with diagonal covariance matrices is denoted as DC-GMM (Diagonal Covariance GMM). The GMM with full covariance matrices is denoted as FC-GMM (Full Covariance GMM). Both DC-GMM and FC-GMM are special cases of GC-GMM. The experimental results show that the TE-GMM can achieve a better accuracy than the classical Karhunen-Loéve transformation method. The experimental results also show that, in comparison with the traditional GMM, the GC-GMM can reduce the computational complexity and the number of parameters significantly without degradation in system performance.

APA, Harvard, Vancouver, ISO, and other styles

10

林廷翰. "Spectro-temporal Smoothed Auditory Spectra for Robust Speaker Identification." Thesis, 2010. http://ndltd.ncl.edu.tw/handle/93426071838577958845.

Full text

Abstract:

碩士<br>國立交通大學<br>電信工程研究所<br>98<br>The performance of conventional speaker recognition systems is severely compromised by interference, such as additive or convolutional noises. High-level information of the speaker is considered more robust cues for recognizing speakers. This paper proposes an auditory-model based spectral features, auditory cepstral coefficients (ACCs), and a spectro-temporal modulation filtering (STMF) process to capture high-level information for robust speaker recognition. Text-independent closed-set speaker recognition experiments are conducted on TIMIT and GRID corpora to evaluate the robustness of ACCs and benefits of the STMF process. Experimental results show ACCs’ significant improvement over conventional MFCCs in all SNR conditions. The superior performance of STMF to newly developed ANTCCs is also demonstrated in low SNR conditions.

APA, Harvard, Vancouver, ISO, and other styles

Book chapters on the topic "Robust speaker identification"

1

Li, Qi. "Auditory-Based Feature Extraction and Robust Speaker Identification." In Speaker Authentication. Springer Berlin Heidelberg, 2011. http://dx.doi.org/10.1007/978-3-642-23731-7_8.

Full text

APA, Harvard, Vancouver, ISO, and other styles

2

Holambe, Raghunath S., and Mangesh S. Deshpande. "Noise Robust Speaker Identification: Using Nonlinear Modeling Techniques." In Forensic Speaker Recognition. Springer New York, 2011. http://dx.doi.org/10.1007/978-1-4614-0263-3_7.

Full text

APA, Harvard, Vancouver, ISO, and other styles

3

Lee, Younjeong, Joohun Lee, and Ki Yong Lee. "Efficient Speaker Identification Based on Robust VQ-PCA." In Computational Science and Its Applications — ICCSA 2003. Springer Berlin Heidelberg, 2003. http://dx.doi.org/10.1007/3-540-44843-8_69.

Full text

APA, Harvard, Vancouver, ISO, and other styles

4

Sekkate, Sara, Mohammed Khalil, and Abdellah Adib. "A Feature Level Fusion Scheme for Robust Speaker Identification." In Communications in Computer and Information Science. Springer International Publishing, 2018. http://dx.doi.org/10.1007/978-3-319-96292-4_23.

Full text

APA, Harvard, Vancouver, ISO, and other styles

5

Lee, Younjeong, Hernsoo Hahn, Youngjoon Han, and Joohun Lee. "Robust Speaker Identification Based on t-Distribution Mixture Model." In AI 2005: Advances in Artificial Intelligence. Springer Berlin Heidelberg, 2005. http://dx.doi.org/10.1007/11589990_105.

Full text

APA, Harvard, Vancouver, ISO, and other styles

6

Ayhan, Bulent, and Chiman Kwan. "Robust Speaker Identification Algorithms and Results in Noisy Environments." In Advances in Neural Networks – ISNN 2018. Springer International Publishing, 2018. http://dx.doi.org/10.1007/978-3-319-92537-0_51.

Full text

APA, Harvard, Vancouver, ISO, and other styles

7

Yang, IL-Ho, Min-Seok Kim, Byung-Min So, Myung-Jae Kim, and Ha-Jin Yu. "Robust Speaker Identification Using Ensembles of Kernel Principal Component Analysis." In Lecture Notes in Computer Science. Springer Berlin Heidelberg, 2012. http://dx.doi.org/10.1007/978-3-642-28942-2_7.

Full text

APA, Harvard, Vancouver, ISO, and other styles

8

Kim, Min-Seok, Ha-Jin Yu, Keun-Chang Kwak, and Su-Young Chi. "Robust Text-Independent Speaker Identification Using Hybrid PCA&LDA." In Lecture Notes in Computer Science. Springer Berlin Heidelberg, 2006. http://dx.doi.org/10.1007/11925231_102.

Full text

APA, Harvard, Vancouver, ISO, and other styles

9

Biagetti, Giorgio, Paolo Crippa, Laura Falaschetti, Simone Orcioni, and Claudio Turchetti. "Robust Speaker Identification in a Meeting with Short Audio Segments." In Intelligent Decision Technologies 2016. Springer International Publishing, 2016. http://dx.doi.org/10.1007/978-3-319-39627-9_41.

Full text

APA, Harvard, Vancouver, ISO, and other styles

10

Pawar, M. D., and Rajendra Kokate. "A Robust Wavelet Based Decomposition and Multilayer Neural Network for Speaker Identification." In Lecture Notes in Networks and Systems. Springer Singapore, 2019. http://dx.doi.org/10.1007/978-981-13-3765-9_21.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Conference papers on the topic "Robust speaker identification"

1

Schonherr, Lea, Dennis Orth, Martin Heckmann, and Dorothea Kolossa. "Environmentally robust audio-visual speaker identification." In 2016 IEEE Spoken Language Technology Workshop (SLT). IEEE, 2016. http://dx.doi.org/10.1109/slt.2016.7846282.

Full text

APA, Harvard, Vancouver, ISO, and other styles

2

Carey, M. J., E. S. Parris, H. Lloyd-Thomas, and S. J. Bennett. "Robust prosodic features for speaker identification." In 4th International Conference on Spoken Language Processing (ICSLP 1996). ISCA, 1996. http://dx.doi.org/10.21437/icslp.1996-457.

Full text

APA, Harvard, Vancouver, ISO, and other styles

3

Deshpande, M. S., and R. S. Holambe. "Robust speaker identification in babble noise." In the International Conference & Workshop. ACM Press, 2011. http://dx.doi.org/10.1145/1980022.1980160.

Full text

APA, Harvard, Vancouver, ISO, and other styles

4

Deshpande, Mangesh S., and Raghunath S. Holambe. "Robust Q Features for Speaker Identification." In 2009 International Conference on Advances in Recent Technologies in Communication and Computing. IEEE, 2009. http://dx.doi.org/10.1109/artcom.2009.75.

Full text

APA, Harvard, Vancouver, ISO, and other styles

5

Nidhyananthan, S. Selva, R. Shantha Selva Kumari, and G. Jaffino. "Robust speaker identification using vocal source information." In 2012 International Conference on Devices, Circuits and Systems (ICDCS 2012). IEEE, 2012. http://dx.doi.org/10.1109/icdcsyst.2012.6188700.

Full text

APA, Harvard, Vancouver, ISO, and other styles

6

Matsumoto, Kizuki, Noboru Hayasaka, and Youji Iiguni. "Noise robust speaker identification by dividing MFCC." In 2014 6th International Symposium on Communications, Control and Signal Processing (ISCCSP). IEEE, 2014. http://dx.doi.org/10.1109/isccsp.2014.6877959.

Full text

APA, Harvard, Vancouver, ISO, and other styles

7

Mitra, Vikramjit, Mitchell McLaren, Horacio Franco, Martin Graciarena, and Nicolas Scheffer. "Modulation features for noise robust speaker identification." In Interspeech 2013. ISCA, 2013. http://dx.doi.org/10.21437/interspeech.2013-695.

Full text

APA, Harvard, Vancouver, ISO, and other styles

8

Kim, Min-Seok, Il-Ho Yang, and Ha-Jin Yu. "Robust Speaker Identification Using Greedy Kernel PCA." In 2008 20th IEEE International Conference on Tools with Artificial Intelligence (ICTAI). IEEE, 2008. http://dx.doi.org/10.1109/ictai.2008.105.

Full text

APA, Harvard, Vancouver, ISO, and other styles

9

Sadjadi, Seyed Omid, and John H. L. Hansen. "Blind reverberation mitigation for robust speaker identification." In ICASSP 2012 - 2012 IEEE International Conference on Acoustics, Speech and Signal Processing. IEEE, 2012. http://dx.doi.org/10.1109/icassp.2012.6288851.

Full text

APA, Harvard, Vancouver, ISO, and other styles

10

Liu, Gang, Yun Lei, and John H. L. Hansen. "Robust feature front-end for speaker identification." In ICASSP 2012 - 2012 IEEE International Conference on Acoustics, Speech and Signal Processing. IEEE, 2012. http://dx.doi.org/10.1109/icassp.2012.6288853.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Reports on the topic "Robust speaker identification"

1

Jin, Qin, and Yun Wang. Integrated Robust Open-Set Speaker Identification System (IROSIS). Defense Technical Information Center, 2012. http://dx.doi.org/10.21236/ada562148.

Full text

APA, Harvard, Vancouver, ISO, and other styles

2

Chong, Alberto E. Does It Matter How People Speak? Inter-American Development Bank, 2006. http://dx.doi.org/10.18235/0010970.

Full text

Abstract:

Language serves two key functions. It enables communication between agents, which allows for the establishment and operation of formal and informal institutions. It also serves a less obvious function, a reassuring quality more closely related to issues linked with trust, social capital, and cultural identification. While research on the role of language as a learning process is widespread, there is no evidence on the role of language as a signal of cultural affinity. I pursue this latter avenue of research and show that subtle language affinity is positively linked with change in earnings when using English-speaking data for cities in the Golden Horseshoe area in Southern Ontario during the period 1991 to 2001. The results are robust to changes in specification, a broad number of empirical tests, and a diverse set of outcome variables.

APA, Harvard, Vancouver, ISO, and other styles

We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!