Dissertations / Theses on the topic 'Adaptation du locuteur'
Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles
Consult the top 25 dissertations / theses for your research on the topic 'Adaptation du locuteur.'
Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.
You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.
Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.
Bonneau, Hélène. "Quantification vectorielle et adaptation au locuteur." Grenoble 2 : ANRT, 1987. http://catalogue.bnf.fr/ark:/12148/cb37603148c.
Full textBonneau, Hélène. "Quantification vectorielle et adaptation au locuteur." Paris 11, 1987. http://www.theses.fr/1987PA112306.
Full textGilles, Philippe. "Décodage phonétique de la parole et adaptation au locuteur." Avignon, 1993. http://www.theses.fr/1993AVIG0105.
Full textTeng, Wen Xuan. "Adaptation rapide au locuteur par sous-espace variable de modèles de référence." Rennes 1, 2008. ftp://ftp.irisa.fr/techreports/theses/2008/teng.pdf.
Full textLes travaux présentés dans cette thèse sont consacrés au problème de l’adaptation de modèles acoustiques pour la reconnaissance automatique de la parole en utilisant très peu de données. Nous définissons le concept de sous-espace de modèle de référence afin d’unifier la plupart des techniques d'adaptation rapide proposées dans la littérature dans un formalisme commun. Il nous aide à étudier les limites des techniques actuelles et à explorer de nouveaux algorithmes d'adaptation. Nous avons montré expérimentalement que l'adaptation avec des sous-espaces fixés ne peut pas donner d’améliorations stables pour différentes cibles à adapter (e. G. Locuteurs). Afin de résoudre ce problème, nous avons proposé d'utiliser des sous-espaces variables qui est mis en œuvre par un nouvel algorithme d’adaptation, l’interpolation de modèle de référence (IMR). Cette technique permet la sélection a posteriori de modèles de référence avec différents critères de sélection. La technique proposée est appliquée dans les systèmes de décodage phonétique et de reconnaissance automatique de la parole continue à grand vocabulaire. Des expériences sur trois bases de données, à savoir IDIOLOGOS, PAIDIOLOGOS et ESTER, montrent l'efficacité de la technique IMR avec l'adaptation instantanée. En outre, l'adaptation progressive est également atteinte en combinant la lente mise à jour des modèles de référence avec l’adaptation rapide par IMR
Su, Huan-Yu. "Reconnaissance acoustico-phonétique en parole continue par quantification vectorielle adaptation du dictionnaire au locuteur /." Grenoble 2 : ANRT, 1987. http://catalogue.bnf.fr/ark:/12148/cb37610109z.
Full textLauri, Fabrice. "Adaptation au locuteur de modèles acoustiques markoviens pour la reconnaissance automatique de la parole." Nancy 2, 2004. http://www.theses.fr/2004NAN2A001.
Full textSU, HUANG-YU. "Reconnaissance acoustico-phonetique en parole continue par quantification vectorielle : adaptation du dictionnaire au locuteur." Rennes 1, 1987. http://www.theses.fr/1987REN10127.
Full textBellot, Olivier. "Adaptation au locuteur des modèles acoustiques dans le cadre de la reconnaissance automatique de la parole." Avignon, 2006. http://www.theses.fr/2006AVIG0154.
Full textThe speaker-dependent HMM-based recognizers have lower Word Error Rates (WER) than speaker-independent ones. Nevertheless, in the speaker-dependent case, the requirement of large amount of training data for each test speaker reduces the utility and portability of such systems. The aim of speaker adaptation techniques is to enhance the speaker-independent acoustic models to bring their recognition accuracy as close as possible to the one obtained with speaker-dependent models. In this work, we present two different approaches to increase the robustness of speech regnonizer with respect to the speaker acoustic variabilities. The first one is a method using test and training data for acoustic model adaptation . This method operates in two steps : the first one performs an a priori adaptation using the transcribed training data of the closest training speakers to the test speaker. The second one performs an a posteriori adaptation using the MLLR procedure on the test data. This adaptation strategy was evaluated in a large vocabulary speech recognition task. Our method leads to a relative gain of 15% with respect to the baseline system. The second method presented is based on tree structure. To avoid poor transformation parameters estimation accuracy due to an insufficiency of adaptation data in a node, we propose a new technique based on the maximum a posteriori approach and PDF Gaussians Merging. The basic idea behind this new technique is to estimate an affine transformations which bring the training acoustic models as close as possible to the test acoustic models rather than transformation maximizing the likelihood of the adaptation data. In this manner, even with very small amount of adaptation data, the parameters transformations are accurately estimated for means and variances. This method leads to a relative gain of 16% with respect to the baseline system and a relative gain of 19. 5% combined with the MLLR adaptation
Barras, Claude. "Reconnaissance de la parole continue : adaptation au locuteur et controle temporel dans les modeles de markov caches." Paris 6, 1996. http://www.theses.fr/1996PA066019.
Full textFerràs, Font Marc. "Utilisation des coefficients de régression linéaire par maximum de vraisemblance comme paramètres pour la reconnaissance automatique du locuteur." Phd thesis, Université Paris Sud - Paris XI, 2009. http://tel.archives-ouvertes.fr/tel-00616673.
Full textLelong, Amélie. "Convergence phonétique en interaction Phonetic convergence in interaction." Thesis, Grenoble, 2012. http://www.theses.fr/2012GRENT079/document.
Full textThe work presented in this manuscript is based on the study of a phenomenon called phonetic convergence which postulates that two people in interaction will tend to adapt how they talk to their partner in a communicative purpose. We have developed a paradigm called “Verbal Dominoes“ to collect a large corpus to characterize this phenomenon, the ultimate goal being to fill a conversational agent of this adaptability in order to improve the quality of human-machine interactions.We have done several studies to investigate the phenomenon between pairs of unknown people, good friends, and between people coming from the same family. We expect that the amplitude of convergence is proportional to the social distance between the two speakers. We found this result. Then, we have studied the knowledge of the linguistic target impact on adaptation. To characterize the phonetic convergence, we have developed two methods: the first one is based on a linear discriminant analysis between the MFCC coefficients of each speaker and the second one used speech recognition techniques. The last method will allow us to study the phenomenon in less controlled conditions.Finally, we characterized the phonetic convergence with a subjective measurement using a new perceptual test called speaker switching. The test was performed using signals coming from real interactions but also with synthetic data obtained with the harmonic plus
Valdés, Vargas Julian Andrés. "Adaptation de clones orofaciaux à la morphologie et aux stratégies de contrôle de locuteurs cibles pour l'articulation de la parole." Thesis, Grenoble, 2013. http://www.theses.fr/2013GRENT105/document.
Full textThe capacity of producing speech is learned and maintained by means of a perception-action loop that allows speakers to correct their own production as a function of the perceptive feedback received. This auto feedback is auditory and proprioceptive, but not visual. Thus, speech sounds may be complemented by augmented speech systems, i.e. speech accompanied by the virtual display of speech articulators shapes on a computer screen, including those that are typically hidden such as tongue or velum. This kind of system has applications in domains such as speech therapy, phonetic correction or language acquisition in the framework of Computer Aided Pronunciation Training (CAPT). This work has been conducted in the frame of development of a visual articulatory feedback system, based on the morphology and articulatory strategies of a reference speaker, which automatically animates a 3D talking head from the speech sound. The motivation of this research was to make this system suitable for several speakers. Thus, the twofold objective of this thesis work was to acquire knowledge about inter-speaker variability, and to propose vocal tract models to adapt a reference clone, composed of models of speech articulator's contours (lips, tongue, velum, etc), to other speakers that may have different morphologies and different articulatory strategies. In order to build articulatory models of various vocal tract contours, we have first acquired data that cover the whole articulatory space in the French language. Midsagittal Magnetic Resonance Images (MRI) of eleven French speakers, pronouncing 63 articulations, have been collected. One of the main contributions of this study is a more detailed and larger database compared to the studies in the literature, containing information of several vocal tract contours, speakers and consonants, whereas previous studies in the literature are mostly based on vowels. The vocal tract contours visible in the MRI were outlined by hand following the same protocol for all speakers. In order to acquire knowledge about inter-speaker variability, we have characterised our speakers in terms of the articulatory strategies of various vocal tract contours like: tongue, lips and velum. We observed that each speaker has his/her own strategy to achieve sounds that are considered equivalent, among different speakers, for speech communication purposes. By means of principal component analysis (PCA), the variability of the tongue, lips and velum contours was decomposed in a set of principal movements. We noticed that these movements are performed in different proportions depending on the speaker. For instance, for a given displacement of the jaw, the tongue may globally move in a proportion that depends on the speaker. We also noticed that lip protrusion, lip opening, the influence of the jaw movement on the lips, and the velum's articulatory strategy can also vary according to the speaker. For example, some speakers roll up their uvulas against the tongue to produce the consonant /ʁ/ in vocalic contexts. These findings also constitute an important contribution to the knowledge of inter-speaker variability in speech production. In order to extract a set of common articulatory patterns that different speakers employ when producing speech sounds (normalisation), we have based our approach on linear models built from articulatory data. Multilinear decomposition methods have been applied to the contours of the tongue, lips and velum. The evaluation of our models was based in two criteria: the variance explanation and the Root Mean Square Error (RMSE) between the original and recovered articulatory coordinates. Models were also assessed using a leave-one-out cross validation procedure
Lelong, Amelie. "Convergence phonétique en interaction Phonetic convergence in interaction." Phd thesis, Université de Grenoble, 2012. http://tel.archives-ouvertes.fr/tel-00822871.
Full textTomashenko, Natalia. "Speaker adaptation of deep neural network acoustic models using Gaussian mixture model framework in automatic speech recognition systems." Thesis, Le Mans, 2017. http://www.theses.fr/2017LEMA1040/document.
Full textDifferences between training and testing conditions may significantly degrade recognition accuracy in automatic speech recognition (ASR) systems. Adaptation is an efficient way to reduce the mismatch between models and data from a particular speaker or channel. There are two dominant types of acoustic models (AMs) used in ASR: Gaussian mixture models (GMMs) and deep neural networks (DNNs). The GMM hidden Markov model (GMM-HMM) approach has been one of the most common technique in ASR systems for many decades. Speaker adaptation is very effective for these AMs and various adaptation techniques have been developed for them. On the other hand, DNN-HMM AMs have recently achieved big advances and outperformed GMM-HMM models for various ASR tasks. However, speaker adaptation is still very challenging for these AMs. Many adaptation algorithms that work well for GMMs systems cannot be easily applied to DNNs because of the different nature of these models. The main purpose of this thesis is to develop a method for efficient transfer of adaptation algorithms from the GMM framework to DNN models. A novel approach for speaker adaptation of DNN AMs is proposed and investigated. The idea of this approach is based on using so-called GMM-derived features as input to a DNN. The proposed technique provides a general framework for transferring adaptation algorithms, developed for GMMs, to DNN adaptation. It is explored for various state-of-the-art ASR systems and is shown to be effective in comparison with other speaker adaptation techniques and complementary to them
Ben, Youssef Atef. "Contrôle de têtes parlantes par inversion acoustico-articulatoire pour l’apprentissage et la réhabilitation du langage." Thesis, Grenoble, 2011. http://www.theses.fr/2011GRENT088/document.
Full textSpeech sounds may be complemented by displaying speech articulators shapes on a computer screen, hence producing augmented speech, a signal that is potentially useful in all instances where the sound itself might be difficult to understand, for physical or perceptual reasons. In this thesis, we introduce a system called visual articulatory feedback, in which the visible and hidden articulators of a talking head are controlled from the speaker's speech sound. The motivation of this research was to develop such a system that could be applied to Computer Aided Pronunciation Training (CAPT) for learning of foreign languages, or in the domain of speech therapy. We have based our approach to this mapping problem on statistical models build from acoustic and articulatory data. In this thesis we have developed and evaluated two statistical learning methods trained on parallel synchronous acoustic and articulatory data recorded on a French speaker by means of an electromagnetic articulograph. Our Hidden Markov models (HMMs) approach combines HMM-based acoustic recognition and HMM-based articulatory synthesis techniques to estimate the articulatory trajectories from the acoustic signal. Gaussian mixture models (GMMs) estimate articulatory features directly from the acoustic ones. We have based our evaluation of the improvement results brought to these models on several criteria: the Root Mean Square Error between the original and recovered EMA coordinates, the Pearson Product-Moment Correlation Coefficient, displays of the articulatory spaces and articulatory trajectories, as well as some acoustic or articulatory recognition rates. Experiments indicate that the use of states tying and multi-Gaussian per state in the acoustic HMM improves the recognition stage, and that the minimum generation error (MGE) articulatory HMMs parameter updating results in a more accurate inversion than the conventional maximum likelihood estimation (MLE) training. In addition, the GMM mapping using MLE criteria is more efficient than using minimum mean square error (MMSE) criteria. In conclusion, we have found that the HMM inversion system has a greater accuracy compared with the GMM one. Beside, experiments using the same statistical methods and data have shown that the face-to-tongue inversion problem, i.e. predicting tongue shapes from face and lip shapes cannot be solved in a general way, and that it is impossible for some phonetic classes. In order to extend our system based on a single speaker to a multi-speaker speech inversion system, we have implemented a speaker adaptation method based on the maximum likelihood linear regression (MLLR). In MLLR, a linear regression-based transform that adapts the original acoustic HMMs to those of the new speaker was calculated to maximise the likelihood of adaptation data. Finally, this speaker adaptation stage has been evaluated using an articulatory phonetic recognition system, as there are not original articulatory data available for the new speakers. Finally, using this adaptation procedure, we have developed a complete articulatory feedback demonstrator, which can work for any speaker. This system should be assessed by perceptual tests in realistic conditions
Ben, youssef Atef. "Contrôle de têtes parlantes par inversion acoustico-articulatoire pour l'apprentissage et la réhabilitation du langage." Phd thesis, Université de Grenoble, 2011. http://tel.archives-ouvertes.fr/tel-00721957.
Full textLe, Lan Gaël. "Analyse en locuteurs de collections de documents multimédia." Thesis, Le Mans, 2017. http://www.theses.fr/2017LEMA1020/document.
Full textThe task of speaker diarization and linking aims at answering the question "who speaks and when?" in a collection of multimedia recordings. It is an essential step to index audiovisual contents. The task of speaker diarization and linking firstly consists in segmenting each recording in terms of speakers, before linking them across the collection. Aim is, to identify each speaker with a unique anonymous label, even for speakers appearing in multiple recordings, without any knowledge of their identity or number. The challenge of the cross-recording linking is the modeling of the within-speaker/across-recording variability: depending on the recording, a same speaker can appear in multiple acoustic conditions (in a studio, in the street...). The thesis proposes two methods to overcome this issue. Firstly, a novel neural variability compensation method is proposed, using the triplet-loss paradigm for training. Secondly, an iterative unsupervised domain adaptation process is presented, in which the system exploits the information (even inaccurate) about the data it processes, to enhance its performances on the target acoustic domain. Moreover, novel ways of analyzing the results in terms of speaker are explored, to understand the actual performance of a diarization and linking system, beyond the well-known Diarization Error Rate (DER). Systems and methods are evaluated on two TV shows of about 40 episodes, using either a global, or longitudinal linking architecture, and state of the art speaker modeling (i-vector)
Borges, Liselene de Abreu. "Sistemas de adaptação ao locutor utilizando autovozes." Universidade de São Paulo, 2001. http://www.teses.usp.br/teses/disponiveis/3/3142/tde-05052003-104044/.
Full textThis present work describe two speaker adaptation technique, using a small amount of adaptation data, for a speech recognition system. These techniques are Maximum Likelihood Linear Regression (MLLR) and Eigenvoices. Both re-estimates the mean of a continuous density Hidden Markov Model system. MLLR technique estimates a set of linear transformations for mean parameters of a Gaussian system. The eigenvoice technique is based on a previous knowledge about speaker variation. For obtaining this previous knowledge, that are retained in eigenvoices, it necessary to apply principal component analysis (PCA). We make adaptation tests over an isolated word recognition system, restrict vocabulary. If a large amount of adaptation data is available (up to 70% of all vocabulary) Eigenvoices technique does not appear to be a good implementation if compared with the MLLR technique. Now, when just a small amount of adaptation data is available (less than 15 % of all vocabulary), Eigenvoices technique get better results than MLLR technique.
Ben, Youssef Atef. "Contrôle de têtes parlantes par inversion acoustico-articulatoire pour l'apprentissage et la réhabilitation du langage." Phd thesis, Université de Grenoble, 2011. http://tel.archives-ouvertes.fr/tel-00699008.
Full textDias, Raquel de Souza Ferreira. "Normalização de locutor em sistema de reconhecimento de fala." [s.n.], 2000. http://repositorio.unicamp.br/jspui/handle/REPOSIP/261949.
Full textDissertação (mestrado) - Universidade Estadual de Campinas, Faculdade de Engenharia Eletrica e de Computação
Made available in DSpace on 2018-07-28T13:19:36Z (GMT). No. of bitstreams: 1 Dias_RaqueldeSouzaFerreira_M.pdf: 453879 bytes, checksum: 3b408421c3b4b92453ac0dc80111c05b (MD5) Previous issue date: 2000
Mestrado
Ottens, Kévin. "Un système multi-agent adaptatif pour la construction d'ontologies à partir de textes." Phd thesis, Université Paul Sabatier - Toulouse III, 2007. http://tel.archives-ouvertes.fr/tel-00176883.
Full textParce que l'ontologie doit être maintenue, et parce qu'elle peut-être vue comme un système complexe constitué de concepts, nous proposons d'utiliser les systèmes multi-agents adaptatifs pour semi-automatiser le processus de construction des ontologies à partir de texte. L'état stable de ces systèmes résulte des interactions coopératives entre les agents logiciels qui les constituent. Dans notre cas, les agents utilisent des algorithmes distribués d'analyse statistique pour trouver la structure la plus satisfaisante d'après une analyse syntaxique et distributionnelle des textes. L'utilisateur peut alors valider, critiquer ou modifier des parties de cette structure d'agents, qui est la base de l'ontologie en devenir, pour la rendre conforme à ses objectifs et à sa vision du domaine modélisé. En retour, les agents se réorganisent pour satisfaire les nouvelles contraintes introduites. Les ontologies habituellement fixées deviennent ici dynamiques, leur conception devient « vivante ». Ce sont les principes sous-jacents de notre système nommé Dynamo.
La pertinence de cette approche a été mise à l'épreuve par des expérimentations visant à évaluer la complexité algorithmique de notre système, et par son utilisation en conditions réelles. Dans ce mémoire, nous présentons et analysons les résultats obtenus.
Sidi-Hida, Mouna. "L'adaptation cinématographie d'oeuvres littéraires françaises et l'enseignement du français au secondaire au Maroc : constats, enjeux et propositions." Thesis, Grenoble, 2012. http://www.theses.fr/2012GRENL031.
Full textExplore film adaptations of literary works in French class, opens up interesting didactic teaching French. In the case of Morocco, this experience has enabled the implementation of a set of didactic sequences for secondary classes (college and high school). Students learned to reevaluate the scope of each artistic and expressive art. The introductory work on the film through the film adaptation, demonstrated to students the complexity of this particular artistic language. Therefore, the film has acquired the status of an object of knowledge among the components of French courses. Equally, understanding literature through film, rehabilitated the literary language from students instead of being put off by the literature, claiming the complexity of his language, students have discovered in it, the depth of human thought and sometimes even of their own. In this way, they began a long way, to learn how to receive the word and image
Detey, Sylvain. "Interphonologie et représentations orthographiques : du rôle de l'écrit dans l'enseignement-apprentissage du français oral chez des étudiants japonais." Phd thesis, Université Toulouse le Mirail - Toulouse II, 2005. http://tel.archives-ouvertes.fr/tel-00458366.
Full textSivasankaran, Sunit. "Séparation de la parole guidée par la localisation." Electronic Thesis or Diss., Université de Lorraine, 2020. http://www.theses.fr/2020LORR0078.
Full textVoice based personal assistants are part of our daily lives. Their performance suffers in the presence of signal distortions, such as noise, reverberation, and competing speakers. This thesis addresses the problem of extracting the signal of interest in such challenging conditions by first localizing the target speaker and using the location to extract the target speech. In a first stage, a common situation is considered when the target speaker utters a known word or sentence such as the wake-up word of a distant-microphone voice command system. A method that exploits this text information in order to improve the speaker localization performance in the presence of competing speakers is proposed. The proposed solution uses a speech recognition system to align the wake-up word to the corrupted speech signal. A model spectrum representing the aligned phones is used to compute an identifier which is then used by a deep neural network to localize the target speaker. Results on simulated data show that the proposed method reduces the localization error rate compared to the classical GCC-PHAT method. Similar improvements are observed on real data. Given the estimated location of the target speaker, speech separation is performed in three stages. In the first stage, a simple delay-and-sum (DS) beamformer is used to enhance the signal impinging from that location which is then used in the second stage to estimate a time-frequency mask corresponding to the localized speaker using a neural network. This mask is used to compute the second-order statistics and to derive an adaptive beamformer in the third stage. A multichannel, multispeaker, reverberated, noisy dataset --- inspired from the famous WSJ0-2mix dataset --- was generated and the performance of the proposed pipeline was investigated in terms of the word error rate (WER). To make the system robust to localization errors, a Speaker LOcalization Guided Deflation (SLOGD) based approach which estimates the sources iteratively is proposed. At each iteration the location of one speaker is estimated and used to estimate a mask corresponding to that speaker. The estimated source is removed from the mixture before estimating the location and mask of the next source. The proposed method is shown to outperform Conv-TasNet. Finally, we consider the problem of explaining the robustness of neural networks used to compute time-frequency masks to mismatched noise conditions. We employ the so-called SHAP method to quantify the contribution of every time-frequency bin in the input signal to the estimated time-frequency mask. We define a metric that summarizes the SHAP values and show that it correlates with the WER achieved on separated speech. To the best of our knowledge, this is the first known study on neural network explainability in the context of speech separation
Valdes, Julian. "Adaptation de clones orofaciaux à la morphologie et aux stratégies de contrôle de locuteurs cibles pour l'articulation de la parole." Phd thesis, 2013. http://tel.archives-ouvertes.fr/tel-00843693.
Full text