Inhaltsverzeichnis
Auswahl der wissenschaftlichen Literatur zum Thema „Parole audio-visuelle“
Geben Sie eine Quelle nach APA, MLA, Chicago, Harvard und anderen Zitierweisen an
Machen Sie sich mit den Listen der aktuellen Artikel, Bücher, Dissertationen, Berichten und anderer wissenschaftlichen Quellen zum Thema "Parole audio-visuelle" bekannt.
Neben jedem Werk im Literaturverzeichnis ist die Option "Zur Bibliographie hinzufügen" verfügbar. Nutzen Sie sie, wird Ihre bibliographische Angabe des gewählten Werkes nach der nötigen Zitierweise (APA, MLA, Harvard, Chicago, Vancouver usw.) automatisch gestaltet.
Sie können auch den vollen Text der wissenschaftlichen Publikation im PDF-Format herunterladen und eine Online-Annotation der Arbeit lesen, wenn die relevanten Parameter in den Metadaten verfügbar sind.
Zeitschriftenartikel zum Thema "Parole audio-visuelle"
AOUCHICHE AIT -YALA, Ouardia. „Narration numérique et prise de parole : améliorer les compétences orales en FLE à travers la création d’histoires multimodales“. Journal of Languages and Translation 5, Nr. 1 (16.01.2025): 358–70. https://doi.org/10.70204/jlt.v5i1.454.
Der volle Inhalt der QuelleVampé, Anne, und Véronique Aubergé. „Prosodie expressive audio-visuelle de l'interaction personne-machine. Etats mentaux, attitudes, intentions et affects (Feeling of Thinking) en dehors du tour de parole“. Techniques et sciences informatiques 29, Nr. 7 (20.09.2010): 807–32. http://dx.doi.org/10.3166/tsi.29.807-832.
Der volle Inhalt der QuelleChouraqui, Floriane, und Maylis Asté. „La voie audio-visuelle en sciences sociales : donner voix par le film, donner corps à la recherche“. Revue Française des Méthodes Visuelles HS1 (2024). http://dx.doi.org/10.4000/12mq3.
Der volle Inhalt der QuelleDissertationen zum Thema "Parole audio-visuelle"
Musti, Utpala. „Synthèse acoustico-visuelle de la parole par sélection d'unités bimodales“. Thesis, Université de Lorraine, 2013. http://www.theses.fr/2013LORR0003.
Der volle Inhalt der QuelleThis work deals with audio-visual speech synthesis. In the vast literature available in this direction, many of the approaches deal with it by dividing it into two synthesis problems. One of it is acoustic speech synthesis and the other being the generation of corresponding facial animation. But, this does not guarantee a perfectly synchronous and coherent audio-visual speech. To overcome the above drawback implicitly, we proposed a different approach of acoustic-visual speech synthesis by the selection of naturally synchronous bimodal units. The synthesis is based on the classical unit selection paradigm. The main idea behind this synthesis technique is to keep the natural association between the acoustic and visual modality intact. We describe the audio-visual corpus acquisition technique and database preparation for our system. We present an overview of our system and detail the various aspects of bimodal unit selection that need to be optimized for good synthesis. The main focus of this work is to synthesize the speech dynamics well rather than a comprehensive talking head. We describe the visual target features that we designed. We subsequently present an algorithm for target feature weighting. This algorithm that we developed performs target feature weighting and redundant feature elimination iteratively. This is based on the comparison of target cost based ranking and a distance calculated based on the acoustic and visual speech signals of units in the corpus. Finally, we present the perceptual and subjective evaluation of the final synthesis system. The results show that we have achieved the goal of synthesizing the speech dynamics reasonably well
Musti, Utpala. „Synthèse acoustico-visuelle de la parole par sélection d'unités bimodales“. Electronic Thesis or Diss., Université de Lorraine, 2013. http://www.theses.fr/2013LORR0003.
Der volle Inhalt der QuelleThis work deals with audio-visual speech synthesis. In the vast literature available in this direction, many of the approaches deal with it by dividing it into two synthesis problems. One of it is acoustic speech synthesis and the other being the generation of corresponding facial animation. But, this does not guarantee a perfectly synchronous and coherent audio-visual speech. To overcome the above drawback implicitly, we proposed a different approach of acoustic-visual speech synthesis by the selection of naturally synchronous bimodal units. The synthesis is based on the classical unit selection paradigm. The main idea behind this synthesis technique is to keep the natural association between the acoustic and visual modality intact. We describe the audio-visual corpus acquisition technique and database preparation for our system. We present an overview of our system and detail the various aspects of bimodal unit selection that need to be optimized for good synthesis. The main focus of this work is to synthesize the speech dynamics well rather than a comprehensive talking head. We describe the visual target features that we designed. We subsequently present an algorithm for target feature weighting. This algorithm that we developed performs target feature weighting and redundant feature elimination iteratively. This is based on the comparison of target cost based ranking and a distance calculated based on the acoustic and visual speech signals of units in the corpus. Finally, we present the perceptual and subjective evaluation of the final synthesis system. The results show that we have achieved the goal of synthesizing the speech dynamics reasonably well
Huyse, Aurélie. „Intégration audio-visuelle de la parole: le poids de la vision varie-t-il en fonction de l'âge et du développement langagier?“ Doctoral thesis, Universite Libre de Bruxelles, 2012. http://hdl.handle.net/2013/ULB-DIPOT:oai:dipot.ulb.ac.be:2013/209690.
Der volle Inhalt der QuelleLe paradigme expérimental utilisé consistait toujours en une tâche d’identification de syllabes présentées dans trois modalités :auditive seule, visuelle seule et audio-visuelle (congruente et incongruente). Les cinq études avaient également comme point commun la présentation de stimuli visuels dont la qualité était réduite, visant à empêcher une lecture labiale de bonne qualité. Le but de chacune de ces études était non seulement d’examiner si les performances variaient en fonction des variables investiguées mais également de déterminer si les différences provenaient bien du processus d’intégration lui-même et non uniquement de différences au niveau de la perception unimodale. Pour cela, les scores des participants ont été comparés à des scores prédits sur base d’un modèle prenant en compte les variations individuelles des poids auditifs et visuels, le weighted fuzzy-logical model of perception.
L’ensemble des résultats, discuté dans la dernière partie de ce travail, fait pencher la balance en faveur de l’hypothèse d’une intégration dépendante du contexte. Nous proposons alors une nouvelle architecture de fusion bimodale, prenant en compte ces dernières données. Enfin, les implications sont aussi d’ordre pratique, suggérant la nécessité d’incorporer des évaluations et rééducations à la fois auditives et visuelles dans le cadre des programmes de revalidation de personnes âgées, dysphasiques ou avec implant cochléaire./During face-to-face conversation, perception of auditory speech is influenced by the visual speech cues contained in lip movements. Indeed, previous research has highlighted the ability of lip-reading to enhance and even modify speech perception. This phenomenon is known as audio-visual integration. The aim of this doctoral thesis is to study the possibility of modifying this audio-visual integration according to several variables. This work lies into the scope of an important debate between invariant versus subject-dependent audio-visual integration in speech processing. Each study of this dissertation investigates the impact of a specific variable on bimodal integration: the quality of the visual input, age of participants, the use of a cochlear implant, age at cochlear implantation and the presence of specific language impairments.
The paradigm used always consisted of a syllable identification task, where syllables were presented in three modalities: auditory only, visual only and audio-visual (congruent and incongruent). There was also a condition where the quality of the visual input was reduced, in order to prevent a lip-reading of good quality. The aim of each of the five studies was not only to examine whether performances were modified according to the variable under study but also to ascertain that differences were indeed issued from the integration process itself. Thereby, our results were analyzed in the framework of model predictive of audio-visual speech performance (weighted fuzzy-logical model of perception) in order to disentangle unisensory effects from audio-visual integration effects.
Taken together, our data suggest that speech integration is not automatic but rather depends on the context. We propose a new architecture of bimodal fusions, taking these considerations into account. Finally, there are also practical implications suggesting the need to incorporate not only auditory but also visual exercise in the rehabilitation programs of older adults and children with cochlear implants or with specific language impairements.
Doctorat en Sciences Psychologiques et de l'éducation
info:eu-repo/semantics/nonPublished
Musti, Utpala. „Synthèse Acoustico-Visuelle de la Parole par Séléction d'Unités Bimodales“. Phd thesis, Université de Lorraine, 2013. http://tel.archives-ouvertes.fr/tel-00927121.
Der volle Inhalt der QuelleAbel, Louis. „Co-speech gesture synthesis : Towards a controllable and interpretable model using a graph deterministic approach“. Electronic Thesis or Diss., Université de Lorraine, 2025. http://www.theses.fr/2025LORR0020.
Der volle Inhalt der QuelleHuman communication is a multimodal process combining verbal and non-verbal dimensions, designed to foster mutual understanding. Gestures, in particular, enrich speech by clarifying meanings, expressing emotions and conveying abstract ideas. However, although advances in speech synthesis have made it possible to produce artificial voices close to human speech, existing systems often neglect visual cues, limiting the effectiveness and immersion of human-machine interactions. To fill this gap, this research developed STARGATE, an innovative model designed to integrate appropriate co-speech gestures into speech synthesis systems. The aim was to overcome the challenges of efficiency, interpretability and training on limited data, while producing gestures synchronized with speech. The STARGATE architecture is based on an autoregressive framework combining CNN networks for audio and text inputs, an ST-GCN network to encode the gesture history, and a biRNN decoder to generate gestures from these latent spaces. A key feature is the addition of an extra speaker embedding input, enabling gesture style customization according to the identity provided. The evaluation confirmed STARGATE's advantages over reference models such as StyleGestures and ZeroEGGS. Objectively, STARGATE achieved a significantly lower FGD, indicating better gesture quality. Subjectively, participants judged the gestures generated to be more humanlike and appropriate, particularly our 9-speaker model, which demonstrated speaker-dependent personalization capability. These results also showed that audio-only models are not sufficient to produce semantically relevant gestures, underlining the importance of integrating textual information. In addition, feedback from linguistic experts highlighted areas for improvement, notably gesture coherence and dynamics. This feedback guided the optimization of the model, reducing the FGD and improving its speed of inference, making it ideal for real-time use. Enhancements have strengthened STARGATE's capabilities, in particular the integration of finger movements, enabling the production of more expressive gestures, such as deictic gestures. The model was also generalized to several speakers, improving the quality of the gestures generated and enabling precise adaptation to each individual's style. This personalization was validated by a further evaluation, designed to analyze the ability of a multi-speaker model to produce different, yet consistent gesture identities. In-depth analysis revealed that STARGATE establishes a correspondence between textual embeddings and generated gestures. This proved the consistency of our model in its generation and understanding of semantic concepts when illustrated in metaphorical gestures. Our analyses of adjacency matrices have also led to the development of a gesture detector to identify gesture boundaries, with promising results. In conclusion, we think that STARGATE represents a major advance in co-verbal gesture synthesis, combining efficiency, control and interpretability into a deterministic model. It enhances human-machine interactions by enriching multimodal communication, while offering promising prospects for embodied conversational agents and assistive technologies
Erjavec, Grozdana. „Apport des mouvements buccaux, des mouvements extra-buccaux et du contexte facial à la perception de la parole chez l'enfant et chez l'adulte“. Thesis, Paris 8, 2015. http://www.theses.fr/2015PA080118/document.
Der volle Inhalt der QuelleThe present thesis work fits into the domain/is incorporated within the framework of research on audio-visual (AV) speech perception. Its objective is to answer the following questions: (i) What is the nature of visual input processing (holistic vs analytic) in AV speech perception? (ii) What is the implication of extra-oral facial movement in AV speech perception? (iii) What are the oculomotor patterns in AV speech perception? (iv) What are the developmental changes in the above-mentioned aspects (i), (ii) and (iii)? The classic noise degradation paradigm was applied in two experiments conducted in the framework of the present thesis. Each experiment were conducted on participants of 4 age groups, adults, adolescents, pre-adolescents and children. Each group consisted of 16 participants. Participants’ task was to repeat consonant-vowel (/a/) syllables. The syllables were both mildly and strongly degraded by pink noise and were presented in four audio(-visual) conditions, one purely auditory (AO) and three audio-visual conditions. The AV conditions were the following: (i) AV face (AVF), (ii) AV « mouth extraction » (AVM-E ; mouth format without visual contrasts), (iii) AV « mouth window » (AVM-W ; mouth format with high visual contrasts) in experiment 1, and (i) AVF, (ii) AVF « mouth active (and facial frame static) » (AVF-MA), (iii) AVF « extra-oral regions active (and mouth absent) » (AVF-EOA) in experiment 2. The data relative to (i) the total number of correct repetitions (total performance), (ii) the difference in the correct repetitions score between each AV and the AO condition (AV gain), and (iii) the total fixations duration in the oral area and other facial areas (for the AV formats) were analyzed. The main results showed that the mechanisms involved in AV speech perception reach their maturity before late childhood. The vision of the talker’s full face does not seem to be advantageous in this context. It seems that the vision of the talker’s full face might perturb AV speech processing in adults, possibly because it triggers processing of other types of information (identity, facial expressions) which could in terms interfere with the processing of acoustic aspects of speech. The contribution of the extra-oral articulatory movement to AV speech perception was poor and limited to the condition of highly degraded auditory information. For ecologically presented facial information, the oculomotor patterns in AV speech perception varied as a function of the level of auditory information degradation, but appeared rather stable across the 4 groups. Finally, the modalities of the featural (mouth) facial information presentation affected the oculomotor behavior patterns in adults, pre-adolescents and children, thus suggesting a certain sensitivity of visuo-attentional processing to low-level visual stimuli characteristics in AV speech perception. The variations in visuo-attentional processing seemed to be associated to a certain extent with variations in AV speech perception
Buchteile zum Thema "Parole audio-visuelle"
Dupont, Malika, und Brigitte Lejeune. „Lecture Labiale et Perception Audio-Visuelle de la Parole“. In Rééducation De la Boucle Audio-phonatoire, 17–18. Elsevier, 2010. http://dx.doi.org/10.1016/b978-2-294-70754-4.50003-9.
Der volle Inhalt der Quelle