To see the other types of publications on this topic, follow the link: Multimodal annotation.

Journal articles on the topic 'Multimodal annotation'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 journal articles for your research on the topic 'Multimodal annotation.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse journal articles on a wide variety of disciplines and organise your bibliography correctly.

1

Kleida, Danae. "Entering a dance performance through multimodal annotation: annotating with scores." International Journal of Performance Arts and Digital Media 17, no. 1 (January 2, 2021): 19–30. http://dx.doi.org/10.1080/14794713.2021.1880182.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Chou, Chien-Li, Hua-Tsung Chen, and Suh-Yin Lee. "Multimodal Video-to-Near-Scene Annotation." IEEE Transactions on Multimedia 19, no. 2 (February 2017): 354–66. http://dx.doi.org/10.1109/tmm.2016.2614426.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Zhu, Songhao, Xiangxiang Li, and Shuhan Shen. "Multimodal deep network learning‐based image annotation." Electronics Letters 51, no. 12 (June 2015): 905–6. http://dx.doi.org/10.1049/el.2015.0258.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Brunner, Marie-Louise, and Stefan Diemer. "Multimodal meaning making: The annotation of nonverbal elements in multimodal corpus transcription." Research in Corpus Linguistics 10, no. 1 (2021): 63–88. http://dx.doi.org/10.32714/ricl.09.01.05.

Full text
Abstract:
The article discusses how to integrate annotation for nonverbal elements (NVE) from multimodal raw data as part of a standardized corpus transcription. We argue that it is essential to include multimodal elements when investigating conversational data, and that in order to integrate these elements, a structured approach to complex multimodal data is needed. We discuss how to formulate a structured corpus-suitable standard syntax and taxonomy for nonverbal features such as gesture, facial expressions, and physical stance, and how to integrate it in a corpus. Using corpus examples, the article describes the development of a robust annotation system for spoken language in the corpus of Video-mediated English as a Lingua Franca Conversations (ViMELF 2018) and illustrates how the system can be used for the study of spoken discourse. The system takes into account previous research on multimodality, transcribes salient nonverbal features in a concise manner, and uses a standard syntax. While such an approach introduces a degree of subjectivity through the criteria of salience and conciseness, the system also offers considerable advantages: it is versatile and adaptable, flexible enough to work with a wide range of multimodal data, and it allows both quantitative and qualitative research on the pragmatics of interaction.
APA, Harvard, Vancouver, ISO, and other styles
5

Völkel, Thorsten. "Personalized and adaptive navigation based on multimodal annotation." ACM SIGACCESS Accessibility and Computing, no. 86 (September 2006): 4–7. http://dx.doi.org/10.1145/1196148.1196149.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Podlasov, Alexey, Sabine Tan, and Kay O'Halloran. "Interactive state-transition diagrams for visualization of multimodal annotation." Intelligent Data Analysis 16, no. 4 (July 11, 2012): 683–702. http://dx.doi.org/10.3233/ida-2012-0544.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Tian, Feng, Quge Wang, Xin Li, and Ning Sun. "Heterogeneous multimedia cooperative annotation based on multimodal correlation learning." Journal of Visual Communication and Image Representation 58 (January 2019): 544–53. http://dx.doi.org/10.1016/j.jvcir.2018.12.028.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Debras, Camille. "How to prepare the video component of the Diachronic Corpus of Political Speeches for multimodal analysis." Research in Corpus Linguistics 9, no. 1 (2021): 132–51. http://dx.doi.org/10.32714/ricl.09.01.08.

Full text
Abstract:
The Diachronic Corpus of Political Speeches (DCPS) is a collection of 1,500 full-length political speeches in English. It includes speeches delivered in countries where English is an official language (the US, Britain, Canada, Ireland) by English-speaking politicians in various settings from 1800 up to the present time. Enriched with semi-automatic morphosyntactic annotations and with discourse-pragmatic manual annotations, the DCPS is designed to achieve maximum representativeness and balance for political English speeches from major national English varieties in time, preserve detailed metadata, and enable corpus-based studies of syntactic, semantic and discourse-pragmatic variation and change on political corpora. For speeches given from 1950 onwards, video-recordings of the original delivery are often retrievable online. This opens up avenues of research in multimodal linguistics, in which studies on the integration of speech and gesture in the construction of meaning can include analyses of recurrent gestures and of multimodal constructions. This article discusses the issues at stake in preparing the video-recorded component of the DCPS for linguistic multimodal analysis, namely the exploitability of recordings, the segmentation and alignment of transcriptions, the annotation of gesture forms and functions in the software ELAN and the quantity of available gesture data.
APA, Harvard, Vancouver, ISO, and other styles
9

Martin, J. C., G. Caridakis, L. Devillers, K. Karpouzis, and S. Abrilian. "Manual annotation and automatic image processing of multimodal emotional behaviors: validating the annotation of TV interviews." Personal and Ubiquitous Computing 13, no. 1 (May 3, 2007): 69–76. http://dx.doi.org/10.1007/s00779-007-0167-y.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

Carletta, Jean, Stefan Evert, Ulrich Heid, Jonathan Kilgour, Judy Robertson, and Holger Voormann. "The NITE XML Toolkit: Flexible annotation for multimodal language data." Behavior Research Methods, Instruments, & Computers 35, no. 3 (August 2003): 353–63. http://dx.doi.org/10.3758/bf03195511.

Full text
APA, Harvard, Vancouver, ISO, and other styles
11

O’Halloran, Kay, Sabine Tan, Bradley Smith, and Alexey Podlasov. "Challenges in designing digital interfaces for the study of multimodal phenomena." Information Design Journal 18, no. 1 (June 9, 2010): 2–21. http://dx.doi.org/10.1075/idj.18.1.02hal.

Full text
Abstract:
The paper discusses the challenges faced by researchers in developing effective digital interfaces for analyzing the meaning-making processes of multimodal phenomena. The authors propose a social semiotic approach as the underlying theoretical foundation, because interactive digital technology is the embodiment of multimodal social semiotic communication. The paper outlines the complex issues with which researchers are confronted in designing digital interface frameworks for modeling, analyzing, and retrieving meaning from multimodal data, giving due consideration to the multiplicity of theoretical frameworks and theories which have been developed for the study of multimodal text within social semiotics, and their impact on the development of a computer-based tool for the exploration, annotation, and analysis of multimodal data.
APA, Harvard, Vancouver, ISO, and other styles
12

Hardison, Debra M. "Visualizing the acoustic and gestural beats of emphasis in multimodal discourse." Journal of Second Language Pronunciation 4, no. 2 (December 31, 2018): 232–59. http://dx.doi.org/10.1075/jslp.17006.har.

Full text
Abstract:
Abstract Perceivers’ attention is entrained to the rhythm of a speaker’s gestural and acoustic beats. When different rhythms (polyrhythms) occur across the visual and auditory modalities of speech simultaneously, attention may be heightened, enhancing memorability of the sequence. In this three-stage study, Stage 1 analyzed videorecordings of native English-speaking instructors, focusing on frame-by-frame analysis of time-aligned annotations from Praat and Anvil (video annotation tool) of polyrhythmic sequences. Stage 2 explored the perceivers’ perspective on the sequences’ discourse role. Stage 3 analyzed 10 international teaching assistants’ gestures, and implemented a multistep technology-assisted program to enhance verbal and nonverbal communication skills. Findings demonstrated (a) a dynamic temporal gesture-speech relationship involving perturbations of beat intervals surrounding pitch-accented vowels, (b) the sequences’ important role as highlighters of information, and (c) improvement of ITA confidence, teaching effectiveness, and ability to communicate important points. Findings support the joint production of gesture and prosodically prominent features.
APA, Harvard, Vancouver, ISO, and other styles
13

Saneiro, Mar, Olga C. Santos, Sergio Salmeron-Majadas, and Jesus G. Boticario. "Towards Emotion Detection in Educational Scenarios from Facial Expressions and Body Movements through Multimodal Approaches." Scientific World Journal 2014 (2014): 1–14. http://dx.doi.org/10.1155/2014/484873.

Full text
Abstract:
We report current findings when considering video recordings of facial expressions and body movements to provide affective personalized support in an educational context from an enriched multimodal emotion detection approach. In particular, we describe an annotation methodology to tag facial expression and body movements that conform to changes in the affective states of learners while dealing with cognitive tasks in a learning process. The ultimate goal is to combine these annotations with additional affective information collected during experimental learning sessions from different sources such as qualitative, self-reported, physiological, and behavioral information. These data altogether are to train data mining algorithms that serve to automatically identify changes in the learners’ affective states when dealing with cognitive tasks which help to provide emotional personalized support.
APA, Harvard, Vancouver, ISO, and other styles
14

Diete, Alexander, Timo Sztyler, and Heiner Stuckenschmidt. "Exploring Semi-Supervised Methods for Labeling Support in Multimodal Datasets." Sensors 18, no. 8 (August 11, 2018): 2639. http://dx.doi.org/10.3390/s18082639.

Full text
Abstract:
Working with multimodal datasets is a challenging task as it requires annotations which often are time consuming and difficult to acquire. This includes in particular video recordings which often need to be watched as a whole before they can be labeled. Additionally, other modalities like acceleration data are often recorded alongside a video. For that purpose, we created an annotation tool that enables to annotate datasets of video and inertial sensor data. In contrast to most existing approaches, we focus on semi-supervised labeling support to infer labels for the whole dataset. This means, after labeling a small set of instances our system is able to provide labeling recommendations. We aim to rely on the acceleration data of a wrist-worn sensor to support the labeling of a video recording. For that purpose, we apply template matching to identify time intervals of certain activities. We test our approach on three datasets, one containing warehouse picking activities, one consisting of activities of daily living and one about meal preparations. Our results show that the presented method is able to give hints to annotators about possible label candidates.
APA, Harvard, Vancouver, ISO, and other styles
15

Relyea, Robert, Darshan Bhanushali, Karan Manghi, Abhishek Vashist, Clark Hochgraf, Amlan Ganguly, Andres Kwasinski, Michael E. Kuhl, and Raymond Ptucha. "Improving Multimodal Localization Through Self-Supervision." Electronic Imaging 2020, no. 6 (January 26, 2020): 14–1. http://dx.doi.org/10.2352/issn.2470-1173.2020.6.iriacv-014.

Full text
Abstract:
Modern warehouses utilize fleets of robots for inventory management. To ensure efficient and safe operation, real-time localization of each agent is essential. Most robots follow metal tracks buried in the floor and use a grid of precisely mounted RFID tags for localization. As robotic agents in warehouses and manufacturing plants become ubiquitous, it would be advantageous to eliminate the need for these metal wires and RFID tags. Not only do they suffer from significant installation costs, the removal of wires would allow agents to travel to any area inside the building. Sensors including cameras and LiDAR have provided meaningful localization information for many different positioning system implementations. Fusing localization features from multiple sensor sources is a challenging task especially when the target localization task’s dataset is small. We propose a deep-learning based localization system which fuses features from an omnidirectional camera image and a 3D LiDAR point cloud to create a robust robot positioning model. Although the usage of vision and LiDAR eliminate the need for the precisely installed RFID tags, they do require the collection and annotation of ground truth training data. Deep neural networks thrive on lots of supervised data, and the collection of this data can be time consuming. Using a dataset collected in a warehouse environment, we evaluate the performance of two individual sensor models for localization accuracy. To minimize the need for extensive ground truth data collection, we introduce a self-supervised pretraining regimen to populate the image feature extraction network with meaningful weights before training on the target localization task with limited data. In this research, we demonstrate how our self-supervision improves accuracy and convergence of localization models without the need for additional sample annotation.
APA, Harvard, Vancouver, ISO, and other styles
16

Lazaridis, Michalis, Apostolos Axenopoulos, Dimitrios Rafailidis, and Petros Daras. "Multimedia search and retrieval using multimodal annotation propagation and indexing techniques." Signal Processing: Image Communication 28, no. 4 (April 2013): 351–67. http://dx.doi.org/10.1016/j.image.2012.04.001.

Full text
APA, Harvard, Vancouver, ISO, and other styles
17

Cantalini, Giorgina, and Massimo Moneglia. "annotation of gesture and gesture / prosody synchronization in multimodal speech corpora." Journal of Speech Sciences 9 (September 9, 2020): 07–30. http://dx.doi.org/10.20396/joss.v9i00.14956.

Full text
Abstract:
This paper was written with the aim of highlighting the functional and structural correlations between gesticulation and prosody, focusing on gesture / prosody synchronization in spontaneous spoken Italian. The gesture annotation used follows the LASG model (Bressem et al. 2013), while the prosodic annotation focuses on the identification of terminal and non-terminal prosodic breaks which, according to L-AcT (Cresti, 2000; Moneglia & Raso 2014), determine speech act boundaries and the information structure, respectively. Gesticulation co-occurs with speech in about 90% of the speech flow examined and gestural arcs are synchronous with prosodic boundaries. Gesture Phrases, which contain the expressive phase (Stroke) never cross terminal prosodic boundaries, finding in the utterance the maximum unit for gesture / speech correlation. Strokes may correlate with all information unit types, however only infrequently with Dialogic Units (i.e. those functional to the management of the communication). The identification of linguistic units via the marking of prosodic boundaries allows us to understand the linguistic scope of the gesture, supporting its interpretation. Gestures may be linked at different linguistic levels, namely those of: a) the word level; b) the information unit phrase; c) the information unit function; d) the illocutionary value.
APA, Harvard, Vancouver, ISO, and other styles
18

Ladilova, Anna. "Multimodal Metaphors of Interculturereality / Metaforas multimodais da interculturealidade." REVISTA DE ESTUDOS DA LINGUAGEM 28, no. 2 (May 5, 2020): 917. http://dx.doi.org/10.17851/2237-2083.28.2.917-955.

Full text
Abstract:
Abstract: The present paper looks at the interactive construction of multimodal metaphors of interculturereality – a term coined by the author from interculturality and intercorporeality, assuming that intercultural interaction is always an embodied phenomenon, shared among its participants. For this, two videotaped sequences of a group conversation are analyzed drawing upon interaction analysis (Couper-Kuhlen; Selting, 2018). The data was transcribed following the GAT2 (Selting et al., 2011) guidelines, including gesture form annotation, which relied on the system described by Bressem (2013). Gesture function was interpreted drawing on the interactional context and on the system proposed by Kendon (2004) and Bressem and Müller (2013). The results question the validity of the classical conduit metaphor of communication (Reddy, 1979) in the intercultural context and instead propose an embodied approach to the conceptualization of the understanding process among the participants. The analysis also shows that even though the metaphors are multimodal, the metaphoric content is not always evenly distributed among the different modalities (speech, gesture). Apart from that, the metaphorical content is constructed sequentially, referring to preceding metaphors used by the same or different interlocutors and associated with metaphorical blends.Keywords: metaphors; multimodality; interculturality; intercorporeality; migration.Resumo: O presente artigo analisa a construção interativa de metáforas multimodais da interculturealidade – um termo proveniente da interculturalidade e intercorporealidade, assumindo que a interação intercultural é sempre um fenômeno incorporado, compartilhado entre os seus participantes. Para tal, duas sequências gravadas em vídeo de uma conversa em grupo serão analisadas com base na análise da interação (COUPER-KUHLEN; SELTING, 2018). Os dados foram transcritos seguindo as orientações do sistema GAT2 (SELTING et al., 2011), incluindo a anotação da forma gestual, que se baseou no sistema descrito por Bressem (2013). A função dos gestos foi interpretada com base no contexto interacional e no sistema proposto por Kendon (2004) e Bressem e Müller (2013). Os resultados questionam a validade da metáfora clássica do conduto de comunicação (REDDY, 2012) no contexto intercultural e, além disso, propõem uma abordagem corporificada da conceituação do processo de entendimento entre os participantes. A análise também mostra que, embora as metáforas sejam multimodais, o conteúdo metafórico não é sempre uniformemente distribuído entre as diferentes modalidades (fala, gesto). Além disso, o conteúdo metafórico é construído sequencialmente, referindo-se a metáforas anteriores utilizadas pelos mesmos ou diferentes interlocutores e recorrendo a mesclagens metafóricas.Palavras-chave: metáforas; multimodalidade; interculturalidade; intercorporealidade; migração.
APA, Harvard, Vancouver, ISO, and other styles
19

Nguyen, Nhu Van, Alain Boucher, and Jean-Marc Ogier. "Keyword Visual Representation for Image Retrieval and Image Annotation." International Journal of Pattern Recognition and Artificial Intelligence 29, no. 06 (August 12, 2015): 1555010. http://dx.doi.org/10.1142/s0218001415550101.

Full text
Abstract:
Keyword-based image retrieval is more comfortable for users than content-based image retrieval. Because of the lack of semantic description of images, image annotation is often used a priori by learning the association between the semantic concepts (keywords) and the images (or image regions). This association issue is particularly difficult but interesting because it can be used for annotating images but also for multimodal image retrieval. However, most of the association models are unidirectional, from image to keywords. In addition to that, existing models rely on a fixed image database and prior knowledge. In this paper, we propose an original association model, which provides image-keyword bidirectional transformation. Based on the state-of-the-art Bag of Words model dealing with image representation, including a strategy of interactive incremental learning, our model works well with a zero-or-weak-knowledge image database and evolving from it. Some objective quantitative and qualitative evaluations of the model are proposed, in order to highlight the relevance of the method.
APA, Harvard, Vancouver, ISO, and other styles
20

Landolsi, Mohamed Yassine, Hela Haj Mohamed, and Lotfi Ben Romdhane. "Image annotation in social networks using graph and multimodal deep learning features." Multimedia Tools and Applications 80, no. 8 (January 8, 2021): 12009–34. http://dx.doi.org/10.1007/s11042-020-09730-8.

Full text
APA, Harvard, Vancouver, ISO, and other styles
21

Chang, E., Kingshy Goh, G. Sychay, and Gang Wu. "CBSA: content-based soft annotation for multimodal image retrieval using bayes point machines." IEEE Transactions on Circuits and Systems for Video Technology 13, no. 1 (January 2003): 26–38. http://dx.doi.org/10.1109/tcsvt.2002.808079.

Full text
APA, Harvard, Vancouver, ISO, and other styles
22

Diemer, Stefan, Marie-Louise Brunner, and Selina Schmidt. "Compiling computer-mediated spoken language corpora." Compilation, transcription, markup and annotation of spoken corpora 21, no. 3 (September 19, 2016): 348–71. http://dx.doi.org/10.1075/ijcl.21.3.03die.

Full text
Abstract:
This paper discusses key issues in the compilation of spoken language corpora in a computer-mediated communication (CMC) environment, using data from the Corpus of Academic Spoken English (CASE), a corpus of Skype conversations currently being compiled at Saarland University, Germany, in cooperation with European and US partners. Based on first findings, Skype is presented as a suitable tool for collecting informal spoken data. In addition, new recommendations concerning data compilation and transcription are put forward to supplement existing best practice as presented in Wynne (2005). We recommend the preservation of multimodal features during anonymisation, and the addition of annotation elements already at the transcription stage, particularly CMC-related discourse features, English as a Lingua Franca (ELF) features (e.g. non-standard language and code-switching), as well as the inclusion of prosodic, paralinguistic, and non-verbal annotation. Additionally, we propose a layered corpus design in order to allow researchers to focus on specific annotation features.
APA, Harvard, Vancouver, ISO, and other styles
23

Chen, Yu-Hua, and Radovan Bruncak. "Transcribear – Introducing a secure online transcription and annotation tool." Digital Scholarship in the Humanities 35, no. 2 (March 25, 2019): 265–75. http://dx.doi.org/10.1093/llc/fqz016.

Full text
Abstract:
Abstract Reliable high-quality transcription and/or annotation (a.k.a. ‘coding’) is essential for research in a variety of areas in Humanities and Social Sciences which make use of qualitative data such as interviews, focus groups, classroom observations, or any other audio/video recordings. A good tool can facilitate the work of transcription and annotation because the process is notoriously time-consuming and challenging. However, our survey indicates that few existing tools can accommodate the requirements for transcription and annotation (e.g. audio/video playback, spelling checks, keyboard shortcuts, adding tags of annotation) in one place so that a user does not need to constantly switch between multiple windows, for example, an audio player and a text editor. ‘Transcribear’ (https://transcribear.com) is therefore developed as an easy-to-use online tool which facilitates transcription and annotation on the same interface while this web tool operates offline so that a user’s recordings and transcripts can remain secure and confidential. To minimize human errors, the functionality of tag validation is also added. Originally designed for a multimodal corpus project UNNC CAWSE (https://www.nottingham.edu.cn/en/english/research/cawse/), this browser-based application can be customized for individual users’ needs in terms of the annotation scheme and corresponding shortcut keys. This article will explain how this new tool can make tedious and repetitive manual work faster and easier and at the same time improve the quality of outputs as the process of transcription and annotation tends to be prone to human errors. The limitations of Transcribear and future work will also be discussed.
APA, Harvard, Vancouver, ISO, and other styles
24

Drăgan, Nicolae Sorin. "Left/Right Polarity in Gestures and Politics." Romanian Journal of Communication and Public Relations 20, no. 3 (December 1, 2018): 53. http://dx.doi.org/10.21018/rjcpr.2018.3.265.

Full text
Abstract:
In this article we investigate how political actors involved in TV debates during the 2009 and 2014 presidential elections in Romania manage the relationship between handedness (left/right polarity in hand gestures) and political orientation (left/right polarity in politics),. For this purpose we developed a multimodal analysis for some relevant sequences during these debates. The practice of integrating the meanings of different semiotic resources allows a better understanding of the meaning of verbal discourse, actions and behavior of political actors involved in a particular communication situation. In addition, the Multimodal Professional Analysis Tool, ELAN, allows the annotation and dynamic analysis of the semiotic behavior of the political actors involved in the analyzed sequences.
APA, Harvard, Vancouver, ISO, and other styles
25

MARTIN, JEAN-CLAUDE, RADOSLAW NIEWIADOMSKI, LAURENCE DEVILLERS, STEPHANIE BUISINE, and CATHERINE PELACHAUD. "MULTIMODAL COMPLEX EMOTIONS: GESTURE EXPRESSIVITY AND BLENDED FACIAL EXPRESSIONS." International Journal of Humanoid Robotics 03, no. 03 (September 2006): 269–91. http://dx.doi.org/10.1142/s0219843606000825.

Full text
Abstract:
One of the challenges of designing virtual humans is the definition of appropriate models of the relation between realistic emotions and the coordination of behaviors in several modalities. In this paper, we present the annotation, representation and modeling of multimodal visual behaviors occurring during complex emotions. We illustrate our work using a corpus of TV interviews. This corpus has been annotated at several levels of information: communicative acts, emotion labels, and multimodal signs. We have defined a copy-synthesis approach to drive an Embodied Conversational Agent from these different levels of information. The second part of our paper focuses on a model of complex (superposition and masking of) emotions in facial expressions of the agent. We explain how the complementary aspects of our work on corpus and computational model is used to specify complex emotional behaviors.
APA, Harvard, Vancouver, ISO, and other styles
26

Chapman, Roger J., and Philip J. Smith. "Asynchronous Communications to Support Distributed Work in the National Airspace System." Proceedings of the Human Factors and Ergonomics Society Annual Meeting 46, no. 1 (September 2002): 41–45. http://dx.doi.org/10.1177/154193120204600109.

Full text
Abstract:
This research involved the evaluation of a multimodal asynchronous communications tool to support collaborative analysis of post-operations in the National Airspace System (NAS). Collaborating authors have been shown to provide different feedback with asynchronous speech based communications compared to text. Voice synchronized with pointing in asynchronous annotation systems has been found to be more efficient in scheduling tasks, than voice-only, or text only communication. This research investigated how synchronized voice and pointing annotation over asynchronously shared slide shows composed of post operations graphical and tabular data differs in its effect compared to text based annotation, as collections of flights ranked low by standard performance metrics are discussed by FAA (Federal Aviation Administration) and airline representatives. The results showed the combined problem solving and message creation time was shorter when working in the voice and pointing mode than the text based mode, without having an effect on the number and type of ideas generated for improving performance. In both modes the system was also considered useful and usable to both dispatchers and traffic managers.
APA, Harvard, Vancouver, ISO, and other styles
27

Szekrényes, István. "Annotation and interpretation of prosodic data in the HuComTech corpus for multimodal user interfaces." Journal on Multimodal User Interfaces 8, no. 2 (April 29, 2014): 143–50. http://dx.doi.org/10.1007/s12193-013-0140-1.

Full text
APA, Harvard, Vancouver, ISO, and other styles
28

Millet, Agnès, and Isabelle Estève. "Transcribing and annotating multimodality." Gesture and Multimodal Development 10, no. 2-3 (December 31, 2010): 297–320. http://dx.doi.org/10.1075/gest.10.2-3.09mil.

Full text
Abstract:
This paper deals with the central question of transcribing deaf children’s productions. We present the annotation grid we created on Elan®, explaining in detail how and why the observation of the narrative productions of 6 to 12 year-old deaf children led us to modify the annotation schemes previously available. Deaf children resort to every resource available in both modalities: voice and gesture. Thus, these productions are fundamentally multimodal and bilingual. In order to describe these specific practices, we propose considering verbal and non-verbal, vocal and gestural, materials as parts of one integrated production. A linguistic-centered transcription is not efficient in describing such bimodal productions, since describing bimodal utterances implies taking into account the ‘communicative desire’ (‘vouloir-dire’) of the children. For this reason, both the question of the transcription unit and the issue of the complexity of semiotic interactions in bimodal utterances need to be reconsidered.
APA, Harvard, Vancouver, ISO, and other styles
29

Bolly, Catherine T., and Dominique Boutet. "The multimodal CorpAGEst corpus: keeping an eye on pragmatic competence in later life." Corpora 13, no. 3 (November 2018): 279–317. http://dx.doi.org/10.3366/cor.2018.0151.

Full text
Abstract:
The CorpAGEst project aims to study the pragmatic competence of very old people (75 years old and more), by looking at their use of verbal and gestural pragmatic markers in real-world settings (versus laboratory conditions). More precisely, we hypothesise that identifying multimodal pragmatic patterns in language use, as produced by older adults at the gesture–speech interface, helps to better characterise language variation and communication abilities in later life. The underlying assumption is that discourse markers (e.g., tu sais ‘you know’) and pragmatic gestures (e.g., an exaggerated opening of the eyes) are relevant indicators of stance in discourse. This paper's objective is mainly methodological. It aims to demonstrate how the pragmatic profile of older adults can be established by analysing audio and video data. After a brief theoretical introduction, we describe the annotation protocol that has been developed to explore issues in multimodal pragmatics and ageing. Lastly, first results from a corpus-based study are given, showing how multimodal approaches can tackle important aspects of communicative abilities, at the crossroads of language and ageing research in linguistics.
APA, Harvard, Vancouver, ISO, and other styles
30

Ladewig, Silva, and Lena Hotze. "Zur temporalen Entfaltung und multimodalen Orchestrierung von konzeptuellen Räumen am Beispiel einer Erzählung." Linguistik Online 104, no. 4 (November 15, 2020): 109–36. http://dx.doi.org/10.13092/lo.104.7320.

Full text
Abstract:
The study presented in this article investigates the temporal unfolding and multimodal orchestration of meaning in a narration. Two aspects are focused on. First, the temporal and multimodal orchestration of conceptual spaces in the entire narrative is described. Five conceptual spaces were identified which were construed by multiple visual-kinesic modalities and speech. Moreover, the study showed that the conceptual spaces are often created simultaneously, which, however, does not lead to communication problems due to the media properties of the modalities involved (see also Schmitt 2005). The second part of the analysis zoomed in onto the phase of the narrative climax in which the multimodal production of the narrative space with role shift dominated. By applying a timeline-annotation procedure for gestures (Müller/Ladewig 2013) a temporally unfolding salience structure (Müller/Tag 2010) could be reconstructed which highlights certain semantic aspects in the creation and flow of multimodal meaning. Thus, specific information “necessary” to understand the climax of the narration was foregrounded and made prominent for a co-participant. By focusing methodically and theoretically on the temporal structure and the interplay of different modalities, the paper offers a further contribution to the current discussion about temporality, dynamics and multimodality of language (Deppermann/Günthner 2015; Müller 2008b).
APA, Harvard, Vancouver, ISO, and other styles
31

Partarakis, Nikolaos, Xenophon Zabulis, Antonis Chatziantoniou, Nikolaos Patsiouras, and Ilia Adami. "An Approach to the Creation and Presentation of Reference Gesture Datasets, for the Preservation of Traditional Crafts." Applied Sciences 10, no. 20 (October 19, 2020): 7325. http://dx.doi.org/10.3390/app10207325.

Full text
Abstract:
A wide spectrum of digital data are becoming available to researchers and industries interested in the recording, documentation, recognition, and reproduction of human activities. In this work, we propose an approach for understanding and articulating human motion recordings into multimodal datasets and VR demonstrations of actions and activities relevant to traditional crafts. To implement the proposed approach, we introduce Animation Studio (AnimIO) that enables visualisation, editing, and semantic annotation of pertinent data. AnimIO is compatible with recordings acquired by Motion Capture (MoCap) and Computer Vision. Using AnimIO, the operator can isolate segments from multiple synchronous recordings and export them in multimodal animation files. AnimIO can be used to isolate motion segments that refer to individual craft actions, as described by practitioners. The proposed approach has been iteratively designed for use by non-experts in the domain of 3D motion digitisation.
APA, Harvard, Vancouver, ISO, and other styles
32

Caicedo, Juan C., Jaafar BenAbdallah, Fabio A. González, and Olfa Nasraoui. "Multimodal representation, indexing, automated annotation and retrieval of image collections via non-negative matrix factorization." Neurocomputing 76, no. 1 (January 2012): 50–60. http://dx.doi.org/10.1016/j.neucom.2011.04.037.

Full text
APA, Harvard, Vancouver, ISO, and other styles
33

Larkey, Edward. "Narratological Approaches to Multimodal Cross-Cultural Comparisons of Global TV Formats." Audiovisual Data in Digital Humanities 7, no. 14 (December 31, 2018): 38. http://dx.doi.org/10.18146/2213-0969.2018.jethc152.

Full text
Abstract:
This article cross-culturally compares different versions of the Quebec sitcom/sketch comedy television series Un Gars, Une Fille (1997-2002) by examining the various gender roles and family conflict management strategies in a scene in which the heterosexual couple visits the male character’s mother-in-law. The article summarizes similarities and differences in the narrative structure, sequencing and content of several format adaptations by compiling computer-generated quantitative and qualitative data on the length of segments. To accomplish this, I have used the annotation function of Adobe Premiere, and visualized the findings using Microsoft Excel bar graphs and tables. This study applies a multimodal methodology to reveal the textual organization of scenes, shots and sequences which guide viewers toward culturally proxemic interpretations. This article discusses the benefits of applying the notion of discursive proximity suggested by Uribe-Jongbloed and Espinosa-Medina (2014) to gain a more comprehensive and complex understanding of the multimodal nature of cross-cultural comparison of global television format adaptations.
APA, Harvard, Vancouver, ISO, and other styles
34

Race, Alan M., Daniel Sutton, Gregory Hamm, Gareth Maglennon, Jennifer P. Morton, Nicole Strittmatter, Andrew Campbell, et al. "Deep Learning-Based Annotation Transfer between Molecular Imaging Modalities: An Automated Workflow for Multimodal Data Integration." Analytical Chemistry 93, no. 6 (February 3, 2021): 3061–71. http://dx.doi.org/10.1021/acs.analchem.0c02726.

Full text
APA, Harvard, Vancouver, ISO, and other styles
35

Belhi, Abdelhak, Abdelaziz Bouras, and Sebti Foufou. "Leveraging Known Data for Missing Label Prediction in Cultural Heritage Context." Applied Sciences 8, no. 10 (September 30, 2018): 1768. http://dx.doi.org/10.3390/app8101768.

Full text
Abstract:
Cultural heritage represents a reliable medium for history and knowledge transfer. Cultural heritage assets are often exhibited in museums and heritage sites all over the world. However, many assets are poorly labeled, which decreases their historical value. If an asset’s history is lost, its historical value is also lost. The classification and annotation of overlooked or incomplete cultural assets increase their historical value and allows the discovery of various types of historical links. In this paper, we tackle the challenge of automatically classifying and annotating cultural heritage assets using their visual features as well as the metadata available at hand. Traditional approaches mainly rely only on image data and machine-learning-based techniques to predict missing labels. Often, visual data are not the only information available at hand. In this paper, we present a novel multimodal classification approach for cultural heritage assets that relies on a multitask neural network where a convolutional neural network (CNN) is designed for visual feature learning and a regular neural network is used for textual feature learning. These networks are merged and trained using a shared loss. The combined networks rely on both image and textual features to achieve better asset classification. Initial tests related to painting assets showed that our approach performs better than traditional CNNs that only rely on images as input.
APA, Harvard, Vancouver, ISO, and other styles
36

Poggi, Isabella. "Signals of intensification and attenuation in orchestra and choir conduction." Normas 7, no. 1 (June 23, 2017): 33. http://dx.doi.org/10.7203/normas.7.10423.

Full text
Abstract:
Based on a model of communication according to which not only words but also body signals constitute lexicons (Poggi, 2007), the study presented aimes at building a lexicon of conductors’ multimodal behaviours requesting intensification and attenuation of sound intensity. In a corpus of concerts and rehearsals, the conductors’ body signals requesting to play or sing forte, piano, crescendo, diminuendo were analysed through an annotation scheme describing the body signals, their meanings, and their semiotic devices: generic codified (the same as in everyday language); specific codified (shared with laypeople but with specific meanings in conduction); direct iconic, (resemblance between visual and acoustic modality); indirect iconic, (evoking the technical movement by connected movements or emotion expressions). The work outlines a lexicon of the conductors’ signals that in gesture, head, face, gaze, posture, body convey attenuation and intensification in music.
APA, Harvard, Vancouver, ISO, and other styles
37

Celata, Chiara, Chiara Meluzzi, and Irene Ricci. "The sociophonetics of rhotic variation in Sicilian dialects and Sicilian Italian: corpus, methodology and first results." Loquens 3, no. 1 (September 29, 2016): 025. http://dx.doi.org/10.3989/loquens.2016.025.

Full text
Abstract:
SoPhISM (The SocioPhonetics of verbal Interaction: Sicilian Multimodal corpus) is an acoustic and articulatory sociophonetic corpus focused on whithin-speaker variation as a function of stylistic/communicative factors. The corpus is particularly intended for the study of rhotics as a sociolinguistic variable in the production of Sicilian speakers. Rhotics are analyzed according to the distinction between single-phase and multiple-phase rhotics along with the presence of constriction and aperture articulatory phases. Based on these parameters, the annotation protocol seeks to classify rhotic variants within a sufficiently granular, but internally consistent, phonetic perspective. The proposed descriptive parameters allow for the discussion of atypical realizations in terms of phonetic derivations (or simplifications) of typical closure–aperture sequences. The distribution of fricative variants in the speech repertoire of one speaker and his interlocutors shows the potential provided by SoPhISM for sociophonetic variation to be studied at the ‘micro’ level of individual speaker’s idiolects.
APA, Harvard, Vancouver, ISO, and other styles
38

Kalir, Jeremiah H., and Antero Garcia. "Civic Writing on Digital Walls." Journal of Literacy Research 51, no. 4 (October 3, 2019): 420–43. http://dx.doi.org/10.1177/1086296x19877208.

Full text
Abstract:
Civic writing has appeared on walls over centuries, across cultures, and in response to political concerns. This article advances a civic interrogation of how civic writing is publicly authored, read, and discussed as openly accessible and multimodal texts on digital walls. Drawing upon critical literacy perspectives, we examine how a repertoire of 10 civic writing practices associated with open web annotation (OWA) helped educators develop critical literacy. We introduce a social design experiment in which educators leveraged OWA to discuss educational equity across sociopolitical texts and contexts. We then describe a single case of OWA conversation among educators and use discourse analysis to examine shifting situated meanings and political expressions present in educators’ civic writing practices. We conclude by considering implications for theorizing the marginality of critical literacy, designing learning environments that foster educators’ civic writing, and facilitating learning opportunities that encourage educators’ civic writing across digital walls.
APA, Harvard, Vancouver, ISO, and other styles
39

Tidke, Bharat, Rupa Mehta, and Jenish Dhanani. "Multimodal ensemble approach to identify and rank top-k influential nodes of scholarly literature using Twitter network." Journal of Information Science 46, no. 4 (March 18, 2019): 437–58. http://dx.doi.org/10.1177/0165551519837190.

Full text
Abstract:
Scholarly literature is an immense network of activities, linked via collaborations or information propagation. Analysing such network can be leveraged by harnessing rich semantic meaning of scholarly graph. Identifying and ranking top- k influential nodes from various domains of scholarly literature using social media data are still infancy. Social networking sites like Twitter provide an opportunity to create inventive graph-based measures to identify and rank influential nodes such as scholars, articles, journal, information spreading media and academic institutions of scholarly literature. Many network-based models such as centrality measures have been proposed to identify influential nodes. The empirical annotation shows that centrality measures for finding influential nodes are high in computational complexity. In addition, notion of these measures have high variance, which signifies an influential node deviation with change in application and nature of information flows in the network. The research aims to propose an ensemble learning approach based on multimodal majority voting influence (MMMVI) to identify and weighted multimodal ensemble average influence (WMMEAI) to rank top- k influential nodes in Twitter network data set of well-known three influential nodes, that is, academic institution, scholar and journal. The empirical analysis has been accomplished to learn practicability and efficiency of the proposed approaches when compared with state-of-the-art approaches. The experimental result shows that the ensemble approach using surface learning models (SLMs) can lead to better identification and ranking of influential nodes with low computational complexity.
APA, Harvard, Vancouver, ISO, and other styles
40

Dumont, Émilie, and Georges Quénot. "Automatic Story Segmentation for TV News Video Using Multiple Modalities." International Journal of Digital Multimedia Broadcasting 2012 (2012): 1–11. http://dx.doi.org/10.1155/2012/732514.

Full text
Abstract:
While video content is often stored in rather large files or broadcasted in continuous streams, users are often interested in retrieving only a particular passage on a topic of interest to them. It is, therefore, necessary to split video documents or streams into shorter segments corresponding to appropriate retrieval units. We propose here a method for the automatic segmentation of TV news videos into stories. A-multiple-descriptor based segmentation approach is proposed. The selected multimodal features are complementary and give good insights about story boundaries. Once extracted, these features are expanded with a local temporal context and combined by an early fusion process. The story boundaries are then predicted using machine learning techniques. We investigate the system by experiments conducted using TRECVID 2003 data and protocol of the story boundary detection task, and we show that the proposed approach outperforms the state-of-the-art methods while requiring a very small amount of manual annotation.
APA, Harvard, Vancouver, ISO, and other styles
41

Trujillo, James P., and Judith Holler. "The Kinematics of Social Action: Visual Signals Provide Cues for What Interlocutors Do in Conversation." Brain Sciences 11, no. 8 (July 28, 2021): 996. http://dx.doi.org/10.3390/brainsci11080996.

Full text
Abstract:
During natural conversation, people must quickly understand the meaning of what the other speaker is saying. This concerns not just the semantic content of an utterance, but also the social action (i.e., what the utterance is doing—requesting information, offering, evaluating, checking mutual understanding, etc.) that the utterance is performing. The multimodal nature of human language raises the question of whether visual signals may contribute to the rapid processing of such social actions. However, while previous research has shown that how we move reveals the intentions underlying instrumental actions, we do not know whether the intentions underlying fine-grained social actions in conversation are also revealed in our bodily movements. Using a corpus of dyadic conversations combined with manual annotation and motion tracking, we analyzed the kinematics of the torso, head, and hands during the asking of questions. Manual annotation categorized these questions into six more fine-grained social action types (i.e., request for information, other-initiated repair, understanding check, stance or sentiment, self-directed, active participation). We demonstrate, for the first time, that the kinematics of the torso, head and hands differ between some of these different social action categories based on a 900 ms time window that captures movements starting slightly prior to or within 600 ms after utterance onset. These results provide novel insights into the extent to which our intentions shape the way that we move, and provide new avenues for understanding how this phenomenon may facilitate the fast communication of meaning in conversational interaction, social action, and conversation.
APA, Harvard, Vancouver, ISO, and other styles
42

Maeng, Jun-Ho, Dong-Hyun Kang, and Deok-Hwan Kim. "Deep Learning Method for Selecting Effective Models and Feature Groups in Emotion Recognition Using an Asian Multimodal Database." Electronics 9, no. 12 (November 24, 2020): 1988. http://dx.doi.org/10.3390/electronics9121988.

Full text
Abstract:
Emotional awareness is vital for advanced interactions between humans and computer systems. This paper introduces a new multimodal dataset called MERTI-Apps based on Asian physiological signals and proposes a genetic algorithm (GA)—long short-term memory (LSTM) deep learning model to derive the active feature groups for emotion recognition. This study developed an annotation labeling program for observers to tag the emotions of subjects by their arousal and valence during dataset creation. In the learning phase, a GA was used to select effective LSTM model parameters and determine the active feature group from 37 features and 25 brain lateralization features extracted from the electroencephalogram (EEG) time, frequency, and time–frequency domains. The proposed model achieved a root-mean-square error (RMSE) of 0.0156 in terms of the valence regression performance in the MAHNOB-HCI dataset, and RMSE performances of 0.0579 and 0.0287 in terms of valence and arousal regression performance, and 65.7% and 88.3% in terms of valence and arousal accuracy in the in-house MERTI-Apps dataset, which uses Asian-population-specific 12-channel EEG data and adds an additional brain lateralization (BL) feature. The results revealed 91.3% and 94.8% accuracy in the valence and arousal domain in the DEAP dataset owing to the effective model selection of a GA.
APA, Harvard, Vancouver, ISO, and other styles
43

Zhao, Bin, Zhiyang Liu, Guohua Liu, Chen Cao, Song Jin, Hong Wu, and Shuxue Ding. "Deep Learning-Based Acute Ischemic Stroke Lesion Segmentation Method on Multimodal MR Images Using a Few Fully Labeled Subjects." Computational and Mathematical Methods in Medicine 2021 (January 29, 2021): 1–13. http://dx.doi.org/10.1155/2021/3628179.

Full text
Abstract:
Acute ischemic stroke (AIS) has been a common threat to human health and may lead to severe outcomes without proper and prompt treatment. To precisely diagnose AIS, it is of paramount importance to quantitatively evaluate the AIS lesions. By adopting a convolutional neural network (CNN), many automatic methods for ischemic stroke lesion segmentation on magnetic resonance imaging (MRI) have been proposed. However, most CNN-based methods should be trained on a large amount of fully labeled subjects, and the label annotation is a labor-intensive and time-consuming task. Therefore, in this paper, we propose to use a mixture of many weakly labeled and a few fully labeled subjects to relieve the thirst of fully labeled subjects. In particular, a multifeature map fusion network (MFMF-Network) with two branches is proposed, where hundreds of weakly labeled subjects are used to train the classification branch, and several fully labeled subjects are adopted to tune the segmentation branch. By training on 398 weakly labeled and 5 fully labeled subjects, the proposed method is able to achieve a mean dice coefficient of 0.699 ± 0.128 on a test set with 179 subjects. The lesion-wise and subject-wise metrics are also evaluated, where a lesion-wise F1 score of 0.886 and a subject-wise detection rate of 1 are achieved.
APA, Harvard, Vancouver, ISO, and other styles
44

Boutsi, Argyro-Maria, Charalabos Ioannidis, and Sofia Soile. "An Integrated Approach to 3D Web Visualization of Cultural Heritage Heterogeneous Datasets." Remote Sensing 11, no. 21 (October 26, 2019): 2508. http://dx.doi.org/10.3390/rs11212508.

Full text
Abstract:
The evolution of the high-quality 3D archaeological representations from niche products to integrated online media has not yet been completed. Digital archives of the field often lack multimodal data interoperability, user interaction and intelligibility. A web-based cultural heritage archive that compensates for these issues is presented in this paper. The multi-resolution 3D models constitute the core of the visualization on top of which supportive documentation data and multimedia content are spatial and logical connected. Our holistic approach focuses on the dynamic manipulation of the 3D scene through the development of advanced navigation mechanisms and information retrieval tools. Users parse the multi-modal content in a geo-referenced way through interactive annotation systems over cultural points of interest and automatic narrative tours. Multiple 3D and 2D viewpoints are enabled in real-time to support data inspection. The implementation exploits front-end programming languages, 3D graphic libraries and visualization frameworks to handle efficiently the asynchronous operations and preserve the initial assets’ accuracy. The choice of Greece’s Meteora, UNESCO world site, as a case study accounts for the platform’s applicability to complex geometries and large-scale historical environments.
APA, Harvard, Vancouver, ISO, and other styles
45

Harrison, Simon. "The organisation of kinesic ensembles associated with negation." Gesture 14, no. 2 (December 31, 2014): 117–40. http://dx.doi.org/10.1075/gest.14.2.01har.

Full text
Abstract:
This paper describes the organisation of kinesic ensembles associated with negation in speech through a qualitative study of negative utterances identified in face-to-face conversations between English speakers. All the utterances contain a verbal negative particle (no, not, nothing, etc.) and the kinesic ensembles comprise Open Hand Prone gestures and head shakes, both associated with the expression of negation in previous studies (e.g., Kendon, 2002, 2004; Calbris, 1990, 2011; Harrison, 2009, 2010). To analyse how these elements relate to each other, the utterances were studied in ELAN annotation software with separate analytical tiers for aspects of form in both speech and gestures. The micro-analysis of the temporal and semantic coordination between tiers shows that kinesic ensembles are organized in relation to the node, scope, and focus of negation in speech. Speakers coordinate gesture phrase structures of both head and hand gestures in relation to the grammar of verbal negation, and the gestures they use share a core formational feature that expresses a negative semantic theme in line with the expression of negation in the verbal utterance. The paper demonstrates these connections between grammar and gesture and sheds light on the mechanics of ‘multimodal negation’ at the utterance level.
APA, Harvard, Vancouver, ISO, and other styles
46

Goncharuk, O. S. "Perioperative analgesia and assessment of pain in children (literature review)." Reports of Vinnytsia National Medical University 25, no. 2 (June 24, 2021): 329–35. http://dx.doi.org/10.31393/reports-vnmedical-2021-25(2)-26.

Full text
Abstract:
Annotation. Adequacy of postoperative analgesia and pain assessment remains a pressing issue in children. In order to provide effective pain management to this population, it is important to consider some specific features such as the age of a child, cognitive imparement, mechanisms of pain, and traumatic cimcumstances resulted in nociceptive responses. Therefore, it is essential for clinicians to be able to choose the appropriate tools for pain assessment in different age groups of children and clinical situations, and to interpret the obtained data correctly. Hence, our study aimed to systematize existing problematic aspects of postoperative pain assessment is children and to analyze the evidence on perioperative analgesia in the paediatic practice. For this purpose, we systematically searched MEDLINE, the Cochrane Library and Google Scholar for trials published between 2002-2020. We paid particular attention to the correct choice of pain assessment tools in children of different age groups, and proper interpretation of the data obtained. The study contains the updated recommendations for postoperative pain management in children. There is a special emphasis on priority of multimodal analgesia in children. Analysis of recent publications shows that newborns and children under 5 years of age should be assessed with comprehensive pain scales that include behavioral characteristics and physiological parameters. It is advisable to use self-assessment pain scales for children older than 5 years of age. In order to manage the acute pain effectively, it should be assessed at least every 4-6 hours. Sufficient perioperative analgesia promotes rapid rehabilitation and prevents children from postoperative homeostatic disruption.
APA, Harvard, Vancouver, ISO, and other styles
47

Da Fonte, Renata Fonseca Lima, and Késia Vanessa Nascimento da Silva. "MULTIMODALIDADE NA LINGUAGEM DE CRIANÇAS AUTISTAS: O "NÃO" EM SUAS DIVERSAS MANIFESTAÇÕES." PROLÍNGUA 14, no. 2 (May 6, 2020): 250–62. http://dx.doi.org/10.22478/ufpb.1983-9979.2019v14n2.48829.

Full text
Abstract:
Este trabalho tem o intuito de analisar os aspectos multimodais da linguagem de crianças autistas em contextos interativos de negação, a partir da perspectiva multimodal da linguagem, na qual gesto e produção vocal são duas facetas de uma mesma matriz de significação. Metodologicamente, é um estudo de natureza qualitativa e quantitativa, no qual dados foram extraídos a partir da observação e análise das interações de três crianças autistas com faixa etária entre cinco e seis anos de idade, participantes do Grupo de Estudos e Atendimento ao Espectro Autista – GEAUT/UNICAP. Para a transcrição, utilizou-se o software ELAN (Eudico Linguistic Annotator) que permite a transcrição de áudio e vídeo simultaneamente. Os dados mostraram uma sincronia semântica e temporal de diferentes aspectos multimodais da linguagem: “gesto”, “vocalização/prosódia” e “olhar” nos enunciados negativos das crianças autistas. Entre eles, as estereotipias motoras, o desvio do olhar e a ação de virar as costas caracterizaram-se como aspectos multimodais peculiares do “não” nas crianças autistas.
APA, Harvard, Vancouver, ISO, and other styles
48

Kim, Donghyun, Kuniaki Saito, Kate Saenko, Stan Sclaroff, and Bryan Plummer. "MULE: Multimodal Universal Language Embedding." Proceedings of the AAAI Conference on Artificial Intelligence 34, no. 07 (April 3, 2020): 11254–61. http://dx.doi.org/10.1609/aaai.v34i07.6785.

Full text
Abstract:
Existing vision-language methods typically support two languages at a time at most. In this paper, we present a modular approach which can easily be incorporated into existing vision-language methods in order to support many languages. We accomplish this by learning a single shared Multimodal Universal Language Embedding (MULE) which has been visually-semantically aligned across all languages. Then we learn to relate MULE to visual data as if it were a single language. Our method is not architecture specific, unlike prior work which typically learned separate branches for each language, enabling our approach to easily be adapted to many vision-language methods and tasks. Since MULE learns a single language branch in the multimodal model, we can also scale to support many languages, and languages with fewer annotations can take advantage of the good representation learned from other (more abundant) language data. We demonstrate the effectiveness of our embeddings on the bidirectional image-sentence retrieval task, supporting up to four languages in a single model. In addition, we show that Machine Translation can be used for data augmentation in multilingual learning, which, combined with MULE, improves mean recall by up to 20.2% on a single language compared to prior work, with the most significant gains seen on languages with relatively few annotations. Our code is publicly available1.
APA, Harvard, Vancouver, ISO, and other styles
49

Weiß, Christof, Frank Zalkow, Vlora Arifi-Müller, Meinard Müller, Hendrik Vincent Koops, Anja Volk, and Harald G. Grohganz. "Schubert Winterreise Dataset." Journal on Computing and Cultural Heritage 14, no. 2 (June 2021): 1–18. http://dx.doi.org/10.1145/3429743.

Full text
Abstract:
This article presents a multimodal dataset comprising various representations and annotations of Franz Schubert’s song cycle Winterreise . Schubert’s seminal work constitutes an outstanding example of the Romantic song cycle—a central genre within Western classical music. Our dataset unifies several public sources and annotations carefully created by music experts, compiled in a comprehensive and consistent way. The multimodal representations comprise the singer’s lyrics, sheet music in different machine-readable formats, and audio recordings of nine performances, two of which are freely accessible for research purposes. By means of explicit musical measure positions, we establish a temporal alignment between the different representations, thus enabling a detailed comparison across different performances and modalities. Using these alignments, we provide for the different versions various musicological annotations describing tonal and structural characteristics. This metadata comprises chord annotations in different granularities, local and global annotations of musical keys, and segmentations into structural parts. From a technical perspective, the dataset allows for evaluating algorithmic approaches to tasks such as automated music transcription, cross-modal music alignment, or tonal analysis, and for testing these algorithms’ robustness across songs, performances, and modalities. From a musicological perspective, the dataset enables the systematic study of Schubert’s musical language and style in Winterreise and the comparison of annotations regarding different annotators and granularities. Beyond the research domain, the data may serve further purposes such as the didactic preparation of Schubert’s work and its presentation to a wider public by means of an interactive multimedia experience. With this article, we provide a detailed description of the dataset, indicate its potential for computational music analysis by means of several studies, and point out possibilities for future research.
APA, Harvard, Vancouver, ISO, and other styles
50

Sinte, Aurélie. "Répéter, redire, reformuler : analyse plurisémiotique de conférences TEDx." SHS Web of Conferences 46 (2018): 01001. http://dx.doi.org/10.1051/shsconf/20184601001.

Full text
Abstract:
Cette proposition s’inscrit dans un large projet d’analyse des reformulations multimodales (RM) dans la construction du discours : décrire les relations qu’entretiennent trois canaux sémiotiques multimodaux (la parole (S1), la gestualité co-verbale (S2) et les supports de présentation (S3)) dans des discours scientifiques. L’objectif est de décrire comment les reformulations multimodales participent au caractère performant du discours, à la construction de sa cohérence. Les RM sont étudiées du point de vue interne à chaque système sémiotique (S1, S2, S3) et du point de vue du croisement d’un système à l’autre (rapport S1/S2, S1/S3, S2/S3 et S1/S2/S3). L’analyse en cours s’opère comme suit : repérage des passages où se trouvent des RM et les canaux mobilisés, annotation des données, analyse quantitative et qualitative des RM et des croisements, identification des paradigmes d’utilisation (des prestations sans RM à celles qui exploitent abondamment les croisements sur les 3 niveaux). Contrairement à ce qui a été avancé par d’autres, mon hypothèse est qu’il ne s’agit pas de deux (voire trois) discours distincts et simultanés. Je considère que la linéarité (de S1 d’une part, de S3 d’autre part) et la simultanéité des trois sources d’information (S1, S2 et S3) s’entrecroisent dans la construction d’un discours unique mais plurisémiotique.
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography