Um die anderen Arten von Veröffentlichungen zu diesem Thema anzuzeigen, folgen Sie diesem Link: Imagem audio-visual.

Zeitschriftenartikel zum Thema „Imagem audio-visual“

Geben Sie eine Quelle nach APA, MLA, Chicago, Harvard und anderen Zitierweisen an

Wählen Sie eine Art der Quelle aus:

Machen Sie sich mit Top-50 Zeitschriftenartikel für die Forschung zum Thema "Imagem audio-visual" bekannt.

Neben jedem Werk im Literaturverzeichnis ist die Option "Zur Bibliographie hinzufügen" verfügbar. Nutzen Sie sie, wird Ihre bibliographische Angabe des gewählten Werkes nach der nötigen Zitierweise (APA, MLA, Harvard, Chicago, Vancouver usw.) automatisch gestaltet.

Sie können auch den vollen Text der wissenschaftlichen Publikation im PDF-Format herunterladen und eine Online-Annotation der Arbeit lesen, wenn die relevanten Parameter in den Metadaten verfügbar sind.

Sehen Sie die Zeitschriftenartikel für verschiedene Spezialgebieten durch und erstellen Sie Ihre Bibliographie auf korrekte Weise.

1

Martins, Carlos Pereira. "Memórias em movimento: a tela como ferramenta no ensino de História." Txt: Leituras Transdisciplinares de Telas e Textos 4, no. 8 (2008): 59. http://dx.doi.org/10.17851/1809-8150.4.8.59-67.

Der volle Inhalt der Quelle
Annotation:
<p><strong>Resumo</strong>: Este artigo aborda a imagem na tela refletindo sobre a tradição de não-leitores na sociedade brasileira, a qual favorece uma cultura cinematográfica, tratando da análise do discurso e da memória num momento em que a difusão do conhecimento através do audiovisual é mais presente. Sem esgotar o tema, este texto entende a imagem como instrumento de ensino na sala de aula, possibilitando o trânsito do aluno da tela ao texto e vice-versa.</p><p><strong>Abstract</strong>: This article approaches the image in the screen contemplati
APA, Harvard, Vancouver, ISO und andere Zitierweisen
2

Pereira do Nascimento, Edivaldo Jeronimo, and Ernani Nunes Ribeiro. "Audiodescrição no ensino de ciências biológicas: uma experiência no ensino médio com o ensino sobre células." Educação Online 18, no. 42 (2023): e231810. http://dx.doi.org/10.36556/eol.v18i42.1304.

Der volle Inhalt der Quelle
Annotation:
O artigo demonstra a importância da audiodescrição, uma das tecnologias assistivas da comunicação, em contextos educacionais, no ensino de biologia. Analisamos os educandos com deficiência visual, na tradução de imagens estáticas de células no livro didático do primeiro ano do ensino médio. O estudo partiu da problemática das vivências escolares no uso de discursos imagéticos dispostos em situações didáticas. Como método, foi usado o estudo de caso na cidade do Recife-PE, através de uma entrevista com um sujeito, a partir dos critérios: ser estudante do primeiro ano do ensino médio de escola p
APA, Harvard, Vancouver, ISO und andere Zitierweisen
3

Marins, Líliam Cristina, and Fernanda Gritti. "Livros infantojuvenis Pra Cego Ver : a imagem materializada na audiodescrição." FronteiraZ. Revista do Programa de Estudos Pós-Graduados em Literatura e Crítica Literária, no. 24 (July 6, 2020): 104–18. http://dx.doi.org/10.23925/1983-4373.2020i24p104-118.

Der volle Inhalt der Quelle
Annotation:
O presente artigo tem como objetivo analisar a audiodescrição (AD) de ilustrações presentes em dois textos literários direcionados ao público infantojuvenil: Simplesmente Diferente (2011), de autoria de Mônica Picavêa, e Cinderela (2014), adaptado por Anna Cláudia Ramos. Embora sejam propostas narrativas distintas, ambos os livros compõem as escassas produções de literatura acessível para cegos ou com baixa visão. Ao considerar a importância da ilustração na constituição deste gênero, analisar a produção das audiodescrições de tais materiais em um meio semiótico diferente do visual pode ser um
APA, Harvard, Vancouver, ISO und andere Zitierweisen
4

Matsumoto, Roberta. "Variações sobre teatro e audiovisual[Roberta Matsumoto]." Repertório, no. 28 (December 5, 2017): 47. http://dx.doi.org/10.9771/r.v0i28.24998.

Der volle Inhalt der Quelle
Annotation:
<p class="p1">Resumo:<span class="Apple-converted-space"> </span></p><p class="p3">Nos últimos 30 anos, o diálogo entre o teatro e as tecnologias de produção de imagem tem se tornando cada vez mais intenso, sobretudo com o advento do sistema digital que vem possibilitado a miniaturização dos dispositivos e sua disseminação. Neste artigo, apontamos alguns caminhos de diálogo entre o teatro e o audiovisual no que diz respeito à presença de tais tecnologias como elemento de encenação do espetáculo teatral e à sua utilização como ferramentas que auxiliam o processo de
APA, Harvard, Vancouver, ISO und andere Zitierweisen
5

Duarte, Josimar Faria. "Festa de Devoção a Santa Rita de Cássia em Viçosa." Mosaico 11, no. 1 (2018): 5. http://dx.doi.org/10.18224/mos.v11i1.5928.

Der volle Inhalt der Quelle
Annotation:
O objetivo deste artigo é analisar as memórias e identidades regionais construídas nas práticas de devoção à italiana Margherita Lotti, ou Santa Rita de Cássia, na cidade mineira de Viçosa. Em geral, nos dias próximos a 22 de maio, a população dessa cidade se reúne em festas, sempre marcadas por missas celebradas por vários padres, músicas, danças, procissões cheias de alegorias e fogos de artifício. Os cortejos a Santa tomam as ruas da cidade, sempre sendo acompanhados pelo clero secular e religioso, autoridades civis e políticas, corais e orquestras e por pessoas das mais diversas condições
APA, Harvard, Vancouver, ISO und andere Zitierweisen
6

Thai, P. Kamakshi, P. Manisha, L. Abhigyna Reddy, and M. Nagendhar Reddy. "Deep Face Gen: Speech-Driven Face Image Synthesis." International Scientific Journal of Engineering and Management 04, no. 02 (2025): 1–7. https://doi.org/10.55041/isjem02266.

Der volle Inhalt der Quelle
Annotation:
A framework based on Generative Adversarial Networks (GANs) is proposed to synthesize facial images from audio inputs. The system aims to automatically translate large volumes of audio into understandable facial images without human intervention. By using a GAN architecture, the model generates image features from audio waveforms to reconstruct facial images. It is trained on a dataset of labeled examples, producing facial images corresponding to the identities of the speakers. The method achieves an accuracy of 96.88% for ungrouped data and 93.91% for grouped data. This approach demonstrates
APA, Harvard, Vancouver, ISO und andere Zitierweisen
7

Jahan, Ayesha, Sanobar Shadan, Yasmeen Fatima, and Naheed Sultana. "Image Orator - Image to Speech Using CNN, LSTM and GTTS." International Journal for Research in Applied Science and Engineering Technology 11, no. 6 (2023): 4473–81. http://dx.doi.org/10.22214/ijraset.2023.54470.

Der volle Inhalt der Quelle
Annotation:
Abstract: This report presents an image to audio system that utilizes a combination of Long Short-Term Memory (LSTM) and Convolutional Neural Networks (CNN) for image captioning and Google Text-to-Speech (GTTS) for generating audio output. The aim of the project is to create an accessible system that converts images into descriptive audio signals for visually impaired individuals. The proposed system has the potential to provide meaningful context and information about the image through descriptive audio output, making it easier for visually impaired individuals to engage with visual content.
APA, Harvard, Vancouver, ISO und andere Zitierweisen
8

Scarborough, E., J. Brandt, S. Rogers, P. Amburn, D. Ruck, and M. Ericson. "A Prototype Visual and Audio Display." Presence: Teleoperators and Virtual Environments 1, no. 4 (1992): 459–67. http://dx.doi.org/10.1162/pres.1992.1.4.459.

Der volle Inhalt der Quelle
Annotation:
A display is described that provides a three-dimensional perspective view with spatially correlated audio. The system is developed around an optics device that projects a three-dimensional perspective view from a CRT to a concave mirror that focuses the energy at an image plane above the mirror. The result is that the objects displayed on the CRT appear to be floating in space. The directional audio is provided from an audio localization cue synthesizer that encodes pinna filtering and an interaural time delay onto an input audio signal. A magnetic head tracker is used to keep the audio images
APA, Harvard, Vancouver, ISO und andere Zitierweisen
9

E J, Dinakar. "Context Aware Visual Analysis for Dynamic Audio Narration." INTERNATIONAL JOURNAL OF SCIENTIFIC RESEARCH IN ENGINEERING AND MANAGEMENT 09, no. 05 (2025): 1–9. https://doi.org/10.55041/ijsrem48530.

Der volle Inhalt der Quelle
Annotation:
Abstract—Due to the exponential growth of multimedia content, there is a growing demand for advanced image captioning systems that go beyond static descriptions and provide deep, dynamic audio narratives. This paper introduces "Context- Aware Visual Analysis for Dynamic Audio Narration," a pipeline where computer vision and natural language processing synergize to convert images into contextually informed, user- controlled audio descriptions. In this work, the network leverages the robust architecture of `salesforce/BLIP-image- captioning-large` alongside a fine-tuned `google/FLAN-T5- large` m
APA, Harvard, Vancouver, ISO und andere Zitierweisen
10

Seo, June-Seok, Sung-Dae Hong, and Jin-Wan Park. "Configuration of Audio-Visual System using Visual Image." Journal of the Korea Contents Association 8, no. 6 (2008): 121–29. http://dx.doi.org/10.5392/jkca.2008.8.6.121.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
11

Retsikas, Konstantinos. "Unleashing the Future: Deleuze's Crystals of Time and Theo Angelopoulos's The Travelling Players." Deleuze and Guattari Studies 18, no. 4 (2024): 570–94. http://dx.doi.org/10.3366/dlgs.2024.0573.

Der volle Inhalt der Quelle
Annotation:
This article discusses The Travelling Players (1975), Theo Angelopoulos's provocative film, by upholding the priority the director's audio-visual images assign to the future as what Deleuze calls ‘the pure and empty form of time’. It also argues that The Travelling Players supplements the Deleuzian quartet of crystal-images with a new type of crystal-image: Angelopoulos crafts images of extraordinary consistency that act as filters, splitting and sieving time at once, a kind of audio-visual refinery laid out on the silver screen, with the distillation process dedicated to blocking repression,
APA, Harvard, Vancouver, ISO und andere Zitierweisen
12

Vyawahare, Prof D. G. "Image to Audio Conversion for Blind People Using Neural Network." International Journal for Research in Applied Science and Engineering Technology 11, no. 12 (2023): 1949–57. http://dx.doi.org/10.22214/ijraset.2023.57712.

Der volle Inhalt der Quelle
Annotation:
Abstract: The development of an image-to-audio conversion system represents a significant stride towards enhancing accessibility and autonomy for visually impaired individuals. This innovative technology leverages computer vision and audio synthesis techniques to convert visual information from images into auditory cues, enabling blind users to interpret and comprehend their surroundings more effectively. The core of this system relies on advanced computer vision algorithms that process input images, recognizing objects, text, and scene elements. These algorithms employ deep learning models to
APA, Harvard, Vancouver, ISO und andere Zitierweisen
13

A, Mr Balaji. "Extracting Audio from Image Using Machine Learning." INTERANTIONAL JOURNAL OF SCIENTIFIC RESEARCH IN ENGINEERING AND MANAGEMENT 08, no. 04 (2024): 1–5. http://dx.doi.org/10.55041/ijsrem31532.

Der volle Inhalt der Quelle
Annotation:
This study introduces a new method for extracting sound from pictures by utilizing machine learning. Lately, there has been a lot of excitement around multi-modal learning because of its ability to reveal valuable information from various sources, like images and sound. Our research is centered on using the unique qualities of visual and auditory signals to predict sound content from pictures. This opens up possibilities for enhancing accessibility, creating content, and providing immersive user experiences. We start by exploring previous research in multi-modal learning, audio-visual processi
APA, Harvard, Vancouver, ISO und andere Zitierweisen
14

Orynbay, Laura, Bibigul Razakhova, Peter Peer, Blaž Meden, and Žiga Emeršič. "Recent Advances in Synthesis and Interaction of Speech, Text, and Vision." Electronics 13, no. 9 (2024): 1726. http://dx.doi.org/10.3390/electronics13091726.

Der volle Inhalt der Quelle
Annotation:
In recent years, there has been increasing interest in the conversion of images into audio descriptions. This is a field that lies at the intersection of Computer Vision (CV) and Natural Language Processing (NLP), and it involves various tasks, including creating textual descriptions of images and converting them directly into auditory representations. Another aspect of this field is the synthesis of natural speech from text. This has significant potential to improve accessibility, user experience, and the applications of Artificial Intelligence (AI). In this article, we reviewed a wide range
APA, Harvard, Vancouver, ISO und andere Zitierweisen
15

Xu, Xin, and Su Mei Xi. "Cross-Media Retrieval Method Based on Space Mapping." Advanced Materials Research 756-759 (September 2013): 1898–902. http://dx.doi.org/10.4028/www.scientific.net/amr.756-759.1898.

Der volle Inhalt der Quelle
Annotation:
This paper puts forward a novel cross-media retrieval approach, which can process multimedia data of different modalities and measure cross-media similarity, such as image-audio similarity. Both image and audio data are selected for experiments and comparisons. Given the same visual and auditory features the new approach outperforms ICA, PCA and PLS methods both in precision and recall performance. Overall cross-media retrieval results between images and audios are very encouraging.
APA, Harvard, Vancouver, ISO und andere Zitierweisen
16

Musse, Christina Ferraz, and Ana Clara Campos dos Santos. "Memória e filmes domésticos em Super 8: a família Assis em Juiz de Fora - MG." Revista Observatório 1, no. 3 (2015): 181. http://dx.doi.org/10.20873/uft.2447-4266.2015v1n3p181.

Der volle Inhalt der Quelle
Annotation:
Neste trabalho abordamos a memória como objeto de estudo, a película cinematográfica Super 8 e os filmes domésticos. No embasamento teórico, utilizamos autores como Andreas Huyssen e Pierre Nora (memória), Roger Odin e Lila Foster (filmes domésticos). Nosso objetivo é apresentar os filmes em Super 8 feitos pelo fotógrafo Márcio Assis na década de 1970, a fim de verificar as relações entre sua narrativa oral atual por meio da gravação em áudio de seus comentários do filme, em comparação a sua narrativa visual sobre a própria família na década de 1970. Também fazemos um estudo de como os familia
APA, Harvard, Vancouver, ISO und andere Zitierweisen
17

Braun, Sabine. "Creating Coherence in Audio Description." Meta 56, no. 3 (2012): 645–62. http://dx.doi.org/10.7202/1008338ar.

Der volle Inhalt der Quelle
Annotation:
As an emerging form of intermodal translation, audio description (AD) raises many new questions for Translation Studies and related disciplines. This paper will investigate the question of how the coherence of a multimodal source text such as a film can be re-created in audio description. Coherence in film characteristically emerges from links within and across different modes of expression (e.g., links between visual images, image-sound links and image-dialogue links). Audio describing a film is therefore not simply a matter of substituting visual images with verbal descriptions. It involves
APA, Harvard, Vancouver, ISO und andere Zitierweisen
18

Mahamad Amin, Nurrul Akma, Nilam Nur Amir Sjarif, and Siti Sophiayati Yuhaniz. "A Review of Convolutional Neural Network Model for Audio-Visual Features Extraction in Personality Traits Recognition." International Journal of Innovative Computing 15, no. 1 (2025): 45–52. https://doi.org/10.11113/ijic.v15n1.498.

Der volle Inhalt der Quelle
Annotation:
In the field of personality computing research, the analysis on audio-visual input is predominantly used to detect human personality behaviors. With the advancement of computer vision technology, there has been significant enhancement in personality computing. Personality trait recognition is one of the applications under personality computing where the machine can analyze human behaviors and recognize personality traits via video analysis. In video input, there are different audio-visual features characteristics, consisting of visual (images) and audio (sounds) elements. Therefore, it is crit
APA, Harvard, Vancouver, ISO und andere Zitierweisen
19

Nakamura, Takumi, Yagi Daichi, Kuangzhe Xu, Toshihiko Matsuka, and Keita Hirai. "Investigating Effects of Visual and Auditory Adaptation on Metallic Material Appearance." Color and Imaging Conference 2020, no. 28 (2020): 130–35. http://dx.doi.org/10.2352/issn.2169-2629.2020.28.20.

Der volle Inhalt der Quelle
Annotation:
In this paper, we investigated the effects of visual and auditory adaptation on material appearance. The target in this study was metallic perception. First, participants evaluated CG images using sounds and other images. In the experiment, we prepared metallic stimulus under various adaptation conditions with different combinations of metal image, non-metal image, metal sound, and non-metal sound stimuli. After these adaptations, the participants answered "metal" or "non-metal" after viewing a displayed reference image. The reference images were generated by interpolating metal and non-metal
APA, Harvard, Vancouver, ISO und andere Zitierweisen
20

Zhang, Jiaqi, Dongpo Zhang, and Shang Zhang. "A Digital Image Steganographic Detection Method for LSB Steganography." International Journal of Computer Science and Information Technology 4, no. 2 (2024): 256–63. http://dx.doi.org/10.62051/ijcsit.v4n2.33.

Der volle Inhalt der Quelle
Annotation:
With the advancement of communication technology, communication information can no longer be transmitted merely through text but can be covertly conveyed via various types of communication carriers like images, audio, and video. This brings forth new challenges to traditional detection techniques that mainly target text information. Digital image steganalysis aims to detect and uncover communication information by recognizing steganographic behaviors contained within digital image files. This paper takes digital image steganographic activities based on least significant bit (LSB) as the main r
APA, Harvard, Vancouver, ISO und andere Zitierweisen
21

Zheliezniak, Serafym. "Problems of Definition and Classification of Sonic Image in the Audio-visual Culture." Culturology Ideas, no. 14 (2'2018) (2018): 199–204. http://dx.doi.org/10.37627/2311-9489-14-2018-2.199-204.

Der volle Inhalt der Quelle
Annotation:
The purpose of the research is to identify the basis of the sonic image in audio-visual culture, to substantiate the interrelation of its elements and to demonstrate the peculiarities of its functioning. The following specific methods were used to obtain the desirable scientific results: the analysis was used to dissect the subject of the research into individual components, to study their properties, that helped to create a coherent idea of the notion of the sonic image; systematic method was used for the building of a certain structure and typology, which would allow to organize the knowledg
APA, Harvard, Vancouver, ISO und andere Zitierweisen
22

Wu, Xinyi, Zhenyao Wu, Lili Ju, and Song Wang. "Binaural Audio-Visual Localization." Proceedings of the AAAI Conference on Artificial Intelligence 35, no. 4 (2021): 2961–68. http://dx.doi.org/10.1609/aaai.v35i4.16403.

Der volle Inhalt der Quelle
Annotation:
Localizing sound sources in a visual scene has many important applications and quite a few traditional or learning-based methods have been proposed for this task. Humans have the ability to roughly localize sound sources within or beyond the range of the vision using their binaural system. However most existing methods use monaural audio, instead of binaural audio, as a modality to help the localization. In addition, prior works usually localize sound sources in the form of object-level bounding boxes in images or videos and evaluate the localization accuracy by examining the overlap between t
APA, Harvard, Vancouver, ISO und andere Zitierweisen
23

Lavanya, K., B. Jayamala, C. Jeyasri, and A. Sakthivel. "Automatic Audio and Image Caption Generation with Deep Learning." Shanlax International Journal of Arts, Science and Humanities 11, S3-July (2024): 34–39. http://dx.doi.org/10.34293/sijash.v11is3-july.7916.

Der volle Inhalt der Quelle
Annotation:
A novel approach to image caption generation tailored specifically for visually impaired individuals. The proposed system employs advanced computer vision algorithms to analyze images and generate descriptive textual captions. Furthermore, it integrates seamless text-to-speech conversion functionality, allowing for the automatic transformation of these captions into spoken audio, thereby enabling access to visual content for individuals with visual impairments. The goal of this project is to generate descriptive captions for a given photograph or image. We achieve this by employing Convolution
APA, Harvard, Vancouver, ISO und andere Zitierweisen
24

Wang, Suzhen, Lincheng Li, Yu Ding, and Xin Yu. "One-Shot Talking Face Generation from Single-Speaker Audio-Visual Correlation Learning." Proceedings of the AAAI Conference on Artificial Intelligence 36, no. 3 (2022): 2531–39. http://dx.doi.org/10.1609/aaai.v36i3.20154.

Der volle Inhalt der Quelle
Annotation:
Audio-driven one-shot talking face generation methods are usually trained on video resources of various persons. However, their created videos often suffer unnatural mouth shapes and asynchronous lips because those methods struggle to learn a consistent speech style from different speakers. We observe that it would be much easier to learn a consistent speech style from a specific speaker, which leads to authentic mouth movements. Hence, we propose a novel one-shot talking face generation framework by exploring consistent correlations between audio and visual motions from a specific speaker and
APA, Harvard, Vancouver, ISO und andere Zitierweisen
25

Sanguineti, Valentina, Pietro Morerio, Alessio Del Bue, and Vittorio Murino. "Audio-Visual Localization by Synthetic Acoustic Image Generation." Proceedings of the AAAI Conference on Artificial Intelligence 35, no. 3 (2021): 2523–31. http://dx.doi.org/10.1609/aaai.v35i3.16354.

Der volle Inhalt der Quelle
Annotation:
Acoustic images constitute an emergent data modality for multimodal scene understanding. Such images have the peculiarity to distinguish the spectral signature of sounds coming from different directions in space, thus providing richer information than the one derived from mono and binaural microphones. However, acoustic images are typically generated by cumbersome microphone arrays, which are not as widespread as ordinary microphones mounted on optical cameras. To exploit this empowered modality while using standard microphones and cameras we propose to leverage the generation of synthetic aco
APA, Harvard, Vancouver, ISO und andere Zitierweisen
26

Zhang, Dongxu, Hao Chen, Xinyi Zhang, and Lingge Tan. "Evaluation of Landscapes and Soundscapes in Traditional Villages in the Hakka Region of Guangdong Province Based on Audio-Visual Interactions." Buildings 15, no. 2 (2025): 259. https://doi.org/10.3390/buildings15020259.

Der volle Inhalt der Quelle
Annotation:
Traditional villages in the Hakka region of Guangdong Province have attracted significant attention for their unique cultural heritage and traditional lifestyles. Their favourable audio-visual environments offer immersive and realistic experiences for both residents and visitors. Thus, we selected four representative villages and used semantic segmentation to extract the core visual elements (sky, vegetation, construction, and dynamic) from visual landscape images. Audio-visual interaction experiments and subjective surveys were conducted to investigate the participants’ evaluations of the vis
APA, Harvard, Vancouver, ISO und andere Zitierweisen
27

Wardani, Kiky Rizky nova. "ANALISA EFEKTIVITAS PENGGUNAAN MULTIMEDIA SEBAGAI ALTERNATIF MODEL PEMBELAJARAN." Jurnal Teknologi Informasi MURA 11, no. 1 (2019): 37–45. http://dx.doi.org/10.32767/jti.v11i1.436.

Der volle Inhalt der Quelle
Annotation:
Abstrak
 Penelitian ini bertujuan untuk mengetahui seberapa efektif hasil ketercapaian pembelajaran siswa/siswi SMK N 1 INTAN, yang diajar menggunakan bantuan gambar, audio visual (media), dan hasil belajar siswa-siswi yang diajarkan dengan menggunakan metode konvensional. Subjek penelitian adalah siswa-siswi kelas 12 SMK N 1 INTAN mata pelajaran Teknik Komputer Jaringan pada tahun 2018/2019, yang dibagi menjadi kelompok coba (eksperimen) dan kelompok kontrol. setiap kelompok terdiri dari 17 siswa-siswi (8 laki-laki dan 9 perempuan) dan kelompok control 16 siswa-siswi (9 laki-laki dan 7 p
APA, Harvard, Vancouver, ISO und andere Zitierweisen
28

Domniţeanu, Aurelia. "Development of reading competence through multimodal texts." Journal of Educational Theory and Practice DIDACTICA PRO... 20, no. 2-3 (120-121) (2020): 54–56. https://doi.org/10.5281/zenodo.3931115.

Der volle Inhalt der Quelle
Annotation:
The multimodal text is a novelty of the school curriculum and contains two or more semiotic systems: linguistic, visual, audio, gesture, spatial. Multimodality involves interference between word, image, sound, gesture and movement. Multimodal texts contribute to the development of reading competence through their attractiveness, using images and digital elements.  
APA, Harvard, Vancouver, ISO und andere Zitierweisen
29

Udayashree, Dr S. "Deep Fake Using Audio and Image Detection." International Journal for Research in Applied Science and Engineering Technology 13, no. 5 (2025): 2716–23. https://doi.org/10.22214/ijraset.2025.70706.

Der volle Inhalt der Quelle
Annotation:
In the AI-driven era, deep fakes, generated through advanced techniques like Generative Adversarial Networks (GANs), present significant threats by creating highly realistic yet fabricated media. While audio deep fakes have received considerable attention, the detection of manipulated images remains underexplored, creating a critical gap in comprehensive deep fake identification. Our proposed system bridges this gap by integrating transfer learning for enhanced detection across both fake audio and manipulated images. For image analysis, we utilize the VGG19 architecture, leveraging its deep co
APA, Harvard, Vancouver, ISO und andere Zitierweisen
30

Et. al., D. N. V. S. L. S. Indira,. "An Enhanced CNN-2D for Audio-Visual Emotion Recognition (AVER) Using ADAM Optimizer." Turkish Journal of Computer and Mathematics Education (TURCOMAT) 12, no. 5 (2021): 1378–88. http://dx.doi.org/10.17762/turcomat.v12i5.2030.

Der volle Inhalt der Quelle
Annotation:
The importance of integrating visual components into the speech recognition process for improving robustness has been identified by recent developments in audio visual emotion recognition (AVER). Visual characteristics have a strong potential to boost the accuracy of current techniques for speech recognition and have become increasingly important when modelling speech recognizers. CNN is very good to work with images. An audio file can be converted into image file like a spectrogram with good frequency to extract hidden knowledge. This paper provides a method for emotional expression recogniti
APA, Harvard, Vancouver, ISO und andere Zitierweisen
31

Kunzendorf, Robert G., Scott S. Lyman, Brenda Sousa, and Emily Hilly. "“Imageless” Spatial and Temporal Rules Can Be Tested and Refined by Constructing Vivid Visual and Auditory Images." Imagination, Cognition and Personality 32, no. 2 (2012): 115–50. http://dx.doi.org/10.2190/ic.32.2.c.

Der volle Inhalt der Quelle
Annotation:
In this computerized study, research participants completed both Marks' (1973) Vividness of Visual Imagery Questionnaire and Kunzendorf's (1979) Vividness of Auditory Imagery Questionnaire and, immediately thereafter, completed either a visuo-spatial rule-development exercise or an audio-temporal rule-development exercise. During the visuo-spatial exercise, participants were administered 20 four-alternative quizzes regarding the schematic rules of 3-point perspective (3PP) and, between quizzes, were instructed to figure out the rules by constructing visual images that serve to test their devel
APA, Harvard, Vancouver, ISO und andere Zitierweisen
32

Hu, Hwai-Tsu, and Tung-Tsun Lee. "Hiding Full-Color Images into Audio with Visual Enhancement via Residual Networks." Cryptography 7, no. 4 (2023): 47. http://dx.doi.org/10.3390/cryptography7040047.

Der volle Inhalt der Quelle
Annotation:
Watermarking is a viable approach for safeguarding the proprietary rights of digital media. This study introduces an innovative fast Fourier transform (FFT)-based phase modulation (PM) scheme that facilitates efficient and effective blind audio watermarking at a remarkable rate of 508.85 numeric values per second while still retaining the original quality. Such a payload capacity makes it possible to embed a full-color image of 64 × 64 pixels within an audio signal of just 24.15 s. To bolster the security of watermark images, we have also implemented the Arnold transform in conjunction with ch
APA, Harvard, Vancouver, ISO und andere Zitierweisen
33

Zhang, Jingyu, Xinyi Yan, Yi Xiang, Yingyi Zhang, and Chengzhi Zhang. "Building a Multimodal Dataset of Academic Paper for Keyword Extraction." Proceedings of the Association for Information Science and Technology 61, no. 1 (2024): 435–46. http://dx.doi.org/10.1002/pra2.1040.

Der volle Inhalt der Quelle
Annotation:
ABSTRACTUp to this point, keyword extraction task typically relies solely on textual data. Neglecting visual details and audio features from image and audio modalities leads to deficiencies in information richness and overlooks potential correlations, thereby constraining the model's ability to learn representations of the data and the accuracy of model predictions. Furthermore, the currently available multimodal datasets for keyword extraction task are particularly scarce, further hindering the progress of research on multimodal keyword extraction task. Therefore, this study constructs a mult
APA, Harvard, Vancouver, ISO und andere Zitierweisen
34

Hryhorenko, N., N. Larionov, and V. Bredikhin. "RESEARCH OF THE PROCESS OF VISUAL ART TRANSMISSION IN MUSIC AND THE CREATION OF COLLECTIONS FOR PEOPLE WITH VISUAL IMPAIRMENTS." Municipal economy of cities 6, no. 180 (2023): 2–6. http://dx.doi.org/10.33042/2522-1809-2023-6-180-2-6.

Der volle Inhalt der Quelle
Annotation:
This article explores the creation of music through the automated generation of sounds from images. The developed automatic image sound generation method is based on the joint use of neural networks and light-music theory. Translating visual art into music using machine learning models can be used to make extensive museum collections accessible to the visually impaired by translating artworks from an inaccessible sensory modality (sight) to an accessible one (hearing). Studies of other audio-visual models have shown that previous research has focused on improving model performance with multimo
APA, Harvard, Vancouver, ISO und andere Zitierweisen
35

Bhosale, Swapnil, Haosen Yang, Diptesh Kanojia, Jiankang Deng, and Xiatian Zhu. "Unsupervised Audio-Visual Segmentation with Modality Alignment." Proceedings of the AAAI Conference on Artificial Intelligence 39, no. 15 (2025): 15567–75. https://doi.org/10.1609/aaai.v39i15.33709.

Der volle Inhalt der Quelle
Annotation:
Audio-Visual Segmentation (AVS) aims to identify, at the pixel level, the object in a visual scene that produces a given sound. Current AVS methods rely on costly fine-grained annotations of mask-audio pairs, making them impractical for scalability. To address this, we propose the Modality Correspondence Alignment (MoCA) framework, which seamlessly integrates off-the-shelf foundation models like DINO, SAM, and ImageBind. Our approach leverages existing knowledge within these models and optimizes their joint usage for multimodal associations. Our approach relies on estimating positive and negat
APA, Harvard, Vancouver, ISO und andere Zitierweisen
36

Upadhyay, Shrikant, Mohit Kumar, Aditi Upadhyay, et al. "Digital Image Identification and Verification Using Maximum and Preliminary Score Approach with Watermarking for Security and Validation Enhancement." Electronics 12, no. 7 (2023): 1609. http://dx.doi.org/10.3390/electronics12071609.

Der volle Inhalt der Quelle
Annotation:
Digital face approaches possess currently received awesome attention because of their huge wide variety of digital audio, and visual programs. Digitized snapshots are progressively more communicated using an un-relaxed medium together with cyberspace. Consequently, defence, clinical, medical, and exceptional supervised photographs are essentially blanketed towards trying to employ it; such controls ought to damage such choices constructed totally based on those pictures. So, to shield the originality of digital audio/visual snapshots, several approaches proposed. Such techniques incorporate tr
APA, Harvard, Vancouver, ISO und andere Zitierweisen
37

Setiawati, Hani Martha Puji, Steaven Octavianus, and Dwi Novita Sari. "Penggunaan Media Audio Visual dalam Pengajaran Sekolah Minggu di Gereja Kemah Tabernakel, Bumiayu, Salatiga." Jurnal EFATA: Jurnal Teologi dan Pelayanan 8, no. 1 (2022): 59–70. http://dx.doi.org/10.47543/efata.v8i1.58.

Der volle Inhalt der Quelle
Annotation:
Education in the industrial revolution 4.0 has increased in terms of technology. Early Sunday school learning, which only used the lecture method, is now using various methods. The audio-visual teaching method is a combination of image and sound, this method makes the child's sense of hearing and sense of sight easier to understand the material presented. Audio-visual media does not only depend on laptops or LCD projectors, but teachers can provide alternative audio-visual methods using painted or printed images accompanied by spiritual song instruments using speakers or gadgets. It makes Sund
APA, Harvard, Vancouver, ISO und andere Zitierweisen
38

Chaturvedi, Rajnish Kumar, Dinesh Prasad Sahu, Manoj Kumar Tyagi, et al. "Visual object detection using audio data." Journal of Physics: Conference Series 2664, no. 1 (2023): 012006. http://dx.doi.org/10.1088/1742-6596/2664/1/012006.

Der volle Inhalt der Quelle
Annotation:
Abstract Nowadays Internet of Things (IoT) and Machine Learning (ML) are growing fields. One application of these two fields is object detection, which detects semantic objects using digital images and videos of classes like humans, vehicles, buildings, etc. Visual object detection systems are very effective and accurate due to the appearance information obtained from the cameras. But they face the problem of a limited Field of View. This paper aims to tackle this issue by using audio data to localize the object. A microphone is used to estimate the angular position of the object emitting the
APA, Harvard, Vancouver, ISO und andere Zitierweisen
39

Li, Jialu. "Integrating Multimodal Data for Deep Learning-Based Facial Emotion Recognition." Highlights in Science, Engineering and Technology 124 (February 18, 2025): 362–67. https://doi.org/10.54097/gpy08650.

Der volle Inhalt der Quelle
Annotation:
With the rapid development of neural networks, emotion recognition has become a research area of great concern. It has important applications not only in marketing and human-computer interaction but also holds significant importance for improving emotional computing and user experience. This paper studies various methods for emotion recognition in images and videos, utilizing convolutional neural networks (CNN), multi-layer perceptron (MLP), and fusion models. The Facial Expression Recognition 2013 (FER2013) image dataset and the Ryerson Audio-Visual Database of Emotional Speech and Song (RAVD
APA, Harvard, Vancouver, ISO und andere Zitierweisen
40

Muhammad Ikhsan and Muhammad Syafiq Humaisi. "PEMANFAATAN MEDIA PEMBELAJARAN AUDIO VISUAL DALAM MENGEMBANGKAN MOTIVASI BELAJAR SISWA PADA MATA PELAJARAN IPS TERPADU." JIIPSI: Jurnal Ilmiah Ilmu Pengetahuan Sosial Indonesia 1, no. 1 (2021): 1–12. http://dx.doi.org/10.21154/jiipsi.v1i1.45.

Der volle Inhalt der Quelle
Annotation:
ABSTRAK
 Penelitian ini dilatar belakangi oleh keingin tahuan peneliti dalam pemanfaatan media pembelajaran audio visual yang ada di SMP Negeri 1 Jenangan, Media audio visual adalah media yang mempunyai unsur suara dan unsur gambar, Jenis media ini mampu memberikan pengalaman belajar yang lebih baik karena meliputi suara dan gambar. Untuk menunjang kesuksesan media ini haruslah dengan sarana dan prasarana yang memadai dan layak untuk difungsikan. Sejumlah peralatan yang dipakai oleh para guru dalam menyampaikan konsep, gagasan dan pengalaman yang ditangkap oleh indera pandang dan pendenga
APA, Harvard, Vancouver, ISO und andere Zitierweisen
41

Akiyama, Asahi, Miki Yonemura, and Shinichi Sakamoto. "Experimental study on the effect of visual information of source image to evaluate the annoyance of aircraft noise." INTER-NOISE and NOISE-CON Congress and Conference Proceedings 268, no. 8 (2023): 419–25. http://dx.doi.org/10.3397/in_2023_0074.

Der volle Inhalt der Quelle
Annotation:
Laboratory tests in anechoic chambers are effective for evaluating environmental noise. In the actual environment, it is said that human reactions are greatly influenced not only by a sense of hearing but also by various senses, especially visual sensation. In this study, we experimentally evaluated the annoyance of environmental noise using an audio-visual simulation system that combines a three-dimensional sound field system in an anechoic room and a video playback system equipped with a dome-shaped screen. Previous studies have shown that the annoyance of traffic noise changes depending on
APA, Harvard, Vancouver, ISO und andere Zitierweisen
42

Stanton, Polly. "Sound, listening and the moving image." Qualitative Research Journal 19, no. 1 (2019): 65–71. http://dx.doi.org/10.1108/qrj-12-2018-0019.

Der volle Inhalt der Quelle
Annotation:
Purpose As an artist working with sound and the moving image, an in-between space is revealed, a flux between two distinct mediums that intersect as temporal experience and sensory synchronisation. The audio–visual relationship is a pattern of constantly shifting moments of connection and discordance, an ephemeral dance of timing and rhythm that binds together to create a cinematic expression of time and event. The paper aims to discuss this issue. Design/methodology/approach In this paper, the author will consider the audio-visual event and the space that exists between the visual and the son
APA, Harvard, Vancouver, ISO und andere Zitierweisen
43

Zhu, Ying-Xin, and Hao-Ran Jin. "Speaker Localization Based on Audio-Visual Bimodal Fusion." Journal of Advanced Computational Intelligence and Intelligent Informatics 25, no. 3 (2021): 375–82. http://dx.doi.org/10.20965/jaciii.2021.p0375.

Der volle Inhalt der Quelle
Annotation:
The demand for fluency in human–computer interaction is on an increase globally; thus, the active localization of the speaker by the machine has become a problem worth exploring. Considering that the stability and accuracy of the single-mode localization method are low, while the multi-mode localization method can utilize the redundancy of information to improve accuracy and anti-interference, a speaker localization method based on voice and image multimodal fusion is proposed. First, the voice localization method based on time differences of arrival (TDOA) in a microphone array and the face d
APA, Harvard, Vancouver, ISO und andere Zitierweisen
44

Park, Se Jin, Minsu Kim, Joanna Hong, Jeongsoo Choi, and Yong Man Ro. "SyncTalkFace: Talking Face Generation with Precise Lip-Syncing via Audio-Lip Memory." Proceedings of the AAAI Conference on Artificial Intelligence 36, no. 2 (2022): 2062–70. http://dx.doi.org/10.1609/aaai.v36i2.20102.

Der volle Inhalt der Quelle
Annotation:
The challenge of talking face generation from speech lies in aligning two different modal information, audio and video, such that the mouth region corresponds to input audio. Previous methods either exploit audio-visual representation learning or leverage intermediate structural information such as landmarks and 3D models. However, they struggle to synthesize fine details of the lips varying at the phoneme level as they do not sufficiently provide visual information of the lips at the video synthesis step. To overcome this limitation, our work proposes Audio-Lip Memory that brings in visual in
APA, Harvard, Vancouver, ISO und andere Zitierweisen
45

Farenza, Gita, Edy Susanto, and Yenni Fitria. "BASED LEARNING MEDIA IN CLASS X STUDENTS OF SMAN 8 BENGKULU CITY." Journal Of Dehasen Educational Review 3, no. 01 (2022): 9–15. http://dx.doi.org/10.33258/joder.v3i01.2137.

Der volle Inhalt der Quelle
Annotation:
The purpose of this research:to knowUtilization of Audio Visual-Based Learning Media in Class X Students of SMAN 8 Bengkulu City. The qualitative research method utilizes and analyzes practically and objectively about the Utilization of Audio Visual-Based Learning Media in Class X Students of SMAN 8 Bengkulu City. The steps of this research are observation, interview and documentation. The results of the research on the use of media in ICT subjects at SMA Negeri 8 Bengkulu city that are used and applied are audio-visual media, because this audio-visual media is a media that involves all the se
APA, Harvard, Vancouver, ISO und andere Zitierweisen
46

Geyer, Carolin, Luzia Zimmermann, and Melanie Wyss. "Projected and perceived destination image of the Lake Lucerne Region." Journal of Destination Marketing & Management 33 (July 13, 2024): 100921. https://doi.org/10.5281/zenodo.13777884.

Der volle Inhalt der Quelle
Annotation:
Tourists play an important role in shaping the destination image by posting their experiences on visual and audio-visual channels. Thus, it is increasingly important for a successful destination marketing strategy that destination managers know how their destination is perceived and whether this is aligned with the image they project. This comparative content analysis evaluates the potential gap between the projected and perceived destination image for the Lake Lucerne Region in Switzerland on Instagram. From a database of over 10,000 posts, this study takes a closer look at 300 randomly selec
APA, Harvard, Vancouver, ISO und andere Zitierweisen
47

Jamil, Sonain, Fawad, MuhibUr Rahman, et al. "Malicious UAV Detection Using Integrated Audio and Visual Features for Public Safety Applications." Sensors 20, no. 14 (2020): 3923. http://dx.doi.org/10.3390/s20143923.

Der volle Inhalt der Quelle
Annotation:
Unmanned aerial vehicles (UAVs) have become popular in surveillance, security, and remote monitoring. However, they also pose serious security threats to public privacy. The timely detection of a malicious drone is currently an open research issue for security provisioning companies. Recently, the problem has been addressed by a plethora of schemes. However, each plan has a limitation, such as extreme weather conditions and huge dataset requirements. In this paper, we propose a novel framework consisting of the hybrid handcrafted and deep feature to detect and localize malicious drones from th
APA, Harvard, Vancouver, ISO und andere Zitierweisen
48

Widiastuti, Rina, and Lastria Nurtanzila. "Membaca Citra Indonesia Dalam Arsip Audio Visual Kementerian Pariwisata." Diplomatika: Jurnal Kearsipan Terapan 2, no. 1 (2018): 44. http://dx.doi.org/10.22146/diplomatika.35300.

Der volle Inhalt der Quelle
Annotation:
This article aims to explore the image of Indonesia created by the Ministry of Tourism through the audio-visual archive, which is stored and published on its official youtube channel. This study is in response to the Indonesian government's policy to improve the image of Indonesia and build a strong nation brand for Indonesia to compete in global level. Using the content analysis method, I analyzed the videos on the Indonesia.Travel channel to reveal the image of Indonesia promoted by the Indonesian government. Based on the icons that appear on the 218 videos on the youtube channel, we can fin
APA, Harvard, Vancouver, ISO und andere Zitierweisen
49

Tan, Yun, Chunzhi Li, Jiaohua Qin, Youyuan Xue, and Xuyu Xiang. "Medical Image Description Based on Multimodal Auxiliary Signals and Transformer." International Journal of Intelligent Systems 2024 (February 13, 2024): 1–12. http://dx.doi.org/10.1155/2024/6680546.

Der volle Inhalt der Quelle
Annotation:
Medical image description can be applied to clinical medical diagnosis, but the field still faces serious challenges. There is a serious problem of visual and textual data bias in medical datasets, which are the imbalanced distribution of health and disease data. This can greatly affect the learning performance of data-driven neural networks and finally lead to errors in the generated medical image descriptions. To address this problem, we propose a new medical image description network architecture named multimodal data-assisted knowledge fusion network (MDAKF), which introduces multimodal au
APA, Harvard, Vancouver, ISO und andere Zitierweisen
50

Lax-López, María. "Ecclesiastic Audio Description: The Church from a Semiotic and Translation Perspective." Íkala, Revista de Lenguaje y Cultura 28, no. 3 (2023): 1–15. http://dx.doi.org/10.17533/udea.ikala.351603.

Der volle Inhalt der Quelle
Annotation:
Audio description (AD) is a type of accessible translation consisting of a process of intersemiotic translations from images into words. Its objective is, mainly, allowing people with visual disabilities to create a mental image of the things they cannot perceive visually. In this study, we will address a type of AD that has not been explored in academic and professional contexts: church AD, where the source text is the church, understood to be the architectonic structure used for Christian worship. Our aim is to provide a basis for the study and practice of church AD. To this end, we propose
APA, Harvard, Vancouver, ISO und andere Zitierweisen
Wir bieten Rabatte auf alle Premium-Pläne für Autoren, deren Werke in thematische Literatursammlungen aufgenommen wurden. Kontaktieren Sie uns, um einen einzigartigen Promo-Code zu erhalten!