To see the other types of publications on this topic, follow the link: Transfert de style zero-shot.

Journal articles on the topic 'Transfert de style zero-shot'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 journal articles for your research on the topic 'Transfert de style zero-shot.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse journal articles on a wide variety of disciplines and organise your bibliography correctly.

1

Yao, Jixun, Yang Yuguang, Yu Pan, et al. "StableVC: Style Controllable Zero-Shot Voice Conversion with Conditional Flow Matching." Proceedings of the AAAI Conference on Artificial Intelligence 39, no. 24 (2025): 25669–77. https://doi.org/10.1609/aaai.v39i24.34758.

Full text
Abstract:
Zero-shot voice conversion (VC) aims to transfer the timbre from the source speaker to an arbitrary unseen speaker while preserving the original linguistic content. Despite recent advancements in zero-shot VC using language model-based or diffusion-based approaches, several challenges remain: 1) current approaches primarily focus on adapting timbre from unseen speakers and are unable to transfer style and timbre to different unseen speakers independently; 2) these approaches often suffer from slower inference speeds due to the autoregressive modeling methods or the need for numerous sampling s
APA, Harvard, Vancouver, ISO, and other styles
2

Zhang, Yu, Rongjie Huang, Ruiqi Li, et al. "StyleSinger: Style Transfer for Out-of-Domain Singing Voice Synthesis." Proceedings of the AAAI Conference on Artificial Intelligence 38, no. 17 (2024): 19597–605. http://dx.doi.org/10.1609/aaai.v38i17.29932.

Full text
Abstract:
Style transfer for out-of-domain (OOD) singing voice synthesis (SVS) focuses on generating high-quality singing voices with unseen styles (such as timbre, emotion, pronunciation, and articulation skills) derived from reference singing voice samples. However, the endeavor to model the intricate nuances of singing voice styles is an arduous task, as singing voices possess a remarkable degree of expressiveness. Moreover, existing SVS methods encounter a decline in the quality of synthesized singing voices in OOD scenarios, as they rest upon the assumption that the target vocal attributes are disc
APA, Harvard, Vancouver, ISO, and other styles
3

Wang, Zhen, Zihang Lin, Meng Yuan, Yuehu Liu, and Chi Zhang. "Style Nursing with Spatial and Semantic Guidance for Zero-Shot Traffic Scene Style Transfer." Proceedings of the AAAI Conference on Artificial Intelligence 39, no. 8 (2025): 8214–22. https://doi.org/10.1609/aaai.v39i8.32886.

Full text
Abstract:
Recent advances in text-to-image diffusion models have shown an outstanding ability in zero-shot style transfer. However, existing methods often struggle to balance preserving the semantic content of the input image and faithfully transferring the target style in line with the edit prompt. Especially when applied to complex traffic scenes with diverse objects, layouts, and stylistic variations, current diffusion models tend to exhibit Style Neglection, i.e., failing to generate the required style in the prompt. To address this issue, we propose Style Nursing, which directs the model to focus o
APA, Harvard, Vancouver, ISO, and other styles
4

Xi, Jier, Xiufen Ye, and Chuanlong Li. "Sonar Image Target Detection Based on Style Transfer Learning and Random Shape of Noise under Zero Shot Target." Remote Sensing 14, no. 24 (2022): 6260. http://dx.doi.org/10.3390/rs14246260.

Full text
Abstract:
With the development of sonar technology, sonar images have been widely used to detect targets. However, there are many challenges for sonar images in terms of object detection. For example, the detectable targets in the sonar data are more sparse than those in optical images, the real underwater scanning experiment is complicated, and the sonar image styles produced by different types of sonar equipment due to their different characteristics are inconsistent, which makes it difficult to use them for sonar object detection and recognition algorithms. In order to solve these problems, we propos
APA, Harvard, Vancouver, ISO, and other styles
5

Wang, Wenjing, Jizheng Xu, Li Zhang, Yue Wang, and Jiaying Liu. "Consistent Video Style Transfer via Compound Regularization." Proceedings of the AAAI Conference on Artificial Intelligence 34, no. 07 (2020): 12233–40. http://dx.doi.org/10.1609/aaai.v34i07.6905.

Full text
Abstract:
Recently, neural style transfer has drawn many attentions and significant progresses have been made, especially for image style transfer. However, flexible and consistent style transfer for videos remains a challenging problem. Existing training strategies, either using a significant amount of video data with optical flows or introducing single-frame regularizers, have limited performance on real videos. In this paper, we propose a novel interpretation of temporal consistency, based on which we analyze the drawbacks of existing training strategies; and then derive a new compound regularization
APA, Harvard, Vancouver, ISO, and other styles
6

Park, Jangkyoung, Ammar Ul Hassan, and Jaeyoung Choi. "CCFont: Component-Based Chinese Font Generation Model Using Generative Adversarial Networks (GANs)." Applied Sciences 12, no. 16 (2022): 8005. http://dx.doi.org/10.3390/app12168005.

Full text
Abstract:
Font generation using deep learning has made considerable progress using image style transfer, but the automatic conversion/generation of Chinese characters still remains a difficult task owing to the complex character shape and large number of Chinese characters. Most known Chinese character generation models use the image conversion method of the Chinese character shape itself; however, it is difficult to reproduce complex Chinese characters. Recent methods have utilized character compositionality by separating up to three or four components to improve the quality of generated characters, bu
APA, Harvard, Vancouver, ISO, and other styles
7

An, Tianbo, Pingping Yan, Jiaai Zuo, Xing Jin, Mingliang Liu, and Jingrui Wang. "Enhancing Cross-Lingual Sarcasm Detection by a Prompt Learning Framework with Data Augmentation and Contrastive Learning." Electronics 13, no. 11 (2024): 2163. http://dx.doi.org/10.3390/electronics13112163.

Full text
Abstract:
Given their intricate nature and inherent ambiguity, sarcastic texts often mask deeper emotions, making it challenging to discern the genuine feelings behind the words. The proposal of the sarcasm detection task is to assist us with more accurately understanding the true intention of the speaker. Advanced methods, such as deep learning and neural networks, are widely used in the field of sarcasm detection. However, most research mainly focuses on sarcastic texts in English, as other languages lack corpora and annotated datasets. To address the challenge of low-resource languages in sarcasm det
APA, Harvard, Vancouver, ISO, and other styles
8

Azizah, Kurniawati, and Wisnu Jatmiko. "Transfer Learning, Style Control, and Speaker Reconstruction Loss for Zero-Shot Multilingual Multi-Speaker Text-to-Speech on Low-Resource Languages." IEEE Access 10 (2022): 5895–911. http://dx.doi.org/10.1109/access.2022.3141200.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Güitta-López, Lucía, Lionel Güitta-López, Jaime Boal, and Álvaro J. López-López. "Sim-to-real transfer via a Style-Identified Cycle Consistent Generative Adversarial Network: Zero-shot deployment on robotic manipulators through visual domain adaptation." Engineering Applications of Artificial Intelligence 159 (November 2025): 111510. https://doi.org/10.1016/j.engappai.2025.111510.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

Cho, Kyusik, Dong Yeop Kim, and Euntai Kim. "Zero-Shot Scene Change Detection." Proceedings of the AAAI Conference on Artificial Intelligence 39, no. 3 (2025): 2509–17. https://doi.org/10.1609/aaai.v39i3.32253.

Full text
Abstract:
We present a novel, training-free approach to scene change detection. Our method leverages tracking models, which inherently perform change detection between consecutive frames of video by identifying common objects and detecting new or missing objects. Specifically, our method takes advantage of the change detection effect of the tracking model by inputting reference and query images instead of consecutive frames. Furthermore, we focus on the content gap and style gap between two input images in change detection, and address both issues by proposing adaptive content threshold and style bridgi
APA, Harvard, Vancouver, ISO, and other styles
11

Li, Wenqian, Pengfei Fang, and Hui Xue. "SVasP: Self-Versatility Adversarial Style Perturbation for Cross-Domain Few-Shot Learning." Proceedings of the AAAI Conference on Artificial Intelligence 39, no. 15 (2025): 15275–83. https://doi.org/10.1609/aaai.v39i15.33676.

Full text
Abstract:
Cross-Domain Few-Shot Learning (CD-FSL) aims to transfer knowledge from seen source domains to unseen target domains, which is crucial for evaluating the generalization and robustness of models. Recent studies focus on utilizing visual styles to bridge the domain gap between different domains. However, the serious dilemma of gradient instability and local optimization problem occurs in those style-based CD-FSL methods. This paper addresses these issues and proposes a novel crop-global style perturbation method, called Self-Versatility Adversarial Style Perturbation (SVasP), which enhances the
APA, Harvard, Vancouver, ISO, and other styles
12

Yang, Zhenhua, Dezhi Peng, Yuxin Kong, Yuyi Zhang, Cong Yao, and Lianwen Jin. "FontDiffuser: One-Shot Font Generation via Denoising Diffusion with Multi-Scale Content Aggregation and Style Contrastive Learning." Proceedings of the AAAI Conference on Artificial Intelligence 38, no. 7 (2024): 6603–11. http://dx.doi.org/10.1609/aaai.v38i7.28482.

Full text
Abstract:
Automatic font generation is an imitation task, which aims to create a font library that mimics the style of reference images while preserving the content from source images. Although existing font generation methods have achieved satisfactory performance, they still struggle with complex characters and large style variations. To address these issues, we propose FontDiffuser, a diffusion-based image-to-image one-shot font generation method, which innovatively models the font imitation task as a noise-to-denoise paradigm. In our method, we introduce a Multi-scale Content Aggregation (MCA) block
APA, Harvard, Vancouver, ISO, and other styles
13

Cheng, Jikang, Zhen Han, Zhongyuan Wang, and Liang Chen. "“One-Shot” Super-Resolution via Backward Style Transfer for Fast High-Resolution Style Transfer." IEEE Signal Processing Letters 28 (2021): 1485–89. http://dx.doi.org/10.1109/lsp.2021.3098230.

Full text
APA, Harvard, Vancouver, ISO, and other styles
14

Yu, Yong. "Few Shot POP Chinese Font Style Transfer using CycleGAN." Journal of Physics: Conference Series 2171, no. 1 (2022): 012031. http://dx.doi.org/10.1088/1742-6596/2171/1/012031.

Full text
Abstract:
Abstract The new style design of Chinese fonts is an arduous task, because there are many types of commonly used Chinese characters and the composition of Chinese characters is complicated. Therefore, the style transfer of Chinese characters based on GAN has become a research hotspot in the past two years. This line of re-search is dedicated to using a small number of artificially designed new style fonts and learning the map-ping from the source font style domain to the target style domain. However, such methods have two problems: 1. The performance on pop (point of purchase) fonts with exagg
APA, Harvard, Vancouver, ISO, and other styles
15

Zhang, Xin, Peiliang Zhang, Jingling Yuan, and Lin Li. "Zero-Shot Learning for Materials Science Texts: Leveraging Duck Typing Principles." Proceedings of the AAAI Conference on Artificial Intelligence 39, no. 1 (2025): 1129–37. https://doi.org/10.1609/aaai.v39i1.32100.

Full text
Abstract:
Materials science text mining (MSTM), involving tasks like property extraction and synthesis action retrieval, is pivotal for advancing research by deriving critical insights from scientific literature. Descriptors, serving as essential task labels, often vary in meaning depending on researchers' usage purposes across different mining tasks. (e.g., 'Material' can refer to both synthesis components and participants in fuel cell experiment). This meaning difference makes it difficult for existing methods, fine-tuned to specific task, to handle the same descriptors in other tasks. To overcome abo
APA, Harvard, Vancouver, ISO, and other styles
16

Zhu, Anna, Xiongbo Lu, Xiang Bai, Seiichi Uchida, Brian Kenji Iwana, and Shengwu Xiong. "Few-Shot Text Style Transfer via Deep Feature Similarity." IEEE Transactions on Image Processing 29 (2020): 6932–46. http://dx.doi.org/10.1109/tip.2020.2995062.

Full text
APA, Harvard, Vancouver, ISO, and other styles
17

Feng, Wancheng, Yingchao Liu, Jiaming Pei, Wenxuan Liu, Chunpeng Tian, and Lukun Wang. "Local Consistency Guidance: Personalized Stylization Method of Face Video (Student Abstract)." Proceedings of the AAAI Conference on Artificial Intelligence 38, no. 21 (2024): 23486–87. http://dx.doi.org/10.1609/aaai.v38i21.30440.

Full text
Abstract:
Face video stylization aims to convert real face videos into specified reference styles. While one-shot methods perform well in single-image stylization, ensuring continuity between frames and retaining the original facial expressions present challenges in video stylization. To address these issues, our approach employs a personalized diffusion model with pixel-level control. We propose Local Consistency Guidance(LCG) strategy, composed of local-cross attention and local style transfer, to ensure temporal consistency. This framework enables the synthesis of high-quality stylized face videos wi
APA, Harvard, Vancouver, ISO, and other styles
18

Pang, Sizhe, Xinyuan Chen, Yangchen Xie, Hongjian Zhan, Bing Yin, and Yue Lu. "Diff-TST: Diffusion model for one-shot text-image style transfer." Expert Systems with Applications 263 (March 2025): 125747. http://dx.doi.org/10.1016/j.eswa.2024.125747.

Full text
APA, Harvard, Vancouver, ISO, and other styles
19

Cifka, Ondrej, Umut Simsekli, and Gael Richard. "Groove2Groove: One-Shot Music Style Transfer With Supervision From Synthetic Data." IEEE/ACM Transactions on Audio, Speech, and Language Processing 28 (2020): 2638–50. http://dx.doi.org/10.1109/taslp.2020.3019642.

Full text
APA, Harvard, Vancouver, ISO, and other styles
20

Wu, Xinyi, Zhenyao Wu, Yuhang Lu, Lili Ju, and Song Wang. "Style Mixing and Patchwise Prototypical Matching for One-Shot Unsupervised Domain Adaptive Semantic Segmentation." Proceedings of the AAAI Conference on Artificial Intelligence 36, no. 3 (2022): 2740–49. http://dx.doi.org/10.1609/aaai.v36i3.20177.

Full text
Abstract:
In this paper, we tackle the problem of one-shot unsupervised domain adaptation (OSUDA) for semantic segmentation where the segmentors only see one unlabeled target image during training. In this case, traditional unsupervised domain adaptation models usually fail since they cannot adapt to the target domain with over-fitting to one (or few) target samples. To address this problem, existing OSUDA methods usually integrate a style-transfer module to perform domain randomization based on the unlabeled target sample, with which multiple domains around the target sample can be explored during trai
APA, Harvard, Vancouver, ISO, and other styles
21

Ibrahim, Bekkouch Imad Eddine, Victoria Eyharabide, Valérie Le Page, and Frédéric Billiet. "Few-Shot Object Detection: Application to Medieval Musicological Studies." Journal of Imaging 8, no. 2 (2022): 18. http://dx.doi.org/10.3390/jimaging8020018.

Full text
Abstract:
Detecting objects with a small representation in images is a challenging task, especially when the style of the images is very different from recent photos, which is the case for cultural heritage datasets. This problem is commonly known as few-shot object detection and is still a new field of research. This article presents a simple and effective method for black box few-shot object detection that works with all the current state-of-the-art object detection models. We also present a new dataset called MMSD for medieval musicological studies that contains five classes and 693 samples, manually
APA, Harvard, Vancouver, ISO, and other styles
22

Liang, Shixiong, Ruohua Zhou, and Qingsheng Yuan. "ECE-TTS: A Zero-Shot Emotion Text-to-Speech Model with Simplified and Precise Control." Applied Sciences 15, no. 9 (2025): 5108. https://doi.org/10.3390/app15095108.

Full text
Abstract:
Significant advances have been made in emotional speech synthesis technology; however, existing models still face challenges in achieving fine-grained emotion style control and simple yet precise emotion intensity regulation. To address these issues, we propose Easy-Control Emotion Text-to-Speech (ECE-TTS), a zero-shot TTS model built upon the F5-TTS architecture, simplifying emotion modeling while maintaining accurate control. ECE-TTS leverages pretrained emotion recognizers to extract Valence, Arousal, and Dominance (VAD) values, transforming them into Emotion-Adaptive Spherical Vectors (EAS
APA, Harvard, Vancouver, ISO, and other styles
23

Rong, Yan, and Li Liu. "Seeing Your Speech Style: A Novel Zero-Shot Identity-Disentanglement Face-based Voice Conversion." Proceedings of the AAAI Conference on Artificial Intelligence 39, no. 23 (2025): 25092–100. https://doi.org/10.1609/aaai.v39i23.34694.

Full text
Abstract:
Face-based Voice Conversion (FVC) is a novel task that leverages facial images to generate the target speaker's voice style. Previous work has two shortcomings: (1) suffering from obtaining facial embeddings that are well-aligned with the speaker's voice identity information, and (2) inadequacy in decoupling content and speaker identity information from the audio input. To address these issues, we present a novel FVC method, Identity-Disentanglement Face-based Voice Conversion (ID-FaceVC), which overcomes the above two limitations. More precisely, we propose an Identity-Aware Query-based Contr
APA, Harvard, Vancouver, ISO, and other styles
24

Yang, Ze, Yali Wang, Xianyu Chen, Jianzhuang Liu, and Yu Qiao. "Context-Transformer: Tackling Object Confusion for Few-Shot Detection." Proceedings of the AAAI Conference on Artificial Intelligence 34, no. 07 (2020): 12653–60. http://dx.doi.org/10.1609/aaai.v34i07.6957.

Full text
Abstract:
Few-shot object detection is a challenging but realistic scenario, where only a few annotated training images are available for training detectors. A popular approach to handle this problem is transfer learning, i.e., fine-tuning a detector pretrained on a source-domain benchmark. However, such transferred detector often fails to recognize new objects in the target domain, due to low data diversity of training samples. To tackle this problem, we propose a novel Context-Transformer within a concise deep transfer framework. Specifically, Context-Transformer can effectively leverage source-domain
APA, Harvard, Vancouver, ISO, and other styles
25

Weng, Shao-En, Hong-Han Shuai, and Wen-Huang Cheng. "Zero-Shot Face-Based Voice Conversion: Bottleneck-Free Speech Disentanglement in the Real-World Scenario." Proceedings of the AAAI Conference on Artificial Intelligence 37, no. 11 (2023): 13718–26. http://dx.doi.org/10.1609/aaai.v37i11.26607.

Full text
Abstract:
Often a face has a voice. Appearance sometimes has a strong relationship with one's voice. In this work, we study how a face can be converted to a voice, which is a face-based voice conversion. Since there is no clean dataset that contains face and speech, voice conversion faces difficult learning and low-quality problems caused by background noise or echo. Too much redundant information for face-to-voice also causes synthesis of a general style of speech. Furthermore, previous work tried to disentangle speech with bottleneck adjustment. However, it is hard to decide on the size of the bottlen
APA, Harvard, Vancouver, ISO, and other styles
26

Dou, Zi-Yi, and Nanyun Peng. "Zero-Shot Commonsense Question Answering with Cloze Translation and Consistency Optimization." Proceedings of the AAAI Conference on Artificial Intelligence 36, no. 10 (2022): 10572–80. http://dx.doi.org/10.1609/aaai.v36i10.21301.

Full text
Abstract:
Commonsense question answering (CQA) aims to test if models can answer questions regarding commonsense knowledge that everyone knows. Prior works that incorporate external knowledge bases have shown promising results, but knowledge bases are expensive to construct and are often limited to a fixed set of relations. In this paper, we instead focus on better utilizing the implicit knowledge stored in pre-trained language models. While researchers have found that the knowledge embedded in pre-trained language models can be extracted by having them fill in the blanks of carefully designed prompts f
APA, Harvard, Vancouver, ISO, and other styles
27

Men, Yifang, Yuan Yao, Miaomiao Cui, Zhouhui Lian, and Xuansong Xie. "DCT-net." ACM Transactions on Graphics 41, no. 4 (2022): 1–9. http://dx.doi.org/10.1145/3528223.3530159.

Full text
Abstract:
This paper introduces DCT-Net, a novel image translation architecture for few-shot portrait stylization. Given limited style exemplars (~100), the new architecture can produce high-quality style transfer results with advanced ability to synthesize high-fidelity contents and strong generality to handle complicated scenes (e.g., occlusions and accessories). Moreover, it enables full-body image translation via one elegant evaluation network trained by partial observations (i.e., stylized heads). Few-shot learning based style transfer is challenging since the learned model can easily become overfi
APA, Harvard, Vancouver, ISO, and other styles
28

Wang, Xin, Jiawei Wu, Da Zhang, Yu Su, and William Yang Wang. "Learning to Compose Topic-Aware Mixture of Experts for Zero-Shot Video Captioning." Proceedings of the AAAI Conference on Artificial Intelligence 33 (July 17, 2019): 8965–72. http://dx.doi.org/10.1609/aaai.v33i01.33018965.

Full text
Abstract:
Although promising results have been achieved in video captioning, existing models are limited to the fixed inventory of activities in the training corpus, and do not generalize to open vocabulary scenarios. Here we introduce a novel task, zeroshot video captioning, that aims at describing out-of-domain videos of unseen activities. Videos of different activities usually require different captioning strategies in many aspects, i.e. word selection, semantic construction, and style expression etc, which poses a great challenge to depict novel activities without paired training data. But meanwhile
APA, Harvard, Vancouver, ISO, and other styles
29

Hsiao, Teng-Fang, Bo-Kai Ruan, and Hong-Han Shuai. "Training-and-Prompt-Free General Painterly Harmonization via Zero-Shot Disentenglement on Style and Content References." Proceedings of the AAAI Conference on Artificial Intelligence 39, no. 4 (2025): 3545–53. https://doi.org/10.1609/aaai.v39i4.32368.

Full text
Abstract:
Painterly image harmonization aims at seamlessly blending disparate visual elements within a single image. However, previous approaches often struggle due to limitations in training data or reliance on additional prompts, leading to inharmonious and content-disrupted output. To surmount these hurdles, we design a Training-and-prompt-Free General Painterly Harmonization method (TF-GPH). TF-GPH incorporates a novel “Similarity Disentangle Mask”, which disentangles the foreground content and background image by redirecting their attention to corresponding reference images, enhancing the attention
APA, Harvard, Vancouver, ISO, and other styles
30

Zhang, Chao, Hongbin Dong, and Baosong Deng. "Improving Pre-Training and Fine-Tuning for Few-Shot SAR Automatic Target Recognition." Remote Sensing 15, no. 6 (2023): 1709. http://dx.doi.org/10.3390/rs15061709.

Full text
Abstract:
SAR-ATR (synthetic aperture radar-automatic target recognition) is a hot topic in remote sensing. This work suggests a few-shot target recognition approach (FTL) based on the concept of transfer learning to accomplish accurate target recognition of SAR images in a few-shot scenario since the classic SAR ATR method has significant data reliance. At the same time, the strategy introduces a model distillation method to improve the model’s performance further. This method is composed of three parts. First, the data engine, which uses the style conversion model and optical image data to generate im
APA, Harvard, Vancouver, ISO, and other styles
31

Yao, Mingshuai, Yabo Zhang, Xianhui Lin, Xiaoming Li, and Wangmeng Zuo. "VQ-FONT: Few-Shot Font Generation with Structure-Aware Enhancement and Quantization." Proceedings of the AAAI Conference on Artificial Intelligence 38, no. 15 (2024): 16407–15. http://dx.doi.org/10.1609/aaai.v38i15.29577.

Full text
Abstract:
Few-shot font generation is challenging, as it needs to capture the fine-grained stroke styles from a limited set of reference glyphs, and then transfer to other characters, which are expected to have similar styles. However, due to the diversity and complexity of Chinese font styles, the synthesized glyphs of existing methods usually exhibit visible artifacts, such as missing details and distorted strokes. In this paper, we propose a VQGAN-based framework (i.e., VQ-Font) to enhance glyph fidelity through token prior refinement and structure-aware enhancement. Specifically, we pre-train a VQGA
APA, Harvard, Vancouver, ISO, and other styles
32

Guo, Wei, Yuqi Zhang, De Ma, and Qian Zheng. "Learning to Manipulate Artistic Images." Proceedings of the AAAI Conference on Artificial Intelligence 38, no. 3 (2024): 1994–2002. http://dx.doi.org/10.1609/aaai.v38i3.27970.

Full text
Abstract:
Recent advancement in computer vision has significantly lowered the barriers to artistic creation. Exemplar-based image translation methods have attracted much attention due to flexibility and controllability. However, these methods hold assumptions regarding semantics or require semantic information as the input, while accurate semantics is not easy to obtain in artistic images. Besides, these methods suffer from cross-domain artifacts due to training data prior and generate imprecise structure due to feature compression in the spatial domain. In this paper, we propose an arbitrary Style Imag
APA, Harvard, Vancouver, ISO, and other styles
33

Pham Ngoc, Phuong, Chung Tran Quang, and Mai Luong Chi. "ADAPT-TTS: HIGH-QUALITY ZERO-SHOT MULTI-SPEAKER TEXT-TO-SPEECH ADAPTIVE-BASED FOR VIETNAMESE." Journal of Computer Science and Cybernetics 39, no. 2 (2023): 159–73. http://dx.doi.org/10.15625/1813-9663/18136.

Full text
Abstract:
Current adaptive-based speech synthesis techniques are based on two main streams: 1. Fine-tuning the model using small amounts of adaptive data, and 2. Conditionally training the entire model through a speaker embedding of the target speaker. However, both of these methods require adaptive data to appear during training, which makes the training cost to generate new voices quite expensively. In addition, the traditional TTS model uses a simple loss function to reproduce the acoustic features. However, this optimization is based on incorrect distribution assumptions leading to noisy composite a
APA, Harvard, Vancouver, ISO, and other styles
34

V, Sandeep Kumar, Hari Kishore R, Guru Prasadh M, and Divakar R. "ONE SHOT FACE STYLIZATION USING GANS." INTERANTIONAL JOURNAL OF SCIENTIFIC RESEARCH IN ENGINEERING AND MANAGEMENT 07, no. 10 (2023): 1–11. http://dx.doi.org/10.55041/ijsrem26061.

Full text
Abstract:
One-shot face stylization is an interesting and challenging subject in computer vision and deep learning. This work deals with the art of manipulating a target face using a reference image as inspiration, which requires controlling facial recognition while specifying important style characteristics This project has attracted a lot of interest due to its potential applications in digital art, entertainment, and personal products. In this abstract, we examine the important features of a one- shot face stylization. Deep neural networks, especially generative adversarial networks (GANs), are widel
APA, Harvard, Vancouver, ISO, and other styles
35

Wang, Suzhen, Lincheng Li, Yu Ding, and Xin Yu. "One-Shot Talking Face Generation from Single-Speaker Audio-Visual Correlation Learning." Proceedings of the AAAI Conference on Artificial Intelligence 36, no. 3 (2022): 2531–39. http://dx.doi.org/10.1609/aaai.v36i3.20154.

Full text
Abstract:
Audio-driven one-shot talking face generation methods are usually trained on video resources of various persons. However, their created videos often suffer unnatural mouth shapes and asynchronous lips because those methods struggle to learn a consistent speech style from different speakers. We observe that it would be much easier to learn a consistent speech style from a specific speaker, which leads to authentic mouth movements. Hence, we propose a novel one-shot talking face generation framework by exploring consistent correlations between audio and visual motions from a specific speaker and
APA, Harvard, Vancouver, ISO, and other styles
36

Lee, Suhyeon, Junhyuk Hyun, Hongje Seong, and Euntai Kim. "Unsupervised Domain Adaptation for Semantic Segmentation by Content Transfer." Proceedings of the AAAI Conference on Artificial Intelligence 35, no. 9 (2021): 8306–15. http://dx.doi.org/10.1609/aaai.v35i9.17010.

Full text
Abstract:
In this paper, we tackle the unsupervised domain adaptation (UDA) for semantic segmentation, which aims to segment the unlabeled real data using labeled synthetic data. The main problem of UDA for semantic segmentation relies on reducing the domain gap between the real image and synthetic image. To solve this problem, we focused on separating information in an image into content and style. Here, only the content has cues for semantic segmentation, and the style makes the domain gap. Thus, precise separation of content and style in an image leads to effect as supervision of real data even when
APA, Harvard, Vancouver, ISO, and other styles
37

Song, Yun-Zhu, Yi-Syuan Chen, Lu Wang, and Hong-Han Shuai. "General then Personal: Decoupling and Pre-training for Personalized Headline Generation." Transactions of the Association for Computational Linguistics 11 (2023): 1588–607. http://dx.doi.org/10.1162/tacl_a_00621.

Full text
Abstract:
Abstract Personalized Headline Generation aims to generate unique headlines tailored to users’ browsing history. In this task, understanding user preferences from click history and incorporating them into headline generation pose challenges. Existing approaches typically rely on predefined styles as control codes, but personal style lacks explicit definition or enumeration, making it difficult to leverage traditional techniques. To tackle these challenges, we propose General Then Personal (GTP), a novel framework comprising user modeling, headline generation, and customization. We train the fr
APA, Harvard, Vancouver, ISO, and other styles
38

Gong, Rui, Dengxin Dai, Yuhua Chen, Wen Li, Danda Pani Paudel, and Luc Van Gool. "Analogical Image Translation for Fog Generation." Proceedings of the AAAI Conference on Artificial Intelligence 35, no. 2 (2021): 1433–41. http://dx.doi.org/10.1609/aaai.v35i2.16233.

Full text
Abstract:
Image-to-image translation is to map images from a given style to another given style. While exceptionally successful, current methods assume the availability of training images in both source and target domains, which does not always hold in practice. Inspired by humans' reasoning capability of analogy, we propose analogical image translation (AIT) that exploit the concept of gist, for the first time. Given images of two styles in the source domain: A and A', along with images B of the first style in the target domain, learn a model to translate B to B' in the target domain, such that A:A' ::
APA, Harvard, Vancouver, ISO, and other styles
39

Deng, Lujuan, Jieqing Tan, and Fangmei Liu. "Adapting CLIP for Action Recognition via Dual Semantic Supervision and Temporal Prompt Reparameterization." Electronics 13, no. 16 (2024): 3348. http://dx.doi.org/10.3390/electronics13163348.

Full text
Abstract:
The contrastive vision–language pre-trained model CLIP, driven by large-scale open-vocabulary image–text pairs, has recently demonstrated remarkable zero-shot generalization capabilities in diverse downstream image tasks, which has made numerous models dominated by the “image pre-training followed by fine-tuning” paradigm exhibit promising results on standard video benchmarks. However, as models scale up, full fine-tuning adaptive strategy for specific tasks becomes difficult in terms of training and storage. In this work, we propose a novel method that adapts CLIP to the video domain for effi
APA, Harvard, Vancouver, ISO, and other styles
40

Wang, Dingmin, Qiuyuan Huang, Matthew Jackson, and Jianfeng Gao. "Retrieve What You Need: A Mutual Learning Framework for Open-domain Question Answering." Transactions of the Association for Computational Linguistics 12 (2024): 247–63. http://dx.doi.org/10.1162/tacl_a_00646.

Full text
Abstract:
Abstract An open-domain question answering (QA) system usually follows a retrieve-then-read paradigm, in which a retriever is used to retrieve relevant passages from a large corpus, and then a reader generates answers based on the retrieved passages and the original question. In this paper, we propose a simple and novel mutual learning framework to improve the performance of retrieve-then-read-style models via an intermediate module named the knowledge selector, which we train with reinforcement learning. The key benefits of our proposed intermediate module are: 1) no requirement for additiona
APA, Harvard, Vancouver, ISO, and other styles
41

Zuo, Heda, Weitao You, Junxian Wu, et al. "GVMGen: A General Video-to-Music Generation Model with Hierarchical Attentions." Proceedings of the AAAI Conference on Artificial Intelligence 39, no. 21 (2025): 23099–107. https://doi.org/10.1609/aaai.v39i21.34474.

Full text
Abstract:
Composing music for video is essential yet challenging, leading to a growing interest in automating music generation for video applications. Existing approaches often struggle to achieve robust music-video correspondence and generative diversity, primarily due to inadequate feature alignment methods and insufficient datasets. In this study, we present General Video-to-Music Generation model (GVMGen), designed for generating high-related music to the video input. Our model employs hierarchical attentions to extract and align video features with music in both spatial and temporal dimensions, ens
APA, Harvard, Vancouver, ISO, and other styles
42

Kumar, Vinod K. C., Thamer A. Altaim, Shenbaga Sundaram Subramanian, et al. "Effect of lower body, core and upper body kinematic chain exercise protocol on throwing performance among university shot put athletes: A pilot study." Fizjoterapia Polska 23, no. 3 (2023): 108–15. http://dx.doi.org/10.56984/8zg143r1m.

Full text
Abstract:
A coordinated sequence of movements is required to generate maximum power and velocity in shot put. Kinematic chains emphasize the interactions between various body segments during a movement. They suggest that force production and transfer are optimized by coordinating multiple joints and muscle groups. In previous research, the kinematic chain has been attributed to shot put performance. Few studies have examined the effects of a comprehensive kinematic chain exercise protocol on throwing performance among shot put athletes, particularly at universities. Pilot study investigating lower body,
APA, Harvard, Vancouver, ISO, and other styles
43

Zaitsu, Wataru, Mingzhe Jin, Shunichi Ishihara, Satoru Tsuge, and Mitsuyuki Inaba. "Can we spot fake public comments generated by ChatGPT(-3.5, -4)?: Japanese stylometric analysis expose emulation created by one-shot learning." PLOS ONE 19, no. 3 (2024): e0299031. http://dx.doi.org/10.1371/journal.pone.0299031.

Full text
Abstract:
Public comments are an important opinion for civic when the government establishes rules. However, recent AI can easily generate large quantities of disinformation, including fake public comments. We attempted to distinguish between human public comments and ChatGPT-generated public comments (including ChatGPT emulated that of humans) using Japanese stylometric analysis. Study 1 conducted multidimensional scaling (MDS) to compare 500 texts of five classes: Human public comments, GPT-3.5 and GPT-4 generated public comments only by presenting the titles of human public comments (i.e., zero-shot
APA, Harvard, Vancouver, ISO, and other styles
44

Fittall, A. M., and R. G. Cowley. "THE HV11 3-D SEISMIC SURVEY: SKUA – SWIFT AREA GEOLOGY REVEALED?" APPEA Journal 32, no. 1 (1992): 159. http://dx.doi.org/10.1071/aj91013.

Full text
Abstract:
The 4630 km of HV11 3-D seismic survey data, shot over the Skua and Swift fault blocks in Timor Sea licence AC/L4, reveals details of Tithonian faulting not evident previously. The HV11 survey provided 10 times the data density of previous coverage and significantly improved data quality through the recording of lower frequencies and use of accurate navigation systems and high resolution processing parameters.Tithonian faulting is revealed as a series of northeast-trending en echelon faults overprinting a deeper, north-northeastern, possibly latest Triassic, trend which defines the major fault
APA, Harvard, Vancouver, ISO, and other styles
45

Bao, Yuyan, Guannan Wei, Oliver Bračevac, Yuxuan Jiang, Qiyang He, and Tiark Rompf. "Reachability types: tracking aliasing and separation in higher-order functional programs." Proceedings of the ACM on Programming Languages 5, OOPSLA (2021): 1–32. http://dx.doi.org/10.1145/3485516.

Full text
Abstract:
Ownership type systems, based on the idea of enforcing unique access paths, have been primarily focused on objects and top-level classes. However, existing models do not as readily reflect the finer aspects of nested lexical scopes, capturing, or escaping closures in higher-order functional programming patterns, which are increasingly adopted even in mainstream object-oriented languages. We present a new type system, λ * , which enables expressive ownership-style reasoning across higher-order functions. It tracks sharing and separation through reachability sets, and layers additional mechanism
APA, Harvard, Vancouver, ISO, and other styles
46

Moyse Ferreira, Lucy. "Colour, movement and modernity in Sonia Delaunay’s (1926) fashion film." Journal of Visual Culture 19, no. 3 (2020): 391–404. http://dx.doi.org/10.1177/1470412920965997.

Full text
Abstract:
Sonia Delaunay is best known for her abstract and colourful style which is manifested across her artwork, fashion, textile and interior designs alike. In 1926, this culminated in a fashion film, titled ‘L’Elégance’. Shot using the Keller-Dorian colour process, the film features a succession of Delaunay’s simultaneous fashion and textile designs. This article explores the implications and origins of the film, considering technological, cultural and social factors. It focuses on the themes of colour and movement that were essential both to the film and Delaunay’s philosophy at large and were str
APA, Harvard, Vancouver, ISO, and other styles
47

Su, Kun, Judith Yue Li, Qingqing Huang, et al. "V2Meow: Meowing to the Visual Beat via Video-to-Music Generation." Proceedings of the AAAI Conference on Artificial Intelligence 38, no. 5 (2024): 4952–60. http://dx.doi.org/10.1609/aaai.v38i5.28299.

Full text
Abstract:
Video-to-music generation demands both a temporally localized high-quality listening experience and globally aligned video-acoustic signatures. While recent music generation models excel at the former through advanced audio codecs, the exploration of video-acoustic signatures has been confined to specific visual scenarios. In contrast, our research confronts the challenge of learning globally aligned signatures between video and music directly from paired music and videos, without explicitly modeling domain-specific rhythmic or semantic relationships. We propose V2Meow, a video-to-music genera
APA, Harvard, Vancouver, ISO, and other styles
48

Li, Liang, Yiping Li, Hailin Wang, et al. "Side-Scan Sonar Image Generation Under Zero and Few Samples for Underwater Target Detection." Remote Sensing 16, no. 22 (2024): 4134. http://dx.doi.org/10.3390/rs16224134.

Full text
Abstract:
The acquisition of side-scan sonar (SSS) images is complex, expensive, and time-consuming, making it difficult and sometimes impossible to obtain rich image data. Therefore, we propose a novel image generation algorithm to solve the problem of insufficient training datasets for SSS-based target detection. For zero-sample detection, we proposed a two-step style transfer approach. The ray tracing method was first used to obtain an optically rendered image of the target. Subsequently, UA-CycleGAN, which combines U-net, soft attention, and HSV loss, was proposed for generating high-quality SSS ima
APA, Harvard, Vancouver, ISO, and other styles
49

Li, Yundong, Yi Liu, Han Dong, Wei Hu, and Chen Lin. "Intrusion detection of railway clearance from infrared images using generative adversarial networks." Journal of Intelligent & Fuzzy Systems 40, no. 3 (2021): 3931–43. http://dx.doi.org/10.3233/jifs-192141.

Full text
Abstract:
The intrusion detection of railway clearance is crucial for avoiding railway accidents caused by the invasion of abnormal objects, such as pedestrians, falling rocks, and animals. However, detecting intrusions using deep learning methods from infrared images captured at night remains a challenging task because of the lack of sufficient training samples. To address this issue, a transfer strategy that migrates daytime RGB images to the nighttime style of infrared images is proposed in this study. The proposed method consists of two stages. In the first stage, a data generation model is trained
APA, Harvard, Vancouver, ISO, and other styles
50

Ren, Mengchao. "Advancements and Applications of Large Language Models in Natural Language Processing: A Comprehensive Review." Applied and Computational Engineering 97, no. 1 (2024): 55–63. http://dx.doi.org/10.54254/2755-2721/97/20241406.

Full text
Abstract:
Abstract. Large language models (LLMs) have revolutionized the field of natural language processing (NLP), demonstrating remarkable capabilities in understanding, generating, and manipulating human language. This comprehensive review explores the development, applications, optimizations, and challenges of LLMs. This paper begin by tracing the evolution of these models and their foundational architectures, such as the Transformer, GPT, and BERT. We then delve into the applications of LLMs in natural language understanding tasks, including sentiment analysis, named entity recognition, question a
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!