Log in

Relevant bibliographies by topics / Transformer Architecture / Journal articles

To see the other types of publications on this topic, follow the link: Transformer Architecture.

Journal articles on the topic 'Transformer Architecture'

Author: Grafiati

Published: 10 December 2022

Last updated: 25 July 2025

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 journal articles for your research on the topic 'Transformer Architecture.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse journal articles on a wide variety of disciplines and organise your bibliography correctly.

1

Alharthi, Musleh, and Ausif Mahmood. "Enhanced Linear and Vision Transformer-Based Architectures for Time Series Forecasting." Big Data and Cognitive Computing 8, no. 5 (2024): 48. http://dx.doi.org/10.3390/bdcc8050048.

Full text

Abstract:

Time series forecasting has been a challenging area in the field of Artificial Intelligence. Various approaches such as linear neural networks, recurrent linear neural networks, Convolutional Neural Networks, and recently transformers have been attempted for the time series forecasting domain. Although transformer-based architectures have been outstanding in the Natural Language Processing domain, especially in autoregressive language modeling, the initial attempts to use transformers in the time series arena have met mixed success. A recent important work indicating simple linear networks out

APA, Harvard, Vancouver, ISO, and other styles

2

Wijaya, Bryan Christofer, and Hendrik Santoso Sugiarto. "Transformer+transformer architecture for image captioning in Indonesian language." IAES International Journal of Artificial Intelligence (IJ-AI) 14, no. 3 (2025): 2338. https://doi.org/10.11591/ijai.v14.i3.pp2338-2346.

Full text

Abstract:

Image captioning in Indonesian language poses a significant challenge due to the complex interplay between visual and linguistic comprehension, as well as the scarcity of publicly available datasets. Despite considerable advancements in this field, research specifically targeting the Indonesian language remains scarce. In this paper, we propose a novel image captioning model employing a transformer-based architecture for both the encoder and decoder components. Our model is trained and evaluated on the pre-translated Flickr30k dataset in the Indonesian language. We conduct a comparative analys

APA, Harvard, Vancouver, ISO, and other styles

3

Selitskiy, Stanislav. "Batch Transformer Architecture: Case of Synthetic Image Generation for Emotion Expression Facial Recognition." Athens Journal of Sciences 12, no. 2 (2025): 129–50. https://doi.org/10.30958/ajs.12-2-4.

Full text

Abstract:

A novel Transformer variation architecture is proposed in the implicit sparse style. Unlike “traditional” Transformers, instead of attention to sequential or batch entities in their entirety of whole dimensionality, in the proposed Batch Transformers, attention to the “important” dimensions (primary components) is implemented. In such a way, the “important” dimensions or feature selection allows for a significant reduction of the bottleneck size in the encoder-decoder ANN architectures. The proposed architecture is tested on the synthetic image generation for the face recognition task in the c

APA, Harvard, Vancouver, ISO, and other styles

4

Jaiswal, Sushma, Harikumar Pallthadka, Rajesh P. Chinchewadi, and Tarun Jaiswal. "Optimized Image Captioning: Hybrid Transformers Vision Transformers and Convolutional Neural Networks: Enhanced with Beam Search." International Journal of Intelligent Systems and Applications 16, no. 2 (2024): 53–61. http://dx.doi.org/10.5815/ijisa.2024.02.05.

Full text

Abstract:

Deep learning has improved image captioning. Transformer, a neural network architecture built for natural language processing, excels at image captioning and other computer vision applications. This paper reviews Transformer-based image captioning methods in detail. Convolutional neural networks (CNNs) extracted image features and RNNs or LSTM networks generated captions in traditional image captioning. This method often has information bottlenecks and trouble capturing long-range dependencies. Transformer architecture revolutionized natural language processing with its attention strategy and

APA, Harvard, Vancouver, ISO, and other styles

5

Havrylovych, Mariia, and Valeriy Danylov. "Research on hybrid transformer-based autoencoders for user biometric verification." System research and information technologies, no. 3 (September 29, 2023): 42–53. http://dx.doi.org/10.20535/srit.2308-8893.2023.3.03.

Full text

Abstract:

Our current study extends previous work on motion-based biometric verification using sensory data by exploring new architectures and more complex input from various sensors. Biometric verification offers advantages like uniqueness and protection against fraud. The state-of-the-art transformer architecture in AI is known for its attention block and applications in various fields, including NLP and CV. We investigated its potential value for applications involving sensory data. The research proposes a hybrid architecture, integrating transformer attention blocks with different autoencoders, to e

APA, Harvard, Vancouver, ISO, and other styles

6

Indraneel Borgohain. "Cross-Modal AI Transformer Architecture: Bridging Multiple Data Modalities Through Advanced Neural Networks." Journal of Computer Science and Technology Studies 7, no. 4 (2025): 541–45. https://doi.org/10.32996/jcsts.2025.7.4.64.

Full text

Abstract:

This article explores the Cross-Modal AI Transformer architecture, a sophisticated framework designed to process and integrate information across multiple data modalities. The article examines the architectural framework, technical implementation, advanced features, and practical applications of these transformers. Through comprehensive analysis of various research findings, the article demonstrates how these architectures effectively bridge different modalities, including text, images, audio, and video. The article highlights the significance of multi-modal encoders, cross-modal attention mec

APA, Harvard, Vancouver, ISO, and other styles

7

S., S., Thulasi Bikku, P. Muthukumar, K. Sandeep, Jampani Chandra Sekhar, and V. Krishna Pratap. "Enhanced Intrusion Detection Using Stacked FT-Transformer Architecture." Journal of Cybersecurity and Information Management 8, no. 2 (2024): 19–29. http://dx.doi.org/10.54216/jcim.130202.

Full text

Abstract:

The function of network intrusion detection systems (NIDS) in protecting networks from cyberattacks is crucial. Many of the more conventional techniques rely on signature-based approaches, which have a hard time distinguishing between various types of assaults. Using stacked FT-Transformer architecture, this research suggests a new way to identify intrusions in networks. When it comes to dealing with complicated tabular data, FT-Transformers—a variant of the Transformer model—have shown outstanding performance. Because of the inherent tabular nature of network traffic data, FT-Transformers are

APA, Harvard, Vancouver, ISO, and other styles

8

Lei, Zhenxin, Man Yao, Jiakui Hu, et al. "Spike2Former: Efficient Spiking Transformer for High-performance Image Segmentation." Proceedings of the AAAI Conference on Artificial Intelligence 39, no. 2 (2025): 1364–72. https://doi.org/10.1609/aaai.v39i2.32126.

Full text

Abstract:

Spiking Neural Networks (SNNs) have a low-power advantage but perform poorly in image segmentation tasks. The reason is that directly converting neural networks with complex architectural designs for segmentation tasks into spiking versions leads to performance degradation and non-convergence. To address this challenge, we first identify the modules in the architecture design that lead to the severe reduction in spike firing, make targeted improvements, and propose Spike2Former architecture. Second, we propose normalized integer spiking neurons to solve the training stability problem of SNNs w

APA, Harvard, Vancouver, ISO, and other styles

9

Nabi, Muneeb, Rohit Pachauri, Shouaib Ahmad, Kanishk Varshney, Prachi Goel, and Apurva Jain. "Visual Image Captioning through Transformer." International Journal for Research in Applied Science and Engineering Technology 11, no. 12 (2023): 2047–50. http://dx.doi.org/10.22214/ijraset.2023.57766.

Full text

Abstract:

Abstract: The convergence of computer vision and natural language processing in Artificial Intelligence has sparked significant interest in recent years, largely propelled by the advancements in deep learning. One notable application born from this synergy is the automatic description of images in English. Image captioning involves the computer's ability to interpret visual information from an image and translate it into one or more descriptive phrases. Generating meaningful descriptions requires understanding the state, properties, and relationships between the depicted objects, demanding a g

APA, Harvard, Vancouver, ISO, and other styles

10

Vu, Minh Tri, Motoaki Hiraga, Nanako Miura, and Arata Masuda. "Failure Mode Classification for Rolling Element Bearings Using Time-Domain Transformer-Based Encoder." Sensors 24, no. 12 (2024): 3953. http://dx.doi.org/10.3390/s24123953.

Full text

Abstract:

In this paper, we propose a Transformer-based encoder architecture integrated with an unsupervised denoising method to learn meaningful and sparse representations of vibration signals without the need for data transformation or pre-trained data. Existing Transformer models often require transformed data or extensive computational resources, limiting their practical adoption. We propose a simple yet competitive modification of the Transformer model, integrating a trainable noise reduction method specifically tailored for failure mode classification using vibration data directly in the time doma

APA, Harvard, Vancouver, ISO, and other styles

11

Wang, Chuanzhi, Jun Huang, Mingyun Lv, Yongmei Wu, and Ruiru Qin. "Dual-Branch Adaptive Convolutional Transformer for Hyperspectral Image Classification." Remote Sensing 16, no. 9 (2024): 1615. http://dx.doi.org/10.3390/rs16091615.

Full text

Abstract:

In hyperspectral image (HSI) classification, convolutional neural networks (CNNs) and transformer architectures have each contributed to considerable advancements. CNNs possess potent local feature representation skills, whereas transformers excel in learning global features, offering a complementary strength. Nevertheless, both architectures are limited by static receptive fields, which hinder their accuracy in delineating subtle boundary discrepancies. To mitigate the identified limitations, we introduce a novel dual-branch adaptive convolutional transformer (DBACT) network architecture feat

APA, Harvard, Vancouver, ISO, and other styles

12

Rahali, Abir, and Moulay A. Akhloufi. "End-to-End Transformer-Based Models in Textual-Based NLP." AI 4, no. 1 (2023): 54–110. http://dx.doi.org/10.3390/ai4010004.

Full text

Abstract:

Transformer architectures are highly expressive because they use self-attention mechanisms to encode long-range dependencies in the input sequences. In this paper, we present a literature review on Transformer-based (TB) models, providing a detailed overview of each model in comparison to the Transformer’s standard architecture. This survey focuses on TB models used in the field of Natural Language Processing (NLP) for textual-based tasks. We begin with an overview of the fundamental concepts at the heart of the success of these models. Then, we classify them based on their architecture and tr

APA, Harvard, Vancouver, ISO, and other styles

13

Kumari, Rekha, Gurpreet Kaur, Aditya Rawat, Harshit Chauhan, Kartik Singh Negi, and Rishi Mishra. "ANALYSIS OF TRANSFORMER-DEEP NEURAL NETWORK USING DEEP LEARNING." International Journal of Engineering Applied Sciences and Technology 8, no. 2 (2023): 313–19. http://dx.doi.org/10.33564/ijeast.2023.v08i02.048.

Full text

Abstract:

Transformers were first used for natural language processing (NLP) tasks, but they quickly spread to other deep learning fields, including computer vision. They assess the interdependence of pairs. Attention is a part that enables to dynamically highlight relevant features of the input data (words in the case of text strings, parts of images in the case of visual Transformers). The cost grows continually with the number of tokens. The most common Trans- former Architecture for image classification uses only the Transformer Encoder to transform the various input tokens. However, the decoder com

APA, Harvard, Vancouver, ISO, and other styles

14

Peter, Ojonugwa, Daniel Emakporuena, Bamidele Tunde, Maryam Abdulkarim, and Abdullahi Umar. "Transformer-Based Explainable Deep Learning for Breast Cancer Detection in Mammography: The MammoFormer Framework." American Journal of Computer Science and Technology 8, no. 2 (2025): 121–37. https://doi.org/10.11648/j.ajcst.20250802.16.

Full text

Abstract:

Breast cancer detection through mammography interpretation remains difficult because of the minimal nature of abnormalities that experts need to identify alongside the variable interpretations between readers. The potential of CNNs for medical image analysis faces two limitations: they fail to process both local information and wide contextual data adequately and do not provide explainable AI (XAI) operations which doctors need to accept them in clinics. The researcher developed the MammoFormer framework which unites transformer-based architecture with multi-feature enhancement components and

APA, Harvard, Vancouver, ISO, and other styles

15

Dordevic, Danilo, Vukasin Bozic, Joseph Thommes, Daniele Coppola, and Sidak Pal Singh. "Rethinking Attention: Exploring Shallow Feed-Forward Neural Networks as an Alternative to Attention Layers in Transformers (Student Abstract)." Proceedings of the AAAI Conference on Artificial Intelligence 38, no. 21 (2024): 23477–79. http://dx.doi.org/10.1609/aaai.v38i21.30436.

Full text

Abstract:

This work presents an analysis of the effectiveness of using standard shallow feed-forward networks to mimic the behavior of the attention mechanism in the original Transformer model, a state-of-the-art architecture for sequence-to-sequence tasks. We substitute key elements of the attention mechanism in the Transformer with simple feed-forward networks, trained using the original components via knowledge distillation. Our experiments, conducted on the IWSLT2017 dataset, reveal the capacity of these ”attentionless Transformers” to rival the performance of the original architecture. Through rigo

APA, Harvard, Vancouver, ISO, and other styles

16

Chi, Ye, Haikun Liu, Ganwei Peng, Xiaofei Liao, and Hai Jin. "Transformer: An OS-Supported Reconfigurable Hybrid Memory Architecture." Applied Sciences 12, no. 24 (2022): 12995. http://dx.doi.org/10.3390/app122412995.

Full text

Abstract:

Non-volatile memories (NVMs) have aroused vast interest in hybrid memory systems due to their promising features of byte-addressability, high storage density, low cost per byte, and near-zero standby energy consumption. However, since NVMs have limited write endurance, high write latency, and high write energy consumption, it is still challenging to directly replace traditional dynamic random access memory (DRAM) with NVMs. Many studies propose to utilize NVM and DRAM in a hybrid memory system, and explore sophisticated memory management schemes to alleviate the impact of slow NVM on the perfo

APA, Harvard, Vancouver, ISO, and other styles

17

Li, Junshuai. "Transformer Based News Text Classification." Applied and Computational Engineering 160, no. 1 (2025): 141–53. https://doi.org/10.54254/2755-2721/2025.tj23582.

Full text

Abstract:

With the exponential growth of online news, Transformer models based on self-attention mechanisms (e.g., BERT, GPT) have demonstrated theoretical advantages over traditional methods (e.g., SVM, Nave Bayes, and CNN) in news text classification by capturing global semantic relationships. The encoder-only Transformer architecture developed in this study, integrating multi-head self-attention, dynamic positional encoding, and global average pooling, achieved an initial accuracy of 69.52% on the 20 Newsgroups dataset (significantly higher than CNN's 57.59%), showcasing its superior global feature e

APA, Harvard, Vancouver, ISO, and other styles

18

Cui, Liyuan, Guoqiang Zhong, Xiang Liu, and Hongwei Xu. "A Compact Object Detection Architecture with Transformer Enhancing." Journal of Physics: Conference Series 2278, no. 1 (2022): 012034. http://dx.doi.org/10.1088/1742-6596/2278/1/012034.

Full text

Abstract:

Abstract With the advancements in rising computer vision processing, Transformer has attracted increasing interesting in this field. However, it is limited because of its unprecedented storage, heavy reliance on data size and intolerable computational power consumption. While lightweight network is in other extreme, pursuing the compact architectures accompanied by performance loss. In this paper, we enhance an architecture as the backbone of object detection networks through combining right-size Transformer, i.e. Vision Transformer module. Specifically, based on GhostNet, a well-known lightwe

APA, Harvard, Vancouver, ISO, and other styles

19

Sarraf, Saman, Arman Sarraf, Danielle D. DeSouza, John A. E. Anderson, and Milton Kabia. "OViTAD: Optimized Vision Transformer to Predict Various Stages of Alzheimer’s Disease Using Resting-State fMRI and Structural MRI Data." Brain Sciences 13, no. 2 (2023): 260. http://dx.doi.org/10.3390/brainsci13020260.

Full text

Abstract:

Advances in applied machine learning techniques for neuroimaging have encouraged scientists to implement models to diagnose brain disorders such as Alzheimer’s disease at early stages. Predicting the exact stage of Alzheimer’s disease is challenging; however, complex deep learning techniques can precisely manage this. While successful, these complex architectures are difficult to interrogate and computationally expensive. Therefore, using novel, simpler architectures with more efficient pattern extraction capabilities, such as transformers, is of interest to neuroscientists. This study introdu

APA, Harvard, Vancouver, ISO, and other styles

20

Song, Bofan, Dharma Raj KC, Rubin Yuchan Yang, Shaobai Li, Chicheng Zhang, and Rongguang Liang. "Classification of Mobile-Based Oral Cancer Images Using the Vision Transformer and the Swin Transformer." Cancers 16, no. 5 (2024): 987. http://dx.doi.org/10.3390/cancers16050987.

Full text

Abstract:

Oral cancer, a pervasive and rapidly growing malignant disease, poses a significant global health concern. Early and accurate diagnosis is pivotal for improving patient outcomes. Automatic diagnosis methods based on artificial intelligence have shown promising results in the oral cancer field, but the accuracy still needs to be improved for realistic diagnostic scenarios. Vision Transformers (ViT) have outperformed learning CNN models recently in many computer vision benchmark tasks. This study explores the effectiveness of the Vision Transformer and the Swin Transformer, two cutting-edge vari

APA, Harvard, Vancouver, ISO, and other styles

21

Jwalin, Thaker. "Hybrid Transformer-Based Architecture for Multi-Horizon Time Series Forecasting with Uncertainty Quantification." International Journal of Innovative Research in Engineering & Multidisciplinary Physical Sciences 11, no. 6 (2023): 1–6. https://doi.org/10.5281/zenodo.15086753.

Full text

Abstract:

Time series forecasting remains a critical challenge across numerous domains, with recent transformer-based ar- chitectures demonstrating remarkable capabilities in capturing complex temporal dependencies. This paper introduces a novel hybrid architecture that integrates state-of-the-art transformer models—including PatchTST, Temporal Fusion Transformers (TFT) [2], and Informer [3]—with traditional statistical methods to enhance multi-horizon forecasting performance. Our approach leverages specialized multi-head attention mechanisms for tempo- ral data, patch embedding techniques,

APA, Harvard, Vancouver, ISO, and other styles

22

Liu, Junchi, Xiang Zhang, and Zhigang Luo. "TransConv: Transformer Meets Contextual Convolution for Unsupervised Domain Adaptation." Entropy 26, no. 6 (2024): 469. http://dx.doi.org/10.3390/e26060469.

Full text

Abstract:

Unsupervised domain adaptation (UDA) aims to reapply the classifier to be ever-trained on a labeled source domain to a related unlabeled target domain. Recent progress in this line has evolved with the advance of network architectures from convolutional neural networks (CNNs) to transformers or both hybrids. However, this advance has to pay the cost of high computational overheads or complex training processes. In this paper, we propose an efficient alternative hybrid architecture by marrying transformer to contextual convolution (TransConv) to solve UDA tasks. Different from previous transfor

APA, Harvard, Vancouver, ISO, and other styles

23

Rahman, Tahsin, Ali Bilgin та Sergio D. Cabrera. "Multi-channel MRI reconstruction using cascaded Swinμ transformers with overlapped attention". Physics in Medicine & Biology 70, № 7 (2025): 075002. https://doi.org/10.1088/1361-6560/adb933.

Full text

Abstract:

Abstract Objective. Deep neural networks have been shown to be very effective at artifact reduction tasks such as magnetic resonance imaging (MRI) reconstruction from undersampled k-space data. In recent years, attention-based vision transformer models have been shown to outperform purely convolutional models at a wide variety of tasks, including MRI reconstruction. Our objective is to investigate the use of different transformer architectures for multi-channel cascaded MRI reconstruction. Approach. In this work, we explore the effective use of cascades of small transformers in multi-channel u

APA, Harvard, Vancouver, ISO, and other styles

24

Garg, Manav, Pranshav Gajjar, Pooja Shah, et al. "Comparative Analysis of Deep Learning Architectures and Vision Transformers for Musical Key Estimation." Information 14, no. 10 (2023): 527. http://dx.doi.org/10.3390/info14100527.

Full text

Abstract:

The musical key serves as a crucial element in a piece, offering vital insights into the tonal center, harmonic structure, and chord progressions while enabling tasks such as transposition and arrangement. Moreover, accurate key estimation finds practical applications in music recommendation systems and automatic music transcription, making it relevant across academic and industrial domains. This paper presents a comprehensive comparison between standard deep learning architectures and emerging vision transformers, leveraging their success in various domains. We evaluate their performance on a

APA, Harvard, Vancouver, ISO, and other styles

25

Manasa, Smt B. "Hybrid CNN-Transformer Architecture for Robust Deepfake Detection: A Keyframe-Based Evaluation." INTERNATIONAL JOURNAL OF SCIENTIFIC RESEARCH IN ENGINEERING AND MANAGEMENT 09, no. 05 (2025): 1–9. https://doi.org/10.55041/ijsrem46782.

Full text

Abstract:

Abstract - The proliferation of Deepfake content presents a significant threat to digital integrity and media authenticity. To address this challenge, we present a comprehensive evaluation of four deep learning architectures—Convolutional Neural Networks (CNN), Transformer-based models, CNN integrated with Long Short-Term Memory (CNN+LSTM), and a novel hybrid CNN–Transformer model—specifically applied to Deepfake detection using keyframes. Keyframes were extracted from the FaceForensics++ dataset, preserving high-resolution information crucial for robust detection. Each model was trained and t

APA, Harvard, Vancouver, ISO, and other styles

26

Lopez-Cabrejos, Josue, Thuanne Paixão, Ana Beatriz Alvarez, and Diodomiro Baldomero Luque. "An Efficient and Low-Complexity Transformer-Based Deep Learning Framework for High-Dynamic-Range Image Reconstruction." Sensors 25, no. 5 (2025): 1497. https://doi.org/10.3390/s25051497.

Full text

Abstract:

High-dynamic-range (HDR) image reconstruction involves creating an HDR image from multiple low-dynamic-range images as input, providing a computational solution to enhance image quality. This task presents several challenges, such as frame misalignment, overexposure, and motion, which are addressed using deep learning algorithms. In this context, various architectures with different approaches exist, such as convolutional neural networks, diffusion networks, generative adversarial networks, and Transformer-based architectures, with the latter offering the best quality but at a high computation

APA, Harvard, Vancouver, ISO, and other styles

27

Lorenzo, Javier, Ignacio Parra Alonso, Rubén Izquierdo, et al. "CAPformer: Pedestrian Crossing Action Prediction Using Transformer." Sensors 21, no. 17 (2021): 5694. http://dx.doi.org/10.3390/s21175694.

Full text

Abstract:

Anticipating pedestrian crossing behavior in urban scenarios is a challenging task for autonomous vehicles. Early this year, a benchmark comprising JAAD and PIE datasets have been released. In the benchmark, several state-of-the-art methods have been ranked. However, most of the ranked temporal models rely on recurrent architectures. In our case, we propose, as far as we are concerned, the first self-attention alternative, based on transformer architecture, which has had enormous success in natural language processing (NLP) and recently in computer vision. Our architecture is composed of vario

APA, Harvard, Vancouver, ISO, and other styles

28

Li, Zeen, Shuanghong Liu, Zhihua Fang, and Liang He. "Branch-Transformer: A Parallel Branch Architecture to Capture Local and Global Features for Language Identification." Applied Sciences 14, no. 11 (2024): 4681. http://dx.doi.org/10.3390/app14114681.

Full text

Abstract:

Currently, an increasing number of people are opting to use transformer models or conformer models for language identification, achieving outstanding results. Among them, transformer models based on self-attention can only capture global information, lacking finer local details. There are also approaches that employ conformer models by concatenating convolutional neural networks and transformers to capture both local and global information. However, this static single-branch architecture is difficult to interpret and modify, and it incurs greater inference difficulty and computational costs co

APA, Harvard, Vancouver, ISO, and other styles

29

Shao, Ran, Xiao-Jun Bi, and Zheng Chen. "A novel hybrid transformer-CNN architecture for environmental microorganism classification." PLOS ONE 17, no. 11 (2022): e0277557. http://dx.doi.org/10.1371/journal.pone.0277557.

Full text

Abstract:

The success of vision transformers (ViTs) has given rise to their application in classification tasks of small environmental microorganism (EM) datasets. However, due to the lack of multi-scale feature maps and local feature extraction capabilities, the pure transformer architecture cannot achieve good results on small EM datasets. In this work, a novel hybrid model is proposed by combining the transformer with a convolution neural network (CNN). Compared to traditional ViTs and CNNs, the proposed model achieves state-of-the-art performance when trained on small EM datasets. This is accomplish

APA, Harvard, Vancouver, ISO, and other styles

30

Xiong, Chuhao. "A Survey of Transformer Optimization Techniques: Progress and Challenges from Computational Efficiency to Multimodal Fusion." Applied and Computational Engineering 157, no. 1 (2025): 139–46. https://doi.org/10.54254/2755-2721/2025.po24682.

Full text

Abstract:

Since its proposal in 2017, the Transformer model has achieved revolutionary breakthroughs in natural language processing and even in computer vision tasks. However, its huge number of parameters and high computational complexity have posed substantial difficulties in training and inference efficiency, model knowledge updating, and multimodal information fusion. This paper reviews recent research progress on Transformer optimization techniques, including: (1) Structural optimization and computational efficiency model architecture improvements, pruning compression, and efficient attention mecha

APA, Harvard, Vancouver, ISO, and other styles

31

Li, Mingchen, Xuechen Zhang, Yixiao Huang, and Samet Oymak. "On the Power of Convolution-Augmented Transformer." Proceedings of the AAAI Conference on Artificial Intelligence 39, no. 17 (2025): 18393–402. https://doi.org/10.1609/aaai.v39i17.34024.

Full text

Abstract:

The transformer architecture has catalyzed revolutionary advances in language modeling. However, recent architectural recipes, such as state-space models, have bridged the performance gap. Motivated by this, we examine the benefits of Convolution-Augmented Transformer (CAT) for recall, copying, and length generalization tasks. CAT incorporates convolutional filters in the K/Q/V embeddings of an attention layer. Through CAT, we show that the locality of the convolution synergizes with the global view of the attention. Unlike comparable architectures, such as Mamba or transformer, CAT can provab

APA, Harvard, Vancouver, ISO, and other styles

32

Kingsuk Chakrabarty. "Enhancing Transformer Architecture: Techniques for Efficient Inference." International Journal of Scientific Research in Computer Science, Engineering and Information Technology 11, no. 2 (2025): 2749–56. https://doi.org/10.32628/cseit25112757.

Full text

Abstract:

This paper explores recent advancements in optimizing transformer architectures for efficient inference. We investigate various techniques including pruning, quantization, knowledge distillation, and architectural modifications. Our experimental results demonstrate that combining these approaches can reduce inference time by up to 74% while maintaining over 95% of the original performance. We also introduce a novel attention mechanism that dynamically allocates computational resources based on input complexity. Our implementation shows promise for edge device deployment where computational res

APA, Harvard, Vancouver, ISO, and other styles

33

Ibrahem, Hatem, Ahmed Salem, and Hyun-Soo Kang. "RT-ViT: Real-Time Monocular Depth Estimation Using Lightweight Vision Transformers." Sensors 22, no. 10 (2022): 3849. http://dx.doi.org/10.3390/s22103849.

Full text

Abstract:

The latest research in computer vision highlighted the effectiveness of the vision transformers (ViT) in performing several computer vision tasks; they can efficiently understand and process the image globally unlike the convolution which processes the image locally. ViTs outperform the convolutional neural networks in terms of accuracy in many computer vision tasks but the speed of ViTs is still an issue, due to the excessive use of the transformer layers that include many fully connected layers. Therefore, we propose a real-time ViT-based monocular depth estimation (depth estimation from sin

APA, Harvard, Vancouver, ISO, and other styles

34

Bogdanov, M. R., G. R. Shakhmametova, and N. N. Oskin. "Possibility of Using the Attention Mechanism in Multimodal Recognition of Cardiovascular Diseases." Programmnaya Ingeneria 15, no. 11 (2024): 578–88. http://dx.doi.org/10.17587/prin.15.578-588.

Full text

Abstract:

The paper is about studying the possibility of using the attention mechanism in diagnosing various cardiovascular diseases. Biomedical data were presented in different modalities (text, images, and time series). A comparison of the efficiency of 5 transformers based on the attention mechanism (Dosovitsky transformer, compact convolutional transformer, transformer with external attention, transformer based on tokenization with patch shift and local self-attention, transformer based on multiple deep attention) was carried out with the Exception convolutional neural network, three fully connecte

APA, Harvard, Vancouver, ISO, and other styles

35

Cheng, Kun, Lei Yu, Zhijun Tu, et al. "Effective Diffusion Transformer Architecture for Image Super-Resolution." Proceedings of the AAAI Conference on Artificial Intelligence 39, no. 3 (2025): 2455–63. https://doi.org/10.1609/aaai.v39i3.32247.

Full text

Abstract:

Recent advances indicate that diffusion model holds great promise in image super-resolution. While latest methods are primarily based on latent diffusion models with convolutional neural networks, there are few attempts to explore transformers, which have demonstrated remarkable performance in image generation. In this work, we design an effective diffusion transformer for image super resolution (DiT-SR) that achieves the visual quality of prior-based methods, but through a training-from-scratch manner. In practice, DiT-SR leverages an overall U-shaped architecture, and adopts uniform isotropi

APA, Harvard, Vancouver, ISO, and other styles

36

Kumar, Yulia, Kuan Huang, Chin-Chien Lin, et al. "Applying Swin Architecture to Diverse Sign Language Datasets." Electronics 13, no. 8 (2024): 1509. http://dx.doi.org/10.3390/electronics13081509.

Full text

Abstract:

In an era where artificial intelligence (AI) bridges crucial communication gaps, this study extends AI’s utility to American and Taiwan Sign Language (ASL and TSL) communities through advanced models like the hierarchical vision transformer with shifted windows (Swin). This research evaluates Swin’s adaptability across sign languages, aiming for a universal platform for the unvoiced. Utilizing deep learning and transformer technologies, it has developed prototypes for ASL-to-English translation, supported by an educational framework to facilitate learning and comprehension, with the intention

APA, Harvard, Vancouver, ISO, and other styles

37

Diash Firdaus, Idi Sumardi, Chalifa Chazar, and Muhamad Zufar Dafy. "Image-Based Malware Multiclass Classification Using Vision Transformer Architecture." Cyber Security dan Forensik Digital 8, no. 1 (2025): 72–79. https://doi.org/10.14421/csecurity.2025.8.1.5107.

Full text

Abstract:

Perkembangan malware yang semakin canggih telah menjadi ancaman serius bagi keamanan siber global, mengakibatkan kerugian finansial yang signifikan. Metode deteksi tradisional seperti deteksi berbasis tanda tangan dan analisis dinamis memiliki keterbatasan dalam mendeteksi varian malware baru. Sebagai solusi inovatif, analisis malware berbasis gambar mengubah file biner malware menjadi representasi gambar, memanfaatkan pemrosesan citra digital dan pembelajaran mesin untuk identifikasi yang lebih efisien. Penelitian ini menggunakan arsitektur Vision Transformer (ViT) untuk klasifikasi malware m

APA, Harvard, Vancouver, ISO, and other styles

38

Ampazis, Nicholas, and Flora Sakketou. "Diversifying Multi-Head Attention in the Transformer Model." Machine Learning and Knowledge Extraction 6, no. 4 (2024): 2618–38. http://dx.doi.org/10.3390/make6040126.

Full text

Abstract:

Recent studies have shown that, due to redundancy, some heads of the Transformer model can be pruned without diminishing the efficiency of the model. In this paper, we propose a constrained optimization algorithm based on Hebbian learning, which trains specific layers in the Transformer architecture in order to enforce diversification between the different heads in the multi-head attention module. The diversification of the heads is achieved through a single-layer feed-forward neural network that is added to the Transformer architecture and is trained with the proposed algorithm. We utilize th

APA, Harvard, Vancouver, ISO, and other styles

39

Chen, Jindou, and Yiqing Shen. "Memorizing Swin-Transformer Denoising Network for Diffusion Model." Electronics 13, no. 20 (2024): 4050. http://dx.doi.org/10.3390/electronics13204050.

Full text

Abstract:

Diffusion models have garnered significant attention in the field of image generation. However, existing denoising architectures, such as U-Net, face limitations in capturing the global context, while Vision Transformers (ViTs) may struggle with local receptive fields. To address these challenges, we propose a novel Swin-Transformer-based denoising network architecture that leverages the strengths of both U-Net and ViT. Moreover, our approach integrates the k-Nearest Neighbor (kNN) based memorizing attention module into the Swin-Transformer, enabling it to effectively harness crucial contextua

APA, Harvard, Vancouver, ISO, and other styles

40

Wang, Junpu, Guili Xu, Fuju Yan, Jinjin Wang, and Zhengsheng Wang. "Defect transformer: An efficient hybrid transformer architecture for surface defect detection." Measurement 211 (April 2023): 112614. http://dx.doi.org/10.1016/j.measurement.2023.112614.

Full text

APA, Harvard, Vancouver, ISO, and other styles

41

Raikwar, Piyush, Renato Cardoso, Nadezda Chernyavskaya, et al. "Transformers for Generalized Fast Shower Simulation." EPJ Web of Conferences 295 (2024): 09039. http://dx.doi.org/10.1051/epjconf/202429509039.

Full text

Abstract:

Recently, transformer-based foundation models have proven to be a generalized architecture applicable to various data modalities, ranging from text to audio and even a combination of multiple modalities. Transformers by design should accurately model the non-trivial structure of particle showers thanks to the absence of strong inductive bias, better modeling of long-range dependencies, and interpolation and extrapolation capabilities. In this paper, we explore a transformer-based generative model for detector-agnostic fast shower simulation, where the goal is to generate synthetic particle sho

APA, Harvard, Vancouver, ISO, and other styles

42

Zeng, Yiliang, Na Meng, Jinlin Zou, and Wenbin Liu. "PICT-Net: A Transformer-Based Network with Prior Information Correction for Hyperspectral Image Unmixing." Remote Sensing 17, no. 5 (2025): 869. https://doi.org/10.3390/rs17050869.

Full text

Abstract:

Transformers have performed favorably in recent hyperspectral unmixing studies in which the self-attention mechanism possesses the ability to retain spectral information and spatial details. However, the lack of reliable prior information for correction guidance has resulted in an inadequate accuracy and robustness of the network. To benefit from the advantages of the Transformer architecture and to improve the interpretability and robustness of the network, a dual-branch network with prior information correction, incorporating a Transformer network (PICT-Net), is proposed. The upper branch ut

APA, Harvard, Vancouver, ISO, and other styles

43

Pasquali, Dominic, Michele Grossi, and Sofia Vallecorsa. "Measurements With A Quantum Vision Transformer: A Naive Approach." EPJ Web of Conferences 295 (2024): 12003. http://dx.doi.org/10.1051/epjconf/202429512003.

Full text

Abstract:

In mainstream machine learning, transformers are gaining widespread usage. As Vision Transformers rise in popularity in computer vision, they now aim to tackle a wide variety of machine learning applications. In particular, transformers for High Energy Physics (HEP) experiments continue to be investigated for tasks including jet tagging, particle reconstruction, and pile-up mitigation. An improved Quantum Vision Transformer (QViT) with a quantum-enhanced self-attention mechanism is introduced and discussed. A shallow circuit is proposed for each component of self-attention to leverage current

APA, Harvard, Vancouver, ISO, and other styles

44

Alharthi, Musleh, and Ausif Mahmood. "xLSTMTime: Long-Term Time Series Forecasting with xLSTM." AI 5, no. 3 (2024): 1482–95. http://dx.doi.org/10.3390/ai5030071.

Full text

Abstract:

In recent years, transformer-based models have gained prominence in multivariate long-term time series forecasting (LTSF), demonstrating significant advancements despite facing challenges such as high computational demands, difficulty in capturing temporal dynamics, and managing long-term dependencies. The emergence of LTSF-Linear, with its straightforward linear architecture, has notably outperformed transformer-based counterparts, prompting a reevaluation of the transformer’s utility in time series forecasting. In response, this paper presents an adaptation of a recent architecture, termed e

APA, Harvard, Vancouver, ISO, and other styles

45

Li, Zonghan. "A comparative study of KAN transformer and traditional vision transformer for ultrasound image-based diagnosis." Applied and Computational Engineering 103, no. 1 (2024): 127–34. http://dx.doi.org/10.54254/2755-2721/103/20241181.

Full text

Abstract:

Abstract. In recent years, the MLP architecture has almost been monopolized in the field of deep learning, and its success is undeniable, but at the same time there are some problems. Kolmogorov-Arnold Network (KAN) is a new neural network architecture based on Kolmogorov-Arnold theory implementation. Compared to traditional MLPs, KANs have higher interpretability, faster training, and more efficient usability. In this paper, based on the theory of KAN, the author try to replace the Multi-Layer Perceptron (MLP) architecture in Vision Transformer (ViT) with the better performing KAN,author cond

APA, Harvard, Vancouver, ISO, and other styles

46

Polson, Sarah, and Vadim Sokolov. "Kolmogorov GAM Networks Are All You Need!" Entropy 27, no. 6 (2025): 593. https://doi.org/10.3390/e27060593.

Full text

Abstract:

Kolmogorov GAM (K-GAM) networks have been shown to be an efficient architecture for both training and inference. They are additive models with embeddings that are independent of the target function of interest. They provide an alternative to Transformer architectures. They are the machine learning version of Kolmogorov’s superposition theorem (KST), which provides an efficient representation of multivariate functions. Such representations are useful in machine learning for encoding dictionaries (a.k.a. “look-up” tables). KST theory also provides a representation based on translates of the Köpp

APA, Harvard, Vancouver, ISO, and other styles

47

Lee, Jaewoo, Sungjun Lee, Wonki Cho, Zahid Ali Siddiqui, and Unsang Park. "Vision Transformer-Based Tailing Detection in Videos." Applied Sciences 11, no. 24 (2021): 11591. http://dx.doi.org/10.3390/app112411591.

Full text

Abstract:

Tailing is defined as an event where a suspicious person follows someone closely. We define the problem of tailing detection from videos as an anomaly detection problem, where the goal is to find abnormalities in the walking pattern of the pedestrians (victim and follower). We, therefore, propose a modified Time-Series Vision Transformer (TSViT), a method for anomaly detection in video, specifically for tailing detection with a small dataset. We introduce an effective way to train TSViT with a small dataset by regularizing the prediction model. To do so, we first encode the spatial information

APA, Harvard, Vancouver, ISO, and other styles

48

Azhahudurai, K., and V. Veeramanikandan. "Time Series Forecasting of Air Pollutant PM2.5 Using Transformer Architecture." International Journal of Science and Research (IJSR) 12, no. 11 (2023): 2075–82. http://dx.doi.org/10.21275/sr231125192357.

Full text

APA, Harvard, Vancouver, ISO, and other styles

49

Jianwen, Mo, Mo Lunlin, Yuan Hua, Lin Leping, and Chen Lingping. "CNN with Embedding Transformers for Person Reidentification." Mathematical Problems in Engineering 2023 (July 14, 2023): 1–12. http://dx.doi.org/10.1155/2023/4591991.

Full text

Abstract:

For person reidentification (ReID), most slicing methods (such as part-based convolutional baseline (PCB) and AlignedReID) introduce a lot of background devoid of pedestrian parts, resulting in the cross-aliasing of features in the deep network. Besides, the resulting component features are not perfectly aligned with each other, thus affecting model performance. We propose a convolutional neural network (CNN) with embedding transformers (CET) person ReID network architecture based on the respective advantages of CNN and transformer. In CET, first, the residual transformer (RT) structure is fir

APA, Harvard, Vancouver, ISO, and other styles

50

Prawiratama, Rifqi Alfaesta. "Design of a Generative AI Image Similarity Test Application and Handmade Images Using Deep Learning Methods." Telematika 20, no. 3 (2023): 326. http://dx.doi.org/10.31315/telematika.v20i3.10096.

Full text

Abstract:

Purpose: The aim of this research is to develop a classification model using the Transformer approach, specifically the BEiT architecture, to differentiate between handmade images and AI Generative Art. The objective is to ensure the authenticity of art and address ethical and legal concerns related to AI Generative Art.Design/methodology/approach: The study utilizes the BEiT architecture within the Transformer approach to create a classification model. The training process uses Bidirectional Encoder representation from Image Transformers (BEiT) to improve image classification. The primary dat

APA, Harvard, Vancouver, ISO, and other styles

We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!