To see the other types of publications on this topic, follow the link: Kernel architecture.

Journal articles on the topic 'Kernel architecture'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 journal articles for your research on the topic 'Kernel architecture.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse journal articles on a wide variety of disciplines and organise your bibliography correctly.

1

Torres-Huitzil, Cesar. "Resource Efficient Hardware Architecture for Fast Computation of Running Max/Min Filters." Scientific World Journal 2013 (2013): 1–10. http://dx.doi.org/10.1155/2013/108103.

Full text
Abstract:
Running max/min filters on rectangular kernels are widely used in many digital signal and image processing applications. Filtering with ak×kkernel requires ofk2−1comparisons per sample for a direct implementation; thus, performance scales expensively with the kernel sizek. Faster computations can be achieved by kernel decomposition and using constant time one-dimensional algorithms on custom hardware. This paper presents a hardware architecture for real-time computation of running max/min filters based on the van Herk/Gil-Werman (HGW) algorithm. The proposed architecture design uses less computation and memory resources than previously reported architectures when targeted to Field Programmable Gate Array (FPGA) devices. Implementation results show that the architecture is able to compute max/min filters, on1024×1024images with up to255×255kernels, in around 8.4 milliseconds, 120 frames per second, at a clock frequency of 250 MHz. The implementation is highly scalable for the kernel size with good performance/area tradeoff suitable for embedded applications. The applicability of the architecture is shown for local adaptive image thresholding.
APA, Harvard, Vancouver, ISO, and other styles
2

Giannoula, Christina, Ivan Fernandez, Juan Gómez-Luna, Nectarios Koziris, Georgios Goumas, and Onur Mutlu. "Towards Efficient Sparse Matrix Vector Multiplication on Real Processing-In-Memory Architectures." ACM SIGMETRICS Performance Evaluation Review 50, no. 1 (2022): 33–34. http://dx.doi.org/10.1145/3547353.3522661.

Full text
Abstract:
Several manufacturers have already started to commercialize near-bank Processing-In-Memory (PIM) architectures, after decades of research efforts. Near-bank PIM architectures place simple cores close to DRAM banks. Recent research demonstrates that they can yield significant performance and energy improvements in parallel applications by alleviating data access costs. Real PIM systems can provide high levels of parallelism, large aggregate memory bandwidth and low memory access latency, thereby being a good fit to accelerate the Sparse Matrix Vector Multiplication (SpMV) kernel. SpMV has been characterized as one of the most significant and thoroughly studied scientific computation kernels. It is primarily a memory-bound kernel with intensive memory accesses due its algorithmic nature, the compressed matrix format used, and the sparsity patterns of the input matrices given. This paper provides the first comprehensive analysis of SpMV on a real-world PIM architecture, and presents SparseP, the first SpMV library for real PIM architectures. We make two key contributions. First, we design efficient SpMV algorithms to accelerate the SpMV kernel in current and future PIM systems, while covering a wide variety of sparse matrices with diverse sparsity patterns. Second, we provide the first comprehensive analysis of SpMV on a real PIM architecture. Specifically, we conduct our rigorous experimental analysis of SpMV kernels in the UPMEM PIM system, the first publicly-available real-world PIM architecture. Our extensive evaluation provides new insights and recommendations for software designers and hardware architects to efficiently accelerate the SpMV kernel on real PIM systems. For more information about our thorough characterization on the SpMV PIM execution, results, insights and the open-source SparseP software package [21], we refer the reader to the full version of the paper [3, 4]. The SparseP software package is publicly and freely available at https://github.com/CMU-SAFARI/SparseP.
APA, Harvard, Vancouver, ISO, and other styles
3

Muhammad, Ali, Weicheng Hu, Zhaoyang Li, et al. "Appraising the Genetic Architecture of Kernel Traits in Hexaploid Wheat Using GWAS." International Journal of Molecular Sciences 21, no. 16 (2020): 5649. http://dx.doi.org/10.3390/ijms21165649.

Full text
Abstract:
Kernel morphology is one of the major yield traits of wheat, the genetic architecture of which is always important in crop breeding. In this study, we performed a genome-wide association study (GWAS) to appraise the genetic architecture of the kernel traits of 319 wheat accessions using 22,905 single nucleotide polymorphism (SNP) markers from a wheat 90K SNP array. As a result, 111 and 104 significant SNPs for Kernel traits were detected using four multi-locus GWAS models (mrMLM, FASTmrMLM, FASTmrEMMA, and pLARmEB) and three single-locus models (FarmCPU, MLM, and MLMM), respectively. Among the 111 SNPs detected by the multi-locus models, 24 SNPs were simultaneously detected across multiple models, including seven for kernel length, six for kernel width, six for kernels per spike, and five for thousand kernel weight. Interestingly, the five most stable SNPs (RAC875_29540_391, Kukri_07961_503, tplb0034e07_1581, BS00074341_51, and BobWhite_049_3064) were simultaneously detected by at least three multi-locus models. Integrating these newly developed multi-locus GWAS models to unravel the genetic architecture of kernel traits, the mrMLM approach detected the maximum number of SNPs. Furthermore, a total of 41 putative candidate genes were predicted to likely be involved in the genetic architecture underlining kernel traits. These findings can facilitate a better understanding of the complex genetic mechanisms of kernel traits and may lead to the genetic improvement of grain yield in wheat.
APA, Harvard, Vancouver, ISO, and other styles
4

Patel, Chirag, Dulari Bhatt, Urvashi Sharma, et al. "DBGC: Dimension-Based Generic Convolution Block for Object Recognition." Sensors 22, no. 5 (2022): 1780. http://dx.doi.org/10.3390/s22051780.

Full text
Abstract:
The object recognition concept is being widely used a result of increasing CCTV surveillance and the need for automatic object or activity detection from images or video. Increases in the use of various sensor networks have also raised the need of lightweight process frameworks. Much research has been carried out in this area, but the research scope is colossal as it deals with open-ended problems such as being able to achieve high accuracy in little time using lightweight process frameworks. Convolution Neural Networks and their variants are widely used in various computer vision activities, but most of the architectures of CNN are application-specific. There is always a need for generic architectures with better performance. This paper introduces the Dimension-Based Generic Convolution Block (DBGC), which can be used with any CNN to make the architecture generic and provide a dimension-wise selection of various height, width, and depth kernels. This single unit which uses the separable convolution concept provides multiple combinations using various dimension-based kernels. This single unit can be used for height-based, width-based, or depth-based dimensions; the same unit can even be used for height and width, width and depth, and depth and height dimensions. It can also be used for combinations involving all three dimensions of height, width, and depth. The main novelty of DBGC lies in the dimension selector block included in the proposed architecture. Proposed unoptimized kernel dimensions reduce FLOPs by around one third and also reduce the accuracy by around one half; semi-optimized kernel dimensions yield almost the same or higher accuracy with half the FLOPs of the original architecture, while optimized kernel dimensions provide 5 to 6% higher accuracy with around a 10 M reduction in FLOPs.
APA, Harvard, Vancouver, ISO, and other styles
5

Kumar, Anish. "Linux Kernel Input Subsystem: Architecture and Programming Interface." International Journal of Science and Research (IJSR) 12, no. 3 (2023): 1852–54. http://dx.doi.org/10.21275/sr230311123408.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Solomon, D. A. "The Windows NT kernel architecture." Computer 31, no. 10 (1998): 40–47. http://dx.doi.org/10.1109/2.722284.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Zhou, Xiaojian, Qianqian Geng, and Ting Jiang. "Boosting RBFNN performance in regression tasks with quantum kernel methods." Journal of Statistical Mechanics: Theory and Experiment 2025, no. 6 (2025): 063101. https://doi.org/10.1088/1742-5468/add0a3.

Full text
Abstract:
Abstract Quantum and classical machine learning are fundamentally connected through kernel methods, with kernels serving as inner products of feature vectors in high-dimensional spaces, forming their foundation. Among commonly used kernels, the Gaussian kernel plays a prominent role in radial basis function neural network (RBFNN) for regression tasks. Nonetheless, the localized response property of the Gaussian kernel, which emphasizes relationships between nearby data points, limits its capacity to model interactions among more distant data points. As a result, it may potentially overlook the broader structural dependencies present within the dataset. In contrast, quantum kernels are commonly evaluated by explicitly generating quantum states and computing their inner products, thus leveraging additional quantum dimensions and capturing more intricate and complex data patterns. With the motivation of overcoming the problem above, we develop a hybrid quantum–classical model, called quantum kernel-based feedforward neural network (QKFNN) by leveraging quantum kernel methods (QKMs) to improve the prediction accuracy of RBFNN. In this study, we begin with a comprehensive introduction to QKMs, after which we present the architecture of QKFNN. To further refine model performance, an optimization strategy based on the general unitary transformation that involves a rotation factor is employed to obtain an optimized quantum kernel. The effectiveness of QKFNN is validated through experiments on synthetic and real-world datasets.
APA, Harvard, Vancouver, ISO, and other styles
8

Ganjdanesh, Alireza, Shangqian Gao, and Heng Huang. "EffConv: Efficient Learning of Kernel Sizes for Convolution Layers of CNNs." Proceedings of the AAAI Conference on Artificial Intelligence 37, no. 6 (2023): 7604–12. http://dx.doi.org/10.1609/aaai.v37i6.25923.

Full text
Abstract:
Determining kernel sizes of a CNN model is a crucial and non-trivial design choice and significantly impacts its performance. The majority of kernel size design methods rely on complex heuristic tricks or leverage neural architecture search that requires extreme computational resources. Thus, learning kernel sizes, using methods such as modeling kernels as a combination of basis functions, jointly with the model weights has been proposed as a workaround. However, previous methods cannot achieve satisfactory results or are inefficient for large-scale datasets. To fill this gap, we design a novel efficient kernel size learning method in which a size predictor model learns to predict optimal kernel sizes for a classifier given a desired number of parameters. It does so in collaboration with a kernel predictor model that predicts the weights of the kernels - given kernel sizes predicted by the size predictor - to minimize the training objective, and both models are trained end-to-end. Our method only needs a small fraction of the training epochs of the original CNN to train these two models and find proper kernel sizes for it. Thus, it offers an efficient and effective solution for the kernel size learning problem. Our extensive experiments on MNIST, CIFAR-10, STL-10, and ImageNet-32 demonstrate that our method can achieve the best training time vs. accuracy trade-off compared to previous kernel size learning methods and significantly outperform them on challenging datasets such as STL-10 and ImageNet-32. Our implementations are available at https://github.com/Alii-Ganjj/EffConv.
APA, Harvard, Vancouver, ISO, and other styles
9

Liu, Dake, Joar Sohl, and Jian Wang. "Parallel Programming and Its Architectures Based on Data Access Separated Algorithm Kernels." International Journal of Embedded and Real-Time Communication Systems 1, no. 1 (2010): 64–85. http://dx.doi.org/10.4018/jertcs.2010103004.

Full text
Abstract:
A novel master-multi-SIMD architecture and its kernel (template) based parallel programming flow is introduced as a parallel signal processing platform. The name of the platform is ePUMA (embedded Parallel DSP processor architecture with Unique Memory Access). The essential technology is to separate data accessing kernels from arithmetic computing kernels so that the run-time cost of data access can be minimized by running it in parallel with algorithm computing. The SIMD memory subsystem architecture based on the proposed flow dramatically improves the total computing performance. The hardware system and programming flow introduced in this article will primarily aim at low-power high-performance embedded parallel computing with low silicon cost for communications and similar real-time signal processing.
APA, Harvard, Vancouver, ISO, and other styles
10

Qureshi, Yasir Mahmood, William Andrew Simon, Marina Zapater, Katzalin Olcoz, and David Atienza. "Gem5-X." ACM Transactions on Architecture and Code Optimization 18, no. 4 (2021): 1–27. http://dx.doi.org/10.1145/3461662.

Full text
Abstract:
The increasing adoption of smart systems in our daily life has led to the development of new applications with varying performance and energy constraints, and suitable computing architectures need to be developed for these new applications. In this article, we present gem5-X, a system-level simulation framework, based on gem-5, for architectural exploration of heterogeneous many-core systems. To demonstrate the capabilities of gem5-X, real-time video analytics is used as a case-study. It is composed of two kernels, namely, video encoding and image classification using convolutional neural networks (CNNs). First, we explore through gem5-X the benefits of latest 3D high bandwidth memory (HBM2) in different architectural configurations. Then, using a two-step exploration methodology, we develop a new optimized clustered-heterogeneous architecture with HBM2 in gem5-X for video analytics application. In this proposed clustered-heterogeneous architecture, ARMv8 in-order cluster with in-cache computing engine executes the video encoding kernel, giving 20% performance and 54% energy benefits compared to baseline ARM in-order and Out-of-Order systems, respectively. Furthermore, thanks to gem5-X, we conclude that ARM Out-of-Order clusters with HBM2 are the best choice to run visual recognition using CNNs, as they outperform DDR4-based system by up to 30% both in terms of performance and energy savings.
APA, Harvard, Vancouver, ISO, and other styles
11

Giannoula, Christina, Ivan Fernandez, Juan Gómez Luna, Nectarios Koziris, Georgios Goumas, and Onur Mutlu. "SparseP." Proceedings of the ACM on Measurement and Analysis of Computing Systems 6, no. 1 (2022): 1–49. http://dx.doi.org/10.1145/3508041.

Full text
Abstract:
Several manufacturers have already started to commercialize near-bank Processing-In-Memory (PIM) architectures, after decades of research efforts. Near-bank PIM architectures place simple cores close to DRAM banks. Recent research demonstrates that they can yield significant performance and energy improvements in parallel applications by alleviating data access costs. Real PIM systems can provide high levels of parallelism, large aggregate memory bandwidth and low memory access latency, thereby being a good fit to accelerate the Sparse Matrix Vector Multiplication (SpMV) kernel. SpMV has been characterized as one of the most significant and thoroughly studied scientific computation kernels. It is primarily a memory-bound kernel with intensive memory accesses due its algorithmic nature, the compressed matrix format used, and the sparsity patterns of the input matrices given. This paper provides the first comprehensive analysis of SpMV on a real-world PIM architecture, and presents SparseP, the first SpMV library for real PIM architectures. We make three key contributions. First, we implement a wide variety of software strategies on SpMV for a multithreaded PIM core, including (1) various compressed matrix formats, (2) load balancing schemes across parallel threads and (3) synchronization approaches, and characterize the computational limits of a single multithreaded PIM core. Second, we design various load balancing schemes across multiple PIM cores, and two types of data partitioning techniques to execute SpMV on thousands of PIM cores: (1) 1D-partitioned kernels to perform the complete SpMV computation only using PIM cores, and (2) 2D-partitioned kernels to strive a balance between computation and data transfer costs to PIM-enabled memory. Third, we compare SpMV execution on a real-world PIM system with 2528 PIM cores to an Intel Xeon CPU and an NVIDIA Tesla V100 GPU to study the performance and energy efficiency of various devices, i.e., both memory-centric PIM systems and conventional processor-centric CPU/GPU systems, for the SpMV kernel. SparseP software package provides 25 SpMV kernels for real PIM systems supporting the four most widely used compressed matrix formats, i.e., CSR, COO, BCSR and BCOO, and a wide range of data types. SparseP is publicly and freely available at https://github.com/CMU-SAFARI/SparseP. Our extensive evaluation using 26 matrices with various sparsity patterns provides new insights and recommendations for software designers and hardware architects to efficiently accelerate the SpMV kernel on real PIM systems.
APA, Harvard, Vancouver, ISO, and other styles
12

Qu, Bo, and Zhao Zhi Wu. "Design of ARM Based Embedded Operating System Micro Kernel." Applied Mechanics and Materials 347-350 (August 2013): 1799–803. http://dx.doi.org/10.4028/www.scientific.net/amm.347-350.1799.

Full text
Abstract:
This paper describes the design and implementation of an ARM based embedded operating system micro kernel developed on Linux platform with GNU tool chain in technical details, including the three-layer architecture of the kernel (boot layer, core layer and task layer), multi-task schedule (priority for real-time and round-robin for time-sharing), IRQ handler, SWI handler, system calls, and inter-task communication based on which the micro-kernel architecture is constructed. On the foundation of this micro kernel, more components essential to a practical operating system, such as file system and TCP/IP processing, can be added in order to form a real and practical multi-task micro-kernel embedded operating system.
APA, Harvard, Vancouver, ISO, and other styles
13

Kawamura, Koji, Laurence Hibrand-Saint Oyant, Fabrice Foucher, Tatiana Thouroude, and Sébastien Loustau. "Kernel methods for phenotyping complex plant architecture." Journal of Theoretical Biology 342 (February 2014): 83–92. http://dx.doi.org/10.1016/j.jtbi.2013.10.016.

Full text
APA, Harvard, Vancouver, ISO, and other styles
14

Hidaka, Yasuo, Hanpei Koike, and Hidehiko Tanaka. "Architecture of parallel management kernel for PIE64." Future Generation Computer Systems 10, no. 1 (1994): 29–43. http://dx.doi.org/10.1016/0167-739x(94)90049-3.

Full text
APA, Harvard, Vancouver, ISO, and other styles
15

Ermann, L., A. D. Chepelianskii, and D. L. Shepelyansky. "Fractal Weyl law for Linux Kernel architecture." European Physical Journal B 79, no. 1 (2010): 115–20. http://dx.doi.org/10.1140/epjb/e2010-10774-7.

Full text
APA, Harvard, Vancouver, ISO, and other styles
16

Mohr-Daurat, Hubert, Xuan Sun, and Holger Pirk. "BOSS - An Architecture for Database Kernel Composition." Proceedings of the VLDB Endowment 17, no. 4 (2023): 877–90. http://dx.doi.org/10.14778/3636218.3636239.

Full text
Abstract:
Composable Database System Research has yielded components such as Apache Arrow for Storage, Meta's Velox for processing and Apache Calcite for query planning. What is lacking, however, is a design for a general, efficient and easy-to-use architecture to connect them. We propose such an architecture. Our proposal is based on the ideas of partial query evaluation and a carefully designed, unified exchange format for query plans and data. We implement the architecture in a system called BOSS 1 that combines the Apache Arrow, the GPU-accelerated compute kernel ArrayFire and the CPU-oriented Velox kernel into a fully-featured relational Data Management System (DMS). We demonstrate that the architecture is general enough to incorporate practically any DMS component, easy-to-use and virtually overhead-free. Based on the architecture, BOSS achieves significant performance improvement over the CPU-only Velox kernel and even outperforms the highly-optimized GPU-only DMS HeavyDB for some queries.
APA, Harvard, Vancouver, ISO, and other styles
17

Mohr-Daurat, Hubert, Xuan Sun, and Holger Pirk. "BOSS - An Architecture for Database Kernel Composition." ACM SIGMOD Record 54, no. 1 (2025): 37–46. https://doi.org/10.1145/3733620.3733629.

Full text
Abstract:
Composable Database System Research has yielded components such as Apache Arrow for Storage, Meta's Velox for processing and Apache Calcite for query planning. What is lacking, however, is a design for a general, efficient and easy-to-use architecture to connect them. We propose such an architecture. Our proposal is based on the ideas of partial query evaluation and a carefully designed, unified exchange format for query plans and data. We implement the architecture in a system called BOSS1 that combines the Apache Arrow, the GPU-accelerated compute kernel ArrayFire and the CPU-oriented Velox kernel into a fullyfeatured relational Data Management System (DMS). We demonstrate that the architecture is general enough to incorporate practically any DMScomponent, easy-to-use and virtually overhead-free. Based on the architecture, BOSS achieves significant performance improvement over the CPU only Velox kernel and even outperforms the highly-optimized GPU-only DMS HeavyDB for some queries.
APA, Harvard, Vancouver, ISO, and other styles
18

Metzger, Paul, Volker Seeker, Christian Fensch, and Murray Cole. "Device Hopping." ACM Transactions on Architecture and Code Optimization 18, no. 4 (2021): 1–25. http://dx.doi.org/10.1145/3471909.

Full text
Abstract:
Existing OS techniques for homogeneous many-core systems make it simple for single and multithreaded applications to migrate between cores. Heterogeneous systems do not benefit so fully from this flexibility, and applications that cannot migrate in mid-execution may lose potential performance. The situation is particularly challenging when a switch of language runtime would be desirable in conjunction with a migration. We present a case study in making heterogeneous CPU + GPU systems more flexible in this respect. Our technique for fine-grained application migration, allows switches between OpenMP, OpenCL, and CUDA execution, in conjunction with migrations from GPU to CPU, and CPU to GPU. To achieve this, we subdivide iteration spaces into slices, and consider migration on a slice-by-slice basis. We show that slice sizes can be learned offline by machine learning models. To further improve performance, memory transfers are made migration-aware. The complexity of the migration capability is hidden from programmers behind a high-level programming model. We present a detailed evaluation of our mid-kernel migration mechanism with the First Come, First Served scheduling policy. We compare our technique in a focused evaluation scenario against idealized kernel-by-kernel scheduling, which is typical for current systems, and makes perfect kernel to device scheduling decisions, but cannot migrate kernels mid-execution. Models show that up to 1.33× speedup can be achieved over these systems by adding fine-grained migration. Our experimental results with all nine applicable SHOC and Rodinia benchmarks achieve speedups of up to 1.30× (1.08× on average) over an implementation of a perfect but kernel-migration incapable scheduler when migrated to a faster device. Our mechanism and slice size choices introduce an average slowdown of only 2.44% if kernels never migrate. Lastly, our programming model reduces the code size by at least 88% if compared to manual implementations of migratable kernels.
APA, Harvard, Vancouver, ISO, and other styles
19

Mego, Roman, and Tomas Fryza. "Instruction mapping techniques for processors with very long instruction word architectures." Journal of Electrical Engineering 73, no. 6 (2022): 387–95. http://dx.doi.org/10.2478/jee-2022-0053.

Full text
Abstract:
Abstract This paper presents an instruction mapping technique for generating a low-level assembly code for digital signal processing algorithms. This technique helps developers to implement retargetable kernel functions with the performance benefits of the low-level assembly languages. The approach is aimed at exceptionally long instruction word (VLIW) architectures, which benefits the most from the proposed method. Mapped algorithms are described by the signal-flow graphs, which are used to find possible parallel operations. The algorithm is converted into low-level code and mapped to the target architecture. This process also introduces the optimization of instruction mapping priority, which leads to the more effective code. The technique was verified on selected kernels, compared to the common programming methods, and proved that it is suitable for VLIW architectures and for portability to other systems.
APA, Harvard, Vancouver, ISO, and other styles
20

Timmers, T., J. H. van Bemmel, and E. M. van Mulligen. "A New Architecture for Integration of Heterogeneous Software Components." Methods of Information in Medicine 32, no. 04 (1993): 292–301. http://dx.doi.org/10.1055/s-0038-1634934.

Full text
Abstract:
AbstractAn architecture is described that integrates existing applications in a network-wide system. The architecture follows the new open software paradigm, and defines kernel and application services that collaboratively solve the tasks of end-users and provide them with an intuitive user-interface. This paper describes the message language and the kernel mechanism for addressing application services. The architecture has been developed as much as possible to conform with current standards.
APA, Harvard, Vancouver, ISO, and other styles
21

Hamzah, Muhammad Amir, and Siti Hajar Othman. "Performance Evaluation of Support Vector Machine Kernels in Intrusion Detection System for Wireless Sensor Network." International Journal of Innovative Computing 12, no. 1 (2021): 9–15. http://dx.doi.org/10.11113/ijic.v12n1.334.

Full text
Abstract:
Wireless sensor network is very popular in the industrial application due to its characteristics of infrastructure-less wireless network and self-configured for physical and environmental conditions monitoring. However, the dynamic environments of wireless network expose WSN to network vulnerabilities. Intrusion Detection System (IDS) has been used to mitigate the vulnerability issue of network. Researches towards the efficiency improvement of WSN-IDS has been extensively done because the rapid growth of technologies influence the growth of network attacks. Implementation Support Vector Machine (SVM) was found to be one of the optimum algorithms for the improvement of WSN-IDS. Yet, classification efficiency of SVM is based on the kernel function used because different kernel gives different SVM architecture. Linear classification of SVM has limitation to maximize the margin due to the dynamic environment of wireless network which consist of nonlinear data. Since maximizing the margin is the primary goal of SVM, it is crucial to implement the optimum kernel in the classification of nonlinear data. Each SVM model in this research use different kernels which are Linear, RBF, Polynomial and Sigmoid kernels. Further, NSL-KDD dataset was used for the experiment of this research. Performance of each kernel were evaluated based on the experimental result obtained and it was found that RBF kernel provides the best classification accuracy with the score of 91%. Finally, discussion based on the findings was made.
APA, Harvard, Vancouver, ISO, and other styles
22

An, Fubang, Lingli Wang, and Xuegong Zhou. "A High Performance Reconfigurable Hardware Architecture for Lightweight Convolutional Neural Network." Electronics 12, no. 13 (2023): 2847. http://dx.doi.org/10.3390/electronics12132847.

Full text
Abstract:
Since the lightweight convolutional neural network EfficientNet was proposed by Google in 2019, the series of models have quickly become very popular due to their superior performance with a small number of parameters. However, the existing convolutional neural network hardware accelerators for EfficientNet still have much room to improve the performance of the depthwise convolution, squeeze-and-excitation module and nonlinear activation functions. In this paper, we first design a reconfigurable register array and computational kernel to accelerate the depthwise convolution. Next, we propose a vector unit to implement the nonlinear activation functions and the scale operation. An exchangeable-sequence dual-computational kernel architecture is proposed to improve the performance and the utilization. In addition, the memory architectures are designed to complete the hardware accelerator for the above computing architecture. Finally, in order to evaluate the performance of the hardware accelerator, the accelerator is implemented based on Xilinx XCVU37P. The results show that the proposed accelerator can work at the main system clock frequency of 300 MHz with the DSP kernel at 600 MHz. The performance of EfficientNet-B3 in our architecture can reach 69.50 FPS and 255.22 GOPS. Compared with the latest EfficientNet-B3 accelerator, which uses the same FPGA development board, the accelerator proposed in this paper can achieve a 1.28-fold improvement of single-core performance and 1.38-fold improvement of performance of each DSP.
APA, Harvard, Vancouver, ISO, and other styles
23

Kenter, Tobias, Henning Schmitz, and Christian Plessl. "Exploring Trade-Offs between Specialized Dataflow Kernels and a Reusable Overlay in a Stereo Matching Case Study." International Journal of Reconfigurable Computing 2015 (2015): 1–24. http://dx.doi.org/10.1155/2015/859425.

Full text
Abstract:
FPGAs are known to permit huge gains in performance and efficiency for suitable applications but still require reduced design efforts and shorter development cycles for wider adoption. In this work, we compare the resulting performance of two design concepts that in different ways promise such increased productivity. As common starting point, we employ a kernel-centric design approach, where computational hotspots in an application are identified and individually accelerated on FPGA. By means of a complex stereo matching application, we evaluate two fundamentally different design philosophies and approaches for implementing the required kernels on FPGAs. In the first implementation approach, we designed individually specialized data flow kernels in a spatial programming language for a Maxeler FPGA platform; in the alternative design approach, we target a vector coprocessor with large vector lengths, which is implemented as a form of programmable overlay on the application FPGAs of a Convey HC-1. We assess both approaches in terms of overall system performance, raw kernel performance, and performance relative to invested resources. After compensating for the effects of the underlying hardware platforms, the specialized dataflow kernels on the Maxeler platform are around 3x faster than kernels executing on the Convey vector coprocessor. In our concrete scenario, due to trade-offs between reconfiguration overheads and exposed parallelism, the advantage of specialized dataflow kernels is reduced to around 2.5x.
APA, Harvard, Vancouver, ISO, and other styles
24

K, Saranprabhakaran, Senthil A, Sritharan N, and senthil N. "Variability Studies in Maize (Zea Mays L.) inbreds through Morpho Physiological Traits, Principal Component Analysis and their relationship between yield components." Madras Agricultural Journal 108 (2021): 1–5. http://dx.doi.org/10.29321/maj.10.000473.

Full text
Abstract:
The presence of high genetic diversity in physiological traits among maize inbreds had scope for improving the inbreds for better canopy architecture. Eight maize inbreds were characterized by twelve morpho-physiological traits and four yield-related traits. Among the physiological traits, the photosynthetically active radiation (PAR) is evenly distributed in S38, S157, S289, and S322 inbreds at the canopy level. Leaf Dry Matter (LDM) had positive association (r = 0.734*) for 100 kernel weight. In Principal Component Analysis (PCA), the first two PCs were used to construct the biplot where the total number of kernels, cob girth, Average Growth Rate (AGR), and leaf dry matter had a positive association with S157, S322, and D164 inbreds. The inbred S157 recorded high leaf dry matter (47.55 g), more cob length (20.43 cm), more 100-kernel weight (39.32 g) and more average growth rate (6.18 g/day). Hence, S157 is considered as the best ideotype for the developing high yielding maize hybrids based on better canopy architecture.
APA, Harvard, Vancouver, ISO, and other styles
25

Wang, Zi, Yuqing Lan, Xinlei He, and Jianghua Lv. "A Formal Verification Approach for Linux Kernel Designing." Technologies 12, no. 8 (2024): 132. http://dx.doi.org/10.3390/technologies12080132.

Full text
Abstract:
Although the Linux kernel is widely used, its complexity makes errors common and potentially serious. Traditional formal verification methods often have high overhead and rely heavily on manual coding. They typically verify only specific functionalities of the kernel or target microkernels and do not support continuous verification of the entire kernel. To address these limitations, we introduce LMVM (Linux Kernel Modeling and Verification Method), a formal method based on type theory that ensures the correct design of the Linux architecture. In the model, the kernel is treated as a top-level type, subdivided into the following sublevels: subsystem, dentry, file, struct, function, and base. These types are defined in the structure and relationships. The verification process includes checking the design specifications for both type relationships and the presence of each type. Our contribution lies primarily in the following two points: 1. This is a lightweight verification. As long as the modeling is complete, architectural errors in the design phase can be identified promptly. 2. The designed “model refactor” module supports kernel updating, and the kernel can be continuously verified by extending the kernel model. To test its usefulness, we develop a set of security communication mechanisms in the kernel, which are verified using our method.
APA, Harvard, Vancouver, ISO, and other styles
26

Videau, Brice, Kevin Pouget, Luigi Genovese, et al. "BOAST." International Journal of High Performance Computing Applications 32, no. 1 (2017): 28–44. http://dx.doi.org/10.1177/1094342017718068.

Full text
Abstract:
The portability of real high-performance computing (HPC) applications on new platforms is an open and very delicate problem. Especially, the performance portability of the underlying computing kernels is problematic as they need to be tuned for each and every platform the application encounters. This article presents BOAST, a metaprogramming framework dedicated to computing kernels. BOAST allows the description of a kernel and its possible optimizations using a domain-specific language. BOAST runtime will then compare the different versions’performance as well as verify their exactness. BOAST is applied to three use cases: a Laplace kernel in OpenCL and two HPC applications BigDFT (electronic density computation) and SPECFEM3D (seismic and wave propagation).
APA, Harvard, Vancouver, ISO, and other styles
27

Liu, Jing, Tingting Wang, and Yulong Qiao. "Depth and Width Changeable Network-Based Deep Kernel Learning-Based Hyperspectral Sensor Data Analysis." Wireless Communications and Mobile Computing 2021 (February 20, 2021): 1–8. http://dx.doi.org/10.1155/2021/8842396.

Full text
Abstract:
Sensor data analysis is used in many application areas, for example, Artificial Intelligence of Things (AIoT), with the rapid developing of the deep neural network learning that promotes its application area. In this work, we propose the Depth and Width Changeable Deep Kernel Learning-based hyperspectral sensing data analysis algorithm. Compared with the traditional kernel learning-based hyperspectral data classification, the proposed method has its advantages on the hyperspectral data classification. With the deep kernel learning, the feature is mapped through many times mapping and has the more discriminative ability. So, the deep kernel learning has the better performance compared with the multiple kernels learning. And it has the ability to adjust the network architecture for hyperspectral data space, with the optimization equation of the span bound. The experiments are implemented to testified the feasibility and performance of the algorithms on the hyperspectral data analysis, with the classification accuracy of hyperspectral data. The comprehensive analysis of the experiments shows that the proposed algorithm is feasible to hyperspectral sensor data analysis and its promising classification method in many areas data analysis.
APA, Harvard, Vancouver, ISO, and other styles
28

Khalifa, Khaled Ben, Ahmed Ghazi Blaiech, Mehdi Abadi, and Mohamed Hedi Bedoui. "New Hardware Architecture for Self-Organizing Map Used for Color Vector Quantization." Journal of Circuits, Systems and Computers 29, no. 01 (2019): 2050002. http://dx.doi.org/10.1142/s0218126620500024.

Full text
Abstract:
In this paper, we present a new generic architectural approach of a Self-Organizing Map (SOM). The proposed architecture, called the Diagonal-SOM (D-SOM), is described as an Hardware–Description-Language as an intellectual property kernel with easily adjustable parameters.The D-SOM architecture is based on a generic formalism that exploits two levels of the nested parallelism of neurons and connections. This solution is therefore considered as a system based on the cooperation of a distributed set of independent computations. The organization and structure of these calculations process an oriented data flow in order to find a better treatment distribution between different neuroprocessors. To validate the D-SOM architecture, we evaluate the performance of several SOM network architectures after their integration on a Xilinx Virtex-7 Field Programmable Gate Array support. The proposed solution allows the easy adaptation of learning to a large number of SOM topologies without any considerable design effort. [Formula: see text] SOM hardware is validated through FPGA implementation, where temporal performance is almost twice as fast as that obtained in the recent literature. The suggested D-SOM architecture is also validated through simulation on variable-sized SOM networks applied to color vector quantization.
APA, Harvard, Vancouver, ISO, and other styles
29

SELIVORSTOVA, TATJANA, SERGEY KLISHCH, SERHII KYRYCHENKO, ANTON GUDA, and KATERYNA OSTROVSKAYA. "ANALYSIS OF MONOLITHIC AND MICROSERVICE ARCHITECTURES FEATURES AND METRICS." Computer systems and information technologies, no. 3 (April 14, 2022): 59–65. http://dx.doi.org/10.31891/csit-2021-5-8.

Full text
Abstract:
In this paper the information technologies stack is presented. Thesetechnologies are used during network architecture deployment. The analysis of technological advantages and drawbacks under investigation for monolithic and network architectures will be useful during of cyber security analysis in telecom networks. The analysis of the main numeric characteristics was carried out with the aid of Kubectl. The results of a series of numerical experiments on the evaluation of the response speed to requests and the fault tolerance are presented. The characteristics of the of monolithic and microservice-based architectures scalability are under investigation. For the time series sets, which characterize the network server load, the value of the Hurst exponent was calculated.
 The research main goal is the monolithic and microservice architecture main characteristics analysis, time series data from the network server accruing, and their statistical analysis.
 The methodology of Kubernetes clusters deploying using Minikube, Kubectl, Docker has been used. Application deploy on AWS ECS virtual machine with monolithic architecture and on the Kubernetes cluster (AWS EKS) were conducted.
 The investigation results gives us the confirmation, that the microservices architecture would be more fault tolerance and flexible in comparison with the monolithic architecture. Time series fractal analysis on the server equipment load showed the presence of long-term dependency, so that we can treat the traffic implementation as a self-similar process.
 The scientific novelty of the article lies in the application of fractal analysis to real time series: use of the kernel in user space, kernel latency, RAM usage, caching of RAM collected over 6 months with a step of 10 seconds, establishing a long-term dependence of time series data.
 The practical significance of the research is methodology creation of the monolithic and microservice architectures deployment and exploitation, as well as the use of time series fractal analysis for the network equipment load exploration.
APA, Harvard, Vancouver, ISO, and other styles
30

Ahmad, Ola, and Freddy Lecue. "FisheyeHDK: Hyperbolic Deformable Kernel Learning for Ultra-Wide Field-of-View Image Recognition." Proceedings of the AAAI Conference on Artificial Intelligence 36, no. 6 (2022): 5968–75. http://dx.doi.org/10.1609/aaai.v36i6.20542.

Full text
Abstract:
Conventional convolution neural networks (CNNs) trained on narrow Field-of-View (FoV) images are the state-of-the art approaches for object recognition tasks. Some methods proposed the adaptation of CNNs to ultra-wide FoV images by learning deformable kernels. However, they are limited by the Euclidean geometry and their accuracy degrades under strong distortions caused by fisheye projections. In this work, we demonstrate that learning the shape of convolution kernels in non-Euclidean spaces is better than existing deformable kernel methods. In particular, we propose a new approach that learns deformable kernel parameters (positions) in hyperbolic space. FisheyeHDK is a hybrid CNN architecture combining hyperbolic and Euclidean convolution layers for positions and features learning. First, we provide intuition of hyperbolic space for wide FoV images. Using synthetic distortion profiles, we demonstrate the effectiveness of our approach. We select two datasets - Cityscapes and BDD100K 2020 - of perspective images which we transform to fisheye equivalents at different scaling factors (analogue to focal lengths). Finally, we provide an experiment on data collected by a real fisheye camera. Validations and experiments show that our approach improves existing deformable kernel methods for CNN adaptation on fisheye images.
APA, Harvard, Vancouver, ISO, and other styles
31

Coppolino, Gabriele, Carlo Condo, Guido Masera, and Warren J. Gross. "A Multi-Kernel Multi-Code Polar Decoder Architecture." IEEE Transactions on Circuits and Systems I: Regular Papers 65, no. 12 (2018): 4413–22. http://dx.doi.org/10.1109/tcsi.2018.2855679.

Full text
APA, Harvard, Vancouver, ISO, and other styles
32

Arguello, F., J. D. Bruguera, R. Doallo, and E. L. Zapata. "Parallel architecture for fast transforms with trigonometric kernel." IEEE Transactions on Parallel and Distributed Systems 5, no. 10 (1994): 1091–99. http://dx.doi.org/10.1109/71.313124.

Full text
APA, Harvard, Vancouver, ISO, and other styles
33

Sequeira, João, and M. Isabel Ribeiro. "A Behaviour-Based Kernel Architecture for Robot Control ⋆." IFAC Proceedings Volumes 30, no. 20 (1997): 787–92. http://dx.doi.org/10.1016/s1474-6670(17)44352-7.

Full text
APA, Harvard, Vancouver, ISO, and other styles
34

You-lei, Chen, and Shen Chang-xiang. "A security kernel architecture based trusted computing platform." Wuhan University Journal of Natural Sciences 10, no. 1 (2005): 1–4. http://dx.doi.org/10.1007/bf02828604.

Full text
APA, Harvard, Vancouver, ISO, and other styles
35

Pramanik, P., P. K. Das, A. K. Bandyopadhyay, and D. Q. M. Fay. "A deadlock-free communication kernel for loop architecture." Information Processing Letters 38, no. 3 (1991): 157–61. http://dx.doi.org/10.1016/0020-0190(91)90239-e.

Full text
APA, Harvard, Vancouver, ISO, and other styles
36

AMAMIYA, MAKOTO, HIDEO TANIGUCHI, and TAKANORI MATSUZAKI. "AN ARCHITECTURE OF FUSING COMMUNICATION AND EXECUTION FOR GLOBAL DISTRIBUTED PROCESSING." Parallel Processing Letters 11, no. 01 (2001): 7–24. http://dx.doi.org/10.1142/s0129626401000397.

Full text
Abstract:
We are pursuing the FUCE architecture project at Kyushu University. FUCE means FUsion of Communication and Execution. The main objective of our research is, as the name shows, to develop a new architecture that truly fuses communication and computation. The FUCE project develops a new on-chip-multi-processor and kernel software on it. We name the processor FUCE processor, and the kernel software as CEFOS (Communication and Execution Fusion OS). The FUCE processor is designed as a network node processor to perform mainly switching/transmitting of messages/transaction and handling its contents. FUCE processor architecture is designed as a multiprocessor-on-chip to support the fine-grain multi-threading. The kernel software CEFOS is also developed on the concept of multithreading. User and system processes are constructed as a set of threads, which are executed concurrently according to thread dependences.
APA, Harvard, Vancouver, ISO, and other styles
37

Ranganadh, Narayanam. "NOVEL QUAD PARALLELIZED ARCHITECTURE FOR DIGITAL IMAGE PROCESSING CONVOLUTION ON FPGAs." INTERNATIONAL JOURNAL OF ENGINEERING SCIENCES & RESEARCH TECHNOLOGY 9, no. 4 (2020): 162–67. https://doi.org/10.5281/zenodo.3778567.

Full text
Abstract:
The Digital Image Processing convolution is core block for Convolution Neural Networks (CNN) which is used in Deep CNNs and is used for advanced applications of feature extraction, image recognition etc. This paper introduces 2 novel hardware architectures for convolution process one of them is hardware Quad parallelized architecture. Performance comparison is done and I proved that parallelization is highly useful. This parallel one can speed up the process of convolution filtered image data transmission through telemedicine communication network etc. Implementation is done for 64 by 64 size matrix, 3 by 3 Kernel using Xilinx Vivado 2015.2 tool on Xilinx Artix-7 FPGAs using Verilog HDL.
APA, Harvard, Vancouver, ISO, and other styles
38

Wang, Cheng, Huangai Li, Yan Long, et al. "A Systemic Investigation of Genetic Architecture and Gene Resources Controlling Kernel Size-Related Traits in Maize." International Journal of Molecular Sciences 24, no. 2 (2023): 1025. http://dx.doi.org/10.3390/ijms24021025.

Full text
Abstract:
Grain yield is the most critical and complex quantitative trait in maize. Kernel length (KL), kernel width (KW), kernel thickness (KT) and hundred-kernel weight (HKW) associated with kernel size are essential components of yield-related traits in maize. With the extensive use of quantitative trait locus (QTL) mapping and genome-wide association study (GWAS) analyses, thousands of QTLs and quantitative trait nucleotides (QTNs) have been discovered for controlling these traits. However, only some of them have been cloned and successfully utilized in breeding programs. In this study, we exhaustively collected reported genes, QTLs and QTNs associated with the four traits, performed cluster identification of QTLs and QTNs, then combined QTL and QTN clusters to detect consensus hotspot regions. In total, 31 hotspots were identified for kernel size-related traits. Their candidate genes were predicted to be related to well-known pathways regulating the kernel developmental process. The identified hotspots can be further explored for fine mapping and candidate gene validation. Finally, we provided a strategy for high yield and quality maize. This study will not only facilitate causal genes cloning, but also guide the breeding practice for maize.
APA, Harvard, Vancouver, ISO, and other styles
39

Shin, Kilho, and Tetsuji Kuboyama. "A Generalization of Haussler's Convolution Kernel — Mapping Kernel and Its Application to Tree Kernels." Journal of Computer Science and Technology 25, no. 5 (2010): 1040–54. http://dx.doi.org/10.1007/s11390-010-9386-1.

Full text
APA, Harvard, Vancouver, ISO, and other styles
40

Danesh, Ahmad Reza, and Mehdi Habibi. "A signed pulse-train-based image processor-array for parallel kernel convolution in vision sensors." Sensor Review 40, no. 4 (2020): 521–28. http://dx.doi.org/10.1108/sr-10-2019-0242.

Full text
Abstract:
Purpose The purpose of this paper is to design a kernel convolution processor. High-speed image processing is a challenging task for real-time applications such as product quality control of manufacturing lines. Smart image sensors use an array of in-pixel processors to facilitate high-speed real-time image processing. These sensors are usually used to perform the initial low-level bulk image filtering and enhancement. Design/methodology/approach In this paper, using pulse-width modulated signals and regular nearest neighbor interconnections, a convolution image processor is presented. The presented processor is not only capable of processing arbitrary size kernels but also the kernel coefficients can be any arbitrary positive or negative floating number. Findings The performance of the proposed architecture is evaluated on a Xilinx Virtex-7 field programmable gate array platform. The peak signal-to-noise ratio metric is used to measure the computation error for different images, filters and illuminations. Finally, the power consumption of the circuit in different operating conditions is presented. Originality/value The presented processor array can be used for high-speed kernel convolution image processing tasks including arbitrary size edge detection and sharpening functions, which require negative and fractional kernel values.
APA, Harvard, Vancouver, ISO, and other styles
41

Liedtke, J. "On micro-kernel construction." ACM SIGOPS Operating Systems Review 29, no. 5 (1995): 237–50. http://dx.doi.org/10.1145/224057.224075.

Full text
APA, Harvard, Vancouver, ISO, and other styles
42

A, Mahalingam, Robin S, Pushpam R, and Mohanasundaram K. "Genetic Architecture of Grain Quality Traits in the Biparental Progenies (BIPs) of Rice (Oryza sativa. L)." Madras Agricultural Journal 99, December (2012): 657–61. http://dx.doi.org/10.29321/maj.10.100164.

Full text
Abstract:
Investigation was carried out with an objective of studying the genetic architecture of biparental progenies (BIPs) of JGL 384 x Rasi cross combination of rice in terms of grain quality parameters. Intermated progenies (BIPs) showed superior mean performance than their parents, F1 , F2 and F 3 generations for hulling (76.24%), milling (72.24%), head rice recovery (67.48%), kernel length after cooking (9.78), kernel L / B ratio after cooking (3.08), linear elongation ratio (1.66), volume expansion ratio (4.94) and amylose content (22.64%). Enhanced trait mean value might be due to pooling of favorable alleles through recombination, considerable heterozygosity, accumulation of favorable alleles of low frequency and breaking up of undesirable linkages which was possible because of intermating. Combining ability analysis of NCD II revealed that cooking traits like kernel length after cooking and volume expansion ratio were governed by additive genes. Hence, for the improvement of these traits pure line selection, mass selection and or progeny selection will be suggested and selection might be effective at this level. Other grain quality traits viz., hulling percentage, milling percentage, head rice recovery, kernel length, kernel breadth, kernel L/B ratio, kernel breadth after cooking, kernel L/B ratio after cooking, linear elongation ratio, breadth wise expansion ratio, gel consistency, gelatinization temperature and amylose content were governed by dominant genes. Hence, these traits could be improved by recombination breeding by taking up selection at later generations, alternatively one or two more cycles of intermating may break the undesirable linkages among the traits of interest.
APA, Harvard, Vancouver, ISO, and other styles
43

Ou, Chien Min, Wen Jyi Hwang, and Ssu Min Yang. "Efficient Hardware Architecture for Kernel Fuzzy C-Means Algorithm." Applied Mechanics and Materials 284-287 (January 2013): 3079–86. http://dx.doi.org/10.4028/www.scientific.net/amm.284-287.3079.

Full text
Abstract:
A novel VLSI architecture for kernel fuzzy c-means algorithm is presented in this paper. The architecture consists of efficient circuits for the computation of kernel functions, membership coefficients and cluster centers. In addition, the usual iterative operations for updating the membership matrix and cluster centers are merged into one single updating process to evade the large storage requirement. The circuit is used as a hardware accelerator of a softcore processor in a system-on-programmable chip for physical performance measurement. Experimental results show that the proposed solution is an effective alternative for cluster analysis with low computational cost and high performance.
APA, Harvard, Vancouver, ISO, and other styles
44

Wu, Ya Hu, Ning Han, and Di Yan. "The Research and Implementation of KNX Communication Kernel Based on ATMega32." Advanced Materials Research 433-440 (January 2012): 3269–75. http://dx.doi.org/10.4028/www.scientific.net/amr.433-440.3269.

Full text
Abstract:
According to current research review of KNX technology, a hardware platform which core processor is ATMega32 is designed to implement KNX communication kernel. The software architecture of KNX communication kernel is key research in this paper, which is divided into three parts: communication, management and application. Meanwhile, working principle and achieved function of KNX communication kernel are discussed in detail. Finally, an application example is used to test the function of kernel. Research and implementation of KNX communication kernel based on new hardware platform will significantly reduce development costs of KNX device, benefit for application for KNX, and prompt market penetration.
APA, Harvard, Vancouver, ISO, and other styles
45

Habibi, Mehdi, and Ahmad Reza Danesh. "A digital arbitrary size kernel convolution smart image sensor based on in-pixel pulse width processors." Sensor Review 37, no. 4 (2017): 468–77. http://dx.doi.org/10.1108/sr-03-2017-0035.

Full text
Abstract:
Purpose The purpose of this study is to propose a pulse width based, in-pixel, arbitrary size kernel convolution processor. When image sensors are used in machine vision tasks, large amount of data need to be transferred to the output and fed to a processor. Basic and low-level image processing functions such as kernel convolution is used extensively in the early stages of most machine vision tasks. These low-level functions are usually computationally extensive and if the computation is performed inside every pixel, the burden on the external processor will be greatly reduced. Design/methodology/approach In the proposed architecture, digital pulse width processing is used to perform kernel convolution on the image sensor data. With this approach, while the photocurrent fluctuations are expressed with changes in the pulse width of an output signal, the small processor incorporated in each pixel receives the output signal of the corresponding pixel and its neighbors and produces a binary coded output result for that specific pixel. The process is commenced in parallel among all pixels of the image sensor. Findings It is shown that using the proposed architecture, not only kernel convolution can be performed in the digital domain inside smart image sensors but also arbitrary kernel coefficients are obtainable simply by adjusting the sampling frequency at different phases of the processing. Originality/value Although in-pixel digital kernel convolution has been previously reported however with the presented approach no in-pixel analog to binary coded digital converter is required. Furthermore, arbitrary kernel coefficients and scaling can be deployed in the processing. The given architecture is a suitable choice for smart image sensors which are to be used in high-speed machine vision tasks.
APA, Harvard, Vancouver, ISO, and other styles
46

Joseph Raj, Alex Noel, Lianhong Cai, Wei Li, Zhemin Zhuang, and Tardi Tjahjadi. "FPGA-based systolic deconvolution architecture for upsampling." PeerJ Computer Science 8 (May 11, 2022): e973. http://dx.doi.org/10.7717/peerj-cs.973.

Full text
Abstract:
A deconvolution accelerator is proposed to upsample n × n input to 2n × 2n output by convolving with a k × k kernel. Its architecture avoids the need for insertion and padding of zeros and thus eliminates the redundant computations to achieve high resource efficiency with reduced number of multipliers and adders. The architecture is systolic and governed by a reference clock, enabling the sequential placement of the module to represent a pipelined decoder framework. The proposed accelerator is implemented on a Xilinx XC7Z020 platform, and achieves a performance of 3.641 giga operations per second (GOPS) with resource efficiency of 0.135 GOPS/DSP for upsampling 32 × 32 input to 256 × 256 output using a 3 × 3 kernel at 200 MHz. Furthermore, its high peak signal to noise ratio of almost 80 dB illustrates that the upsampled outputs of the bit truncated accelerator are comparable to IEEE double precision results.
APA, Harvard, Vancouver, ISO, and other styles
47

Hiri, Mustafa, Nabil Ourdani, Mohamed Chrayah, Abeer Alsadoon, and Noura Aknin. "Adaptive kernel integration in visual geometry group 16 for enhanced classification of diabetic retinopathy stages in retinal images." IAES International Journal of Artificial Intelligence (IJ-AI) 14, no. 2 (2025): 1484. https://doi.org/10.11591/ijai.v14.i2.pp1484-1495.

Full text
Abstract:
Diabetic retinopathy (DR) is a major cause of vision impairment globally, with early detection remaining a significant challenge. The limitations of current diagnostic methods, particularly in identifying early-stage DR, highlight a pressing need for more accurate diagnostic technologies. In response, our research introduces an innovative model that enhances the visual geometry group 16 (VGG16) architecture with adaptive kernel techniques. Traditionally, the VGG16 model deploys consistent kernel sizes throughout its convolutional layers. In this study, multiple convolutional branches with varying kernel sizes (3×3, 5×5, and 7×7) were seamlessly integrated after the 'block5_conv1' layer of VGG16. These branches were adaptively merged using a softmax-weighted combination, enabling the model to automatically prioritize kernel sizes based on the image's intricate features. To combat the challenge of imbalanced datasets, the synthetic minority over-sampling technique (SMOTE) was employed before training, harmonizing the distribution of the five DR stages. Our results are promising, showing a training accuracy above 94.17% and a validation accuracy over 90.24%, our model significantly outperforms traditional methods. This study represents a significant stride in applying adaptive kernels to deep learning for precise medical imaging tasks. The model's accuracy in classifying DR stages highlights its potential as a valuable diagnostic tool, paving the way for future enhancements in DR detection and management.
APA, Harvard, Vancouver, ISO, and other styles
48

Di, Bang, Daokun Hu, Zhen Xie, et al. "TLB-pilot: Mitigating TLB Contention Attack on GPUs with Microarchitecture-Aware Scheduling." ACM Transactions on Architecture and Code Optimization 19, no. 1 (2022): 1–23. http://dx.doi.org/10.1145/3491218.

Full text
Abstract:
Co-running GPU kernels on a single GPU can provide high system throughput and improve hardware utilization, but this raises concerns on application security. We reveal that translation lookaside buffer (TLB) attack, one of the common attacks on CPU, can happen on GPU when multiple GPU kernels co-run. We investigate conditions or principles under which a TLB attack can take effect, including the awareness of GPU TLB microarchitecture, being lightweight, and bypassing existing software and hardware mechanisms. This TLB-based attack can be leveraged to conduct Denial-of-Service (or Degradation-of-Service) attacks. Furthermore, we propose a solution to mitigate TLB attacks. In particular, based on the microarchitecture properties of GPU, we introduce a software-based system, TLB-pilot, that binds thread blocks of different kernels to different groups of streaming multiprocessors by considering hardware isolation of last-level TLBs and the application’s resource requirement. TLB-pilot employs lightweight online profiling to collect kernel information before kernel launches. By coordinating software- and hardware-based scheduling and employing a kernel splitting scheme to reduce load imbalance, TLB-pilot effectively mitigates TLB attacks. The result shows that when under TLB attack, TLB-pilot mitigates the attack and provides on average 56.2% and 60.6% improvement in average normalized turnaround times and overall system throughput, respectively, compared to the traditional Multi-Process Service based co-running solution. When under TLB attack, TLB-pilot also provides up to 47.3% and 64.3% improvement (41% and 42.9% on average) in average normalized turnaround times and overall system throughput, respectively, compared to a state-of-the-art co-running solution for efficiently scheduling of thread blocks.
APA, Harvard, Vancouver, ISO, and other styles
49

Lou, Li, and Yong Li. "A Seismic Image Denoising Method Based on Kernel-prediction CNN Architecture." International Journal on Artificial Intelligence Tools 29, no. 07n08 (2020): 2040012. http://dx.doi.org/10.1142/s0218213020400126.

Full text
Abstract:
To filter noises and preserve the details of seismic images, a denoising method based on kernel prediction convolution neural network (CNN) architecture is proposed. The method consists of two convolution layers and a residual connection, containing a source sensing encoder, a spatial feature extractor and a kernel predictor. The scalar kernel was normalized by the softmax function to obtain the denoised images. In addition, to avoid excessive blur at the expense of image details, the authors put forward the concept of asymmetric loss function, which would enable users to control the level of residual noise and make a trade-off between variance and deviation. The experimental results show the proposed method achieved good denoising effect. Compared with some other excellent methods, the proposed method increased the peak signal-to-noise ratio (PSNR) by about 1.0–3.2 dB for seismic images without discontinuity, and about 1.8–3.9 dB for seismic images with discontinuity.
APA, Harvard, Vancouver, ISO, and other styles
50

Sari, Dewi Novita, Muh Arif Rahman, and Randy Cahya Wihandika. "Uji Parameter dan Arsitektur Convolutional Neural Network untuk Mendeteksi Citra Wajah Bermasker." Jurnal Teknologi Informasi dan Ilmu Komputer 9, no. 7 (2022): 1707. http://dx.doi.org/10.25126/jtiik.2022976776.

Full text
Abstract:
<p class="Abstrak"><span lang="IN">Deteksi citra wajah bermasker dibutuhkan pada masa pandemi COVID-19 oleh lembaga-lembaga yang terhubung langsung dengan masyarakat karenakan terbatasnya sumber daya manusia dalam melakukan deteksi wajah bermasker secara konvensional. Penggunaan masker dalam aktivitas sehari-hari merupakan salah satu protokol perlindungan diri dari COVID-19 yang wajib diterapkan. </span>C<span lang="IN">itra wajah bermasker digunakan</span><span lang="IN">sebagai data masukan dengan proses deteksi menggunakan <em>Convolutional Neural Network</em> (CNN). Deteksi citra wajah bermasker telah banyak dilakukan dengan berbagai bentuk arsitektur model, akan tetapi tidak disertai dengan penjelasan dari pemilihan parameter yang digunakan. Pembuatan model dapat menjadi efisien jika dilakukan dengan mengetahui hubungan keterkaitan antar parameter yang diterapkan. Oleh karenanya</span>, <span lang="IN">penelitian ini dilakukan dengan tujuan untuk mengetahui hubungan keterkaitan antar parameter dalam arsitektur model CNN</span>. <span lang="IN">Sehingga dapat dihasilkan performa terbaik dalam mendeteksi citra wajah bermasker. Hubungan keterkaitan antar parameter yang diteliti terbatas pada ukuran <em>kernel</em> dan jumlah <em>kernel</em> karena peran aktif keduanya dalam melakukan pelatihan data. Dua ukuran <em>kernel</em> yang digunakan yaitu 3×3 dan 5×5 dengan jumlah 3 dan 6 buah. Empat arsitektur model dibangun dengan 7 <em>layer</em> penyusun menggunakan kombinasi parameter tersebut. Pelatihan model dilakukan menggunakan data citra wajah bermasker dan tidak bermasker berjumlah 3150 citra dengan 15 <em>epoch</em>, kemudian diuji menggunakan 1350 citra. Performa terbaik diperoleh dari kombinasi parameter ukuran <em>kernel</em> 5×5 berjumlah 6 buah pada setiap <em>convolutional layer</em>. Nilai <em>f1-score</em> terbaik yang diperoleh sebesar 0,95 dengan akurasi 0,95 dan nilai rata-rata <em>loss</em> 0,1692. Berdasarkan hasil tersebut, disimpulkan bahwa parameter ukuran <em>kernel</em> dan jumlah <em>kernel</em> memiliki hubungan keterkaitan dalam menghasilkan nilai performa arsitektur model CNN terbaik untuk pendeteksian citra wajah bermasker.</span></p><p class="Abstrak"><span lang="IN"><br /></span></p><p class="Abstrak"><em><strong><span lang="IN">Abstract</span></strong></em></p><p><em><span>Detection of masked face images is needed during the COVID-19 pandemic by institutions directly connected to the community due to limited human resources to perform conventional masked face detection. Using masks in daily activities is one of the self-protective protocols from COVID-19 that must be implemented. Masked face images are used as input data, the detection process uses Convolutional Neural Network (CNN). Detection of masked face images has been carried out with various forms of model architecture but is not accompanied by an explanation of the selected parameters used. Modeling can be done efficiently by knowing the relationship between the applied parameters. Therefore, this study aims to know the relationship between parameters in the CNN model architecture so that the best performance can be produced in detecting masked face images. The study of the relationship between parameters is limited to the size of the kernel and the number of kernels because of their active role in the data training. The two kernel sizes used are 3×3 and 5×5, with a total of 3 and 6 pieces. Four model architectures are built with seven layers using a combination of these parameters. The model training was carried out using masked and maskless faces of 3150 images with 15 epochs, then tested using 1350 images. The best performance is obtained from 6 pieces of 5×5 kernel size in each convolutional layer. The best f1-score value obtained is 0.95, with an accuracy of 0.95 and an average loss value of 0.1692. Based on these results, it is concluded that the kernel size parameter and the number of kernels have a relationship in producing the best CNN architectural performance value for masked face image detection.</span></em></p><p class="Abstrak"><span lang="IN"><br /></span></p>
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!