Relevant bibliographies by topics / Architecture manycore

Journal articles
Dissertations / Theses
Books
Book chapters
Conference papers

Academic literature on the topic 'Architecture manycore'

Author: Grafiati

Published: 4 June 2021

Last updated: 25 May 2024

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the lists of relevant articles, books, theses, conference reports, and other scholarly sources on the topic 'Architecture manycore.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Journal articles on the topic "Architecture manycore"

Muddukrishna, Ananya, Peter A. Jonsson, and Mats Brorsson. "Locality-Aware Task Scheduling and Data Distribution for OpenMP Programs on NUMA Systems and Manycore Processors." Scientific Programming 2015 (2015): 1–16. http://dx.doi.org/10.1155/2015/981759.

Full text

Abstract:

Performance degradation due to nonuniform data access latencies has worsened on NUMA systems and can now be felt on-chip in manycore processors. Distributing data across NUMA nodes and manycore processor caches is necessary to reduce the impact of nonuniform latencies. However, techniques for distributing data are error-prone and fragile and require low-level architectural knowledge. Existing task scheduling policies favor quick load-balancing at the expense of locality and ignore NUMA node/manycore cache access latencies while scheduling. Locality-aware scheduling, in conjunction with or as a replacement for existing scheduling, is necessary to minimize NUMA effects and sustain performance. We present a data distribution and locality-aware scheduling technique for task-based OpenMP programs executing on NUMA systems and manycore processors. Our technique relieves the programmer from thinking of NUMA system/manycore processor architecture details by delegating data distribution to the runtime system and uses task data dependence information to guide the scheduling of OpenMP tasks to reduce data stall times. We demonstrate our technique on a four-socket AMD Opteron machine with eight NUMA nodes and on the TILEPro64 processor and identify that data distribution and locality-aware task scheduling improve performance up to 69% for scientific benchmarks compared to default policies and yet provide an architecture-oblivious approach for programmers.

APA, Harvard, Vancouver, ISO, and other styles

Choudhury, Dwaipayan, Aravind Sukumaran Rajam, Ananth Kalyanaraman, and Partha Pratim Pande. "High-Performance and Energy-Efficient 3D Manycore GPU Architecture for Accelerating Graph Analytics." ACM Journal on Emerging Technologies in Computing Systems 18, no. 1 (January 31, 2022): 1–19. http://dx.doi.org/10.1145/3482880.

Full text

Abstract:

Recent advances in GPU-based manycore accelerators provide the opportunity to efficiently process large-scale graphs on chip. However, real world graphs have a diverse range of topology and connectivity patterns (e.g., degree distributions) that make the design of input-agnostic hardware architectures a challenge. Network-on-Chip (NoC)- based architectures provide a way to overcome this challenge as the architectural topology can be used to approximately model the expected traffic patterns that emerge from graph application workloads. In this paper, we first study the mix of long- and short-range traffic patterns generated on-chip using graph workloads, and subsequently use the findings to adapt the design of an optimal NoC-based architecture. In particular, by leveraging emerging three-dimensional (3D) integration technology, we propose design of a small-world NoC (SWNoC)- enabled manycore GPU architecture, where the placement of the links connecting the streaming multiprocessors (SM) and the memory controllers (MC) follow a power-law distribution. The proposed 3D manycore GPU architecture outperforms the traditional planar (2D) counterparts in both performance and energy consumption. Moreover, by adopting a joint performance-thermal optimization strategy, we address the thermal concerns in a 3D design without noticeably compromising the achievable performance. The 3D integration technology is also leveraged to incorporate Near Data Processing (NDP) to complement the performance benefits introduced by the SWNoC architecture. As graph applications are inherently memory intensive, off-chip data movement gives rise to latency and energy overheads in the presence of external DRAM. In conventional GPU architectures, as the main memory layer is not integrated with the logic, off-chip data movement negatively impacts overall performance and energy consumption. We demonstrate that NDP significantly reduces the overheads associated with such frequent and irregular memory accesses in graph-based applications. The proposed SWNoC-enabled NDP framework that integrates 3D memory (like Micron's HMC) with a massive number of GPU cores achieves 29.5% performance improvement and 30.03% less energy consumption on average compared to a conventional planar Mesh-based design with external DRAM.

APA, Harvard, Vancouver, ISO, and other styles

Korolija, Nenad, and Kent Milfeld. "Towards hybrid supercomputing architectures." Journal of Computer and Forensic Sciences 1, no. 1 (2022): 47–54. http://dx.doi.org/10.5937/1-42710.

Full text

Abstract:

In light of recent work on combining control-flow and dataflow architectures on the same chip die, a new architecture based on an asymmetric multicore processor is proposed. The control-flow architectures are described as a most commonly used computer architecture today. Both multicore and manycore architectures are explained, as they are based on the same principles. A dataflow computing model assumes that data input flows through hardware as either a software or hardware dataflow implementation. In software dataflow, processors based on the control-flow paradigm process tasks based on their availability from the same queue (if there are any). In hardware dataflow architectures, the hardware is configured for a particular algorithm, and data input is streamed into the hardware, and the output is streamed back to the multicore processor for further processing. Hardware dataflow architectures are usually implemented with FPGAs. Hybrid architectures employ asymmetric multicore and manycore computer architectures that are based on the control-flow and hardware dataflow architecture, all combined on the same chip die. Advantages include faster processing time, lower power consumption (and heating), and less space needed for the hardware.

APA, Harvard, Vancouver, ISO, and other styles

Arka, Aqeeb Iqbal, Biresh Kumar Joardar, Ryan Gary Kim, Dae Hyun Kim, Janardhan Rao Doppa, and Partha Pratim Pande. "HeM3D." ACM Transactions on Design Automation of Electronic Systems 26, no. 2 (February 2021): 1–21. http://dx.doi.org/10.1145/3424239.

Full text

Abstract:

Heterogeneous manycore architectures are the key to efficiently execute compute- and data-intensive applications. Through-silicon-via (TSV)-based 3D manycore system is a promising solution in this direction as it enables the integration of disparate computing cores on a single system. Recent industry trends show the viability of 3D integration in real products (e.g., Intel Lakefield SoC Architecture, the AMD Radeon R9 Fury X graphics card, and Xilinx Virtex-7 2000T/H580T, etc.). However, the achievable performance of conventional TSV-based 3D systems is ultimately bottlenecked by the horizontal wires (wires in each planar die). Moreover, current TSV 3D architectures suffer from thermal limitations. Hence, TSV-based architectures do not realize the full potential of 3D integration. Monolithic 3D (M3D) integration, a breakthrough technology to achieve “More Moore and More Than Moore,” opens up the possibility of designing cores and associated network routers using multiple layers by utilizing monolithic inter-tier vias (MIVs) and hence, reducing the effective wire length. Compared to TSV-based 3D integrated circuits (ICs), M3D offers the “true” benefits of vertical dimension for system integration: the size of an MIV used in M3D is over 100 × smaller than a TSV. This dramatic reduction in via size and the resulting increase in density opens up numerous opportunities for design optimizations in 3D manycore systems: designers can use up to millions of MIVs for ultra-fine-grained 3D optimization, where individual cores and routers can be spread across multiple tiers for extreme power and performance optimization. In this work, we demonstrate how M3D-enabled vertical core and uncore elements offer significant performance and thermal improvements in manycore heterogeneous architectures compared to its TSV-based counterpart. To overcome the difficult optimization challenges due to the large design space and complex interactions among the heterogeneous components (CPU, GPU, Last Level Cache, etc.) in a M3D-based manycore chip, we leverage novel design-space exploration algorithms to trade off different objectives. The proposed M3D-enabled heterogeneous architecture, called HeM3D , outperforms its state-of-the-art TSV-equivalent counterpart by up to 18.3% in execution time while being up to 19°C cooler.

APA, Harvard, Vancouver, ISO, and other styles

Lahdhiri, Habiba, Jordane Lorandel, Salvatore Monteleone, Emmanuelle Bourdel, and Maurizio Palesi. "Framework for Design Exploration and Performance Analysis of RF-NoC Manycore Architecture." Journal of Low Power Electronics and Applications 10, no. 4 (November 3, 2020): 37. http://dx.doi.org/10.3390/jlpea10040037.

Full text

Abstract:

The Network-on-chip (NoC) paradigm has been proposed as a promising solution to enable the handling of a high degree of integration in multi-/many-core architectures. Despite their advantages, wired NoC infrastructures are facing several performance issues regarding multi-hop long-distance communications. RF-NoC is an attractive solution offering high performance and multicast/broadcast capabilities. However, managing RF links is a critical aspect that relies on both application-dependent and architectural parameters. This paper proposes a design space exploration framework for OFDMA-based RF-NoC architecture, which takes advantage of both real application benchmarks simulated using Sniper and RF-NoC architecture modeled using Noxim. We adopted the proposed framework to finely configure a routing algorithm, working with real traffic, achieving up to 45% of delay reduction, compared to a wired NoC setup in similar conditions.

APA, Harvard, Vancouver, ISO, and other styles

LI, Hongliang, Fang ZHENG, Ziyu HAO, Hongguang GAO, Feng GUO, Yong TANG, Hui LV, Xin LIU, and Fangyuan CHEN. "Research on homegrown manycore architecture for intelligent computing." SCIENTIA SINICA Informationis 49, no. 3 (March 1, 2019): 247–55. http://dx.doi.org/10.1360/n112018-00283.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Dévigne, Clément, Jean-Baptiste Bréjon, Quentin L. Meunier, and Franck Wajsbürt. "Executing secured virtual machines within a manycore architecture." Microprocessors and Microsystems 48 (February 2017): 21–35. http://dx.doi.org/10.1016/j.micpro.2016.09.008.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Li, Mingzhen, Yi Liu, Hailong Yang, Zhongzhi Luan, Lin Gan, Guangwen Yang, and Depei Qian. "Accelerating Sparse Cholesky Factorization on Sunway Manycore Architecture." IEEE Transactions on Parallel and Distributed Systems 31, no. 7 (July 1, 2020): 1636–50. http://dx.doi.org/10.1109/tpds.2019.2953852.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Hosseini, Morteza, and Tinoosh Mohsenin. "Binary Precision Neural Network Manycore Accelerator." ACM Journal on Emerging Technologies in Computing Systems 17, no. 2 (April 2021): 1–27. http://dx.doi.org/10.1145/3423136.

Full text

Abstract:

This article presents a low-power, programmable, domain-specific manycore accelerator, Binarized neural Network Manycore Accelerator (BiNMAC), which adopts and efficiently executes binary precision weight/activation neural network models. Such networks have compact models in which weights are constrained to only 1 bit and can be packed several in one memory entry that minimizes memory footprint to its finest. Packing weights also facilitates executing single instruction, multiple data with simple circuitry that allows maximizing performance and efficiency. The proposed BiNMAC has light-weight cores that support domain-specific instructions, and a router-based memory access architecture that helps with efficient implementation of layers in binary precision weight/activation neural networks of proper size. With only 3.73% and 1.98% area and average power overhead, respectively, novel instructions such as Combined Population-Count-XNOR , Patch-Select , and Bit-based Accumulation are added to the instruction set architecture of the BiNMAC, each of which replaces execution cycles of frequently used functions with 1 clock cycle that otherwise would have taken 54, 4, and 3 clock cycles, respectively. Additionally, customized logic is added to every core to transpose 16×16-bit blocks of memory on a bit-level basis, that expedites reshaping intermediate data to be well-aligned for bitwise operations. A 64-cluster architecture of the BiNMAC is fully placed and routed in 65-nm TSMC CMOS technology, where a single cluster occupies an area of 0.53 mm 2 with an average power of 232 mW at 1-GHz clock frequency and 1.1 V. The 64-cluster architecture takes 36.5 mm 2 area and, if fully exploited, consumes a total power of 16.4 W and can perform 1,360 Giga Operations Per Second (GOPS) while providing full programmability. To demonstrate its scalability, four binarized case studies including ResNet-20 and LeNet-5 for high-performance image classification, as well as a ConvNet and a multilayer perceptron for low-power physiological applications were implemented on BiNMAC. The implementation results indicate that the population-count instruction alone can expedite the performance by approximately 5×. When other new instructions are added to a RISC machine with existing population-count instruction, the performance is increased by 58% on average. To compare the performance of the BiNMAC with other commercial-off-the-shelf platforms, the case studies with their double-precision floating-point models are also implemented on the NVIDIA Jetson TX2 SoC (CPU+GPU). The results indicate that, within a margin of ∼2.1%--9.5% accuracy loss, BiNMAC on average outperforms the TX2 GPU by approximately 1.9× (or 7.5× with fabrication technology scaled) in energy consumption for image classification applications. On low power settings and within a margin of ∼3.7%--5.5% accuracy loss compared to ARM Cortex-A57 CPU implementation, BiNMAC is roughly ∼9.7×--17.2× (or 38.8×--68.8× with fabrication technology scaled) more energy efficient for physiological applications while meeting the application deadline.

APA, Harvard, Vancouver, ISO, and other styles

Silva, Bruno A. da, Arthur M. Lima, Janier Arias-Garcia, Michael Huebner, and Jones Yudi. "A Manycore Vision Processor for Real-Time Smart Cameras." Sensors 21, no. 21 (October 27, 2021): 7137. http://dx.doi.org/10.3390/s21217137.

Full text

Abstract:

Real-time image processing and computer vision systems are now in the mainstream of technologies enabling applications for cyber-physical systems, Internet of Things, augmented reality, and Industry 4.0. These applications bring the need for Smart Cameras for local real-time processing of images and videos. However, the massive amount of data to be processed within short deadlines cannot be handled by most commercial cameras. In this work, we show the design and implementation of a manycore vision processor architecture to be used in Smart Cameras. With massive parallelism exploration and application-specific characteristics, our architecture is composed of distributed processing elements and memories connected through a Network-on-Chip. The architecture was implemented as an FPGA overlay, focusing on optimized hardware utilization. The parameterized architecture was characterized by its hardware occupation, maximum operating frequency, and processing frame rate. Different configurations ranging from one to eighty-one processing elements were implemented and compared to several works from the literature. Using a System-on-Chip composed of an FPGA integrated into a general-purpose processor, we showcase the flexibility and efficiency of the hardware/software architecture. The results show that the proposed architecture successfully allies programmability and performance, being a suitable alternative for future Smart Cameras.

APA, Harvard, Vancouver, ISO, and other styles

More sources

Dissertations / Theses on the topic "Architecture manycore"

Cho, Myong Hyon Ph D. Massachusetts Institute of Technology. "On-chip networks for manycore architecture." Thesis, Massachusetts Institute of Technology, 2013. http://hdl.handle.net/1721.1/84885.

Full text

Abstract:

Thesis (Ph. D.)--Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2013.
Cataloged from PDF version of thesis.
Includes bibliographical references (pages 109-116).
Over the past decade, increasing the number of cores on a single processor has successfully enabled continued improvements of computer performance. Further scaling these designs to tens and hundreds of cores, however, still presents a number of hard problems, such as scalability, power efficiency and effective programming models. A key component of manycore systems is the on-chip network, which faces increasing efficiency demands as the number of cores grows. In this thesis, we present three techniques for improving the efficiency of on-chip interconnects. First, we present PROM (Path-based, Randomized, Oblivious, and Minimal routing) and BAN (Bandwidth Adaptive Networks), techniques that offer efficient intercore communication for bandwith-constrained networks. Next, we present ENC (Exclusive Native Context), the first deadlock-free, fine-grained thread migration protocol developed for on-chip networks. ENC demonstrates that a simple and elegant technique in the on-chip network can provide critical functional support for higher-level application and system layers. Finally, we provide a realistic context by sharing our hands-on experience in the physical implementation of the on-chip network for the Execution Migration Machine, an ENC-based 110-core processor fabricated in 45nm ASIC technology.
by Myong Hyon Cho.
Ph.D.

APA, Harvard, Vancouver, ISO, and other styles

Stubbfält, Erik. "Hardware Architecture Impact on Manycore Programming Model." Thesis, Uppsala universitet, Institutionen för informationsteknologi, 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-441739.

Full text

Abstract:

This work investigates how certain processor architectures can affectthe implementation and performance of a parallel programming model.The Ericsson Many-Core Architecture (EMCA) is compared and contrastedto general-purpose multicore processors, highlighting differencesin their memory systems and processor cores. A proof-of-conceptimplementation of the Concurrency Building Blocks (CBB) programmingmodel is developed for x86-64 using MPI. Benchmark tests showhow CBB on EMCA handles compute-intensive and memory-intensivescenarios, compared to a high-end x86-64 machine running the proofof-concept implementation. EMCA shows its strengths in heavy computationswhile x86-64 performs at its best with high degrees of datareuse. Both systems are able to utilize locality in their memory systemsto achieve great performance benefits.

APA, Harvard, Vancouver, ISO, and other styles

Dévigne, Clément. "Exécution sécurisée de plusieurs machines virtuelles sur une plateforme Manycore." Thesis, Paris 6, 2017. http://www.theses.fr/2017PA066138/document.

Full text

Abstract:

Les architectures manycore, qui comprennent un grand nombre de cœurs, sont un moyen de répondre à l'augmentation continue de la quantité de données numériques à traiter par les infrastructures proposant des services de cloud computing. Ces données, qui peuvent concerner des entreprises aussi bien que des particuliers, sont sensibles par nature, et c'est pourquoi la problématique d'isolation est primordiale. Or, depuis le début du développement du cloud computing, des techniques de virtualisation sont de plus en plus utilisées pour permettre à différents utilisateurs de partager physiquement les mêmes ressources matérielles. Cela est d'autant plus vrai pour les architectures manycore, et il revient donc en partie aux architectures de garantir la confidentialité et l'intégrité des données des logiciels s'exécutant sur la plateforme. Nous proposons dans cette thèse un environnement de virtualisation sécurisé pour une architecture manycore. Notre mécanisme s'appuie sur des composants matériels et un logiciel hyperviseur pour isoler plusieurs systèmes d'exploitation s'exécutant sur la même architecture. L'hyperviseur est en charge de l'allocation des ressources pour les systèmes d'exploitation virtualisés, mais ne possède pas de droit d'accès sur les ressources allouées à ces systèmes. Ainsi, une faille de sécurité dans l'hyperviseur ne met pas en péril la confidentialité ou l'intégrité des données des systèmes virtualisés. Notre solution est évaluée en utilisant un prototype virtuel précis au cycle et a été implémentée dans une architecture manycore à mémoire partagée cohérente. Nos évaluations portent sur le surcoût matériel et sur la dégradation en performance induits par nos mécanismes. Enfin, nous analysons la sécurité apportée par notre solution
Manycore architectures, which comprise a lot of cores, are a way to answer the always growing demand for digital data processing, especially in a context of cloud computing infrastructures. These data, which can belong to companies as well as private individuals, are sensitive by nature, and this is why the isolation problematic is primordial. Yet, since the beginning of cloud computing, virtualization techniques are more and more used to allow different users to physically share the same hardware resources. This is all the more true for manycore architectures, and it partially comes down to the architectures to guarantee that data integrity and confidentiality are preserved for the software it executes. We propose in this thesis a secured virtualization environment for a manycore architecture. Our mechanism relies on hardware components and a hypervisor software to isolate several operating systems running on the same architecture. The hypervisor is in charge of allocating resources for the virtualized operating systems, but does not have the right to access the resources allocated to these systems. Thus, a security flaw in the hypervisor does not imperil data confidentiality and integrity of the virtualized systems. Our solution is evaluated on a cycle-accurate virtual prototype and has been implemented in a coherent shared memory manycore architecture. Our evaluations target the hardware and performance overheads added by our mechanisms. Finally, we analyze the security provided by our solution

APA, Harvard, Vancouver, ISO, and other styles

Azar, Céline. "On the design of a distributed adaptive manycore architecture for embedded systems." Lorient, 2012. http://www.theses.fr/2012LORIS268.

Full text

Abstract:

Des défis de conception ont émergé récemment à différents niveaux: l'augmentation du nombre de processeurs sur puce au niveau matériel, la complexité des modèles de programmation parallèles, et les exigences dynamiques des applications actuelles. Face à cette évolution, la thèse propose une architecture distribuée adaptative nommée CEDAR (Configurable Embedded Distributed ARchitecture) dont les principaux atouts sont la scalabilité, la flexibilité et la simplicité. La plateforme CEDAR est une matrice homogène de processeurs RISC, chacun connecté à ses quatre proches voisins via des buffers, partagés par les processeurs adjacents. Aucun contrôle global n'existe, celui-ci étant réparti entre les cœurs. Deux versions sont conçues pour la plateforme, avec un modèle de programmation simple. Une version logicielle, CEDAR-S (CEDAR-Software), est l'implémentation de base où les processeurs adjacents sont reliés via des buffers partagés. Un co-processeur, nommé DMC (Direct Management of Communications), est ajouté dans la version CEDAR-H (CEDAR-Hardware), afin d'optimiser le protocole de routage. Les DMCs sont interconnectés en mesh 2D. Deux nouveaux concepts sont proposés afin de gérer l'adaptabilité de CEDAR. En premier, un algorithme de routage bio-inspiré, dynamique et distribué gère le routage de manière non-supervisée, et est indépendant de la position physique des tâches communicantes. Le deuxième concept consiste en la migration distribuée et dynamique de tâches afin de répondre aux besoins du système et des applications. CEDAR présente des performances élevées avec sa stratégie de routage optimisée, par rapport à l'état de l'art des réseaux sur puce. Le coût de la migration est évalué et des protocoles adéquats sont présentés. CEDAR s'avère être un design prometteur pour les architectures manycœurs
Chip design challenges emerged lately at many levels: the increase of the number of cores at the hardware stage, the complexity of the parallel programming models at the software level, and the dynamic requirements of current applications. Facing this evolution, the PhD thesis aims to design a distributed adaptive manycore architecture, named CEDAR (Configurable Embedded Distributed ARchitecture), which main assets are scalability, flexibility and simplicity. The CEDAR platform is an array of homogeneous, small footprint, RISC processors, each connected to its four nearest neighbors. No global control exists, yet it is distributed among the cores. Two versions are designed for the platform, along with a user-familiar programming model. A software version, CEDAR-S, is the basic implementation where adjacent cores are connected to each other via shared buffers. A co-processor called DMC (Direct Management of Communications) is added in the CEDAR-H version, to optimize the routing protocol. The DMCs are interconnected in a mesh fashion. Two novel concepts are proposed to enhance the adaptiveness of CEDAR. First, a distributed dynamic routing strategy, based on a bio-inspired algorithm, handles routing in a non-supervised fashion, and is independent of the physical placement of communicating tasks. The second concept presents dynamic distributed task migration in response to several system and application requirements. Results show that CEDAR scores high performances with its optimized routing strategy, compared to state-of-art networks. The migration cost is evaluated and adequate protocols are presented. CEDAR is shown to be a promising design concept for future manycores

APA, Harvard, Vancouver, ISO, and other styles

Dévigne, Clément. "Exécution sécurisée de plusieurs machines virtuelles sur une plateforme Manycore." Electronic Thesis or Diss., Paris 6, 2017. http://www.theses.fr/2017PA066138.

Full text

Abstract:

APA, Harvard, Vancouver, ISO, and other styles

Gallet, Camille. "Étude de transformations et d’optimisations de code parallèle statique ou dynamique pour architecture "many-core"." Thesis, Paris 6, 2016. http://www.theses.fr/2016PA066747/document.

Full text

Abstract:

L’évolution des supercalculateurs, de leur origine dans les années 60 jusqu’à nos jours, a fait face à 3 révolutions : (i) l’arrivée des transistors pour remplacer les triodes, (ii) l’apparition des calculs vectoriels, et (iii) l’organisation en grappe (clusters). Ces derniers se composent actuellement de processeurs standards qui ont profité de l’accroissement de leur puissance de calcul via une augmentation de la fréquence, la multiplication des cœurs sur la puce et l’élargissement des unités de calcul (jeu d’instructions SIMD). Un exemple récent comportant un grand nombre de cœurs et des unités vectorielles larges (512 bits) est le co-proceseur Intel Xeon Phi. Pour maximiser les performances de calcul sur ces puces en exploitant aux mieux ces instructions SIMD, il est nécessaire de réorganiser le corps des nids de boucles en tenant compte des aspects irréguliers (flot de contrôle et flot de données). Dans ce but, cette thèse propose d’étendre la transformation nommée Deep Jam pour extraire de la régularité d’un code irrégulier et ainsi faciliter la vectorisation. Ce document présente notre extension et son application sur une mini-application d’hydrodynamique multi-matériaux HydroMM. Ces travaux montrent ainsi qu’il est possible d’obtenir un gain de performances significatif sur des codes irréguliers
Since the 60s to the present, the evolution of supercomputers faced three revolutions : (i) the arrival of the transistors to replace triodes, (ii) the appearance of the vector calculations, and (iii) the clusters. These currently consist of standards processors that have benefited of increased computing power via an increase in the frequency, the proliferation of cores on the chip and expansion of computing units (SIMD instruction set). A recent example involving a large number of cores and vector units wide (512-bit) is the co-proceseur Intel Xeon Phi. To maximize computing performance on these chips by better exploiting these SIMD instructions, it is necessary to reorganize the body of the loop nests taking into account irregular aspects (control flow and data flow). To this end, this thesis proposes to extend the transformation named Deep Jam to extract the regularity of an irregular code and facilitate vectorization. This thesis presents our extension and application of a multi-material hydrodynamic mini-application, HydroMM. Thus, these studies show that it is possible to achieve a significant performance gain on uneven codes

APA, Harvard, Vancouver, ISO, and other styles

Bechara, Charly. "Study and design of a manycore architecture with multithreaded processors for dynamic embedded applications." Phd thesis, Université Paris Sud - Paris XI, 2011. http://tel.archives-ouvertes.fr/tel-00713536.

Full text

Abstract:

Embedded systems are getting more complex and require more intensive processing capabilities. They must be able to adapt to the rapid evolution of the high-end embedded applications that are characterized by their high computation-intensive workloads (order of TOPS: Tera Operations Per Second), and their high level of parallelism. Moreover, since the dynamism of the applications is becoming more significant, powerful computing solutions should be designed accordingly. By exploiting efficiently the dynamism, the load will be balanced between the computing resources, which will improve greatly the overall performance. To tackle the challenges of these future high-end massively-parallel dynamic embedded applications, we have designed the AHDAM architecture, which stands for "Asymmetric Homogeneous with Dynamic Allocator Manycore architecture". Its architecture permits to process applications with large data sets by efficiently hiding the processors' stall time using multithreaded processors. Besides, it exploits the parallelism of the applications at multiple levels so that they would be accelerated efficiently on dedicated resources, hence improving efficiently the overall performance. AHDAM architecture tackles the dynamism of these applications by dynamically balancing the load between its computing resources using a central controller to increase their utilization rate.The AHDAM architecture has been evaluated using a relevant embedded application from the telecommunication domain called "spectrum radio-sensing". With 136 cores running at 500 MHz, AHDAM architecture reaches a peak performance of 196 GOPS and meets the computation requirements of the application.

APA, Harvard, Vancouver, ISO, and other styles

Park, Seo Jin. "Analyzing performance and usability of broadcast-based inter-core communication (ATAC) on manycore architecture." Thesis, Massachusetts Institute of Technology, 2013. http://hdl.handle.net/1721.1/85219.

Full text

Abstract:

Thesis: M. Eng., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2013.
This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.
Cataloged from student-submitted PDF version of thesis.
Includes bibliographical references (pages 55-56).
In this thesis, I analyze the performance and usability benefits of broadcast-based inter-core communication on manycore architecture. The problem of high communication cost on manycore architecture was tackled by a new architecture which allows ecient broadcasting by leveraging an on-chip optical network. I designed the new architecture and API for the new broadcasting feature and implemented them on a multicore simulator called Graphite. I also re-implemented common parallel APIs (barrier and work-stealing) which benet from the cheap broadcasting and showed their ease of use and superior performance versus existing parallel programming libraries through conducting famous benchmarks on the Graphite simulator.
by Seo Jin Park.
M. Eng.

APA, Harvard, Vancouver, ISO, and other styles

Gao, Yang. "Contrôleur de cache générique pour une architecture manycore massivement parallèle à mémoire partagée cohérente." Paris 6, 2011. http://www.theses.fr/2011PA066296.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Karaoui, Mohamed Lamine. "Système de fichiers scalable pour architectures many-cores à faible empreinte énergétique." Thesis, Paris 6, 2016. http://www.theses.fr/2016PA066186/document.

Full text

Abstract:

Cette thèse porte sur l'étude des problèmes posés par l'implémentation d'un système de fichiers passant à l'échelle, pour un noyau de type UNIX sur une architecture manycore NUMA à cohérence de cache matérielle et à faible empreinte énergétique. Pour cette étude, nous prenons comme référence l'architecture manycore généraliste TSAR et le noyau de type UNIX ALMOS.L'architecture manycore visée pose trois problèmes pour lesquels nous apportons des réponses après avoir décrit les solutions existantes. L'un de ces problèmes est spécifique à l'architecture TSAR tandis que les deux autres sont généraux.Le premier problème concerne le support d'une mémoire physique plus grande que la mémoire virtuelle. Ceci est dû à l'espace d'adressage physique étendu de TSAR, lequel est 256 fois plus grand que l'espace d'adressage virtuel. Pour résoudre ce problème, nous avons profondément modifié la structure noyau pour le décomposer en plusieurs instances communicantes. La communication se fait alors principalement par passage de messages.Le deuxième problème concerne la stratégie de placement des structures du système de fichiers sur les nombreux bancs de mémoire. Pour résoudre ce problème nous avons implémenté une stratégie de distribution uniforme des données sur les différents bancs de mémoire.Le troisième problème concerne la synchronisation des accès concurrents. Pour résoudre ce problème, nous avons mis au point un mécanisme de synchronisation utilisant plusieurs mécanismes. En particulier, nous avons conçu un mécanisme lock-free efficace pour synchroniser les accès faits par plusieurs lecteurs et un écrivain. Les résultats expérimentaux montrent que : (1) l'utilisation d'une structure composée de plusieurs instances communicantes ne dégrade pas les performances du noyau et peut même les augmenter ; (2) l'ensemble des solutions utilisées permettent d'avoir des résultats qui passent mieux à l'échelle que le noyau NetBSD ; (3) la stratégie de placement la plus adaptée aux systèmes de fichiers pour les architectures manycore est celle distribuant uniformément les données
In this thesis we study the problems of implementing a UNIX-like scalable file system on a hardware cache coherent NUMA manycore architecture. To this end, we use the TSAR manycore architecture and ALMOS, a UNIX-like operating system.The TSAR architecture presents, from the operating system point of view, three problems to which we offer a set of solutions. One of these problems is specific to the TSAR architecture while the others are common to existing coherent NUMA manycore.The first problem concerns the support of a physical memory that is larger than the virtual memory. This is due to the extended physical address space of TSAR, which is 256 times bigger than the virtual address space. To resolve this problem, we modified the structure of the kernel to decompose it into multiple communicating units.The second problem is the placement strategy to be used on the file system structures. To solve this problem, we implemented a strategy that evenly distributes the data on the different memory banks.The third problem is the synchronization of concurrent accesses to the file system. Our solution to resolve this problem uses multiple mechanisms. In particular, the solution uses an efficient lock-free mechanism that we designed, which synchronizes the accesses between several readers and a single writer.Experimental results show that: (1) structuring the kernel into multiple units does not deteriorate the performance and may even improve them; (2) our set of solutions allow us to give performances that scale better than NetBSD; (3) the placement strategy which distributes evenly the data is the most adapted for manycore architectures

APA, Harvard, Vancouver, ISO, and other styles

More sources

Books on the topic "Architecture manycore"

Levesque, John M., and Aaron Vose. Programming for Hybrid Multi/manycore Mpp Systems. Taylor & Francis Group, 2020.

Find full text

APA, Harvard, Vancouver, ISO, and other styles

Levesque, John, and Aaron Vose. Programming for Hybrid Multi/Manycore MPP Systems. Taylor & Francis Group, 2017.

Find full text

APA, Harvard, Vancouver, ISO, and other styles

Levesque, John, and Aaron Vose. Programming for Hybrid Multi/Manycore MPP Systems. Taylor & Francis Group, 2017.

Find full text

APA, Harvard, Vancouver, ISO, and other styles

Levesque, John, and Aaron Vose. Programming for Hybrid Multi/Manycore MPP Systems. Taylor & Francis Group, 2017.

Find full text

APA, Harvard, Vancouver, ISO, and other styles

Levesque, John, and Aaron Vose. Programming for Hybrid Multi/Manycore MPP Systems. Taylor & Francis Group, 2017.

Find full text

APA, Harvard, Vancouver, ISO, and other styles

Programming for Hybrid Multi/Manycore MPP Systems. Taylor & Francis Group, 2017.

Find full text

APA, Harvard, Vancouver, ISO, and other styles

Book chapters on the topic "Architecture manycore"

Fernández-Pascual, Ricardo, Alberto Ros, and Manuel E. Acacio. "Optimization of a Linked Cache Coherence Protocol for Scalable Manycore Coherence." In Architecture of Computing Systems – ARCS 2016, 100–112. Cham: Springer International Publishing, 2016. http://dx.doi.org/10.1007/978-3-319-30695-7_8.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Tan, Guangming, Vugranam C. Sreedhar, and Guang R. Gao. "Just-In-Time Locality and Percolation for Optimizing Irregular Applications on a Manycore Architecture." In Languages and Compilers for Parallel Computing, 331–42. Berlin, Heidelberg: Springer Berlin Heidelberg, 2008. http://dx.doi.org/10.1007/978-3-540-89740-8_23.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Abduljabbar, Mustafa, Mohammed Al Farhan, Rio Yokota, and David Keyes. "Performance Evaluation of Computation and Communication Kernels of the Fast Multipole Method on Intel Manycore Architecture." In Lecture Notes in Computer Science, 553–64. Cham: Springer International Publishing, 2017. http://dx.doi.org/10.1007/978-3-319-64203-1_40.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Monazzah, Amir Mahdi Hosseini, Amir M. Rahmani, Antonio Miele, and Nikil Dutt. "Exploiting Memory Resilience for Emerging Technologies: An Energy-Aware Resilience Exemplar for STT-RAM Memories." In Dependable Embedded Systems, 505–26. Cham: Springer International Publishing, 2020. http://dx.doi.org/10.1007/978-3-030-52017-5_21.

Full text

Abstract:

AbstractDue to the consistent pressing quest of larger on-chip memories and caches of multicore and manycore architectures, Spin Transfer Torque Magnetic RAM (STT-MRAM or STT-RAM) has been proposed as a promising technology to replace classical SRAMs in near-future devices. Main advantages of STT-RAMs are a considerably higher transistor density and a negligible leakage power compared with SRAM technology. However, the drawback of this technology is the high probability of errors occurring especially in write operations. Such errors are asymmetric and transition-dependent, where 0 → 1 is the most critical one, and is high subjected to the amount and current (voltage) supplied to the memory during the write operation. As a consequence, STT-RAMs present an intrinsic trade-off between energy consumption vs. reliability that needs to be properly tuned w.r.t. the currently running application and its reliability requirement. This chapter proposes FlexRel, an energy-aware reliability improvement architectural scheme for STT-RAM cache memories. FlexRel considers a memory architecture provided with Error Correction Codes (ECCs) and a custom current regulator for the various cache ways and conducts a trade-off between reliability and energy consumption. FlexRel cache controller dynamically profiles the number of 0 → 1 transitions of each individual bit write operation in a cache block and based on that selects the most-suitable cache way and current level to guarantee the necessary error rate threshold (in terms of occurred write errors) while minimizing the energy consumption. We experimentally evaluated the efficiency of FlexRel against the most efficient uniform protection scheme from reliability, energy, area, and performance perspectives. Experimental simulations performed by using gem5 has demonstrated that while FlexRel satisfies the given error rate threshold, it delivers up to 13.2% energy saving. From the area footprint perspective, FlexRel delivers up to 7.9% cache ways’ area saving. Furthermore, the performance overhead of the FlexRel algorithm which changes the traffic patterns of the cache ways during the executions is 1.7%, on average.

APA, Harvard, Vancouver, ISO, and other styles

Bodin, François. "Keynote: Compilers in the Manycore Era." In High Performance Embedded Architectures and Compilers, 2–3. Berlin, Heidelberg: Springer Berlin Heidelberg, 2009. http://dx.doi.org/10.1007/978-3-540-92990-1_2.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Pavlovic, Milan, Yoav Etsion, and Alex Ramirez. "Can Manycores Support the Memory Requirements of Scientific Applications?" In Computer Architecture, 65–76. Berlin, Heidelberg: Springer Berlin Heidelberg, 2011. http://dx.doi.org/10.1007/978-3-642-24322-6_7.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Chandru, Vishwanathan, and Frank Mueller. "Reducing NoC and Memory Contention for Manycores." In Architecture of Computing Systems – ARCS 2016, 293–305. Cham: Springer International Publishing, 2016. http://dx.doi.org/10.1007/978-3-319-30695-7_22.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Anzt, Hartwig, Dimitar Lukarski, Stanimire Tomov, and Jack Dongarra. "Self-adaptive Multiprecision Preconditioners on Multicore and Manycore Architectures." In Lecture Notes in Computer Science, 115–23. Cham: Springer International Publishing, 2015. http://dx.doi.org/10.1007/978-3-319-17353-5_10.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Diao, Mamadou, and Jongman Kim. "Multimedia Mining on Manycore Architectures: The Case for GPUs." In Advances in Visual Computing, 619–30. Berlin, Heidelberg: Springer Berlin Heidelberg, 2009. http://dx.doi.org/10.1007/978-3-642-10520-3_59.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Goubier, Thierry, Renaud Sirdey, Stéphane Louise, and Vincent David. "ΣC: A Programming Model and Language for Embedded Manycores." In Algorithms and Architectures for Parallel Processing, 385–94. Berlin, Heidelberg: Springer Berlin Heidelberg, 2011. http://dx.doi.org/10.1007/978-3-642-24650-0_33.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Conference papers on the topic "Architecture manycore"

Pal, Rajesh Kumar, Kolin Paul, and Sanjiva Prasad. "ReKonf: A Reconfigurable Adaptive ManyCore Architecture." In 2012 IEEE 10th International Symposium on Parallel and Distributed Processing with Applications (ISPA). IEEE, 2012. http://dx.doi.org/10.1109/ispa.2012.32.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Morari, Alessandro, Antonino Tumeo, Oreste Villa, Simone Secchi, and Mateo Valero. "Efficient Sorting on the Tilera Manycore Architecture." In 2012 24th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD). IEEE, 2012. http://dx.doi.org/10.1109/sbac-pad.2012.41.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Ben Abdelhamid, Riadh, Yoshiki Yamaguchi, and Taisuke Boku. "MITRACA: Manycore Interlinked Torus Reconfigurable Accelerator Architecture." In 2019 IEEE 30th International Conference on Application-specific Systems, Architectures and Processors (ASAP). IEEE, 2019. http://dx.doi.org/10.1109/asap.2019.00-35.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Pawlowski, Steve. "Petascale Computing Research Challenges - A Manycore Perspective." In 2007 IEEE 13th International Symposium on High Performance Computer Architecture. IEEE, 2007. http://dx.doi.org/10.1109/hpca.2007.346188.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Devigne, Clement, Jean-Baptiste Brejon, Quentin Meunier, and Franck Wajsburt. "Executing secured virtual machines within a manycore architecture." In 2015 Nordic Circuits and Systems Conference (NORCAS): NORCHIP & International Symposium on System-on-Chip (SoC). IEEE, 2015. http://dx.doi.org/10.1109/norchip.2015.7364380.

Full text

APA, Harvard, Vancouver, ISO, and other styles

da Silva, Bruno Almeida, Arthur Mendes Lima, and Jones Yudi. "A manycore vision processor architecture for embedded applications." In 2020 X Brazilian Symposium on Computing Systems Engineering (SBESC). IEEE, 2020. http://dx.doi.org/10.1109/sbesc51047.2020.9277867.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Kondo, M., S. T. Nguyen, T. Hirao, T. Soga, H. Sasaki, and K. Inoue. "SMYLEref: A reference architecture for manycore-processor SoCs." In 2013 18th Asia and South Pacific Design Automation Conference (ASP-DAC 2013). IEEE, 2013. http://dx.doi.org/10.1109/aspdac.2013.6509656.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Ramey, Carl. "TILE-Gx100 ManyCore processor: Acceleration interfaces and architecture." In 2011 IEEE Hot Chips 23 Symposium (HCS). IEEE, 2011. http://dx.doi.org/10.1109/hotchips.2011.7477491.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Daglis, Alexandros, Stanko Novaković, Edouard Bugnion, Babak Falsafi, and Boris Grot. "Manycore network interfaces for in-memory rack-scale computing." In ISCA '15: The 42nd Annual International Symposium on Computer Architecture. New York, NY, USA: ACM, 2015. http://dx.doi.org/10.1145/2749469.2750415.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Chandru, Vishwanathan, and Frank Mueller. "Hybrid MPI/OpenMP programming on the Tilera manycore architecture." In 2016 International Conference on High Performance Computing & Simulation (HPCS). IEEE, 2016. http://dx.doi.org/10.1109/hpcsim.2016.7568353.

Full text

APA, Harvard, Vancouver, ISO, and other styles

We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

Contents

Academic literature on the topic 'Architecture manycore'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Journal articles on the topic "Architecture manycore"

Dissertations / Theses on the topic "Architecture manycore"

Books on the topic "Architecture manycore"

Book chapters on the topic "Architecture manycore"

Conference papers on the topic "Architecture manycore"