Academic literature on the topic 'Multi-Core and many-Core'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the lists of relevant articles, books, theses, conference reports, and other scholarly sources on the topic 'Multi-Core and many-Core.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Journal articles on the topic "Multi-Core and many-Core"

1

Kumar, Neetesh, and Deo Prakash Vidyarthi. "Improved scheduler for multi-core many-core systems." Computing 96, no. 11 (August 3, 2014): 1087–110. http://dx.doi.org/10.1007/s00607-014-0420-y.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Maliţa, Mihaela, Gheorghe Ştefan, and Dominique Thiébaut. "Not multi-, but many-core." ACM SIGARCH Computer Architecture News 35, no. 5 (December 2007): 32–38. http://dx.doi.org/10.1145/1360464.1360474.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Kirschenmann, W., L. Plagne, A. Ponçot, and S. Vialle. "Parallel SPNon Multi-Core CPUS and Many-Core GPUS." Transport Theory and Statistical Physics 39, no. 2-4 (March 2010): 255–81. http://dx.doi.org/10.1080/00411450.2010.533741.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Datta, Amitava, Amardeep Kaur, Tobias Lauer, and Sami Chabbouh. "Exploiting multi–core and many–core parallelism for subspace clustering." International Journal of Applied Mathematics and Computer Science 29, no. 1 (March 1, 2019): 81–91. http://dx.doi.org/10.2478/amcs-2019-0006.

Full text
Abstract:
Abstract Finding clusters in high dimensional data is a challenging research problem. Subspace clustering algorithms aim to find clusters in all possible subspaces of the dataset, where a subspace is a subset of dimensions of the data. But the exponential increase in the number of subspaces with the dimensionality of data renders most of the algorithms inefficient as well as ineffective. Moreover, these algorithms have ingrained data dependency in the clustering process, which means that parallelization becomes difficult and inefficient. SUBSCALE is a recent subspace clustering algorithm which is scalable with the dimensions and contains independent processing steps which can be exploited through parallelism. In this paper, we aim to leverage the computational power of widely available multi-core processors to improve the runtime performance of the SUBSCALE algorithm. The experimental evaluation shows linear speedup. Moreover, we develop an approach using graphics processing units (GPUs) for fine-grained data parallelism to accelerate the computation further. First tests of the GPU implementation show very promising results.
APA, Harvard, Vancouver, ISO, and other styles
5

Benner, Peter, Pablo Ezzatti, Hermann Mena, Enrique Quintana-Ortí, and Alfredo Remón. "Solving Matrix Equations on Multi-Core and Many-Core Architectures." Algorithms 6, no. 4 (November 25, 2013): 857–70. http://dx.doi.org/10.3390/a6040857.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Markall, G. R., A. Slemmer, D. A. Ham, P. H. J. Kelly, C. D. Cantwell, and S. J. Sherwin. "Finite element assembly strategies on multi-core and many-core architectures." International Journal for Numerical Methods in Fluids 71, no. 1 (January 19, 2012): 80–97. http://dx.doi.org/10.1002/fld.3648.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Chitty, Darren M. "Fast parallel genetic programming: multi-core CPU versus many-core GPU." Soft Computing 16, no. 10 (June 9, 2012): 1795–814. http://dx.doi.org/10.1007/s00500-012-0862-0.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Castells-Rufas, David, Eduard Fernandez-Alonso, and Jordi Carrabina. "Performance Analysis Techniques for Multi-Soft-Core and Many-Soft-Core Systems." International Journal of Reconfigurable Computing 2012 (2012): 1–14. http://dx.doi.org/10.1155/2012/736347.

Full text
Abstract:
Multi-soft-core systems are a viable and interesting solution for embedded systems that need a particular tradeoff between performance, flexibility and development speed. As the growing capacity allows it, many-soft-cores are also expected to have relevance to future embedded systems. As a consequence, parallel programming methods and tools will be necessarily embraced as a part of the full system development process. Performance analysis is an important part of the development process for parallel applications. It is usually mandatory when you want to get a desired performance or to verify that the system is meeting some real-time constraints. One of the usual techniques used by the HPC community is the postmortem analysis of application traces. However, this is not easily transported to the embedded systems based on FPGA due to the resource limitations of the platforms. We propose several techniques and some hardware architectural support to be able to generate traces on multiprocessor systems based on FPGAs and use them to optimize the performance of the running applications.
APA, Harvard, Vancouver, ISO, and other styles
9

Xie, Zhen, Guangming Tan, Weifeng Liu, and Ninghui Sun. "A Pattern-Based SpGEMM Library for Multi-Core and Many-Core Architectures." IEEE Transactions on Parallel and Distributed Systems 33, no. 1 (January 1, 2022): 159–75. http://dx.doi.org/10.1109/tpds.2021.3090328.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

Lessley, Brenton, Shaomeng Li, and Hank Childs. "HashFight: A Platform-Portable Hash Table for Multi-Core and Many-Core Architectures." Electronic Imaging 2020, no. 1 (January 26, 2020): 376–1. http://dx.doi.org/10.2352/issn.2470-1173.2020.1.vda-376.

Full text
Abstract:
We introduce a new platform-portable hash table and collision-resolution approach, HashFight, for use in visualization and data analysis algorithms. Designed entirely in terms of dataparallel primitives (DPPs), HashFight is atomics-free and consists of a single code base that can be invoked across a diverse range of architectures. To evaluate its hashing performance, we compare the single-node insert and query throughput of Hash- Fight to that of two best-in-class GPU and CPU hash table implementations, using several experimental configurations and factors. Overall, HashFight maintains competitive performance across both modern and older generation GPU and CPU devices, which differ in computational and memory abilities. In particular, HashFight achieves stable performance across all hash table sizes, and has leading query throughput for the largest sets of queries, while remaining within a factor of 1.5X of the comparator GPU implementation on all smaller query sets. Moreover, HashFight performs better than the comparator CPU implementation across all configurations. Our findings reveal that our platform-agnostic implementation can perform as well as optimized, platform-specific implementations, which demonstrates the portable performance of our DPP-based design.
APA, Harvard, Vancouver, ISO, and other styles
More sources

Dissertations / Theses on the topic "Multi-Core and many-Core"

1

Kanellou, Eleni. "Data structures for current multi-core and future many-core architectures." Thesis, Rennes 1, 2015. http://www.theses.fr/2015REN1S171/document.

Full text
Abstract:
Actuellement, la majorité des architectures de processeurs sont fondées sur une mémoire partagée avec cohérence de caches. Des prototypes intégrant de grandes quantités de cœurs, reliés par une infrastructure de transmission de messages, indiquent que, dans un proche avenir, les architectures de processeurs vont probablement avoir ces caractéristiques. Ces deux tendances exigent que les processus s'exécutent en parallèle et rendent la programmation concurrente nécessaire. Cependant, la difficulté inhérente du raisonnement sur la concurrence peut rendre ces nouvelles machines difficiles à programmer. Nous explorons trois approches ayant pour but de faciliter la programmation concurrente. Nous proposons WFR-TM, une approche fondé sur la mémoire transactionnelle (TM), un paradigme de programmation concurrente qui utilise des transactions afin de synchroniser l'accès aux données partagées. Une transaction peut soit terminer (commit), rendant visibles ses modifications, soit échouer (abort), annulant toutes ses modifications. WFR-TM tente de combiner des caractéristiques désirables des TM optimistes et pessimistes. Une TM pessimiste n'échoue jamais aucune transaction; néanmoins les algorithmes existants utilisent des verrous pour exécuter de manière séquentielle les transactions qui contiennent des opérations d'écriture. Les algorithmes TM optimistes exécutent toutes les transactions en parallèle mais les terminent seulement si elles n'ont pas rencontré de conflit au cours de leur exécution. WFR-TM fournit des transactions en lecture seule qui sont wait-free, sans jamais exécuter d'opérations de synchronisation coûteuse (par ex. CAS, LL\SC, etc) ou sacrifier le parallélisme entre les transactions d'écriture. Nous présentons également Dense, une implémentation concurrente de graphe. Les graphes sont des structures de données polyvalentes qui permettent la mise en oeuvre d'une variété d'applications. Cependant, des applications multi-processus qui utilisent des graphes utilisent encore largement des versions séquentielles. Nous introduisons un nouveau modèle de graphes concurrents, permettant l'ajout ou la suppression de n'importe quel arc du graphe, ainsi que la traversée atomique d'une partie (ou de l'intégralité) du graphe. Dense offre la possibilité d'effectuer un snapshot partiel d'un sous-ensemble du graphe défini dynamiquement. Enfin, nous ciblons les futures architectures. Dans l'intérêt de la réutilisation du code il existe depuis quelques temps une tentative d'adaptation des environnements d'exécution de logiciel - comme par ex. JVM, l'environnement d'exécution de Java - initialement prévus pour mémoire partagée, à des machines sans cohérence de caches. Nous étudions des techniques générales pour implémenter des structures de données distribuées en supposant qu'elles vont être utilisées sur des architectures many-core, qui n'offrent qu'une cohérence partielle de caches, voir pas de cohérence du tout
Though a majority of current processor architectures relies on shared, cache-coherent memory, current prototypes that integrate large amounts of cores, connected through a message-passing substrate, indicate that architectures of the near future may have these characteristics. Either of those tendencies requires that processes execute in parallel, making concurrent programming a necessary tool. The inherent difficulty of reasoning about concurrency, however, may make the new processor architectures hard to program. In order to deal with issues such as this, we explore approaches for providing ease of programmability. We propose WFR-TM, an approach based on transactional memory (TM), which is a concurrent programming paradigm that employs transactions in order to synchronize the access to shared data. A transaction may either commit, making its updates visible, or abort, discarding its updates. WFR-TM combines desirable characteristics of pessimistic and optimistic TM. In a pessimistic TM, no transaction ever aborts; however, in order to achieve that, existing TM algorithms employ locks in order to execute update transactions sequentially, decreasing the degree of achieved parallelism. Optimistic TMs execute all transactions concurrently but commit them only if they have encountered no conflict during their execution. WFR-TM provides read-only transactions that are wait-free, without ever executing expensive synchronization operations (like CAS, LL/SC, etc), or sacrificing the parallelism between update transactions. We further present Dense, a concurrent graph implementation. Graphs are versatile data structures that allow the implementation of a variety of applications. However, multi-process applications that rely on graphs still largely use a sequential implementation. We introduce an innovative concurrent graph model that provides addition and removal of any edge of the graph, as well as atomic traversals of a part (or the entirety) of the graph. Dense achieves wait-freedom by relying on light-weight helping and provides the inbuilt capability of performing a partial snapshot on a dynamically determined subset of the graph. We finally aim at predicted future architectures. In the interest of ode reuse and of a common paradigm, there is recent momentum towards porting software runtime environments, originally intended for shared-memory settings, onto non-cache-coherent machines. JVM, the runtime environment of the high-productivity language Java, is a notable example. Concurrent data structure implementations are important components of the libraries that environments like these incorporate. With the goal of contributing to this effort, we study general techniques for implementing distributed data structures assuming they have to run on many-core architectures that offer either partially cache-coherent memory or no cache coherence at all and present implementations of stacks, queues, and lists
APA, Harvard, Vancouver, ISO, and other styles
2

Serpa, Matheus da Silva. "Source code optimizations to reduce multi core and many core performance bottlenecks." reponame:Biblioteca Digital de Teses e Dissertações da UFRGS, 2018. http://hdl.handle.net/10183/183139.

Full text
Abstract:
Atualmente, existe uma variedade de arquiteturas disponíveis não apenas para a indústria, mas também para consumidores finais. Processadores multi-core tradicionais, GPUs, aceleradores, como o Xeon Phi, ou até mesmo processadores orientados para eficiência energética, como a família ARM, apresentam características arquiteturais muito diferentes. Essa ampla gama de características representa um desafio para os desenvolvedores de aplicações. Os desenvolvedores devem lidar com diferentes conjuntos de instruções, hierarquias de memória, ou até mesmo diferentes paradigmas de programação ao programar para essas arquiteturas. Para otimizar uma aplicação, é importante ter uma compreensão profunda de como ela se comporta em diferentes arquiteturas. Os trabalhos relacionados provaram ter uma ampla variedade de soluções. A maioria deles se concentrou em melhorar apenas o desempenho da memória. Outros se concentram no balanceamento de carga, na vetorização e no mapeamento de threads e dados, mas os realizam separadamente, perdendo oportunidades de otimização. Nesta dissertação de mestrado, foram propostas várias técnicas de otimização para melhorar o desempenho de uma aplicação de exploração sísmica real fornecida pela Petrobras, uma empresa multinacional do setor de petróleo. Os experimentos mostram que loop interchange é uma técnica útil para melhorar o desempenho de diferentes níveis de memória cache, melhorando o desempenho em até 5,3 e 3,9 nas arquiteturas Intel Broadwell e Intel Knights Landing, respectivamente. Ao alterar o código para ativar a vetorização, o desempenho foi aumentado em até 1,4 e 6,5 . O balanceamento de carga melhorou o desempenho em até 1,1 no Knights Landing. Técnicas de mapeamento de threads e dados também foram avaliadas, com uma melhora de desempenho de até 1,6 e 4,4 . O ganho de desempenho do Broadwell foi de 22,7 e do Knights Landing de 56,7 em comparação com uma versão sem otimizações, mas, no final, o Broadwell foi 1,2 mais rápido que o Knights Landing.
Nowadays, there are several different architectures available not only for the industry but also for final consumers. Traditional multi-core processors, GPUs, accelerators such as the Xeon Phi, or even energy efficiency-driven processors such as the ARM family, present very different architectural characteristics. This wide range of characteristics presents a challenge for the developers of applications. Developers must deal with different instruction sets, memory hierarchies, or even different programming paradigms when programming for these architectures. To optimize an application, it is important to have a deep understanding of how it behaves on different architectures. Related work proved to have a wide variety of solutions. Most of then focused on improving only memory performance. Others focus on load balancing, vectorization, and thread and data mapping, but perform them separately, losing optimization opportunities. In this master thesis, we propose several optimization techniques to improve the performance of a real-world seismic exploration application provided by Petrobras, a multinational corporation in the petroleum industry. In our experiments, we show that loop interchange is a useful technique to improve the performance of different cache memory levels, improving the performance by up to 5.3 and 3.9 on the Intel Broadwell and Intel Knights Landing architectures, respectively. By changing the code to enable vectorization, performance was increased by up to 1.4 and 6.5 . Load Balancing improved the performance by up to 1.1 on Knights Landing. Thread and data mapping techniques were also evaluated, with a performance improvement of up to 1.6 and 4.4 . We also compared the best version of each architecture and showed that we were able to improve the performance of Broadwell by 22.7 and Knights Landing by 56.7 compared to a naive version, but, in the end, Broadwell was 1.2 faster than Knights Landing.
APA, Harvard, Vancouver, ISO, and other styles
3

Martins, Andr? Lu?s Del Mestre. "Multi-objective resource management for many-core systems." Pontif?cia Universidade Cat?lica do Rio Grande do Sul, 2018. http://tede2.pucrs.br/tede2/handle/tede/8096.

Full text
Abstract:
Submitted by PPG Ci?ncia da Computa??o (ppgcc@pucrs.br) on 2018-05-22T12:22:46Z No. of bitstreams: 1 ANDR?_LU?S_DEL_MESTRE_MARTINS_TES.pdf: 10284806 bytes, checksum: 089cdc5e5c91b6ab23816b94fdbe3d1d (MD5)
Approved for entry into archive by Sheila Dias (sheila.dias@pucrs.br) on 2018-06-04T11:21:09Z (GMT) No. of bitstreams: 1 ANDR?_LU?S_DEL_MESTRE_MARTINS_TES.pdf: 10284806 bytes, checksum: 089cdc5e5c91b6ab23816b94fdbe3d1d (MD5)
Made available in DSpace on 2018-06-04T11:37:12Z (GMT). No. of bitstreams: 1 ANDR?_LU?S_DEL_MESTRE_MARTINS_TES.pdf: 10284806 bytes, checksum: 089cdc5e5c91b6ab23816b94fdbe3d1d (MD5) Previous issue date: 2018-03-19
Sistemas many-core integram m?ltiplos cores em um chip, fornecendo alto desempenho para v?rios segmentos de mercado. Novas tecnologias introduzem restri??es de pot?ncia conhecidos como utilization-wall ou dark-silicon, onde a dissipa??o de pot?ncia no chip impede que todos os PEs sejam utilizados simultaneamente em m?ximo desempenho. A carga de trabalho (workload) em sistemas many-core inclui aplica??es tempo real (RT), com restri??es de vaz?o e temporiza??o. Al?m disso, workloads t?picos geram vales e picos de utiliza??o de recursos ao longo do tempo. Este cen?rio, sistemas complexos de alto desempenho sujeitos a restri??es de pot?ncia e utiliza??o, exigem um gerenciamento de recursos (RM) multi-objetivos capaz de adaptar dinamicamente os objetivos do sistema, respeitando as restri??es impostas. Os trabalhos relacionados que tratam aplica??es RT aplicam uma an?lise em tempo de projeto com o workload esperado, para atender ?s restri??es de vaz?o e temporiza??o. Para abordar esta limita??o do estado-da-arte, ecis?es em tempo de projeto, esta Tese prop?e um gerenciamento hier?rquico de energia (REM), sendo o primeiro trabalho que considera a execu??o de aplica??es RT e ger?ncia de recursos sujeitos a restri??es de pot?ncia, sem uma an?lise pr?via do conjunto de aplica??es. REM emprega diferentes heur?sticas de mapeamento e de DVFS para reduzir o consumo de energia. Al?m de n?o incluir as aplica??es RT, os trabalhos relacionados n?o consideram um workload din?mico, propondo RMs com um ?nico objetivo a otimizar. Para tratar esta segunda limita??o do estado-da-arte, RMs com objetivo ?nico a otimizar, esta Tese apresenta um gerenciamento de recursos multi-objetivos adaptativo e hier?rquico (MORM) para sistemas many-core com restri??es de pot?ncia, considerando workloads din?micos com picos e vales de utiliza??o. MORM pode mudar dinamicamente os objetivos, priorizando energia ou desempenho, de acordo com o comportamento do workload. Ambos RMs (REM e MORM) s?o abordagens multi-objetivos. Esta Tese emprega o paradigma Observar-Decidir-Atuar (ODA) como m?todo de projeto para implementar REM e MORM. A Observa??o consiste em caracterizar os cores e integrar monitores de hardware para fornecer informa??es precisas e r?pidas relacionadas ? energia. A Atua??o configura os atuadores do sistema em tempo de execu??o para permitir que os RMs atendam ?s decis?es multi-objetivos. A Decis?o corresponde ? implementa??o do REM e do MORM, os quais compartilham os m?todos de Observa??o e Atua??o. REM e MORM destacam-se dos trabalhos relacionados devido ?s suas caracter?sticas de escalabilidade, abrang?ncia e estimativa de pot?ncia e energia precisas. As avalia??es utilizando REM em manycores com at? 144 cores reduzem o consumo de energia entre 15% e 28%, mantendo as viola??es de temporiza??o abaixo de 2,5%. Resultados mostram que MORM pode atender dinamicamente a objetivos distintos. Comparado MORM com um RM estado-da-arte, MORM otimiza o desempenho em vales de workload em 11,56% e em picos workload em at? 49%.
Many-core systems integrate several cores in a single die to provide high-performance computing in multiple market segments. The newest technology nodes introduce restricted power caps so that results in the utilization-wall (also known as dark silicon), i.e., the on-chip power dissipation prevents the use of all resources at full performance simultaneously. The workload of many-core systems includes real-time (RT) applications, which bring the application throughput as another constraint to meet. Also, dynamic workloads generate valleys and peaks of resources utilization over the time. This scenario, complex high-performance systems subject to power and performance constraints, creates the need for multi-objective resource management (RM) able to dynamically adapt the system goals while respecting the constraints. Concerning RT applications, related works apply a design-time analysis of the expected workload to ensure throughput constraints. To cover this limitation, design-time decisions, this Thesis proposes a hierarchical Runtime Energy Management (REM) for RT applications as the first work to link the execution of RT applications and RM under a power cap without design-time analysis of the application set. REM employs different mapping and DVFS (Dynamic Voltage Frequency Scaling) heuristics for RT and non-RT tasks to save energy. Besides not considering RT applications, related works do not consider the workload variation and propose single-objective RMs. To tackle this second limitation, single-objective RMs, this Thesis presents a hierarchical adaptive multi-objective resource management (MORM) for many-core systems under a power cap. MORM addresses dynamic workloads with peaks and valleys of resources utilization. MORM can dynamically shift the goals to prioritize energy or performance according to the workload behavior. Both RMs (REM and MORM), are multi-objective approaches. This Thesis employs the Observe-Decide-Act (ODA) paradigm as the design methodology to implement REM and MORM. The Observing consists on characterizing the cores and on integrating hardware monitors to provide accurate and fast power-related information for an efficient RM. The Actuation configures the system actuators at runtime to enable the RMs to follow the multi-objective decisions. The Decision corresponds to REM and MORM, which share the Observing and Actuation infrastructure. REM and MORM stand out from related works regarding scalability, comprehensiveness, and accurate power and energy estimation. Concerning REM, evaluations on many-core systems up to 144 cores show energy savings from 15% to 28% while keeping timing violations below 2.5%. Regarding MORM, results show it can drive applications to dynamically follow distinct objectives. Compared to a stateof- the-art RM targeting performance, MORM speeds up the workload valley by 11.56% and the workload peak by up to 49%.
APA, Harvard, Vancouver, ISO, and other styles
4

Jelena, Tekić. "Оптимизација CFD симулације на групама вишејезгарних хетерогених архитектура." Phd thesis, Univerzitet u Novom Sadu, Prirodno-matematički fakultet u Novom Sadu, 2019. https://www.cris.uns.ac.rs/record.jsf?recordId=110976&source=NDLTD&language=en.

Full text
Abstract:
Предмет  истраживања  тезе  је  из области  паралелног  програмирања,имплементација  CFD  (Computational Fluid  Dynamics)  методе  на  вишехетерогених  вишејезгарних  уређаја истовремено.  У  раду  је  приказанонеколико  алгоритама  чији  је  циљ убрзање  CFD  симулације  на персоналним  рачунарима.  Показано је  да  описано  решење  постиже задовољавајуће  перформансе  и  на HPC  уређајима  (Тесла  графичким картицама).  Направљена  је симулација  у  микросервис архитектури  која  је  портабилна  и флексибилна и додатно олакшава рад на персоналним рачунарима.
Predmet  istraživanja  teze  je  iz oblasti  paralelnog  programiranja,implementacija  CFD  (Computational Fluid  Dynamics)  metode  na  višeheterogenih  višejezgarnih  uređaja istovremeno.  U  radu  je  prikazanonekoliko  algoritama  čiji  je  cilj ubrzanje  CFD  simulacije  na personalnim  računarima.  Pokazano je  da  opisano  rešenje  postiže zadovoljavajuće  performanse  i  na HPC  uređajima  (Tesla  grafičkim karticama).  Napravljena  je simulacija  u  mikroservis arhitekturi  koja  je  portabilna  i fleksibilna i dodatno olakšava rad na personalnim računarima.
The  case  study  of  this  dissertation belongs  to  the  field  of  parallel programming,  the  implementation  of CFD  (Computational  Fluid  Dynamics) method  on  several  heterogeneous multiple  core  devices  simultaneously. The  paper  presents  several  algorithms aimed  at  accelerating  CFD  simulation on common computers. Also it has been shown  that  the  described  solution achieves  satisfactory  performance  onHPC  devices  (Tesla  graphic  cards). Simulation is  created    in  micro-service architecture that is portable and flexible and  makes  it  easy  to  test  CFDsimulations on common computers.
APA, Harvard, Vancouver, ISO, and other styles
5

Singh, Ajeet. "GePSeA: A General-Purpose Software Acceleration Framework for Lightweight Task Offloading." Thesis, Virginia Tech, 2009. http://hdl.handle.net/10919/34264.

Full text
Abstract:
Hardware-acceleration techniques continue to be used to boost the performance of scientific codes. To do so, software developers identify portions of these codes that are amenable for offloading and map them to hardware accelerators. However, offloading such tasks to specialized hardware accelerators is non-trivial. Furthermore, these accelerators can add significant cost to a computing system.

Consequently, this thesis proposes a framework called GePSeA (General Purpose Software Acceleration Framework), which uses a small fraction of the computational power on multi-core architectures to offload complex application-specific tasks. Specifically, GePSeA provides a lightweight process that acts as a helper agent to the application by executing application-specific tasks asynchronously and efficiently. GePSeA is not meant to replace hardware accelerators but to extend them. GePSeA provide several utilities called core components that offload tasks on to the core or to the special-purpose hardware when available in a way that is transparent to the application. Examples of such core components include reliable communication service, distributed lock management, global memory management, dynamic load distribution and network protocol processing. We then apply the GePSeA framework to two applications, namely mpiBLAST, an open-source computational biology application and Reliable Blast UDP (RBUDP) based file transfer application. We observe significant speed-up for both applications.
Master of Science

APA, Harvard, Vancouver, ISO, and other styles
6

Singh, Kunal. "High-Performance Sparse Matrix-Multi Vector Multiplication on Multi-Core Architecture." The Ohio State University, 2018. http://rave.ohiolink.edu/etdc/view?acc_num=osu1524089757826551.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Lo, Moustapha. "Application des architectures many core dans les systèmes embarqués temps réel." Thesis, Université Grenoble Alpes (ComUE), 2019. http://www.theses.fr/2019GREAM002/document.

Full text
Abstract:
Les processeurs mono-coeurs traditionnels ne sont plus suffisants pour répondre aux besoins croissants en performance des fonctions avioniques. Les processeurs multi/many-coeurs ont emergé ces dernières années afin de pouvoir intégrer plusieurs fonctions et de bénéficier de la puissance par Watt disponible grâce aux partages de ressources. En revanche, tous les processeurs multi/many-coeurs ne répondent pas forcément aux besoins des fonctions avioniques. Nous préférons avoir plus de déterminisme que de puissance de calcul car la certification de ces processeurs passe par la maîtrise du déterminisme. L’objectif de cette thèse est d’évaluer le processeur many-coeur (MPPA-256) de Kalray dans un contexte industriel aéronautique. Nous avons choisi la fonction de maintenance HMS (Health Monitoring System) qui a un besoin important en bande passante et un besoin de temps de réponse borné.Par ailleurs, cette fonction est également dotée de propriétés de parallélisme car elle traite des données de vibration venant de capteurs qui sont fonctionnellement indépendants, et par conséquent leur traitement peut être parallélisé sur plusieurs coeurs. La particularité de cette étude est qu’elle s’intéresse au déploiement d’une fonction existante séquentielle sur une architecture many-coeurs en partant de l’acquisition des données jusqu’aux calculs des indicateurs de santé avec un fort accent sur le fluxd’entrées/sorties des données. Nos travaux de recherche ont conduit à 5 contributions:• Transformation des algorithmes existants en algorithmes incrémentaux capables de traiter les données au fur et mesure qu’elles arrivent des capteurs.• Gestion du flux d’entrées des échantillons de vibrations jusqu’aux calculs des indicateurs de santé,la disponibilité des données dans le cluster interne, le moment où elles sont consommées et enfinl’estimation de la charge de calcul.• Mesures de temps pas très intrusives directement sur le MPPA-256 en ajoutant des timestamps dans le flow de données.• Architecture logicielle qui respecte les contraintes temps-réel même dans les pires cas. Elle estbasée sur une pipeline à 3 étages.• Illustration des limites de la fonction existante: nos expériences ont montré que les paramètres contextuels de l’hélicoptère tels que la vitesse du rotor doivent être corrélés aux indicateurs de santé pour réduire les fausses alertes
Traditional single-cores are no longer sufficient to meet the growing needs of performance in avionics domain. Multi-core and many-core processors have emerged in the recent years in order to integrate several functions thanks to the resource sharing. In contrast, all multi-core and many-core processorsdo not necessarily satisfy the avionic constraints. We prefer to have more determinism than computing power because the certification of such processors depends on mastering the determinism.The aim of this thesis is to evaluate the many-core processor (MPPA-256) from Kalray in avionic context. We choose the maintenance function HMS (Health Monitoring System) which requires an important bandwidth and a response time guarantee. In addition, this function has also parallelism properties. It computes data from sensors that are functionally independent and, therefore their processing can be parallelized in several cores. This study focuses on deploying the existing sequential HMS on a many-core processor from the data acquisition to the computation of the health indicators with a strongemphasis on the input flow.Our research led to five main contributions:• Transformation of the global existing algorithms into a real-time ones which can process data as soon as they are available.• Management of the input flow of vibration samples from the sensors to the computation of the health indicators, the availability of raw vibration data in the internal cluster, when they are consumed and finally the workload estimation.• Implementing a lightweight Timing measurements directly on the MPPA-256 by adding timestamps in the data flow.• Software architecture that respects real-time constraints even in the worst cases. The software architecture is based on three pipeline stages.• Illustration of the limits of the existing function: our experiments have shown that the contextual parameters of the helicopter such as the rotor speed must be correlated with the health indicators to reduce false alarms
APA, Harvard, Vancouver, ISO, and other styles
8

Lukarski, Dimitar [Verfasser]. "Parallel Sparse Linear Algebra for Multi-core and Many-core Platforms : Parallel Solvers and Preconditioners / Dimitar Lukarski." Karlsruhe : KIT-Bibliothek, 2012. http://d-nb.info/1020663480/34.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Júnior, Manoel Baptista da Silva. "Portabilidade com eficiência de trechos da dinâmica do modelo BRAMS entre arquiteturas multi-core e many-core." Instituto Nacional de Pesquisas Espaciais (INPE), 2015. http://urlib.net/sid.inpe.br/mtc-m21b/2015/04.28.19.21.

Full text
Abstract:
O aumento contínuo da resolução espacial e temporal dos modelos meteorológicos demanda cada vez mais velocidade e capacidade de processamento. Executar esses modelos em tempo hábil requer o uso de supercomputadores com centenas ou milhares de nós. Atualmente estes modelos são executados em produção no CPTEC em um supercomputador com nós compostos por CPUs com dezenas de núcleos (multi-core). Gerações mais recentes de supercomputadores apresentam nós com CPUs acopladas a aceleradores de processamento, tipicamente placas gráficas (GPGPUs), compostas de centenas de núcleos (many-core). Alterar o código do modelo de forma a usar com alguma eficiência nós com ou sem placas gráficas (código portátil) é um desafio. A interface de programação OpenMP é o padrão estabelecido há décadas para explorar eficientemente as arquiteturas multi-core. Uma nova interface de programação, o OpenACC, foi recentemente proposta para explorar as arquiteturas many-core. Ambas interfaces são semelhantes, baseadas em diretivas de paralelização para execução concorrente de threads. Este trabalho demonstra que é possível escrever um único código paralelizado com as duas interfaces que apresente eficiência aceitável, de forma a poder ser executado num nó com arquitetura multi-core ou então em um nó com arquitetura many-core. O código escolhido como estudo de caso é a advecção de escalares, um trecho da dinâmica do modelo meteorológico regional BRAMS (Brazilian Regional Atmospheric Modelling System).
The continuous growth of spatial and temporal resolutions in current meteorological models demands increasing processing power. The prompt execution of these models requires the use of supercomputers with hundreds or thousands of nodes. Currently, these models are executed at the operational environment of CPTEC on a supercomputer composed of nodes with CPUs with tens of cores (multi-core). Newer supercomputer generations have nodes with CPUs coupled to processing accelerators, typically graphics cards (GPGPUs), containing hundreds of cores (many-core). The rewriting of the model codes in order to use such nodes efficiently, with or without graphics cards (portable code), represents a challenge. The OpenMP programming interface proposed decades ago is a standard for decades to efficiently exploit multi-core architectures. A new programming interface, OpenACC, proposed decades ago is the many-core architectures. These two programming interfaces are similar, since they are based on parallelization directives for the concurrent execution of threads. This work shows the feasibility of writing a single code imbedding both interfaces and presenting acceptable efficiency. When executed on nodes with multi-core or many-core architecture. The code chosen as a case study is the advection of scalars, a part of the dynamics of the regional meteorological model BRAMS (Brazilian Regional Atmospheric Modeling System).
APA, Harvard, Vancouver, ISO, and other styles
10

Thucanakkenpalayam, Sundararajan Karthik. "Energy efficient cache architectures for single, multi and many core processors." Thesis, University of Edinburgh, 2013. http://hdl.handle.net/1842/9916.

Full text
Abstract:
With each technology generation we get more transistors per chip. Whilst processor frequencies have increased over the past few decades, memory speeds have not kept pace. Therefore, more and more transistors are devoted to on-chip caches to reduce latency to data and help achieve high performance. On-chip caches consume a significant fraction of the processor energy budget but need to deliver high performance. Therefore cache resources should be optimized to meet the requirements of the running applications. Fixed configuration caches are designed to deliver low average memory access times across a wide range of potential applications. However, this can lead to excessive energy consumption for applications that do not require the full capacity or associativity of the cache at all times. Furthermore, in systems where the clock period is constrained by the access times of level-1 caches, the clock frequency for all applications is effectively limited by the cache requirements of the most demanding phase within the most demanding application. This motivates the need for dynamic adaptation of cache configurations in order to optimize performance while minimizing energy consumption, on a per-application basis. First, this thesis proposes an energy-efficient cache architecture for a single core system, along with a run-time support framework for dynamic adaptation of cache size and associativity through the use of machine learning. The machine learning model, which is trained offline, profiles the application’s cache usage and then reconfigures the cache according to the program’s requirement. The proposed cache architecture has, on average, 18% better energy-delay product than the prior state-of-the-art cache architectures proposed in the literature. Next, this thesis proposes cooperative partitioning, an energy-efficient cache partitioning scheme for multi-core systems that share the Last Level Cache (LLC), with a core to LLC cache way ratio of 1:4. The proposed cache partitioning scheme uses small auxiliary tags to capture each core’s cache requirements, and partitions the LLC according to the individual cores cache requirement. The proposed partitioning uses a way-aligned scheme that helps in the reduction of both dynamic and static energy. This scheme, on an average offers 70% and 30% reduction in dynamic and static energy respectively, while maintaining high performance on par with state-of-the-art cache partitioning schemes. Finally, when Last Level Cache (LLC) ways are equal to or less than the number of cores present in many-core systems, cooperative partitioning cannot be used for partitioning the LLC. This thesis proposes a region aware cache partitioning scheme as an energy-efficient approach for many core systems that share the LLC, with a core to LLC way ratio of 1:2 and 1:1. The proposed partitioning, on an average offers 68% and 33% reduction in dynamic and static energy respectively, while again maintaining high performance on par with state-of-the-art LLC cache management techniques.
APA, Harvard, Vancouver, ISO, and other styles
More sources

Books on the topic "Multi-Core and many-Core"

1

Pllana, Sabri, and Fatos Xhafa, eds. Programming multi-core and many-core computing systems. Hoboken, NJ, USA: John Wiley & Sons, Inc., 2017. http://dx.doi.org/10.1002/9781119332015.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Fornaciari, William, and Dimitrios Soudris, eds. Harnessing Performance Variability in Embedded and High-performance Many/Multi-core Platforms. Cham: Springer International Publishing, 2019. http://dx.doi.org/10.1007/978-3-319-91962-1.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Design Space Exploration and Resource Management of Multi/Many-Core Systems. MDPI, 2021. http://dx.doi.org/10.3390/books978-3-0365-0877-1.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Fornaciari, William, and Dimitrios Soudris. Harnessing Performance Variability in Embedded and High-performance Many/Multi-core Platforms: A Cross-layer Approach. Springer, 2018.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
5

Harnessing Performance Variability in Embedded and High-performance Many/Multi-core Platforms: A Cross-layer Approach. Springer, 2018.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
6

Nuberg, Ian, Brendan George, and Rowan Reid, eds. Agroforestry for Natural Resource Management. CSIRO Publishing, 2009. http://dx.doi.org/10.1071/9780643097100.

Full text
Abstract:
In its early days, agroforestry may have been viewed as the domain of the 'landcare enthusiast'. Today, integrating trees and shrubs into productive farming systems is seen as a core principle of sustainable agriculture. Agroforestry for Natural Resource Management provides the foundation for an understanding of agroforestry practice in both high and low rainfall zones across Australia. Three major areas are discussed: environmental functions of trees in the landscape (ecosystem mimicry, hydrology, protection of crops, animals and soil, biodiversity, aesthetics); productive functions of trees (timber, firewood, pulp, fodder, integrated multi-products); and the implementation of agroforestry (design, evaluation, establishment, adoption, policy support). The book also includes a DVD that features videos on forest measurement and harvesting, a Farm Forestry Toolbox and many regionally specific agroforestry resources. Written by leading researchers and practitioners from around Australia, Agroforestry for Natural Resource Management will be an essential resource for students in agroforestry courses, as well as a valuable introduction to the field for professionals in related areas.
APA, Harvard, Vancouver, ISO, and other styles
7

Spies, Dennis C. Immigration and Welfare State Retrenchment. Oxford University Press, 2018. http://dx.doi.org/10.1093/oso/9780198812906.001.0001.

Full text
Abstract:
Is large-scale immigration to Europe incompatible with the continent’s generous and encompassing welfare states? Are Europeans willing to share welfare benefits with ethnically different and often less well-off immigrants? Or do they regard the newcomers as undeserving and their claim for welfare rights as unjustified? These questions are at the heart of what has become known as the “New Progressive Dilemma” (NPD) debate—and the predominant answers given to them are rather pessimistic. Pointing to the experiences of the US, where a multi-racial society in combination with a longstanding history of immigration encounters very limited welfare provision, many Europeans fear that the continent’s new immigrant-based heterogeneity may push it toward more American levels of redistribution. But are the conflictual US experiences really reflected in the European context? Immigration and Welfare State Retrenchment addresses this question by connecting the New Progressive Dilemma debate with comparative welfare state and party research in order to analyze the role ethnic diversity plays in welfare reforms in the US and Europe. Whereas the combination of racial patterns and party politics had and still has serious consequences for the US welfare system, the general message of the book is that these are not echoed in the Western European context. In addition, while many Europeans are very critical of immigration and prepared to ban immigrants from welfare benefits, both the institutional design of European welfare programs and the economically divided anti-immigrant movement prevent immigration concerns from translating into actual retrenchment in the core areas of welfare.
APA, Harvard, Vancouver, ISO, and other styles

Book chapters on the topic "Multi-Core and many-Core"

1

Vajda, András. "Multi-core and Many-core Processor Architectures." In Programming Many-Core Chips, 9–43. Boston, MA: Springer US, 2011. http://dx.doi.org/10.1007/978-1-4419-9739-5_2.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Gliwa, Peter. "Multi-Core, Many-Core, and Multi-ECU Timing." In Embedded Software Timing, 189–211. Cham: Springer International Publishing, 2021. http://dx.doi.org/10.1007/978-3-030-64144-3_7.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Natvig, Lasse, Alexandru Iordan, Mujahed Eleyat, Magnus Jahre, and Jorn Amundsen. "Multi- and Many-Cores, Architectural Overview for Programmers." In Programming multi-core and many-core computing systems, 1–27. Hoboken, NJ, USA: John Wiley & Sons, Inc., 2017. http://dx.doi.org/10.1002/9781119332015.ch1.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Kessler, Christoph, Sergei Gorlatch, Johan Enmyren, Usman Dastgeer, Michel Steuwer, and Philipp Kegel. "Skeleton Programming for Portable Many-Core Computing." In Programming multi-core and many-core computing systems, 121–41. Hoboken, NJ, USA: John Wiley & Sons, Inc., 2017. http://dx.doi.org/10.1002/9781119332015.ch6.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Varbanescu, Ana Lucia, Rob V. van Nieuwpoort, Pieter Hijma, Henri E. Bal, Rosa M. Badia, and Xavier Martorell. "Programming Models for Multicore and Many-Core Computing Systems." In Programming multi-core and many-core computing systems, 29–58. Hoboken, NJ, USA: John Wiley & Sons, Inc., 2017. http://dx.doi.org/10.1002/9781119332015.ch2.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Tian, Chen, Min Feng, and Rajiv Gupta. "Software-Based Speculative Parallelization." In Programming multi-core and many-core computing systems, 205–25. Hoboken, NJ, USA: John Wiley & Sons, Inc., 2017. http://dx.doi.org/10.1002/9781119332015.ch10.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Schubert, Lutz, Stefan Wesner, Daniel Rubio Bonilla, and Tommaso Cucinotta. "Autonomic Distribution and Adaptation." In Programming multi-core and many-core computing systems, 227–40. Hoboken, NJ, USA: John Wiley & Sons, Inc., 2017. http://dx.doi.org/10.1002/9781119332015.ch11.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Benkner, Siegfried, Sabri Pllana, Jesper Larsson Träff, Philippas Tsigas, Andrew Richards, George Russell, Samuel Thibault, et al. "Peppher: Performance Portability and Programmability for Heterogeneous Many-Core Architectures." In Programming multi-core and many-core computing systems, 241–60. Hoboken, NJ, USA: John Wiley & Sons, Inc., 2017. http://dx.doi.org/10.1002/9781119332015.ch12.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Aldinucci, Marco, Marco Danelutto, Peter Kilpatrick, and Massimo Torquati. "Fastflow: High-Level and Efficient Streaming on Multicore." In Programming multi-core and many-core computing systems, 261–80. Hoboken, NJ, USA: John Wiley & Sons, Inc., 2017. http://dx.doi.org/10.1002/9781119332015.ch13.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

Roma, Nuno, António Rodrigues, and Leonel Sousa. "Parallel Programming Framework for H.264/AVC Video Encoding in Multicore Systems." In Programming multi-core and many-core computing systems, 281–300. Hoboken, NJ, USA: John Wiley & Sons, Inc., 2017. http://dx.doi.org/10.1002/9781119332015.ch14.

Full text
APA, Harvard, Vancouver, ISO, and other styles

Conference papers on the topic "Multi-Core and many-Core"

1

Parkhurst, Jeff. "From single core to multi-core to many core." In the 16th ACM Great Lakes symposium. New York, New York, USA: ACM Press, 2006. http://dx.doi.org/10.1145/1127908.1127910.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Dehne, Frank, and Stephan Jou. "Parallel algorithms for multi-core and many-core processors." In the 2010 Conference of the Center for Advanced Studies. New York, New York, USA: ACM Press, 2010. http://dx.doi.org/10.1145/1923947.1924009.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Uribe-Paredes, Roberto, Pedro Valero-Lara, Enrique Arias, Jose L. Sanchez, and Diego Cazorla. "Similarity search implementations for multi-core and many-core processors." In Simulation (HPCS). IEEE, 2011. http://dx.doi.org/10.1109/hpcsim.2011.5999889.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Varadarajan, Aravind Krishnan, and Michael S. Hsiao. "RTL Test Generation on Multi-core and Many-Core Architectures." In 2019 32nd International Conference on VLSI Design and 2019 18th International Conference on Embedded Systems (VLSID). IEEE, 2019. http://dx.doi.org/10.1109/vlsid.2019.00036.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Singh, Amit Kumar, Muhammad Shafique, Akash Kumar, and Jörg Henkel. "Mapping on multi/many-core systems." In the 50th Annual Design Automation Conference. New York, New York, USA: ACM Press, 2013. http://dx.doi.org/10.1145/2463209.2488734.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Herrmann, Edward C., Prudhvi Janga, and Philip A. Wilsey. "Pre-computing Function Results in Multi-Core and Many-Core Processors." In 2011 International Conference on Parallel Processing Workshops (ICPPW). IEEE, 2011. http://dx.doi.org/10.1109/icppw.2011.46.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Lee, Victor W., Yen-Kuang Chen, J. Chhugani, C. Kim, D. Kim, C. J. Hughes, N. Rajagopalan Satish, M. Smelyanskiy, and P. Dubey. "Emerging applications for multi/many-core processors." In 2011 IEEE International Symposium on Circuits and Systems. IEEE, 2011. http://dx.doi.org/10.1109/iscas.2011.5937865.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Porterfield, Allan, Nassib Nassar, and Rob Fowler. "Multi-threaded library for many-core systems." In Distributed Processing (IPDPS). IEEE, 2009. http://dx.doi.org/10.1109/ipdps.2009.5161104.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Mirsoleimani, S. Ali, Aske Plaat, Jaap van den Herik, and Jos Vermaseren. "Parallel Monte Carlo Tree Search from Multi-core to Many-core Processors." In 2015 IEEE Trustcom/BigDataSE/ISPA. IEEE, 2015. http://dx.doi.org/10.1109/trustcom.2015.615.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

Wassal, Amr G., Moataz A. Abdelfattah, and Yehea I. Ismail. "Ecosystems for the development of multi-core and many-core SoC models." In 2010 International Conference on Microelectronics (ICM). IEEE, 2010. http://dx.doi.org/10.1109/icm.2010.5696134.

Full text
APA, Harvard, Vancouver, ISO, and other styles

Reports on the topic "Multi-Core and many-Core"

1

Deveci, Mehmet, Christian Robert Trott, and Sivasankaran Rajamanickam. Multi-threaded Sparse Matrix Sparse Matrix Multiplication for Many-Core and GPU Architectures. Office of Scientific and Technical Information (OSTI), January 2018. http://dx.doi.org/10.2172/1417260.

Full text
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography