Dissertations / Theses on the topic 'Computer architecture. Cache memory'
Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles
Consult the top 50 dissertations / theses for your research on the topic 'Computer architecture. Cache memory.'
Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.
You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.
Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.
Gieske, Edmund Joseph. "Critical Words Cache Memory." University of Cincinnati / OhioLINK, 2008. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1208368190.
Full textSorenson, Elizabeth S. "Cache characterization and performance studies using locality surfaces /." Diss., CLICK HERE for online access, 2005. http://contentdm.lib.byu.edu/ETD/image/etd950.pdf.
Full textKim, Donglok. "Extended data cache prefetching using a reference prediction table /." Thesis, Connect to this title online; UW restricted, 1997. http://hdl.handle.net/1773/6127.
Full textBani, Ruchi Rastogi Mohanty Saraju. "A new N-way reconfigurable data cache architecture for embedded systems." [Denton, Tex.] : University of North Texas, 2009. http://digital.library.unt.edu/ark:/67531/metadc12079.
Full textBani, Ruchi Rastogi. "A New N-way Reconfigurable Data Cache Architecture for Embedded Systems." Thesis, University of North Texas, 2009. https://digital.library.unt.edu/ark:/67531/metadc12079/.
Full textBeg, Azam Muhammad. "Improving instruction fetch rate with code pattern cache for superscalar architecture." Diss., Mississippi State : Mississippi State University, 2005. http://library.msstate.edu/etd/show.asp?etd=etd-06202005-103032.
Full textSOHONI, SOHUM. "IMPROVING L2 CACHE PERFORMANCE THROUGH STREAM-DIRECTED OPTIMIZATIONS." University of Cincinnati / OhioLINK, 2004. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1092932892.
Full textJanapsatya, Andhi Computer Science & Engineering Faculty of Engineering UNSW. "Optimization of instruction memory for embedded systems." Awarded by:University of New South Wales. School of Computer Science and Engineering, 2005. http://handle.unsw.edu.au/1959.4/24210.
Full textZhang, Xiushan. "L2 cache replacement based on inter-access time per access count prediction." Diss., Online access via UMI:, 2009.
Find full textElver, Marco Iskender. "Memory consistency directed cache coherence protocols for scalable multiprocessors." Thesis, University of Edinburgh, 2016. http://hdl.handle.net/1842/22073.
Full textNayyar, Raman. "Performance Analysis of a Hierarchical, Cache-Coherent, Shared Memory Based, Multi-processor System." PDXScholar, 1993. https://pdxscholar.library.pdx.edu/open_access_etds/4695.
Full textAlmeida, Henrique Dante de 1982. "Implementação de cache no projeto ArchC." [s.n.], 2012. http://repositorio.unicamp.br/jspui/handle/REPOSIP/275701.
Full textDissertação (mestrado) - Universidade Estadual de Campinas, Instituto de Computação
Made available in DSpace on 2018-08-20T15:21:59Z (GMT). No. of bitstreams: 1 Almeida_HenriqueDantede_M.pdf: 506967 bytes, checksum: ca41d5af5008feeb442f3b9d9322af51 (MD5) Previous issue date: 2012
Resumo: O projeto ArchC visa criar uma linguagem de descrição de arquiteturas, com o objetivo de se construir simuladores e toolchains de arquiteturas computacionais completas. O objetivo deste trabalho é dotar ArchC com capacidade para gerar simuladores de caches. Para tanto foi realizado um estudo detalhado das caches (tipos, organizações, configurações etc) e do funcionamento e do código do ArchC. O resultado foi a descrição de uma coleção de caches parametrizáveis que podem ser adicionadas 'as arquiteturas descritas em ArchC. A implementação das caches é modular, possuindo código isolado para a memória de armazenamento da cache e políticas de operação. A corretude da cache foi verificada utilizando uma sequ¿encia de simulações de diversas configurações de cache e com comparações com o simulador dinero. A cache resultante apresentou um overhead, no tempo de simulaçao, que varia entre 10% e 60%, quando comparada a um simulador sem cache
Abstract: The ArchC project aims to create an architecture description language, with the goal of building complete computer architecture simulators and toolchains. The goal of this project is to add support in ArchC for simulating caches. To achieve this, a detailed study about caches (types, organization, configuration etc) and about the ArchC code was done. The result was a collection of parameterized caches that may be included on the architectures described with ArchC. The cache implementation is modular, having isolated code for the storage and operation policies. Implementation correctness was verified using a set of many cache configurations and with comparisons with the results from dinero simulator. The resulting cache showed an overhead varying between 10% and 60%, when compared to a simulator without caches
Mestrado
Ciência da Computação
Mestre em Ciência da Computação
Baek, Seungcheol. "High-performance memory system architectures using data compression." Diss., Georgia Institute of Technology, 2014. http://hdl.handle.net/1853/51863.
Full textLodde, Mario. "Smart Memory and Network-On-Chip Design for High-Performance Shared-Memory Chip Multiprocessors." Doctoral thesis, Universitat Politècnica de València, 2014. http://hdl.handle.net/10251/35325.
Full textLodde, M. (2014). Smart Memory and Network-On-Chip Design for High-Performance Shared-Memory Chip Multiprocessors [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/35325
TESIS
Kiriwas, Anton. "Directory-based Cache Coherence in SMTp Machines without Memory Overhead using Sparse Directories." Honors in the Major Thesis, University of Central Florida, 2004. http://digital.library.ucf.edu/cdm/ref/collection/ETH/id/714.
Full textBachelors
Engineering and Computer Science
Computer Science
Ghosh, Mrinmoy. "Microarchitectural techniques to reduce energy consumption in the memory hierarchy." Diss., Atlanta, Ga. : Georgia Institute of Technology, 2009. http://hdl.handle.net/1853/28265.
Full textCommittee Chair: Lee, Hsien-Hsin S.; Committee Member: Cahtterjee,Abhijit; Committee Member: Mukhopadhyay, Saibal; Committee Member: Pande, Santosh; Committee Member: Yalamanchili, Sudhakar.
Ballapuram, Chinnakrishnan S. "Semantics-oriented low power architecture." Diss., Atlanta, Ga. : Georgia Institute of Technology, 2008. http://hdl.handle.net/1853/22567.
Full textCommittee Chair: Hsien-Hsin Sean Lee; Committee Member: Abhijit Chatterjee; Committee Member: Bernard Kippelen; Committee Member: Gabriel H. Loh; Committee Member: SungKyu Lim.
Choudhary, Dhruv. "Micro-scheduling and its interaction with cache partitioning." Thesis, Georgia Institute of Technology, 2011. http://hdl.handle.net/1853/41167.
Full textGoldstein, Felipe Portavales. "Um modelo de memória transacional para arquiteturas heterogêneas baseado em software Cache." [s.n.], 2010. http://repositorio.unicamp.br/jspui/handle/REPOSIP/275780.
Full textDissertação (mestrado) - Universidade Estadual de Campinas, Instituto de Matemática, Estatística e Computação Científica
Made available in DSpace on 2018-08-17T02:02:14Z (GMT). No. of bitstreams: 1 Goldstein_FelipePortavales_M.pdf: 2303926 bytes, checksum: c44512059a990654552904a0f94d74f2 (MD5) Previous issue date: 2010
Resumo: A adoção de processadores com múltiplos núcleos pela indústria, levou à necessidade de novas técnicas para facilitar a programação de software paralelo. A técnica chamada memórias transacionais é uma das mais promissoras. Esta técnica é capaz de executar tarefas concorrentemente de forma otimista, o que permite um bom desempenho. Outra vantagem é que a sua utilização é muito mais simples comparada com a técnica clássica de exclusão mútua. Neste trabalho é proposto o primeiro modelo de memória transacional para arquiteturas híbridas, neste caso a arquitetura alvo é o processador Cell BE. O processador Cell BE é especialmente complexo por causa das dificuldades que a arquitetura deste processador impõe ao programador quando se necessita acessar a memória global compartilhada. O modelo proposto age como uma camada entre o programa e a memória principal, permitindo um acesso transparente aos dados, garantindo coerência e realizando o controle de concorrência de forma automática. O modelo proposto utiliza Software Cache combinado com a memória transacional para facilitar o acesso à memória externa a partir dos SPEs. Ele foi implementado e testado utilizando 8 aplicativos benchmark diferentes, mostrando sua viabilidade para casos de uso reais. Foi feita uma análise detalhada de cada parte da arquitetura proposta com relação ao impacto no desempenho geral do sistema. Este modelo foi capaz de obter um desempenho até duas vezes superior à implementação utilizando um mutex global. As vantagens da utilização se concentram principalmente na facilidade de uso, garantias de coerência e por evitar alguns tipos de bugs que seriam comuns em uma implementação com mutex, como por exemplo dead-locks. Este trabalho obteve o prêmio de melhor artigo no SBAC-PAD 2008
Abstract: The adoption of multi-core processors by the industry has pushed towards the development of new techniques to simplify programming parallel software. The technique called transactional memories is one of the most promising. This technique is able to execute multiple tasks concurrently in an optimistic way to achieve a better performance. Another advantage is that the usage of this technique is simpler than the classic mutual exclusion. This work proposes the first transactional memory model for hybrid architectures, in this case the target architecture is the Cell BE processor. The Cell BE is specially complex because of the dificulties when acessing the main shared memory from one of the SPEs. The proposed model acts as a layer between the program running and the main shared memory, allowing transparent access to the data, guaranteeing coherency and automatic concurrency control. The proposed model uses a Software Cache combined with a transactional memory to facilitate the acess to the main memory from the SPEs. This model was implemented and tested using 8 benchmark applications, showing its feasability in real use cases. A detailed analysis of its internal parts has been made to show the impact of each part in the overal system performance. The model was able to achieve a performance up to two times better than a similar implementation using a global mutex. The advantages of this model rely on its usability, coherency guaranty and because it is able to avoid concurrency programming bugs such as dead-lock, which are common in a mutex implementation. This work won the best paper award at SBAC-PAD 2008
Mestrado
Arquitetura de Computadores
Mestre em Ciência da Computação
Guo, Ruirui. "High performance cache architectures for IP routing : replacement, compaction and sampling schemes." Online access for everyone, 2007. http://www.dissertations.wsu.edu/Dissertations/Summer2007/r_guo_072307.pdf.
Full textQin, Xiaohan. "On the use and performance of communication primitives in software controlled cache-coherent cluster architectures /." Thesis, Connect to this title online; UW restricted, 1997. http://hdl.handle.net/1773/6925.
Full textAlves, Marco Antonio Zanata. "Avaliação do compartilhamento das memórias cache no desempenho de arquiteturas multi-core." reponame:Biblioteca Digital de Teses e Dissertações da UFRGS, 2009. http://hdl.handle.net/10183/16129.
Full textIn the current context of innovations in multi-core processors, where the new integration technologies are providing an increasing number of transistors inside chip, the study of techniques for increasing data throughput has great importance for the current and future multi-core and many-core processors. With the continuous demand for performance, the cache memories have been widely adopted in various types of architectural designs of computers. Nowadays, processors on the market point out for the use of shared L2 cache memory. However, it is not clear the gains and costs of these shared cache memory models. Thus, studies that address different aspects of shared cache memory have great importance in context of multi-core processors. Therefore, this dissertation aims to evaluate different shared cache memory, modeling and applying workloads on different organizations in order to obtain significant results from the performance and the influence of the shared cache memory multi-core processors. Thus, several types of shared cache memory were evaluated using traditional techniques to increase performance, such as increasing the associativity, larger line size, larger cache memory and also the increase on the cache memory hierarchy, investigating the correlation between the cache memory architecture and the workload applications. The results show the importance of integration between cache memory architecture project and memory physical design in order to obtain the best trade-off between cache memory access time and cache misses. According to the results, within evaluations, due to physical limitations and performance, organizations 1Core/L2 and 2Cores/L2 with total cache size equal to 32MB, using banks of 2 MB, line size equal to 128 bytes, represent a good choice for physical implementation in general purpose systems, obtaining a good performance in all evaluated applications without major extra costs of area occupation and power consumption. Furthermore, as a conclusion in this dissertation is shown that, for current and future integration technologies, traditional techniques for performance gain obtained with changes in the cache memory such as, increase of the memory size, increasing the associativity, larger line sizes etc.. should not lead to real performance gains if the additional latency generated by these techniques was not treated, in order to balance between the reduction of cache miss rate and the data access time.
Zeng, Hui. "Managing datapath resources in an out-of-order processor for performance and energy efficiency." Diss., Online access via UMI:, 2009.
Find full textDavari, Mahdad. "Advances Towards Data-Race-Free Cache Coherence Through Data Classification." Doctoral thesis, Uppsala universitet, Avdelningen för datorteknik, 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-320595.
Full textKong, Jingfei. "ARCHITECTURAL SUPPORT FOR IMPROVING COMPUTER SECURITY." Doctoral diss., University of Central Florida, 2010. http://digital.library.ucf.edu/cdm/ref/collection/ETD/id/2610.
Full textPh.D.
School of Electrical Engineering and Computer Science
Engineering and Computer Science
Computer Science PhD
Rabbah, Rodric Michel. "Design Space Exploration and Optimization of Embedded Memory Systems." Diss., Georgia Institute of Technology, 2006. http://hdl.handle.net/1853/11605.
Full textPuche, Lara José. "Novel Cache Hierarchies with Photonic Interconnects for Chip Multiprocessors." Doctoral thesis, Universitat Politècnica de València, 2021. http://hdl.handle.net/10251/165254.
Full text[CA] Els processadors multinucli actuals compten amb recursos compartits entre els diferents nuclis. Dos d'aquests recursos compartits, la memòria d’últim nivell i l'ample de banda de memòria principal, poden convertir-se en colls d'ampolla per al rendiment. A mes, amb el creixement del nombre de nuclis que implementen els dissenys mes recents, la xarxa dins del xip també es converteix en un coll d'ampolla que pot afectar negativament el rendiment, ja que les xarxes tradicionals poden trobar limitacions a la seva escalabilitat en el futur proper. Pràcticament la totalitat dels dissenys actuals implementen jerarquies de memòria que es comuniquen mitjançant rapides xarxes d’interconnexió. Aquesta organització es eficaç ates que permet reduir el nombre d'accessos que es realitzen a memòria principal i la latència mitjana d’accés a memòria. Les caches, la xarxa d’interconnexió i la memòria principal, conjuntament amb altres tècniques conegudes com la prebúsqueda, permeten reduir les enormes latències d’accés a memòria principal, limitant així l'impacte negatiu ocasionat per la diferencia de rendiment existent entre els nuclis de còmput i la memòria. No obstant això, compartir els recursos esmentats és font de diversos problemes i reptes, sent un dels principals la gestió de la interferència entre aplicacions. Fer un us eficient de la jerarquia de memòria i les caches, així com comptar amb una xarxa d’interconnexió apropiada, es necessari per sostenir el creixement del rendiment en els dissenys tant actuals com futurs. Aquesta tesi analitza i estudia els principals problemes i inconvenients observats en aquests dos recursos: la memòria cache d’últim nivell i la xarxa dins del xip. En primer lloc, s'estudia l'escalabilitat de les xarxes tradicionals dins del xip amb topologia de malla, així com aquesta es pot veure compromesa en propers dissenys que compten amb major nombre de nuclis. Els resultats d'aquest estudi mostren que, a major nombre de nuclis, l'impacte negatiu de la distància entre nuclis en la latència pot afectar seriosament al rendiment del processador. Com a solució' a aquest problema, en aquesta tesi proposem una xarxa d’interconnexió' òptica modelada en un entorn de simulació detallat, que suposa una solució viable als problemes d'escalabilitat observats en els dissenys tradicionals. A continuació, aquesta tesi dedica un esforç important a identificar i proposar solucions als principals problemes de disseny de les jerarquies de memòria actuals com son, per exemple, el sobredimensionat de l'espai de memòria cache privat, l’existència de repliques de dades i la rigidesa i incapacitat d’adaptació' de les estructures de memòria cache. Encara que ben coneguts, aquests problemes i els seus efectes adversos en el rendiment poden ser evitats en processadors d'alt rendiment gracies a l'enorme capacitat de la memòria cache d’últim nivell que aquest tipus de processadors típicament implementen. No obstant això, en processadors de baix consum, no hi ha la possibilitat de comptar amb aquestes capacitats, i fer un us eficient de l'espai disponible es torna crític per mantenir el rendiment. Com a solució a aquests problemes en processadors de baix consum, proposem una nova organització de jerarquia de dos nivells de memòria cache que utilitza una xarxa d’interconnexió òptica. Els resultats obtinguts mostren que, comparat amb dissenys convencionals, el consum d'energia estàtica en l'arquitectura proposada és un 60% menor, malgrat que els resultats de rendiment presenten valors similars. Per últim, hem estes l'arquitectura proposada per donar suport tant a aplicacions paral·leles com seqüencials. Els resultats obtinguts amb aquesta nova arquitectura mostren un estalvi de fins al 78 % d'energia estàtica en l’execució d'aplicacions paral·leles.
[EN] Current multicores face the challenge of sharing resources among the different processor cores. Two main shared resources act as major performance bottlenecks in current designs: the off-chip main memory bandwidth and the last level cache. Additionally, as the core count grows, the network on-chip is also becoming a potential performance bottleneck, since traditional designs may find scalability issues in the near future. Memory hierarchies communicated through fast interconnects are implemented in almost every current design as they reduce the number of off-chip accesses and the overall latency, respectively. Main memory, caches, and interconnection resources, together with other widely-used techniques like prefetching, help alleviate the huge memory access latencies and limit the impact of the core-memory speed gap. However, sharing these resources brings several concerns, being one of the most challenging the management of the inter-application interference. Since almost every running application needs to access to main memory, all of them are exposed to interference from other co-runners in their way to the memory controller. For this reason, making an efficient use of the available cache space, together with achieving fast and scalable interconnects, is critical to sustain the performance in current and future designs. This dissertation analyzes and addresses the most important shortcomings of two major shared resources: the Last Level Cache (LLC) and the Network on Chip (NoC). First, we study the scalability of both electrical and optical NoCs for future multicoresand many-cores. To perform this study, we model optical interconnects in a cycle-accurate multicore simulation framework. A proper model is required; otherwise, important performance deviations may be observed otherwise in the evaluation results. The study reveals that, as the core count grows, the effect of distance on the end-to-end latency can negatively impact on the processor performance. In contrast, the study also shows that silicon nanophotonics are a viable solution to solve the mentioned latency problems. This dissertation is also motivated by important design concerns related to current memory hierarchies, like the oversizing of private cache space, data replication overheads, and lack of flexibility regarding sharing of cache structures. These issues, which can be overcome in high performance processors by virtue of huge LLCs, can compromise performance in low power processors. To address these issues we propose a more efficient cache hierarchy organization that leverages optical interconnects. The proposed architecture is conceived as an optically interconnected two-level cache hierarchy composed of multiple cache modules that can be dynamically turned on and off independently. Experimental results show that, compared to conventional designs, static energy consumption is improved by up to 60% while achieving similar performance results. Finally, we extend the proposal to support both sequential and parallel applications. This extension is required since the proposal adapts to the dynamic cache space needs of the running applications, and multithreaded applications's behaviors widely differ from those of single threaded programs. In addition, coherence management is also addressed, which is challenging since each cache module can be assigned to any core at a given time in the proposed approach. For parallel applications, the evaluation shows that the proposal achieves up to 78% static energy savings. In summary, this thesis tackles major challenges originated by the sharing of on-chip caches and communication resources in current multicores, and proposes new cache hierarchy organizations leveraging optical interconnects to address them. The proposed organizations reduce both static and dynamic energy consumption compared to conventional approaches while achieving similar performance; which results in better energy efficiency.
Puche Lara, J. (2021). Novel Cache Hierarchies with Photonic Interconnects for Chip Multiprocessors [Tesis doctoral]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/165254
TESIS
Giordano, Omar. "Design and Implementation of an Architecture-aware In-memory Key- Value Store." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-291213.
Full textKey-Value Stores (KVSs) är en typ av icke-relationsdatabaser vars data representeras som ett nyckel-värdepar och används ofta för att representera lagring av cache och session. Bland dem är Memcached en av de mest populära, eftersom den används ofta i olika internettjänster som sociala nätverk och strömmande plattformar. Med tanke på den kontinuerliga och allt snabbare tillväxten av nätverksenheter som använder dessa tjänster måste den råvaruhårdvara som databaserna bygger på bearbeta paket snabbare för att möta marknadens behov. Under de senaste åren har dock prestandaförbättringarna som kännetecknar den nya hårdvaran blivit tunnare och tunnare. Härifrån, eftersom inköp av nya produkter inte längre är synonymt med betydande prestandaförbättringar, måste företagen utnyttja den fulla potentialen för hårdvaran som redan finns i deras besittning, vilket skjuter upp köpet av nyare hårdvara. En av de senaste idéerna för att öka prestanda för råvaruhårdvara är användningen av skivmedveten minneshantering. Denna teknik utnyttjar den Sista Nivån av Cache (SNC) genom att se till att de enskilda kärnorna tar data från minnesplatser som är mappade till deras respektive cachepartier (dvs. SNCskivor). Denna avhandling fokuserar på förverkligandet av en KVS-prototyp— baserad på Intel Haswell mikroarkitektur—byggd ovanpå Data Plane Development Kit (DPDK), och på vilken principerna för skivmedveten minneshantering tillämpas. För att testa dess prestanda, med tanke på att det inte finns en DPDK-baserad trafikgenerator som stöder Memcachedprotokollet, har en ytterligare prototyp av en trafikgenerator som stöder dessa funktioner också utvecklats. Föreställningarna mättes med två olika maskiner: en för trafikgeneratorn och en för KVS. Först testades den “vanliga” KVSprototypen, för att se de faktiska fördelarna, den skivmedvetna. Båda KVSprototyperna utsattes för två typer av trafik: (i) enhetlig trafik där nycklarna alltid skiljer sig från varandra och (ii) sned trafik, där nycklar upprepas och vissa nycklar är mer benägna att upprepas än andra. Experimenten visar att i verkliga scenarier (dvs. kännetecknas av snedställda nyckelfördelningar) kan användningen av en skivmedveten minneshanteringsteknik i en KVS förbättra förbättringen från slut till slut (dvs. ~2%). Dessutom påverkar sådan teknik i hög grad uppslagstiden som krävs av CPU: n för att hitta nyckeln och motsvarande värde i databasen, vilket minskar medeltiden med ~22, 5% och förbättrar 99th percentilen med ~62, 7%.
Zeffer, Håkan. "Towards Low-Complexity Scalable Shared-Memory Architectures." Doctoral thesis, Uppsala University, Department of Information Technology, 2006. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-7135.
Full textPlentiful research has addressed low-complexity software-based shared-memory systems since the idea was first introduced more than two decades ago. However, software-coherent systems have not been very successful in the commercial marketplace. We believe there are two main reasons for this: lack of performance and/or lack of binary compatibility.
This thesis studies multiple aspects of how to design future binary-compatible high-performance scalable shared-memory servers while keeping the hardware complexity at a minimum. It starts with a software-based distributed shared-memory system relying on no specific hardware support and gradually moves towards architectures with simple hardware support.
The evaluation is made in a modern chip-multiprocessor environment with both high-performance compute workloads and commercial applications. It shows that implementing the coherence-violation detection in hardware while solving the interchip coherence in software allows for high-performing binary-compatible systems with very low hardware complexity. Our second-generation hardware-software hybrid performs on par with, and often better than, traditional hardware-only designs.
Based on our results, we conclude that it is not only possible to design simple systems while maintaining performance and the binary-compatibility envelope, it is often possible to get better performance than in traditional and more complex designs.
We also explore two new techniques for evaluating a new shared-memory design throughout this work: adjustable simulation fidelity and statistical multiprocessor cache modeling.
Tang, Ho-Shing. "POPCA : optimizing segment caching for peer-to-peer on-demand streaming /." View abstract or full-text, 2008. http://library.ust.hk/cgi/db/thesis.pl?CSED%202008%20TANG.
Full textEberhart, Paul S. "A Compiler Target Model for Line Associative Registers." UKnowledge, 2019. https://uknowledge.uky.edu/ece_etds/138.
Full textRadovic, Zoran. "Software Techniques for Distributed Shared Memory." Doctoral thesis, Uppsala University, Department of Information Technology, 2005. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-6058.
Full textIn large multiprocessors, the access to shared memory is often nonuniform, and may vary as much as ten times for some distributed shared-memory architectures (DSMs). This dissertation identifies another important nonuniform property of DSM systems: nonuniform communication architecture, NUCA. High-end hardware-coherent machines built from large nodes, or from chip multiprocessors, are typical NUCA systems, since they have a lower penalty for reading recently written data from a neighbor's cache than from a remote cache. This dissertation identifies node affinity as an important property for scalable general-purpose locks. Several software-based hierarchical lock implementations exploiting NUCAs are presented and evaluated. NUCA-aware locks are shown to be almost twice as efficient for contended critical sections compared to traditional lock implementations.
The shared-memory “illusion”' provided by some large DSM systems may be implemented using either hardware, software or a combination thereof. A software-based implementation can enable cheap cluster hardware to be used, but typically suffers from poor and unpredictable performance characteristics.
This dissertation advocates a new software-hardware trade-off design point based on a new combination of techniques. The two low-level techniques, fine-grain deterministic coherence and synchronous protocol execution, as well as profile-guided protocol flexibility, are evaluated in isolation as well as in a combined setting using all-software implementations. Finally, a minimum of hardware trap support is suggested to further improve the performance of coherence protocols across cluster nodes. It is shown that all these techniques combined could result in a fairly stable performance on par with hardware-based coherence.
Chen, Weisong. "A pervasive information framework based on semantic routing and cooperative caching." Click to view the E-thesis via HKUTO, 2004. http://sunzi.lib.hku.hk/hkuto/record/B31065326.
Full textChen, Weisong, and 陳偉松. "A pervasive information framework based on semantic routing and cooperative caching." Thesis, The University of Hong Kong (Pokfulam, Hong Kong), 2004. http://hub.hku.hk/bib/B31065326.
Full textBatistella, Rafael Fernandes. "PBIW : um esquema de codificação baseado em padrões de instrução." [s.n.], 2008. http://repositorio.unicamp.br/jspui/handle/REPOSIP/276093.
Full textDissertação (mestrado) - Universidade Estadual de Campinas, Instituto de Computação
Made available in DSpace on 2018-08-11T00:49:37Z (GMT). No. of bitstreams: 1 Batistella_RafaelFernandes_M.pdf: 3411156 bytes, checksum: 7e6b46824189243405a180e949db65c6 (MD5) Previous issue date: 2008
Resumo: Trabalhos não muito recentes já mostravam que o aumento de velocidade nas memórias DRAM não acompanha o aumento de velocidade dos processadores. Mesmo assim, pesquisadores na área de arquitetura de computadores continuam buscando novas abordagens para aumentar o desempenho dos processadores. Dentro do objetivo de minimizar essa diferença de velocidade entre memória e processador, este trabalho apresenta um novo esquema de codificação baseado em instruções codificadas e padrões de instruções ¿ PBIW (Pattern Based Instruction Word). Uma instrução codificada não contém redundância de dados e é armazenada em uma I-cache. Os padrões de instrução, de forma diferente, são armazenados em uma nova cache, chamada Pattern cache (P-cache) e são utilizados pelo circuito decodificador na preparação da instrução que será repassada aos estágios de execução. Esta técnica se mostrou uma boa alternativa para estilos arquiteturais conhecidos como arquiteturas VLIW e EPIC. Foi realizado um estudo de caso da técnica PBIW sobre uma arquitetura de alto desempenho chamada de 2D-VLIW. O desempenho da técnica de codificação foi avaliado através de experimentos com programas dos benchmarks MediaBench, SPECint e SPECfp. Os experimentos estáticos avaliaram a eficiência da codificação PBIW no aspecto de redução de código. Nestes experimentos foram alcançadas reduções no tamanho dos programas de até 81% sobre programas codificados com a estratégia de codifica¸c¿ao 2D-VLIW e reduções de até 46% quando comparados á programas utilizando o modelo de codificação EPIC. Experimentos dinâmicos mostraram que a codificação PBIW também é capaz que gerar ganhos com relação ao tempo de execução dos programas. Quando comparada à codificação 2D-VLIW, o speedup alcançado foi de at'e 96% e quando comparada à EPIC, foi de até 69%
Abstract: Past works has shown that the increase of DRAM memory speed is not the same of processor speed. Even though, computer architecture researchers keep searching for new approaches to enhance the processor performance. In order to minimize this difference between the processor and memory speed, this work presents a new encoding technique based on encoded instructions and instruction patterns - PBIW (Pattern Based Instruction Word). An encoded instruction contains no redundancy of data and it is stored into an I-cache. The instruction patterns, on the other hand, are stored into a new cache, named Pattern cache (P-cache) and are used by the decoder circuit to build the instruction to be executed in the execution stages. This technique has shown a suitable alternative to well-known architectural styles such as VLIW and EPIC architectures. A case study of this technique was carried out in a high performance architecture called 2D-VLIW. The performance of the encoding technique has been evaluated through trace-driven experiments with MediaBench, SPECint and SPECfp programs. The static experiments have evaluated the PBIW code reduction efficiency. In these experiments, PBIW encoding has achieved up to 81% code reduction over 2D-VLIW encoded programs and up to 46% code reduction over EPIC encoded programs. Dynamic experiments have shown that PBIW encoding can also improve the processor performance. When compared to 2D-VLIW encoding, the speedup was up to 96% while compared to EPIC, the speedup was up to 69%
Mestrado
Arquitetura de Computadores
Mestre em Ciência da Computação
Zlatohlávková, Lucie. "Návrh a implementace prostředků pro zvýšení výkonu procesoru." Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2007. http://www.nusl.cz/ntk/nusl-412764.
Full textCruz, Eduardo Henrique Molina da. "Dynamic detection of the communication pattern in shared memory environments for thread mapping." reponame:Biblioteca Digital de Teses e Dissertações da UFRGS, 2012. http://hdl.handle.net/10183/49758.
Full textThe threads of parallel applications cooperate in order to fulfill their tasks, thereby communication is performed among themselves. The communication latency between the cores in a multiprocessor architecture differs depending on the memory hierarchy and the interconnections. With the increase in the number of cores per chip and the number of threads per core, this difference between the communication latencies is increasing. Therefore, it is important to map the threads of parallel applications taking into account the communication between them. In parallel applications based on the shared memory paradigm, the communication is implicit and occurs through accesses to shared variables, which makes difficult to detect the communication pattern between the threads. Traditional approaches use simulation to monitor the memory accesses performed by the application, requiring modifications to the source code and drastically increasing the overhead. In this master thesis, we introduce two novel light-weight mechanisms to find the communication pattern of threads. The first mechanism makes use of the information about shared cache lines provided by cache coherence protocols. The second mechanism makes use of the Translation Lookaside Buffer (TLB) to detect which memory pages each core is accessing. Both our mechanisms rely entirely on hardware features, which makes the thread mapping transparent to the programmer and allows it to be performed dynamically by the operating system. Moreover, no time consuming task, such as simulation, is required. We evaluated our mechanisms with the NAS Parallel Benchmarks (NPB) and obtained accurate representations of the communication patterns. We generated thread mappings from the detected communication patterns using a mapping algorithm. Mapping is a NP-Hard problem. Therefore, in order to achieve a polynomial complexity, we designed a heuristic method based on the Edmonds graph matching algorithm. Running the applications with these mappings resulted in performance improvements of up to 15.3% compared to the original scheduler of the operating system. The number of cache misses, cache line invalidations and snoop transactions were reduced by up to 31.9%, 41% and 65.4%, respectively.
Pan, Xiang. "Designing Future Low-Power and Secure Processors with Non-Volatile Memory." The Ohio State University, 2017. http://rave.ohiolink.edu/etdc/view?acc_num=osu1492631536670669.
Full textNordén, Markus. "Parallel PDE solvers on cc-NUMA systems /." Uppsala : Univ. : Dept. of Information Technology, Univ, 2004.
Find full textTiosso, Fernando. "Utilização de objetos de aprendizagem para melhoria da qualidade do ensino de hierarquia de memória." Universidade de São Paulo, 2015. http://www.teses.usp.br/teses/disponiveis/55/55134/tde-13082015-142049/.
Full textThe teaching and learning of memory hierarchy are not simple tasks, because many subjects that are covered in theory may demotivate learning because of its complexity. This Master\'s thesis presents the process of transformation of the cache memory module of Amnesia tool in a learning object, aiming to facilitate the construction of knowledge by simulating the structure and functionality of memory hierarchy of von Neumann architecture in a more practice and didactic way. This process allowed existing features in the tool to be adequate and new features developed. In addition, lesson plans and questionnaires of assessment and usability have also been designed, validated and implemented and a tutorial to describe the operation of the new object was developed. Experimental studies have examined two aspects: the first, if the learning object improved, in fact, the students\' learning in the subject cache memory; the second, students\' opinions regarding the use of the object. After the analysis and evaluation of the results obtained in the experiments, was possible show an evolution in learning when it made the use of the object, and also to perceive that students\' motivation to use other learning objects increased.
Sandberg, Andreas. "Understanding Multicore Performance : Efficient Memory System Modeling and Simulation." Doctoral thesis, Uppsala universitet, Avdelningen för datorteknik, 2014. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-220652.
Full textCoDeR-MP
UPMARC
Lesage, Benjamin. "Architecture multi-coeurs et temps d'exécution au pire cas." Phd thesis, Université Rennes 1, 2013. http://tel.archives-ouvertes.fr/tel-00870971.
Full textGuo, Lei. "Insights into access patterns of Internet media systems measurements, analysis, and system design /." Columbus, Ohio : Ohio State University, 2008. http://rave.ohiolink.edu/etdc/view?acc%5Fnum=osu1198696679.
Full textBrewer, Jeffery Ramon. "Reconfigurable Cache Memory." OpenSIUC, 2009. https://opensiuc.lib.siu.edu/theses/48.
Full textVan, Vleet Taylor. "Dynamic cache-line sizes /." Thesis, Connect to this title online; UW restricted, 2000. http://hdl.handle.net/1773/6899.
Full textSehat, Kamiar. "Evaluation of caches and cache coherency." Thesis, University of Cambridge, 1992. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.335240.
Full textHe, Bingsheng. "Cache-oblivious query processing /." View abstract or full-text, 2008. http://library.ust.hk/cgi/db/thesis.pl?CSED%202008%20HE.
Full textRomer, Theodore H. "Using virtual memory to improve cache and TLB performance /." Thesis, Connect to this title online; UW restricted, 1998. http://hdl.handle.net/1773/6913.
Full textPattabiraman, Aishwariya. "Heterogeneous Cache Architecture in Network-on-Chips." University of Cincinnati / OhioLINK, 2011. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1321371508.
Full textAmmari, Rami J. "A study for reducing conflict misses in data cache." Master's thesis, Mississippi State : Mississippi State University, 2004. http://library.msstate.edu/etd/show.asp?etd=etd-04032004-211908.
Full text