Dissertations / Theses: 'Set-Associative'

1

Simons, Brad, and Brad Simons. "Set-Associative History-Aided Adaptive Replacement for On-Chip Caches." Thesis, The University of Arizona, 2016. http://hdl.handle.net/10150/621128.

Full text

Abstract:

Last Level Caches (LLCs) are critical to reducing processor stalls to off-chip memory and improving processing throughput, and replacement policy plays an important role in the performance of LLCs. Many replacement algorithms are designed to be thrash-resistant to protect the working set in the cache from scans, but a fundamental challenge is balancing thrash-resistance to changes to the working set over time as an application executes. In this thesis a novel Set-Associative History-Aided Adaptive Replacement Cache (SHARC) LLC replacement algorithm is proposed, which adjusts scan-resistance at run-time based on the current memory access properties of the application. This policy segregates the cache to protect the working set from scans and utilizes history information from recently evicted cache lines to increase or decrease amount of cache reserved for the working set. On average, SHARC improves IPC by approximately 11% over LRU replacement policy while only requiring 14% increase in overhead. The SHARC-NRU replacement policy is also proposed to reduce this overhead and achieves approximately 10% performance improvement and requires 11% less overhead than LRU.

APA, Harvard, Vancouver, ISO, and other styles

2

Rivera, Roberto Rafael. "On properties of completely flexible loops." Diss., Georgia Institute of Technology, 2000. http://hdl.handle.net/1853/28841.

Full text

APA, Harvard, Vancouver, ISO, and other styles

3

GUPTA, GAURAV. "DESIGN AND ANALYSIS OF A LOW POWER SET-ASSOCIATIVE CACHE USING PARTIAL TAG COMPARISON." University of Cincinnati / OhioLINK, 2006. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1140818310.

Full text

APA, Harvard, Vancouver, ISO, and other styles

4

Nagpal, Radhika. "Store Buffers : implementing single cycle store instructions in write-through, write-back and set associative caches." Thesis, Massachusetts Institute of Technology, 1994. http://hdl.handle.net/1721.1/36678.

Full text

Abstract:

Thesis (B.S. and M.S.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 1994.
Includes bibliographical references (p. 87).
This thesis proposes a new mechanism, called Store Buffers, for implementing single cycle store instructions in a pipelined processor. Single cycle store instructions are difficult to implement because in most cases the tag check must be performed before the data can be written into the data cache. Store buffers allow a store instruction to read the cache tag as it. passes through the pipe while keeping the store instruction data buffered in a backup register until the data cache is free. This strategy guarantees single cycle store execution without increasing the hit access time or degrading the performance of the data cache for simple direct-mapped caches, as well as for more complex set associative and write-back caches. As larger caches are incorporated on-chip, the speed of store instructions becomes an increasingly important part of the overall performance. The first part of the thesis describes the design and implementation of store buffers in write through, write-back, direct-mapped and set associative caches. The second part describes the implementation and simulation of store buffers in a 6-stage pipeline with a direct mapped write-through pipelined cache. The performance of this method is compared to other cache write techniques. Preliminary results show that store buffers perform better than other store strategies under high IO latencies and cache thrashing. With as few as three buffers, they significantly reduce the number of cycles per instruction.
by Radhika Nagpal.
B.S.and M.S.

APA, Harvard, Vancouver, ISO, and other styles

5

Eng, Stefan. "Heuristisk profilbaserad optimering av instruktionscache i en online Just-In-Time kompilator." Thesis, Linköping University, Department of Computer and Information Science, 2004. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-2452.

Full text

Abstract:

This master’s thesis examines the possibility to heuristically optimise instruction cache performance in a Just-In-Time (JIT) compiler.

Programs that do not fit inside the cache all at once may suffer from cache misses as a result of frequently executed code segments competing for the same cache lines. A new heuristic algorithm LHCPA was created to place frequently executed code segments to avoid cache conflicts between them, reducing the overall cache misses and reducing the performance bottlenecks. Set-associative caches are taken into consideration and not only direct mapped caches.

In Ahead-Of-Time compilers (AOT), the problem with frequent cache misses is often avoided by using call graphs derived from profiling and more or less complex algorithms to estimate the performance for different placements approaches. This often results in heavy computation during compilation which is not accepted in a JIT compiler.

A case study is presented on an Alpha processor and an at Ericsson developed JIT Compiler. The results of the case study shows that cache performance can be improved using this technique but also that a lot of other factors influence the result of the cache performance. Such examples are whether the cache is set-associative or not; and especially the size of the cache highly influence the cache performance.

APA, Harvard, Vancouver, ISO, and other styles

6

Ponnala, Kalyan. "DESIGN AND IMPLEMENTATION OF THE INSTRUCTION SET ARCHITECTURE FOR DATA LARS." UKnowledge, 2010. http://uknowledge.uky.edu/gradschool_theses/58.

Full text

Abstract:

The ideal memory system assumed by most programmers is one which has high capacity, yet allows any word to be accessed instantaneously. To make the hardware approximate this performance, an increasingly complex memory hierarchy, using caches and techniques like automatic prefetch, has evolved. However, as the gap between processor and memory speeds continues to widen, these programmer-visible mechanisms are becoming inadequate. Part of the recent increase in processor performance has been due to the introduction of programmer/compiler-visible SWAR (SIMD Within A Register) parallel processing on increasingly wide DATA LARs (Line Associative Registers) as a way to both improve data access speed and increase efficiency of SWAR processing. Although the base concept of DATA LARs predates this thesis, this thesis presents the first instruction set architecture specification complete enough to allow construction of a detailed prototype hardware design. This design was implemented and tested using a hardware simulator.

APA, Harvard, Vancouver, ISO, and other styles

7

Li, Sy-Yuan, and 李斯元. "Set-Associative Load-Store Caches." Thesis, 2007. http://ndltd.ncl.edu.tw/handle/01136642695130656933.

Full text

Abstract:

碩士
國立臺灣海洋大學
資訊工程學系
95
The conventional load/store queue (LSQ) is a CAM structure where a dynamically-scheduled processor stores all in-flight memory instructions and conducts fully associative, age-prioritized searches to maintain dependencies and perform forwarding. LSQ is neither efficient since previous studies have shown that dependency violations are infrequent, nor scalable due to the complexity of the CAM. This paper presents an efficient and scalable alternative to the LSQ, called the set-associative load/store cache (LSC), that replaces the CAM with a set-associative tag array. It is analogous to substituting a set-associative cache for a fully associative cache, since the tag bit cell of a fully-associative array is a CAM. As it has been observed that set-associative caches can significantly reduce tag comparisons while approximating the miss rates of fully associative caches, LSC can substantially lessen the search bandwidth demand without incurring noticeable performance degradation due to stalls caused by set conflicts. Experimental results of SPECint2000 benchmarks show that both a 32-entry and a 128-entry 4-way set-associative LSC can significantly reduce the search bandwidth demand with no visible performance penalties, while a 128-entry L0 LSC can improve the average execution times by 3%.

APA, Harvard, Vancouver, ISO, and other styles

8

Wu, Dong-Hua, and 吳東樺. "SALSC: Set-Associative Load/Store Caches." Thesis, 2010. http://ndltd.ncl.edu.tw/handle/72199186710102839427.

Full text

Abstract:

碩士
國立臺灣科技大學
資訊工程系
98
The conventional load/store queue (LSQ) is a CAM structure where a dynamically-scheduled processor stores all in-flight memory instructions and conducts fully associative, ordering-logic searches to maintain dependencies and perform forwarding. LSQ is neither efficient since previous studies have shown that dependency violations are infrequent, nor scalable due to the complexity of the CAM. This paper presents an efficient and scalable alternative to the LSQ, called the set-associative load/store cache (SALSC), that replaces the CAM with a set-associative tag array. It is analogous to substituting a set-associative cache for a fully associative cache, since the tag bit cell of a fully-associative array is a CAM. As it has been observed that set-associative caches can significantly reduce tag comparisons while approximating the miss rates of fully associative caches, SALSC can substantially lessen the search bandwidth demand without incurring noticeable performance degradation due to stalls caused by set conflicts. Furthermore, an SALSC can be viewed as a set-associative cache integrated with an age logic, and hence it is a natural and straightforward extension to treat an SALSC as an L0 cache by buffering data of memory references in the entries. Experimental results of SPECint2000 benchmarks show that both a 32-entry and a 128-entry 4-way SALSC can significantly reduce the search bandwidth demand with no visible performance penalties, while a 128-entry L0 SALSC can improve the average execution times by 0.22%.

APA, Harvard, Vancouver, ISO, and other styles

9

張延任. "Efficient Simulation Alogorithm for Set-Associative Victim Cache Memory." Thesis, 1997. http://ndltd.ncl.edu.tw/handle/81639848375032088617.

Full text

Abstract:

碩士
中原大學
資訊工程學系
85
Trace-driven simulation is the most commonly used technique for evaluating the behavior of a cache memory system. Prior to this investigation, all simulation algorithms were aimed at the conventional cache architecture without any extra device. This paper presents 1) more efficient and easier one-pass algorithm for simulating alternative all-associativity (i.e. direct-mapped and set-associative) cache than early ones, 2) new and powerful one-pass algorithm for simulating alternative all-associativity caches with a victim cache of different entry member, and 3) uses those simulation results to compare set-associative caches and direct-mapped caches with a small victim cache from various aspect. 　　First, we propose a more efficient algorithm, called hash-like RM simulation, for simulating alternative caches with the same block size, and using the LRU replacement policy, with a single pass through an address trace. This algorithm facilitates more rapid simulation of alternative caches by reducing the average search depth in fully stack. And further, we develop a powerful algorithm, victim one-pass simulation, for simulating alternative caches with a victim cache (buffer) of different entry number in one pass. Since the behavior of victim cache is detrimental to one-pass simulation, this algorithm is more complicated than those for simulating memory system without victim cache. 　　Finally, our experimental data provide evidence that adding a victim cache is worthless for direct-mapped instruction caches with size more than 32K, but 64K direct-mapped data caches with a 4-6 entries victim cache can compete in miss ratios with those of 64K 2-way set-associative caches and have the more superior average memory access time.

APA, Harvard, Vancouver, ISO, and other styles

10

Ting, Chih-Hui, and 丁之暉. "Sequential Way-Access Set-Associative Cache Architecture for Low Power." Thesis, 2007. http://ndltd.ncl.edu.tw/handle/96132699768891504787.

Full text

APA, Harvard, Vancouver, ISO, and other styles

11

Lin, Jin-Yi, and 林金義. "Impact of Set-Associative and Sector on Second-Level Cache." Thesis, 1996. http://ndltd.ncl.edu.tw/handle/27008270233024517376.

Full text

Abstract:

碩士
大同工學院
資訊工程學系
84
Cache memories have become common accross a wide range of computer implementions. Two-level configurations become even more important in systems. The selection of associativity or sector has significant impact on cache performanceand cost. We use a tracer-driven simulation to examine the effect of a two-levelcache hierarchy in uniprocessors. Because there are a large number of design paramenters to any cache, most of which must be considered in any analysis of the design . In this thesis we focus on set- associativity and sectors to evaluate the miss ratio and bus traffic ratio of different cache size, associativity( degree of associativity, set size) and sectors (subblock).

APA, Harvard, Vancouver, ISO, and other styles

12

Li, Jia-Jhe, and 李嘉哲. "Novel Set-Associative Cache to Reduce Leakage Power While Improving Performance." Thesis, 2006. http://ndltd.ncl.edu.tw/handle/70703465694769712561.

Full text

Abstract:

碩士
國立臺灣海洋大學
資訊工程學系
94
As transistors keep shrinking and on-chip caches keep growing, static power dissipation due to leakage of caches takes an increasing fraction of total power in processors. Several techniques have already been proposed to reduce leakage power by turning off unused cache lines. However, they all have to pay the price of performance degradation. This thesis presents a novel cache architecture, the Snug Set-Associative (SSA) cache, that cuts most of static power dissipation of caches while improving the performance. The proposed SSA cache reduces leakage power by implementing the minimum set-associative scheme, which only activates the minimal numbers of ways in each cache set, while the performance losses caused by this scheme are compensated by the base-offset load/store queues. The rationale of combining these two techniques is locality: as the contents of the cache blocks in the current working set are accessed repeatedly, same addresses would be computed again and again. The SSA cache architecture can be applied to data caches and instruction caches to reduce leakage power without incurring performance penalties. Experimental results show that SSA can cut static power consumption of the L1 data cache by 93% on average for SPECint2000 benchmarks, while the execution times are reduced by 5%. Similarly, SSA can cut leakage dissipation of the L1 instruction cache by 92% on average and improve performance over 3%. Furthermore, when SSA is adopted for both L1 data and instruction caches, the normalized leakage of L1 data and instruction caches is lowered to 8% on average, while still accomplishing a 2% reduction in execution times.

APA, Harvard, Vancouver, ISO, and other styles

13

Chou, Tzu-Min, and 周資敏. "WP-TLB: Way Prediction for Set-Associative L2 Cache to Save Dynamic Read Energy." Thesis, 2008. http://ndltd.ncl.edu.tw/handle/92698639476397818459.

Full text

Abstract:

碩士
國立交通大學
資訊科學與工程研究所
97
An L2 cache is usually implemented as a set-associative cache, and its associativity is usually high. It is obvious that there are more energy and access latency consumed on a set-associative cache than a direct-mapped cache with the same size. If we can know the way of the required data in advance, under only activating the corresponding way, the energy con-sumption and the access latency will be close to a direct-mapped cache which has the same size as a single way of the L2 cache. In this paper, we proposed a design for the way prediction of L2 cache. By storing way indices in extension designed TLB (we called WP-TLB), under the premise that no perfor-mance is lost, we can achieve the main goal of saving dynamic read energy and the secondary goal of reducing access latency. Most importantly, whether the way prediction is correct or not, the energy and access latency can be saved. This is because that we can guarantee that even when miss prediction of way occurs, the other ways do not need to be probed for searching the required data. We use CACTI 4.2 to estimate energy consumption and access latency of memory com-ponents. Moreover, we run SPEC2000 benchmark in modified SimpleScalar 3.0 simulator. According to the simulation results, in the best case, the dynamic power can be saved about 65% and the average access latency of L2 cache can be reduced 17%. And the static power is just increased about 0.6%. No overall performance will lose under our design.

APA, Harvard, Vancouver, ISO, and other styles

14

Chen, Hsin-Chu, and 陳信助. "The Design of Way-Prediction Scheme in Set-Associative Cache for Energy Efficient Embedded System." Thesis, 2008. http://ndltd.ncl.edu.tw/handle/81901837676372067659.

Full text

Abstract:

碩士
大同大學
資訊工程學系(所)
95
Embedded System develops rapidly, functions turn into more complicate, and multi-media applications are growing daily and they consume more electrical power. Therefore, how to improve stand-by time will become a very important issue. Related researches indicate that the power consumption of processor cache is accounted for a big proportion. Way-prediction and LRU (Least Recently Used) algorithms improve hit rate and would help in reducing the number of tag comparisons, and therefore save energy consumption. In this thesis, we use MRU (Most Recently Used) table to record the most used block for each index and use Modified Pseudo LRU (MPLRU) Replacement algorithm for reducing hardware complexity and cache miss rate. Experiments show our prediction hit rate reach 90.15%, thus save 64.12% energy. The experimental results are obtained by using Wattch cache simulator for SPEC95 benchmarks.

APA, Harvard, Vancouver, ISO, and other styles

15

Κεραμίδας, Γεώργιος. "Αρχιτεκτονικές επεξεργαστών και μνημών ειδικού σκοπού για την υποστήριξη φερέγγυων (ασφαλών) δικτυακών υπηρεσιών." Thesis, 2008. http://nemertes.lis.upatras.gr/jspui/handle/10889/1037.

Full text

Abstract:

Η ασφάλεια των υπολογιστικών συστημάτων αποτελεί πλέον μια πολύ ενεργή περιοχή και αναμένεται να γίνει μια νέα παράμετρος σχεδίασης ισάξια μάλιστα με τις κλασσικές παραμέτρους σχεδίασης των συστημάτων, όπως είναι η απόδοση, η κατανάλωση ισχύος και το κόστος. Οι φερέγγυες υπολογιστικές πλατφόρμες έχουν προταθεί σαν μια υποσχόμενη λύση, ώστε να αυξήσουν τα επίπεδα ασφάλειας των συστημάτων και να παρέχουν προστασία από μη εξουσιοδοτημένη άδεια χρήσης των πληροφοριών που είναι αποθηκευμένες σε ένα σύστημα. Ένα φερέγγυο σύστημα θα πρέπει να διαθέτει τους κατάλληλους μηχανισμούς, ώστε να είναι ικανό να αντιστέκεται στο σύνολο, τόσο γνωστών όσο και νέων, επιθέσεων άρνησης υπηρεσίας. Οι επιθέσεις αυτές μπορεί να έχουν ως στόχο να βλάψουν το υλικό ή/και το λογισμικό του συστήματος. Ωστόσο, η μεγαλύτερη βαρύτητα στην περιοχή έχει δοθεί στην αποτροπή επιθέσεων σε επίπεδο λογισμικού. Στην παρούσα διατριβή προτείνονται έξι μεθοδολογίες σχεδίασης ικανές να θωρακίσουν ένα υπολογιστικό σύστημα από επιθέσεις άρνησης υπηρεσίας που έχουν ως στόχο να πλήξουν το υλικό του συστήματος. Η κύρια έμφαση δίνεται στο υποσύστημα της μνήμης (κρυφές μνήμες). Στις κρυφές μνήμες αφιερώνεται ένα μεγάλο μέρος της επιφάνειας του ολοκληρωμένου, είναι αυτές που καλούνται να "αποκρύψουν" τους αργούς χρόνους απόκρισης της κύριας μνήμης και ταυτόχρονα σε αυτές οφείλεται ένα μεγάλο μέρος της συνολικής κατανάλωσης ισχύος. Ως εκ τούτου, παρέχοντας βελτιστοποιήσεις στις κρυφές μνήμες καταφέρνουμε τελικά να μειώσουμε τον χρόνο εκτέλεσης του λογισμικού, να αυξήσουμε το ρυθμό μετάδοσης των ψηφιακών δεδομένων και να θωρακίσουμε το σύστημα από επιθέσεις άρνησης υπηρεσίας σε επίπεδο υλικού.
Data security concerns have recently become very important, and it can be expected that security will join performance, power and cost as a key distinguish factor in computer systems. Trusted platforms have been proposed as a promising approach to enhance the security of the modern computer system and prevent unauthorized accesses and modifications of the sensitive information stored in the system. Unfortunately, previous approaches only provide a level of security against software-based attacks and leave the system wide open to hardware attacks. This dissertation thesis proposes six design methodologies to shield a uniprocessor or a multiprocessor system against a various number of Denial of Service (DoS) attacks at the architectural and the operating system level. Specific focus is given to the memory subsystem (i.e. cache memories). The cache memories account for a large portion of the silicon area, they are greedy power consumers and they seriously determine system performance due to the even growing gap between the processor speed and main memory access latency. As a result, in this thesis we propose methodologies to optimize the functionality and lower the power consumption of the cache memories. The goal in all cases is to increase the performance of the system, the achieved packet throughput and to enhance the protection against a various number of passive and Denial of Service attacks.

APA, Harvard, Vancouver, ISO, and other styles

Dissertations / Theses on the topic 'Set-Associative'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles