Journal articles: 'Cache hierarchy'

1

Yavits, Leonid, Amir Morad, and Ran Ginosar. "Cache Hierarchy Optimization." IEEE Computer Architecture Letters 13, no. 2 (July 29, 2014): 69–72. http://dx.doi.org/10.1109/l-ca.2013.18.

Full text

APA, Harvard, Vancouver, ISO, and other styles

2

Zhao, Huatao, Xiao Luo, Chen Zhu, Takahiro Watanabe, and Tianbo Zhu. "Behavior-aware cache hierarchy optimization for low-power multi-core embedded systems." Modern Physics Letters B 31, no. 19-21 (July 27, 2017): 1740067. http://dx.doi.org/10.1142/s021798491740067x.

Full text

Abstract:

In modern embedded systems, the increasing number of cores requires efficient cache hierarchies to ensure data throughput, but such cache hierarchies are restricted by their tumid size and interference accesses which leads to both performance degradation and wasted energy. In this paper, we firstly propose a behavior-aware cache hierarchy (BACH) which can optimally allocate the multi-level cache resources to many cores and highly improved the efficiency of cache hierarchy, resulting in low energy consumption. The BACH takes full advantage of the explored application behaviors and runtime cache resource demands as the cache allocation bases, so that we can optimally configure the cache hierarchy to meet the runtime demand. The BACH was implemented on the GEM5 simulator. The experimental results show that energy consumption of a three-level cache hierarchy can be saved from 5.29% up to 27.94% compared with other key approaches while the performance of the multi-core system even has a slight improvement counting in hardware overhead.

APA, Harvard, Vancouver, ISO, and other styles

3

Tabak, Daniel. "Cache and Memory Hierarchy Design." ACM SIGARCH Computer Architecture News 23, no. 3 (June 1995): 28. http://dx.doi.org/10.1145/203618.564957.

Full text

APA, Harvard, Vancouver, ISO, and other styles

4

Franaszek, P. A., L. A. Lastras-Montano, S. R. Kunkel, and A. C. Sawdey. "Victim management in a cache hierarchy." IBM Journal of Research and Development 50, no. 4.5 (July 2006): 507–23. http://dx.doi.org/10.1147/rd.504.0507.

Full text

APA, Harvard, Vancouver, ISO, and other styles

5

Garashchenko, A. V., and L. G. Gagarina. "An Approach to the Formation of Test Sequences Based on the Graph Model of the Cache Memory Hierarchy." Proceedings of Universities. ELECTRONICS 25, no. 6 (December 2020): 548–57. http://dx.doi.org/10.24151/1561-5405-2020-25-6-548-557.

Full text

Abstract:

The verification of the cache memory hierarchy in modern SoC due to the large state space requires a huge number of complex tests. This becomes the main problem for functional verification. To cover the entire state space, a graph model of the cache memory hierarchy as well as the methods of generating the formation of the test sequences based on this model have been proposed. The graph model vertices are a set of states (tags, values, etc.) of each hierarchy level, and the edges are a set of transitions between states (instructions for reading, records). The graph model, describing all states of the cache-memory hierarchy states, has been developed. Each edge in the graph is a separate check sequence. In case of the non-deterministic situations, such as the choice of a channel (port) for multichannel cache memory, it will not be possible to resolve them at the level of the graph model, since the choice of the channel depends on many factors not considered within the model framework. It has been proposed to create a separate instance of a subgraph for each channel. The described approach has revealed, in verification of the multiport cache-memory hierarchy of the developed core with the new vector architecture VLIW DSP, a few architectural and functional errors. This approach can be used to test other processor cores and their blocks

APA, Harvard, Vancouver, ISO, and other styles

6

Ding, Wei, Yuanrui Zhang, Mahmut Kandemir, and Seung Woo Son. "Compiler-Directed File Layout Optimization for Hierarchical Storage Systems." Scientific Programming 21, no. 3-4 (2013): 65–78. http://dx.doi.org/10.1155/2013/167581.

Full text

Abstract:

File layout of array data is a critical factor that effects the behavior of storage caches, and has so far taken not much attention in the context of hierarchical storage systems. The main contribution of this paper is a compiler-driven file layout optimization scheme for hierarchical storage caches. This approach, fully automated within an optimizing compiler, analyzes a multi-threaded application code and determines a file layout for each disk-resident array referenced by the code, such that the performance of the target storage cache hierarchy is maximized. We tested our approach using 16 I/O intensive application programs and compared its performance against two previously proposed approaches under different cache space management schemes. Our experimental results show that the proposed approach improves the execution time of these parallel applications by 23.7% on average.

APA, Harvard, Vancouver, ISO, and other styles

7

CARAZO, PABLO, RUBÉN APOLLONI, FERNANDO CASTRO, DANIEL CHAVER, LUIS PINUEL, and FRANCISCO TIRADO. "REDUCING CACHE HIERARCHY ENERGY CONSUMPTION BY PREDICTING FORWARDING AND DISABLING ASSOCIATIVE SETS." Journal of Circuits, Systems and Computers 21, no. 07 (November 2012): 1250057. http://dx.doi.org/10.1142/s0218126612500570.

Full text

Abstract:

The first level data cache in modern processors has become a major consumer of energy due to its increasing size and high frequency access rate. In order to reduce this high energy consumption, we propose in this paper a straightforward filtering technique based on a highly accurate forwarding predictor. Specifically, a simple structure predicts whether a load instruction will obtain its corresponding data via forwarding from the load-store structure — thus avoiding the data cache access — or if it will be provided by the data cache. This mechanism manages to reduce the data cache energy consumption by an average of 21.5% with a negligible performance penalty of less than 0.1%. Furthermore, in this paper we focus on the cache static energy consumption too by disabling a portion of sets of the L2 associative cache. Overall, when merging both proposals, the combined L1 and L2 total energy consumption is reduced by an average of 29.2% with a performance penalty of just 0.25%.

APA, Harvard, Vancouver, ISO, and other styles

8

Feliu, Josue, Salvador Petit, Julio Sahuquillo, and Jose Duato. "Cache-Hierarchy Contention-Aware Scheduling in CMPs." IEEE Transactions on Parallel and Distributed Systems 25, no. 3 (March 2014): 581–90. http://dx.doi.org/10.1109/tpds.2013.61.

Full text

APA, Harvard, Vancouver, ISO, and other styles

9

Zahran, Mohamed M. "On cache memory hierarchy for Chip-Multiprocessor." ACM SIGARCH Computer Architecture News 31, no. 1 (March 2003): 39–48. http://dx.doi.org/10.1145/773365.773370.

Full text

APA, Harvard, Vancouver, ISO, and other styles

10

Yan, Mengjia, Bhargava Gopireddy, Thomas Shull, and Josep Torrellas. "Secure Hierarchy-Aware Cache Replacement Policy (SHARP)." ACM SIGARCH Computer Architecture News 45, no. 2 (September 14, 2017): 347–60. http://dx.doi.org/10.1145/3140659.3080222.

Full text

APA, Harvard, Vancouver, ISO, and other styles

11

Qian, Cheng, Libo Huang, Qi Yu, and Zhiying Wang. "CHAM: Improving Prefetch Efficiency Using a Composite Hierarchy-Aware Method." Journal of Circuits, Systems and Computers 27, no. 07 (March 26, 2018): 1850114. http://dx.doi.org/10.1142/s0218126618501141.

Full text

Abstract:

Hardware prefetching has always been a crucial mechanism to improve processor performance. However, an efficient prefetch operation requires a guarantee of high prefetch accuracy; otherwise, it may degrade system performance. Prior studies propose an adaptive priority controlling method to make better use of prefetch accesses, which improves performance in two-level cache systems. However, this method does not perform well in a more complex memory hierarchy, such as a three-level cache system. Thus, it is still necessary to explore the efficiency of prefetch, in particular, in complex hierarchical memory systems. In this paper, we propose a composite hierarchy-aware method called CHAM, which works at the middle level cache (MLC). By using prefetch accuracy as an evaluation criterion, CHAM improves the efficiency of prefetch accesses based on (1) a dynamic adaptive prefetch control mechanism to schedule the priority and data transfer of prefetch accesses across the cache hierarchical levels in the runtime and (2) a prefetch efficiency-oriented hybrid cache replacement policy to select the most suitable policy. To demonstrate its effectiveness, we have performed extensive experiments on 28 benchmarks from SPEC CPU2006 and two benchmarks from BioBench. Compared with a similar adaptive method, CHAM improves the MLC demand hit rate by 9.2% and an improvement of 1.4% in system performance on average in a single-core system. On a 4-core system, CHAM improves the demand hit rate by 33.06% and improves system performance by 10.1% on average.

APA, Harvard, Vancouver, ISO, and other styles

12

López, Sonia, Óscar Garnica, David H. Albonesi, Steven Dropsho, Juan Lanchares, and José I. Hidalgo. "A phase adaptive cache hierarchy for SMT processors." Microprocessors and Microsystems 35, no. 8 (November 2011): 683–94. http://dx.doi.org/10.1016/j.micpro.2011.08.008.

Full text

APA, Harvard, Vancouver, ISO, and other styles

13

Sun, Zhenyu, Xiuyuan Bi, Hai Li, Weng-Fai Wong, and Xiaochun Zhu. "STT-RAM Cache Hierarchy With Multiretention MTJ Designs." IEEE Transactions on Very Large Scale Integration (VLSI) Systems 22, no. 6 (June 2014): 1281–93. http://dx.doi.org/10.1109/tvlsi.2013.2267754.

Full text

APA, Harvard, Vancouver, ISO, and other styles

14

Chang-sheng, Xie, Liu Rui-fang, and Tan Zhi-hu. "Design and implementation of hierarchy cache using pagefile." Wuhan University Journal of Natural Sciences 9, no. 6 (November 2004): 890–94. http://dx.doi.org/10.1007/bf02850793.

Full text

APA, Harvard, Vancouver, ISO, and other styles

15

Zhao, Jia, and Watanabe. "Router-integrated Cache Hierarchy Design for Highly Parallel Computing in Efficient CMP Systems." Electronics 8, no. 11 (November 17, 2019): 1363. http://dx.doi.org/10.3390/electronics8111363.

Full text

Abstract:

In current Chip Multi-Processor (CMP) systems, data sharing existing in cache hierarchy acts as a critical issue which costs plenty of clock cycles for maintaining data coherence. Along with the integrated core number increasing, the only shared cache serves too many processing threads to maintain sharing data efficiently. In this work, an enhanced router network is integrated within the private cache level for fast interconnecting sharing data accesses existing in different threads. All sharing data in private cache level can be classified into seven access types by experimental pattern analysis. Then, both shared accesses and thread-crossed accesses can be rapidly detected and dealt with in the proposed router network. As a result, the access latency of private cache is decreased, and a conventional coherence traffic problem is alleviated. The process in the proposed path is composed of three steps. Firstly, the target accesses can be detected by exploring in the router network. Then, the proposed replacement logic can handle those accesses for maintaining data coherence. Finally, those accesses are delivered in the proposed data deliverer. Thus, the harmful data sharing accesses are solved within the first chip layer in 3D-IC structure. The proposed system is also implemented into a cycle-precise simulation platform, and experimental results illustrate that our model can improve the Instructions Per Cycle (IPC) of on-chip execution by maximum 31.85 percent, while energy consumption can be saved by about 17.61 percent compared to the base system.

APA, Harvard, Vancouver, ISO, and other styles

16

Srikanth, Sriseshan, Anirudh Jain, Thomas M. Conte, Erik P. Debenedictis, and Jeanine Cook. "SortCache." ACM Transactions on Architecture and Code Optimization 18, no. 4 (December 31, 2021): 1–24. http://dx.doi.org/10.1145/3473332.

Full text

Abstract:

Sparse data applications have irregular access patterns that stymie modern memory architectures. Although hyper-sparse workloads have received considerable attention in the past, moderately-sparse workloads prevalent in machine learning applications, graph processing and HPC have not. Where the former can bypass the cache hierarchy, the latter fit in the cache. This article makes the observation that intelligent, near-processor cache management can improve bandwidth utilization for data-irregular accesses, thereby accelerating moderately-sparse workloads. We propose SortCache, a processor-centric approach to accelerating sparse workloads by introducing accelerators that leverage the on-chip cache subsystem, with minimal programmer intervention.

APA, Harvard, Vancouver, ISO, and other styles

17

Way, Jonathan G., and Rebecca D. Cabral. "Effects of Hierarchy Rank on Caching Frequency in a Captive Coywolf (Eastern Coyote) Canis latrans × lycaon, Pack." Canadian Field-Naturalist 123, no. 2 (April 1, 2009): 173. http://dx.doi.org/10.22621/cfn.v123i2.699.

Full text

Abstract:

Caching is useful because it ensures a consistent supply of food for animals. However, there is a relative paucity of data concerning which members of canid social units make the most caches. We provide data indicating that dominant members of a captive Coywolf “Eastern Coyote”, (Canis latrans × lycaon) pack did the majority (78%, n = 46 of 59) of caching. Caching is a common activity stereotypically performed by canids, and dominant members of a social unit tend to cache more often.

APA, Harvard, Vancouver, ISO, and other styles

18

Johnson, Teresa L., and Wen-mei W. Hwu. "Run-time adaptive cache hierarchy management via reference analysis." ACM SIGARCH Computer Architecture News 25, no. 2 (May 1997): 315–26. http://dx.doi.org/10.1145/384286.264213.

Full text

APA, Harvard, Vancouver, ISO, and other styles

19

Lai, Bo-Cheng Charles, Hsien-Kai Kuo, and Jing-Yang Jou. "A Cache Hierarchy Aware Thread Mapping Methodology for GPGPUs." IEEE Transactions on Computers 64, no. 4 (April 2015): 884–98. http://dx.doi.org/10.1109/tc.2014.2308179.

Full text

APA, Harvard, Vancouver, ISO, and other styles

20

Ozturk, Ozcan, Umut Orhan, Wei Ding, Praveen Yedlapalli, and Mahmut Taylan Kandemir. "Cache Hierarchy-Aware Query Mapping on Emerging Multicore Architectures." IEEE Transactions on Computers 66, no. 3 (March 1, 2017): 403–15. http://dx.doi.org/10.1109/tc.2016.2605682.

Full text

APA, Harvard, Vancouver, ISO, and other styles

21

Ma, Cong, William Tuohy, and David J. Lilja. "Impact of spintronic memory on multicore cache hierarchy design." IET Computers & Digital Techniques 11, no. 2 (January 25, 2017): 51–59. http://dx.doi.org/10.1049/iet-cdt.2015.0190.

Full text

APA, Harvard, Vancouver, ISO, and other styles

22

Monteiro, Eduarda, Mateus Grellert, Bruno Zatt, and Sergio Bampi. "Energy-aware cache hierarchy assessment targeting HEVC encoder execution." Journal of Real-Time Image Processing 16, no. 5 (March 9, 2017): 1695–715. http://dx.doi.org/10.1007/s11554-017-0680-9.

Full text

APA, Harvard, Vancouver, ISO, and other styles

23

ZARANDI, HAMID R., and SEYED GHASSEM MIREMADI. "HIERARCHICAL SET-ASSOCIATE CACHE FOR HIGH-PERFORMANCE AND LOW-ENERGY ARCHITECTURE." Journal of Circuits, Systems and Computers 15, no. 06 (December 2006): 861–80. http://dx.doi.org/10.1142/s0218126606003404.

Full text

Abstract:

This paper presents a new cache scheme based on varying the size of sets in the set-associative cache hierarchically. In this scheme, all sets at a hierarchical level have same size but are k times more than the size of sets in the next level of hierarchy where k is called division factor. Therefore the size of tag fields associated to each set is variable and it depends on the hierarchy level of the set it is in. This scheme is proposed to achieve higher hit ratios with respect to the two conventional schemes namely set-associative and direct mapping. The proposed scheme has been simulated with several standard trace files SPEC 2000 and statistics are gathered and analyzed for different cache configurations. The results reveal that the proposed scheme exhibits a higher hit ratio compared to the two well-known mapping schemes, namely set-associative and direct mapping. Moreover, the area and power consumption of this scheme is less than full-associative scheme.

APA, Harvard, Vancouver, ISO, and other styles

24

Dash, Banchhanidhi, Debabala Swain, and Bijay K. Paikaray. "Adaptive weight-based: an exclusive bypass algorithm for L3 cache in a three level cache hierarchy." International Journal of Computational Systems Engineering 3, no. 1/2 (2017): 74. http://dx.doi.org/10.1504/ijcsyse.2017.083157.

Full text

APA, Harvard, Vancouver, ISO, and other styles

25

Swain, Debabala, Bijay K. Paikaray, and Banchhanidhi Dash. "Adaptive weight-based: an exclusive bypass algorithm for L3 cache in a three level cache hierarchy." International Journal of Computational Systems Engineering 3, no. 1/2 (2017): 74. http://dx.doi.org/10.1504/ijcsyse.2017.10004031.

Full text

APA, Harvard, Vancouver, ISO, and other styles

26

Holmes, G., B. Pfahringer, and R. Kirkby. "CACHE HIERARCHY INSPIRED COMPRESSION: A NOVEL ARCHITECTURE FOR DATA STREAMS." Journal of IT in Asia 2, no. 1 (April 26, 2016): 39–52. http://dx.doi.org/10.33736/jita.54.2007.

Full text

Abstract:

We present an architecture for data streams based on structures typically found in web cache hierarchies. The main idea is to build a meta level analyser from a number of levels constructed over time from a data stream. We present the general architecture for such a system and an application to classification. This architecture is an instance of the general wrapper idea allowing us to reuse standard batch learning algorithms in an inherently incremental learning environment. By artificially generating data sources we demonstrate that a hierarchy containing a mixture of models is able to adapt over time to the source of the data. In these experiments the hierarchies use an elementary performance based replacement policy and unweighted voting for making classification decisions.

APA, Harvard, Vancouver, ISO, and other styles

27

Soundararajan, Gokul, Jin Chen, Mohamed A. Sharaf, and Cristiana Amza. "Dynamic partitioning of the cache hierarchy in shared data centers." Proceedings of the VLDB Endowment 1, no. 1 (August 2008): 635–46. http://dx.doi.org/10.14778/1453856.1453926.

Full text

APA, Harvard, Vancouver, ISO, and other styles

28

Basak, Abanti, Xing Hu, Shuangchen Li, Sang Min Oh, and Yuan Xie. "Exploring Core and Cache Hierarchy Bottlenecks in Graph Processing Workloads." IEEE Computer Architecture Letters 17, no. 2 (July 1, 2018): 197–200. http://dx.doi.org/10.1109/lca.2018.2864964.

Full text

APA, Harvard, Vancouver, ISO, and other styles

29

Vivekanandarajah, K., T. Srikanthan, and S. Bhattacharyya. "Energy-delay efficient filter cache hierarchy using pattern prediction scheme." IEE Proceedings - Computers and Digital Techniques 151, no. 2 (2004): 141. http://dx.doi.org/10.1049/ip-cdt:20040032.

Full text

APA, Harvard, Vancouver, ISO, and other styles

30

Conway, Pat, Nathan Kalyanasundharam, Gregg Donley, Kevin Lepak, and Bill Hughes. "Cache Hierarchy and Memory Subsystem of the AMD Opteron Processor." IEEE Micro 30, no. 2 (March 2010): 16–29. http://dx.doi.org/10.1109/mm.2010.31.

Full text

APA, Harvard, Vancouver, ISO, and other styles

31

Liu, Hongyu, and Rui Han. "A Hierarchical Cache Size Allocation Scheme Based on Content Dissemination in Information-Centric Networks." Future Internet 13, no. 5 (May 15, 2021): 131. http://dx.doi.org/10.3390/fi13050131.

Full text

Abstract:

With the rapid growth of mass content retrieval on the Internet, Information-Centric Network (ICN) has become one of the hotspots in the field of future network architectures. The in-network cache is an important feature of ICN. For better network performance in ICN, the cache size on each node should be allocated in proportion to its importance. However, in some current studies, the importance of cache nodes is usually determined by their location in the network topology, ignoring their roles in the actual content transmission process. In this paper, we focus on the allocation of cache size for each node within a given total cache space budget. We explore the impact of heterogeneous cache allocation on content dissemination under the same ICN infrastructure and we quantify the importance of nodes from content dissemination and network topology. To this purpose, we implement a hierarchy partitioning method based on content dissemination, then we formulate a set of weight calculation methods for these hierarchies and to provide a per-node cache space allocation to allocate the total cache space budget to each node in the network. The performance of the scheme is evaluated on the Garr topology, and the average hit ratio, latency, and load are compared to show that the proposed scheme has better performance in these aspects than other schemes.

APA, Harvard, Vancouver, ISO, and other styles

32

Li, Pengcheng, Hao Luo, and Chen Ding. "Rethinking a heap hierarchy as a cache hierarchy: a higher-order theory of memory demand (HOTM)." ACM SIGPLAN Notices 51, no. 11 (July 19, 2018): 111–21. http://dx.doi.org/10.1145/3241624.2926708.

Full text

APA, Harvard, Vancouver, ISO, and other styles

33

Botincan, Matko, and Davor Runje. "An Enhancement of Futures Runtime in Presence of Cache Memory Hierarchy." Journal of Computing and Information Technology 16, no. 4 (2008): 339. http://dx.doi.org/10.2498/cit.1001403.

Full text

APA, Harvard, Vancouver, ISO, and other styles

34

Wang, W. H., J. L. Baer, and H. M. Levy. "Organization and performance of a two-level virtual-real cache hierarchy." ACM SIGARCH Computer Architecture News 17, no. 3 (June 1989): 140–48. http://dx.doi.org/10.1145/74926.74942.

Full text

APA, Harvard, Vancouver, ISO, and other styles

35

Ramos, Luis M., José Luis Briz, Pablo E. Ibáñez, and Victor Viñals. "Data prefetching in a cache hierarchy with high bandwidth and capacity." ACM SIGARCH Computer Architecture News 35, no. 4 (September 2007): 37–44. http://dx.doi.org/10.1145/1327312.1327319.

Full text

APA, Harvard, Vancouver, ISO, and other styles

36

Zhao, Jishen, Cong Xu, Tao Zhang, and Yuan Xie. "BACH: A Bandwidth-Aware Hybrid Cache Hierarchy Design with Nonvolatile Memories." Journal of Computer Science and Technology 31, no. 1 (January 2016): 20–35. http://dx.doi.org/10.1007/s11390-016-1609-7.

Full text

APA, Harvard, Vancouver, ISO, and other styles

37

Vishnekov, A. V., and E. M. Ivanova. "DYNAMIC CONTROL METHODS OF CACHE LINES REPLACEMENT POLICY." Vestnik komp'iuternykh i informatsionnykh tekhnologii, no. 191 (May 2020): 49–56. http://dx.doi.org/10.14489/vkit.2020.05.pp.049-056.

Full text

Abstract:

The paper investigates the issues of increasing the performance of computing systems by improving the efficiency of cache memory, analyzes the efficiency indicators of replacement algorithms. We show the necessity of creation of automated or automatic means for cache memory tuning in the current conditions of program code execution, namely a dynamic cache replacement algorithms control by replacement of the current replacement algorithm by more effective one in current computation conditions. Methods development for caching policy control based on the program type definition: cyclic, sequential, locally-point, mixed. We suggest the procedure for selecting an effective replacement algorithm by support decision-making methods based on the current statistics of caching parameters. The paper gives the analysis of existing cache replacement algorithms. We propose a decision-making procedure for selecting an effective cache replacement algorithm based on the methods of ranking alternatives, preferences and hierarchy analysis. The critical number of cache hits, the average time of data query execution, the average cache latency are selected as indicators of initiation for the swapping procedure for the current replacement algorithm. The main advantage of the proposed approach is its universality. This approach assumes an adaptive decision-making procedure for the effective replacement algorithm selecting. The procedure allows the criteria variability for evaluating the replacement algorithms, its’ efficiency, and their preference for different types of program code. The dynamic swapping of the replacement algorithm with a more efficient one during the program execution improves the performance of the computer system.

APA, Harvard, Vancouver, ISO, and other styles

38

Vishnekov, A. V., and E. M. Ivanova. "DYNAMIC CONTROL METHODS OF CACHE LINES REPLACEMENT POLICY." Vestnik komp'iuternykh i informatsionnykh tekhnologii, no. 191 (May 2020): 49–56. http://dx.doi.org/10.14489/vkit.2020.05.pp.049-056.

Full text

Abstract:

The paper investigates the issues of increasing the performance of computing systems by improving the efficiency of cache memory, analyzes the efficiency indicators of replacement algorithms. We show the necessity of creation of automated or automatic means for cache memory tuning in the current conditions of program code execution, namely a dynamic cache replacement algorithms control by replacement of the current replacement algorithm by more effective one in current computation conditions. Methods development for caching policy control based on the program type definition: cyclic, sequential, locally-point, mixed. We suggest the procedure for selecting an effective replacement algorithm by support decision-making methods based on the current statistics of caching parameters. The paper gives the analysis of existing cache replacement algorithms. We propose a decision-making procedure for selecting an effective cache replacement algorithm based on the methods of ranking alternatives, preferences and hierarchy analysis. The critical number of cache hits, the average time of data query execution, the average cache latency are selected as indicators of initiation for the swapping procedure for the current replacement algorithm. The main advantage of the proposed approach is its universality. This approach assumes an adaptive decision-making procedure for the effective replacement algorithm selecting. The procedure allows the criteria variability for evaluating the replacement algorithms, its’ efficiency, and their preference for different types of program code. The dynamic swapping of the replacement algorithm with a more efficient one during the program execution improves the performance of the computer system.

APA, Harvard, Vancouver, ISO, and other styles

39

Singh, Inderjit, Balwinder Raj, Mamta Khosla, and Brajesh Kumar Kaushik. "Comparative Analysis of Spintronic Memories for Low Power on-chip Caches." SPIN 10, no. 04 (November 16, 2020): 2050027. http://dx.doi.org/10.1142/s2010324720500277.

Full text

Abstract:

The continuous downscaling in CMOS devices has increased leakage power and limited the performance to a few GHz. The research goal has diverted from operating at high frequencies to deliver higher performance in essence with lower power. CMOS based on-chip memories consumes significant fraction of power in modern processors. This paper aims to explore the suitability of beyond CMOS, emerging magnetic memories for the use in memory hierarchy, attributing to their remarkable features like nonvolatility, high-density, ultra-low leakage and scalability. NVSim, a circuit-level tool, is used to explore different design layouts and memory organizations and then estimate the energy, area and latency performance numbers. A detailed system-level performance analysis of STT-MRAM and SOT-MRAM technologies and comparison with 22[Formula: see text]nm SRAM technology are presented. Analysis infers that in comparison to the existing 22[Formula: see text]nm SRAM technology, SOT-MRAM is more efficient in area for memory size [Formula: see text][Formula: see text]KB, speed and energy consumption for cache size [Formula: see text][Formula: see text]KB. A typical 256[Formula: see text]KB SOT-MRAM cache design is 27.74% area efficient, 2.97 times faster and consumes 76.05% lesser leakage than SRAM counterpart and these numbers improve for larger cache sizes. The article deduces that SOT-MRAM technology has a promising potential to replace SRAM in lower levels of computer memory hierarchy.

APA, Harvard, Vancouver, ISO, and other styles

40

Zhou, Min, Onkar Sahni, Mark S. Shephard, Christopher D. Carothers, and Kenneth E. Jansen. "Adjacency-Based Data Reordering Algorithm for Acceleration of Finite Element Computations." Scientific Programming 18, no. 2 (2010): 107–23. http://dx.doi.org/10.1155/2010/273921.

Full text

Abstract:

Effective use of the processor memory hierarchy is an important issue in high performance computing. In this work, a part level mesh topological traversal algorithm is used to define a reordering of both mesh vertices and regions that increases the spatial locality of data and improves overall cache utilization during on processor finite element calculations. Examples based on adaptively created unstructured meshes are considered to demonstrate the effectiveness of the procedure in cases where the load per processing core is varied but balanced (e.g., elements are equally distributed across cores for a given partition). In one example, the effect of the current ajacency-based data reordering is studied for different phases of an implicit analysis including element-data blocking, element-level computations, sparse-matrix filling and equation solution. These results are compared to a case where reordering is applied to mesh vertices only. The computations are performed on various supercomputers including IBM Blue Gene (BG/L and BG/P), Cray XT (XT3 and XT5) and Sun Constellation Cluster. It is observed that reordering improves the per-core performance by up to 24% on Blue Gene/L and up to 40% on Cray XT5. The CrayPat hardware performance tool is used to measure the number of cache misses across each level of the memory hierarchy. It is determined that the measured decrease in L1, L2 and L3 cache misses when data reordering is used, closely accounts for the observed decrease in the overall execution time.

APA, Harvard, Vancouver, ISO, and other styles

41

Wang, Weixun, and Prabhat Mishra. "Dynamic Reconfiguration of Two-Level Cache Hierarchy in Real-Time Embedded Systems." Journal of Low Power Electronics 7, no. 1 (February 1, 2011): 17–28. http://dx.doi.org/10.1166/jolpe.2011.1113.

Full text

APA, Harvard, Vancouver, ISO, and other styles

42

Bellens, Pieter, Josep M. Perez, Felipe Cabarcas, Alex Ramirez, Rosa M. Badia, and Jesus Labarta. "CellSs: Scheduling Techniques to Better Exploit Memory Hierarchy." Scientific Programming 17, no. 1-2 (2009): 77–95. http://dx.doi.org/10.1155/2009/561672.

Full text

Abstract:

Cell Superscalar's (CellSs) main goal is to provide a simple, flexible and easy programming approach for the Cell Broadband Engine (Cell/B.E.) that automatically exploits the inherent concurrency of the applications at a task level. The CellSs environment is based on a source-to-source compiler that translates annotated C or Fortran code and a runtime library tailored for the Cell/B.E. that takes care of the concurrent execution of the application. The first efforts for task scheduling in CellSs derived from very simple heuristics. This paper presents new scheduling techniques that have been developed for CellSs for the purpose of improving an application's performance. Additionally, the design of a new scheduling algorithm is detailed and the algorithm evaluated. The CellSs scheduler takes an extension of the memory hierarchy for Cell/B.E. into account, with a cache memory shared between the SPEs. All new scheduling practices have been evaluated showing better behavior of our system.

APA, Harvard, Vancouver, ISO, and other styles

43

Gan, Xin Biao, Li Shen, Quan Yuan Tan, Cong Liu, and Zhi Ying Wang. "Performance Evaluation and Optimization on GPU." Advanced Materials Research 219-220 (March 2011): 1445–49. http://dx.doi.org/10.4028/www.scientific.net/amr.219-220.1445.

Full text

Abstract:

GPU provides higher peak performance with hundreds of cores than CPU counterpart. However, it is a big challenge to take full advantage of their computing power. In order to understand performance bottlenecks of applications on many-core GPU and then optimize parallel programs on GPU architectures, we propose a performance evaluating model based on memory wall and then classify applications into AbM (Application bound-in Memory) and AbC (Application bound-in Computing). Furthermore, we optimize kernels characterized with low memory bandwidth including matrix multiplication and FFT (Fast Fourier Transform) by employing texture cache on NVIDIA GTX280 using CUDA (Compute Unified Device Architecture). Experimental results show that texture cache is helpful for AbM with better data locality, so it is critical to utilize GPU memory hierarchy efficiently for performance improvement.

APA, Harvard, Vancouver, ISO, and other styles

44

Venkatesan, Rangharajan, Mrigank Sharad, Kaushik Roy, and Anand Raghunathan. "Energy-Efficient All-Spin Cache Hierarchy Using Shift-Based Writes and Multilevel Storage." ACM Journal on Emerging Technologies in Computing Systems 12, no. 1 (August 3, 2015): 1–27. http://dx.doi.org/10.1145/2723165.

Full text

APA, Harvard, Vancouver, ISO, and other styles

45

Oboril, Fabian, Rajendra Bishnoi, Mojtaba Ebrahimi, and Mehdi B. Tahoori. "Evaluation of Hybrid Memory Technologies Using SOT-MRAM for On-Chip Cache Hierarchy." IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 34, no. 3 (March 2015): 367–80. http://dx.doi.org/10.1109/tcad.2015.2391254.

Full text

APA, Harvard, Vancouver, ISO, and other styles

46

Banu, J. Saira, and M. Rajasekhara Babu. "Exploring Vectorization and Prefetching Techniques on Scientific Kernels and Inferring the Cache Performance Metrics." International Journal of Grid and High Performance Computing 7, no. 2 (April 2015): 18–36. http://dx.doi.org/10.4018/ijghpc.2015040102.

Full text

Abstract:

Performance improvement in modern processor is staggering due to power wall and memory wall problem. In general, the power wall problem is addressed by various vectorization design techniques. The Memory wall problem is diminished through prefetching technique. In this paper vectorization is achieved through Single Instruction Multiple Data (SIMD) registers of the current processor. It provides architecture optimization by reducing the number of instructions in the pipeline and by minimizing the utilization of multi-level memory hierarchy. These registers provide an economical computing platform compared to Graphics Processing Unit (GPU) for compute intensive applications. This paper explores software prefetching via Streaming SIMD extension (SSE) instructions to mitigate the memory wall problem. This work quantifies the effect of vectorization and prefetching in Matrix Vector Multiplication (MVM) kernel with dense and sparse structure. Both Prefetching and Vectorization method reduces the data and instruction cache pressure and thereby improving the cache performance. To show the cache performance improvements in the kernel, the Intel VTune amplifier is used. Finally, experimental results demonstrate a promising performance of matrix kernel by Intel Haswell's processor. However, effective utilization of SIMD registers is a programming challenge to the developers.

APA, Harvard, Vancouver, ISO, and other styles

47

Savage, John E., and Mohammad Zubair. "Evaluating Multicore Algorithms on the Unified Memory Model." Scientific Programming 17, no. 4 (2009): 295–308. http://dx.doi.org/10.1155/2009/681708.

Full text

Abstract:

One of the challenges to achieving good performance on multicore architectures is the effective utilization of the underlying memory hierarchy. While this is an issue for single-core architectures, it is a critical problem for multicore chips. In this paper, we formulate the unified multicore model (UMM) to help understand the fundamental limits on cache performance on these architectures. The UMM seamlessly handles different types of multiple-core processors with varying degrees of cache sharing at different levels. We demonstrate that our model can be used to study a variety of multicore architectures on a variety of applications. In particular, we use it to analyze an option pricing problem using the trinomial model and develop an algorithm for it that has near-optimal memory traffic between cache levels. We have implemented the algorithm on a two Quad-Core Intel Xeon 5310 1.6 GHz processors (8 cores). It achieves a peak performance of 19.5 GFLOPs, which is 38% of the theoretical peak of the multicore system. We demonstrate that our algorithm outperforms compiler-optimized and auto-parallelized code by a factor of up to 7.5.

APA, Harvard, Vancouver, ISO, and other styles

48

Tomei, Matthew, Shomit Das, Mohammad Seyedzadeh, Philip Bedoukian, Bradford Beckmann, Rakesh Kumar, and David Wood. "Byte-Select Compression." ACM Transactions on Architecture and Code Optimization 18, no. 4 (December 31, 2021): 1–27. http://dx.doi.org/10.1145/3462209.

Full text

Abstract:

Cache-block compression is a highly effective technique for both reducing accesses to lower levels in the memory hierarchy (cache compression) and minimizing data transfers (link compression). While many effective cache-block compression algorithms have been proposed, the design of these algorithms is largely ad hoc and manual and relies on human recognition of patterns. In this article, we take an entirely different approach. We introduce a class of “byte-select” compression algorithms, as well as an automated methodology for generating compression algorithms in this class. We argue that, based on upper bounds within the class, the study of this class of byte-select algorithms has potential to yield algorithms with better performance than existing cache-block compression algorithms. The upper bound we establish on the compression ratio is 2X that of any existing algorithm. We then offer a generalized representation of a subset of byte-select compression algorithms and search through the resulting space guided by a set of training data traces. Using this automated process, we find efficient and effective algorithms for various hardware applications. We find that the resulting algorithms exploit novel patterns that can inform future algorithm designs. The generated byte-select algorithms are evaluated against a separate set of traces and evaluations show that Byte-Select has a 23% higher compression ratio on average. While no previous algorithm performs best for all our data sets which include CPU and GPU applications, our generated algorithms do. Using an automated hardware generator for these algorithms, we show that their decompression and compression latency is one and two cycles respectively, much lower than any existing algorithm with a competitive compression ratio.

APA, Harvard, Vancouver, ISO, and other styles

49

Al-Kharusi, Ibrahim, and David W. Walker. "Locality properties of 3D data orderings with application to parallel molecular dynamics simulations." International Journal of High Performance Computing Applications 33, no. 5 (May 19, 2019): 998–1018. http://dx.doi.org/10.1177/1094342019846282.

Full text

Abstract:

Application performance on graphical processing units (GPUs), in terms of execution speed and memory usage, depends on the efficient use of hierarchical memory. It is expected that enhancing data locality in molecular dynamic simulations will lower the cost of data movement across the GPU memory hierarchy. The work presented in this article analyses the spatial data locality and data reuse characteristics for row-major, Hilbert and Morton orderings and the impact these have on the performance of molecular dynamics simulations. A simple cache model is presented, and this is found to give results that are consistent with the timing results for the particle force computation obtained on NVidia GeForce GTX960 and Tesla P100 GPUs. Further analysis of the observed memory use, in terms of cache hits and the number of memory transactions, provides a more detailed explanation of execution behaviour for the different orderings. To the best of our knowledge, this is the first study to investigate memory analysis and data locality issues for molecular dynamics simulations of Lennard-Jones fluids on NVidia’s Maxwell and Tesla architectures.

APA, Harvard, Vancouver, ISO, and other styles

50

Robertson, George, Kim Cameron, Mary Czerwinski, and Daniel Robbins. "Animated Visualization of Multiple Intersecting Hierarchies." Information Visualization 1, no. 1 (March 2002): 50–65. http://dx.doi.org/10.1057/palgrave.ivs.9500002.

Full text

Abstract:

The authors describe a new information structure composed of multiple intersecting hierarchies, which we call a Polyarchy. Visualizing polyarchies enables use of novel views for discovery of relationships which are very difficult using existing hierarchy visualization tools. This paper will describe the visualization design and system architecture challenges as well as our current solutions. Visual Pivot is a novel web-based polyarchy visualization technique, supported by a ‘polyarchy server’ implemented with a Mid-Tier Cache architecture. A series of five user studies guided iterative design of Visual Pivot. Finally, the effectiveness of animation in Visual Pivot is discussed.

APA, Harvard, Vancouver, ISO, and other styles

Journal articles on the topic 'Cache hierarchy'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles