To see the other types of publications on this topic, follow the link: Cache memory – Design.

Journal articles on the topic 'Cache memory – Design'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 journal articles for your research on the topic 'Cache memory – Design.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse journal articles on a wide variety of disciplines and organise your bibliography correctly.

1

DRACH, N., A. GEFFLAUT, P. JOUBERT, and A. SEZNEC. "ABOUT CACHE ASSOCIATIVITY IN LOW-COST SHARED MEMORY MULTI-MICROPROCESSORS." Parallel Processing Letters 05, no. 03 (September 1995): 475–87. http://dx.doi.org/10.1142/s0129626495000436.

Full text
Abstract:
Sizes of on-chip caches on current commercial microprocessors range from 16 Kbytes to 36 Kbytes. These microprocessors can be directly used in the design of a low cost single-bus shared memory multiprocessors without using any second-level cache. In this paper, we explore the viability of such a multi-microprocessor. Simulations results clearly establish that performance of such a system will be quite poor if on-chip caches are direct-mapped. On the other hand, when the on-chip caches are partially associative, the achieved level of performance is quite promising. In particular, two recently proposed innovative cache structures, the skewed-associative cache organization and the semi-unified cache organization are shown to work fine.
APA, Harvard, Vancouver, ISO, and other styles
2

Jalil, Luma Fayeq, Maha Abdul kareem H. Al-Rawi, and Abeer Diaa Al-Nakshabandi. "Cache coherence protocol design using VMSI (Valid Modified Shared Invalid) states." Journal of University of Human Development 3, no. 1 (March 31, 2017): 274. http://dx.doi.org/10.21928/juhd.v3n1y2017.pp274-281.

Full text
Abstract:
We have proposed in this research the design of a new protocol named VMSI coherence protocol in the cache in order to solve the problem of coherence which is the incompatibility of data between caches that appeared in recent multiprocessors system through the operations of reading and writing. The main purpose of this protocol is to increase processor efficiency by reducing traffic between processor and memory that have been achieved through the removal of the write back to the main memory in the case of reading or writing of shared caches because it depends on existing directory inside that cache which contains all the data that represents a subset of main memory.
APA, Harvard, Vancouver, ISO, and other styles
3

Journal, Baghdad Science. "Cache Coherence Protocol Design and Simulation Using IES (Invalid Exclusive read/write Shared) State." Baghdad Science Journal 14, no. 1 (March 5, 2017): 219–30. http://dx.doi.org/10.21123/bsj.14.1.219-230.

Full text
Abstract:
To improve the efficiency of a processor in recent multiprocessor systems to deal with data, cache memories are used to access data instead of main memory which reduces the latency of delay time. In such systems, when installing different caches in different processors in shared memory architecture, the difficulties appear when there is a need to maintain consistency between the cache memories of different processors. So, cache coherency protocol is very important in such kinds of system. MSI, MESI, MOSI, MOESI, etc. are the famous protocols to solve cache coherency problem. We have proposed in this research integrating two states of MESI's cache coherence protocol which are Exclusive and Modified, which responds to a request from reading and writing at the same time and that are exclusive to these requests. Also back to the main memory from one of the other processor that has a modified state is removed in using a proposed protocol when it is invalidated as a result of writing to that location that has the same address because in all cases it depends on the latest value written and if back to memory is used to protect data from loss; preprocessing steps to IES protocol is used to maintain and saving data in main memory when it evict from the cache. All of this leads to increased processor efficiency by reducing access to main memory
APA, Harvard, Vancouver, ISO, and other styles
4

Wyland, David C. "Cache tag RAM chips simplify cache memory design." Microprocessors and Microsystems 14, no. 1 (January 1990): 47–57. http://dx.doi.org/10.1016/0141-9331(90)90013-l.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Tabak, Daniel. "Cache and Memory Hierarchy Design." ACM SIGARCH Computer Architecture News 23, no. 3 (June 1995): 28. http://dx.doi.org/10.1145/203618.564957.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

EL-MOURSY, ALI A., and FADI N. SIBAI. "V-SET CACHE: AN EFFICIENT ADAPTIVE SHARED CACHE FOR MULTI-CORE PROCESSORS." Journal of Circuits, Systems and Computers 23, no. 07 (June 2, 2014): 1450095. http://dx.doi.org/10.1142/s0218126614500959.

Full text
Abstract:
Development in VLSI design allows multi- to many-cores to be integrated on a single microprocessor chip. This increase in the core count per chip makes it more critical to design an efficient memory sub-system especially the shared last level cache (LLC). The efficient utilization of the LLC is a dominant factor to achieve the best microprocessor throughput. Conventional set-associative cache cannot cope with the new access pattern of the cache blocks in the multi-core processors. In this paper, the authors propose a new design for LLC in multi-core processor. The proposed v-set cache design allows an adaptive and dynamic utilization of the cache blocks. Unlike lately proposed design such as v-way caches, v-set cache design limits the serial access of cache blocks. In our paper, we thoroughly study the proposed design including area and power consumption as well as the performance and throughput. On eight-core microprocessor, the proposed v-set cache design can achieve a maximum speedup of 25% and 12% and an average speedup of 16% and 6% compared to conventional n-way and v-way cache designs, respectively. The area overhead of v-set does not exceed 7% compared to n-way cache.
APA, Harvard, Vancouver, ISO, and other styles
7

Mittal, Shaily, and Nitin. "Memory Map: A Multiprocessor Cache Simulator." Journal of Electrical and Computer Engineering 2012 (2012): 1–12. http://dx.doi.org/10.1155/2012/365091.

Full text
Abstract:
Nowadays, Multiprocessor System-on-Chip (MPSoC) architectures are mainly focused on by manufacturers to provide increased concurrency, instead of increased clock speed, for embedded systems. However, managing concurrency is a tough task. Hence, one major issue is to synchronize concurrent accesses to shared memory. An important characteristic of any system design process is memory configuration and data flow management. Although, it is very important to select a correct memory configuration, it might be equally imperative to choreograph the data flow between various levels of memory in an optimal manner. Memory map is a multiprocessor simulator to choreograph data flow in individual caches of multiple processors and shared memory systems. This simulator allows user to specify cache reconfigurations and number of processors within the application program and evaluates cache miss and hit rate for each configuration phase taking into account reconfiguration costs. The code is open source and in java.
APA, Harvard, Vancouver, ISO, and other styles
8

Verges, H. T., and D. Nikolos. "Efficient fault tolerant cache memory design." Microprocessing and Microprogramming 41, no. 2 (May 1995): 153–69. http://dx.doi.org/10.1016/0165-6074(95)00004-8.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Venkatesan, Rangharajan, Vivek J. Kozhikkottu, Mrigank Sharad, Charles Augustine, Arijit Raychowdhury, Kaushik Roy, and Anand Raghunathan. "Cache Design with Domain Wall Memory." IEEE Transactions on Computers 65, no. 4 (April 1, 2016): 1010–24. http://dx.doi.org/10.1109/tc.2015.2506581.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

Tzi-cker Chiueh and P. Pradham. "Cache memory design for Internet processors." IEEE Micro 20, no. 1 (2000): 28–33. http://dx.doi.org/10.1109/40.820050.

Full text
APA, Harvard, Vancouver, ISO, and other styles
11

Carter, John B., Wilson C. Hsieh, Leigh B. Stoller, Mark Swanson, Lixin Zhang, and Sally A. McKee. "Impulse: Memory System Support for Scientific Applications." Scientific Programming 7, no. 3-4 (1999): 195–209. http://dx.doi.org/10.1155/1999/209416.

Full text
Abstract:
Impulse is a new memory system architecture that adds two important features to a traditional memory controller. First, Impulse supports application‐specific optimizations through configurable physical address remapping. By remapping physical addresses, applications control how their data is accessed and cached, improving their cache and bus utilization. Second, Impulse supports prefetching at the memory controller, which can hide much of the latency of DRAM accesses. Because it requires no modification to processor, cache, or bus designs, Impulse can be adopted in conventional systems. In this paper we describe the design of the Impulse architecture, and show how an Impulse memory system can improve the performance of memory‐bound scientific applications. For instance, Impulse decreases the running time of the NAS conjugate gradient benchmark by 67%. We expect that Impulse will also benefit regularly strided, memory‐bound applications of commercial importance, such as database and multimedia programs.
APA, Harvard, Vancouver, ISO, and other styles
12

Dalui, Mamata, and Biplab K. Sikdar. "A Cache System Design for CMPs with Built-In Coherence Verification." VLSI Design 2016 (October 30, 2016): 1–16. http://dx.doi.org/10.1155/2016/8093614.

Full text
Abstract:
This work reports an effective design of cache system for Chip Multiprocessors (CMPs). It introduces built-in logic for verification of cache coherence in CMPs realizing directory based protocol. It is developed around the cellular automata (CA) machine, invented by John von Neumann in the 1950s. A special class of CA referred to as single length cycle 2-attractor cellular automata (TACA) has been planted to detect the inconsistencies in cache line states of processors’ private caches. The TACA module captures coherence status of the CMPs’ cache system and memorizes any inconsistent recording of the cache line states during the processors’ reference to a memory block. Theory has been developed to empower a TACA to analyse the cache state updates and then to settle to an attractor state indicating quick decision on a faulty recording of cache line status. The introduction of segmentation of the CMPs’ processor pool ensures a better efficiency, in determining the inconsistencies, by reducing the number of computation steps in the verification logic. The hardware requirement for the verification logic points to the fact that the overhead of proposed coherence verification module is much lesser than that of the conventional verification units and is insignificant with respect to the cost involved in CMPs’ cache system.
APA, Harvard, Vancouver, ISO, and other styles
13

Liu, Tian, Wei Zhang, Tao Xu, and Guan Wang. "Research and Analysis of Design and Optimization of Magnetic Memory Material Cache Based on STT-MRAM." Key Engineering Materials 815 (August 2019): 28–34. http://dx.doi.org/10.4028/www.scientific.net/kem.815.28.

Full text
Abstract:
This paper proposes a cache replacement algorithm based on STT-MRAM magnetic memory, which aims to make the material system based on STT-MRAM magnetic memory better used. The algorithm replaces the data blocks in the cache by considering the position of the STT-MRAM magnetic memory head and the hardware characteristics of the STT-MRAM magnetic memory. This method will be different from the traditional magnetic memory-based common cache replacement algorithm. Traditional replacement algorithms are generally designed with only the algorithm to improve the cache, and the hardware characteristics of the storage device are ignored. This method can improve the material characteristics of the STT-MRAM magnetic memory by improving the cache life and efficiency.
APA, Harvard, Vancouver, ISO, and other styles
14

ANAMIKA, UPADHYAY, SAHU VINAY, KUMAR ROY SUMIT, and SINGH DHARMENDRA. "DESIGN AND IMPLEMENTATION OF CACHE MEMORY WITH FIFO CACHE-CONTROL." i-manager's Journal on Communication Engineering and Systems 7, no. 1 (2018): 16. http://dx.doi.org/10.26634/jcs.7.1.13959.

Full text
APA, Harvard, Vancouver, ISO, and other styles
15

Wang, Shuai, Tao Jin, Chuanlei Zheng, and Guangshan Duan. "Low Power Aging-Aware On-Chip Memory Structure Design by Duty Cycle Balancing." Journal of Circuits, Systems and Computers 25, no. 09 (June 21, 2016): 1650115. http://dx.doi.org/10.1142/s0218126616501152.

Full text
Abstract:
The degradation of CMOS devices over the lifetime can cause severe threat to the system performance and reliability at deep submicron semiconductor technologies. The negative bias temperature instability (NBTI) is among the most important sources of the aging mechanisms. Applying the traditional guardbanding technique to address the decreased speed of devices is too costly. On-chip memory structures, such as register files and on-chip caches, suffer a very high NBTI stress. In this paper, we propose the aging-aware design to combat the NBTI-induced aging in integer register files, data caches and instruction caches in high-performance microprocessors. The proposed aging-aware design can mitigate the negative aging effects by balancing the duty cycle ratio of the internal bits in on-chip memory structures. Besides the aging problem, the power consumption is also one of the most prominent issues in microprocessor design. Therefore, we further propose to apply the low power schemes to different memory structures under aging-aware design. The proposed low power aging-aware design can also achieve a significant power reduction, which will further reduce the temperature and NBTI degradation of the on-chip memory structures. Our experimental results show that our aging-aware design can effectively reduce the NBTI stress with 30.8%, 64.5% and 72.0% power saving for the integer register file, data cache and instruction cache, respectively.
APA, Harvard, Vancouver, ISO, and other styles
16

MITTAL, SPARSH, and ZHAO ZHANG. "EnCache: A DYNAMIC PROFILING-BASED RECONFIGURATION TECHNIQUE FOR IMPROVING CACHE ENERGY EFFICIENCY." Journal of Circuits, Systems and Computers 23, no. 10 (October 14, 2014): 1450147. http://dx.doi.org/10.1142/s0218126614501473.

Full text
Abstract:
With each CMOS technology generation, leakage energy consumption has been dramatically increasing and hence, managing leakage power consumption of large last-level caches (LLCs) has become a critical issue in modern processor design. In this paper, we present EnCache, a novel software-based technique which uses dynamic profiling-based cache reconfiguration for saving cache leakage energy. EnCache uses a simple hardware component called profiling cache, which dynamically predicts energy efficiency of an application for 32 possible cache configurations. Using these estimates, system software reconfigures the cache to the most energy efficient configuration. EnCache uses dynamic cache reconfiguration and hence, it does not require offline profiling or tuning the parameter for each application. Furthermore, EnCache optimizes directly for the overall memory subsystem (LLC and main memory) energy efficiency instead of the LLC energy efficiency alone. The experiments performed with an ×86-64 simulator and workloads from SPEC2006 suite confirm that EnCache provides larger energy saving than a conventional energy saving scheme. For single core and dual-core system configurations, the average savings in memory subsystem energy over a shared baseline configuration are 30.0% and 27.3%, respectively.
APA, Harvard, Vancouver, ISO, and other styles
17

Chakraborty, Bidesh, Mamata Dalui, and Biplab K. Sikdar. "Cellular Automata Based Test Design for Coherence Verification in 3D Caches." Journal of Circuits, Systems and Computers 28, no. 09 (August 2019): 1950148. http://dx.doi.org/10.1142/s0218126619501482.

Full text
Abstract:
To provide high vertical interconnection density between device tiers, Through Silicon Via (TSV) offers a promising solution in 3D caches. It reduces the length of global interconnection and ensures high speed cache memory access. Maintaining coherency of shared data in such caches is, however, very crucial and, therefore, demands that the reliability and accuracy of TSVs as well as the cache coherence controller (CC) are to be ensured. In the current work, we propose an elegant test solution for at-speed detection of stuck-at-faults in TSVs (offline test) as well as verification for the functioning of CC (online test). The proposed test structure is designed around the modular and cascadable structure of Cellular Automata (CA) to achieve a cost-effective realization of test and coherence verification in 3D caches with high degree of scalability. It ensures correct decisions in more than 71% cases even if the test hardware is subjected to single stuck-at-fault.
APA, Harvard, Vancouver, ISO, and other styles
18

Wang, Baokang. "Design and Implementation of Cache Memory with Dual Unit Tile/Line Accessibility." Mathematical Problems in Engineering 2019 (April 1, 2019): 1–12. http://dx.doi.org/10.1155/2019/9601961.

Full text
Abstract:
In recent years, the increasing disparity between the data access speed of cache and processing speeds of processors has caused a major bottleneck in achieving high-performance 2-dimensional (2D) data processing, such as that in scientific computing and image processing. To solve this problem, this paper proposes new dual unit tile/line access cache memory based on a hierarchical hybrid Z-ordering data layout and multibank cache organization supporting skewed storage schemes. The proposed layout improves 2D data locality and reduces L1 cache misses and Translation Lookaside Buffer (TLB) misses efficiently and it is transformed from conventional raster layout by a simple hardware-based address translation unit. In addition, we proposed an aligned tile set replacement algorithm (ATSRA) for reduction of the hardware overhead in the tag memory of the proposed cache. Simulation results using Matrix Multiplication (MM) illustrated that the proposed cache with parallel unit tile/line accessibility can reduce both the L1 cache and TLB misses considerably as compared with conventional raster layout and Z-Morton order layout. The number of parallel load instructions for parallel unit tile/line access was reduced to only about one-fourth of the conventional load instruction. The execution time for parallel load instruction was reduced to about one-third of that required for conventional load instruction. By using 40 nm Complementary Metal-Oxide-Semiconductor (CMOS) technology, we combined the proposed cache with a SIMD-based data path and designed a 5 × 5 mm2 Large-Scale Integration (LSI) chip. The entire hardware overhead of the proposed ATSRA-cache was reduced to only 105% of that required for a conventional cache by using the ATSRA method.
APA, Harvard, Vancouver, ISO, and other styles
19

Xu, Thomas Can Hao, Pasi Liljeberg, and Hannu Tenhunen. "Exploring DRAM Last Level Cache for 3D Network-on-Chip Architecture." Advanced Materials Research 403-408 (November 2011): 4009–18. http://dx.doi.org/10.4028/www.scientific.net/amr.403-408.4009.

Full text
Abstract:
In this paper, we implement and analyze different Network-on-Chip (NoC) designs with Static Random Access Memory (SRAM) Last Level Cache (LLC) and Dynamic Random Access Memory (DRAM) LLC. Different 2D/3D NoCs with SRAM/DRAM are modeled based on state-of-the-art chips. The impact of integrating DRAM cache into a NoC platform is discussed. We explore the advantages and disadvantages of DRAM cache for NoC in terms of access latency, cache size, area and power consumption. We present benchmark results using a cycle accurate full system simulator based on realistic workloads. Experiments show that under different workloads, the average cache hit latencies in two DRAM based designs are increased by 12.53% (2D) and reduced by 27.97% (3D) respectively compared with the SRAM. It is also shown that the power consumption is a tradeoff consideration in improving the cache hit latency of DRAM LLC. Overall, the power consumption of 3D NoC design with DRAM LLC has reduced 25.78% compared with the SRAM design. Our analysis and experimental results provide a guideline to design efficient 3D NoCs with DRAM LLC.
APA, Harvard, Vancouver, ISO, and other styles
20

Afek, Yehuda, Dave Dice, and Adam Morrison. "Cache index-aware memory allocation." ACM SIGPLAN Notices 46, no. 11 (November 18, 2011): 55–64. http://dx.doi.org/10.1145/2076022.1993486.

Full text
APA, Harvard, Vancouver, ISO, and other styles
21

Joo, Yongsoo, Myeung-Heo Kim, In-Kyu Han, and Sung-Soo Lim. "Cache Simulator Design for Optimizing Write Operations of Nonvolatile Memory Based Caches." IEMEK Journal of Embedded Systems and Applications 11, no. 2 (April 30, 2016): 87–95. http://dx.doi.org/10.14372/iemek.2016.11.2.87.

Full text
APA, Harvard, Vancouver, ISO, and other styles
22

LEE, JE-HOON, and HYUN GUG CHO. "ASYNCHRONOUS INSTRUCTION CACHE MEMORY FOR AVERAGE-CASE PERFORMANCE." Journal of Circuits, Systems and Computers 23, no. 05 (May 8, 2014): 1450063. http://dx.doi.org/10.1142/s0218126614500637.

Full text
Abstract:
This paper presents an asynchronous instruction cache memory for average-case performance, rather than worst-case performance. Even though the proposed instruction cache design is based on a fixed delay model, it can achieve high throughput by employing a new memory segmentation technique that divides cache memory cell arrays into multiple memory segments. The conventional bit-line memory segmentation divides a whole memory system into multiple segments so that all memory segments have the same size. On the contrary, we propose a new bit-line segmentation technique for the cache memory which consists of multiple segments but all the memory segments have the same delay bound for themselves. We use the resister-capacitor (R-C) modeling of bit-line delay for content addressable memory–random access memory (CAM–RAM) structure in a cache in order to estimate the total bit-line delay. Then, we decide the number of segments to trade-off between the throughput and complexity of a cache system. We synthesized a 128 KB cache memory consisting of various segments from 1 to 16 using Hynix 0.35-μm CMOS process. From the simulation results, our implementation with dividing factor 4 and 16 can reduce the average cache access time to 28% and 35% when compared to the non-segmented counterpart system. It also shows that our implementation can reduce the average cache access time by 11% and 17% when compared to the bit-line segmented cache that consists of the same number of segments that have the same size.
APA, Harvard, Vancouver, ISO, and other styles
23

Cha, Sanghoon, Bokyeong Kim, Chang Hyun Park, and Jaehyuk Huh. "Morphable DRAM Cache Design for Hybrid Memory Systems." ACM Transactions on Architecture and Code Optimization 16, no. 3 (August 20, 2019): 1–24. http://dx.doi.org/10.1145/3338505.

Full text
APA, Harvard, Vancouver, ISO, and other styles
24

Wang, Guanda, Yue Zhang, Beibei Zhang, Bi Wu, Jiang Nan, Xueying Zhang, Zhizhong Zhang, et al. "Ultra-Dense Ring-Shaped Racetrack Memory Cache Design." IEEE Transactions on Circuits and Systems I: Regular Papers 66, no. 1 (January 2019): 215–25. http://dx.doi.org/10.1109/tcsi.2018.2866932.

Full text
APA, Harvard, Vancouver, ISO, and other styles
25

German, Steven M. "Formal Design of Cache Memory Protocols in IBM." Formal Methods in System Design 22, no. 2 (March 2003): 133–41. http://dx.doi.org/10.1023/a:1022921522163.

Full text
APA, Harvard, Vancouver, ISO, and other styles
26

Hać, Anna. "Design algorithms for asynchronous operations in cache memory." ACM SIGMETRICS Performance Evaluation Review 16, no. 2-4 (February 1989): 21. http://dx.doi.org/10.1145/1041911.1041914.

Full text
APA, Harvard, Vancouver, ISO, and other styles
27

Sirin, Utku, Pınar Tözün, Danica Porobic, Ahmad Yasin, and Anastasia Ailamaki. "Micro-architectural analysis of in-memory OLTP: Revisited." VLDB Journal 30, no. 4 (March 31, 2021): 641–65. http://dx.doi.org/10.1007/s00778-021-00663-8.

Full text
Abstract:
AbstractMicro-architectural behavior of traditional disk-based online transaction processing (OLTP) systems has been investigated extensively over the past couple of decades. Results show that traditional OLTP systems mostly under-utilize the available micro-architectural resources. In-memory OLTP systems, on the other hand, process all the data in main-memory and, therefore, can omit the buffer pool. Furthermore, they usually adopt more lightweight concurrency control mechanisms, cache-conscious data structures, and cleaner codebases since they are usually designed from scratch. Hence, we expect significant differences in micro-architectural behavior when running OLTP on platforms optimized for in-memory processing as opposed to disk-based database systems. In particular, we expect that in-memory systems exploit micro-architectural features such as instruction and data caches significantly better than disk-based systems. This paper sheds light on the micro-architectural behavior of in-memory database systems by analyzing and contrasting it to the behavior of disk-based systems when running OLTP workloads. The results show that, despite all the design changes, in-memory OLTP exhibits very similar micro-architectural behavior to disk-based OLTP: more than half of the execution time goes to memory stalls where instruction cache misses or the long-latency data misses from the last-level cache (LLC) are the dominant factors in the overall execution time. Even though ground-up designed in-memory systems can eliminate the instruction cache misses, the reduction in instruction stalls amplifies the impact of LLC data misses. As a result, only 30% of the CPU cycles are used to retire instructions, and 70% of the CPU cycles are wasted to stalls for both traditional disk-based and new generation in-memory OLTP.
APA, Harvard, Vancouver, ISO, and other styles
28

Tripathi, Tripti, Dr D. S. Chauhan, and Dr S. K. Singh. "Trade-off for Leakage Power Reduction in Deep Sub Micron SRAM Design." International Journal of Electrical and Electronics Research 4, no. 4 (December 30, 2016): 110–17. http://dx.doi.org/10.37391/ijeer.090401.

Full text
Abstract:
Present day electronic industry faces the major problem of standby leakage current, as the processor speed increases, there is requirement of high speed cache memory. SRAM being mainly used for cache memory design, several low power techniques are being used for SRAM cell design. Full CMOS 6T SRAM cell is the most preferred choice for digital circuits. This paper reviews various leakage power techniques used in 6T SRAM cell and their comparative study.
APA, Harvard, Vancouver, ISO, and other styles
29

Bu, Kai, Hai Jun Liu, Hui Xu, and Zhao Lin Sun. "Large Capacity Cache Design Based on Emerging Non-Volatile Memory." Applied Mechanics and Materials 513-517 (February 2014): 918–21. http://dx.doi.org/10.4028/www.scientific.net/amm.513-517.918.

Full text
Abstract:
A triple-level-cell (TLC) STT-RAM architecture was proposed basing on parallel MLC MTJ and serial MLC MTJ. A TLC STT-RAM cell can store three bit which will offer higher capacity density compared with SLC STT-RAM. The write process is also analyzed that it contains three types of basic states transitions. Through mapping soft, medium and hard domains to three individual cache lines, the access to soft lines can perform as accessing SLC STT-RAM-based cache. The amount of three-step operations is also much reduced. .
APA, Harvard, Vancouver, ISO, and other styles
30

Zhang, Tiefei, Jixiang Zhu, Jun Fu, and Tianzhou Chen. "CWC: A Companion Write Cache for Energy-Aware Multi-Level Spin-Transfer Torque RAM Cache Design." Journal of Circuits, Systems and Computers 24, no. 06 (May 26, 2015): 1550079. http://dx.doi.org/10.1142/s0218126615500796.

Full text
Abstract:
Due to its large leakage power and low density, the conventional SARM becomes less appealing to implement the large on-chip cache due to energy issue. Emerging non-volatile memory technologies, such as phase change memory (PCM) and spin-transfer torque RAM (STT-RAM), have advantages of low leakage power and high density, which makes them good candidates for on-chip cache. In particular, STT-RAM has longer endurance and shorter access latency over PCM. There are two kinds of STT-RAM so far: single-level cell (SLC) STT-RAM and multi-level cell (MLC) STT-RAM. Compared to the SLC STT-RAM, the MLC STT-RAM has higher density and lower leakage power, which makes it a even more promising candidate for future on-chip cache. However, MLC STT-RAM improves density at the cost of almost doubled write latency and energy compared to the SLC STT-RAM. These drawbacks degrade the system performance and diminish the energy benefits. To alleviate these problems, we propose a novel cache organization, companion write cache (CWC), which is a small fully associative SRAM cache, working with the main MLC STT-RAM cache in a master-and-servant way. The key function of CWC is to absorb the energy-consuming write updates from the MLC STT-RAM cache. The experimental results are promising that CWC can greatly reduce the write energy and dynamic energy, improve the performance and endurance of MLC STT-RAM cache compared to a baseline.
APA, Harvard, Vancouver, ISO, and other styles
31

P, Pratheeksha, and Revathi S. A. "Machine Learning-Based Cache Replacement Policies: A Survey." International Journal of Engineering and Advanced Technology 10, no. 6 (August 30, 2021): 19–22. http://dx.doi.org/10.35940/ijeat.f2907.0810621.

Full text
Abstract:
Despite extensive developments in improving cache hit rates, designing an optimal cache replacement policy that mimics Belady’s algorithm still remains a challenging task. Existing standard static replacement policies does not adapt to the dynamic nature of memory access patterns, and the diversity of computer programs only exacerbates the problem. Several factors affect the design of a replacement policy such as hardware upgrades, memory overheads, memory access patterns, model latency, etc. The amalgamation of a fundamental concept like cache replacement with advanced machine learning algorithms provides surprising results and drives the development towards cost-effective solutions. In this paper, we review some of the machine-learning based cache replacement policies that outperformed the static heuristics.
APA, Harvard, Vancouver, ISO, and other styles
32

Faeq, Mays K., and Safaa S. Omran. "Cache coherency controller for MESI protocol based on FPGA." International Journal of Electrical and Computer Engineering (IJECE) 11, no. 2 (April 1, 2021): 1043. http://dx.doi.org/10.11591/ijece.v11i2.pp1043-1052.

Full text
Abstract:
In modern techniques of building processors, manufactures using more than one processor in the integrated circuit (chip) and each processor called a core. The new chips of processors called a multi-core processor. This new design makes the processors to work simultanously for more than one job or all the cores working in parallel for the same job. All cores are similar in their design, and each core has its own cache memory, while all cores shares the same main memory. So if one core requestes a block of data from main memory to its cache, there should be a protocol to declare the situation of this block in the main memory and other cores.This is called the cache coherency or cache consistency of multi-core. In this paper a special circuit is designed using very high speed integrated circuit hardware description language (VHDL) coding and implemented using ISE Xilinx software. The protocol used in this design is the modified, exclusive, shared and invalid (MESI) protocol. Test results were taken by using test bench, and showed all the states of the protocol are working correctly.
APA, Harvard, Vancouver, ISO, and other styles
33

Chen, Gang, Kai Huang, Long Cheng, Biao Hu, and Alois Knoll. "Dynamic Partitioned Cache Memory for Real-Time MPSoCs with Mixed Criticality." Journal of Circuits, Systems and Computers 25, no. 06 (March 31, 2016): 1650062. http://dx.doi.org/10.1142/s0218126616500626.

Full text
Abstract:
Shared cache interference in multi-core architectures has been recognized as one of major factors that degrade predictability of a mixed-critical real-time system. Due to the unpredictable cache interference, the behavior of shared cache is hard to predict and analyze statically in multi-core architectures executing mixed-critical tasks, which will not only result in difficulty of estimating the worst-case execution time (WCET) but also introduce significant worst-case timing penalties for critical tasks. Therefore, cache management in mixed-critical multi-core systems has become a challenging task. In this paper, we present a dynamic partitioned cache memory for mixed-critical real-time multi-core systems. In this architecture, critical tasks can dynamically allocate and release the cache resourse during the execution interval according to the real-time workload. This dynamic partitioned cache can, on the one hand, provide the predicable cache performance for critical tasks. On the other hand, the released cache can be dynamically used by non-critical tasks to improve their average performance. We demonstrate and prototype our system design on the embedded FPGA platform. Measurements from the prototype clearly demonstrate the benefits of the dynamic partitioned cache for mixed-critical real-time multi-core systems.
APA, Harvard, Vancouver, ISO, and other styles
34

Grunwald, Dirk, Benjamin Zorn, and Robert Henderson. "Improving the cache locality of memory allocation." ACM SIGPLAN Notices 28, no. 6 (June 1993): 177–86. http://dx.doi.org/10.1145/173262.155107.

Full text
APA, Harvard, Vancouver, ISO, and other styles
35

Jo, Ok-Rae, and Jung-Hoon Lee. "Design of Cache Memory System for Next Generation CPU." IEMEK Journal of Embedded Systems and Applications 11, no. 6 (December 31, 2016): 353–59. http://dx.doi.org/10.14372/iemek.2016.11.6.353.

Full text
APA, Harvard, Vancouver, ISO, and other styles
36

Ma, Cong, William Tuohy, and David J. Lilja. "Impact of spintronic memory on multicore cache hierarchy design." IET Computers & Digital Techniques 11, no. 2 (January 25, 2017): 51–59. http://dx.doi.org/10.1049/iet-cdt.2015.0190.

Full text
APA, Harvard, Vancouver, ISO, and other styles
37

Luo, Xiao, and Paul Gillard. "A VLSI design for an efficient multiprocessor cache memory." Computers & Electrical Engineering 16, no. 1 (January 1990): 3–20. http://dx.doi.org/10.1016/0045-7906(90)90003-x.

Full text
APA, Harvard, Vancouver, ISO, and other styles
38

Li, Xiaochang, and Zhengjun Zhai. "UHNVM: A Universal Heterogeneous Cache Design with Non-Volatile Memory." Electronics 10, no. 15 (July 22, 2021): 1760. http://dx.doi.org/10.3390/electronics10151760.

Full text
Abstract:
During the recent decades, non-volatile memory (NVM) has been anticipated to scale up the main memory size, improve the performance of applications, and reduce the speed gap between main memory and storage devices, while supporting persistent storage to cope with power outages. However, to fit NVM, all existing DRAM-based applications have to be rewritten by developers. Therefore, the developer must have a good understanding of targeted application codes, so as to manually distinguish and store data fit for NVM. In order to intelligently facilitate NVM deployment for existing legacy applications, we propose a universal heterogeneous cache hierarchy which is able to automatically select and store the appropriate data of applications for non-volatile memory (UHNVM), without compulsory code understanding. In this article, a program context (PC) technique is proposed in the user space to help UHNVM to classify data. Comparing to the conventional hot or cold files categories, the PC technique can categorize application data in a fine-grained manner, enabling us to store them either in NVM or SSDs efficiently for better performance. Our experimental results using a real Optane dual-inline-memory-module (DIMM) card show that our new heterogeneous architecture reduces elapsed times by about 11% compared to the conventional kernel memory configuration without NVM.
APA, Harvard, Vancouver, ISO, and other styles
39

Walden, Candace, Devesh Singh, Meenatchi Jagasivamani, Shang Li, Luyi Kang, Mehdi Asnaashari, Sylvain Dubois, Bruce Jacob, and Donald Yeung. "Monolithically Integrating Non-Volatile Main Memory over the Last-Level Cache." ACM Transactions on Architecture and Code Optimization 18, no. 4 (December 31, 2021): 1–26. http://dx.doi.org/10.1145/3462632.

Full text
Abstract:
Many emerging non-volatile memories are compatible with CMOS logic, potentially enabling their integration into a CPU’s die. This article investigates such monolithically integrated CPU–main memory chips. We exploit non-volatile memories employing 3D crosspoint subarrays, such as resistive RAM (ReRAM), and integrate them over the CPU’s last-level cache (LLC). The regular structure of cache arrays enables co-design of the LLC and ReRAM main memory for area efficiency. We also develop a streamlined LLC/main memory interface that employs a single shared internal interconnect for both the cache and main memory arrays, and uses a unified controller to service both LLC and main memory requests. We apply our monolithic design ideas to a many-core CPU by integrating 3D ReRAM over each core’s LLC slice. We find that co-design of the LLC and ReRAM saves 27% of the total LLC–main memory area at the expense of slight increases in delay and energy. The streamlined LLC/main memory interface saves an additional 12% in area. Our simulation results show monolithic integration of CPU and main memory improves performance by 5.3× and 1.7× over HBM2 DRAM for several graph and streaming kernels, respectively. It also reduces the memory system’s energy by 6.0× and 1.7×, respectively. Moreover, we show that the area savings of co-design permits the CPU to have 23% more cores and main memory, and that streamlining the LLC/main memory interface incurs a small 4% performance penalty.
APA, Harvard, Vancouver, ISO, and other styles
40

Pan, Cheng, Xiaolin Wang, Yingwei Luo, and Zhenlin Wang. "Penalty- and Locality-aware Memory Allocation in Redis Using Enhanced AET." ACM Transactions on Storage 17, no. 2 (May 28, 2021): 1–45. http://dx.doi.org/10.1145/3447573.

Full text
Abstract:
Due to large data volume and low latency requirements of modern web services, the use of an in-memory key-value (KV) cache often becomes an inevitable choice (e.g., Redis and Memcached). The in-memory cache holds hot data, reduces request latency, and alleviates the load on background databases. Inheriting from the traditional hardware cache design, many existing KV cache systems still use recency-based cache replacement algorithms, e.g., least recently used or its approximations. However, the diversity of miss penalty distinguishes a KV cache from a hardware cache. Inadequate consideration of penalty can substantially compromise space utilization and request service time. KV accesses also demonstrate locality, which needs to be coordinated with miss penalty to guide cache management. In this article, we first discuss how to enhance the existing cache model, the Average Eviction Time model, so that it can adapt to modeling a KV cache. After that, we apply the model to Redis and propose pRedis, Penalty- and Locality-aware Memory Allocation in Redis, which synthesizes data locality and miss penalty, in a quantitative manner, to guide memory allocation and replacement in Redis. At the same time, we also explore the diurnal behavior of a KV store and exploit long-term reuse. We replace the original passive eviction mechanism with an automatic dump/load mechanism, to smooth the transition between access peaks and valleys. Our evaluation shows that pRedis effectively reduces the average and tail access latency with minimal time and space overhead. For both real-world and synthetic workloads, our approach delivers an average of 14.0%∼52.3% latency reduction over a state-of-the-art penalty-aware cache management scheme, Hyperbolic Caching (HC), and shows more quantitative predictability of performance. Moreover, we can obtain even lower average latency (1.1%∼5.5%) when dynamically switching policies between pRedis and HC.
APA, Harvard, Vancouver, ISO, and other styles
41

Mohammad, Khader, Ahsan Kabeer, and Tarek Taha. "On-Chip Power Minimization Using Serialization-Widening with Frequent Value Encoding." VLSI Design 2014 (May 6, 2014): 1–14. http://dx.doi.org/10.1155/2014/801241.

Full text
Abstract:
In chip-multiprocessors (CMP) architecture, the L2 cache is shared by the L1 cache of each processor core, resulting in a high volume of diverse data transfer through the L1-L2 cache bus. High-performance CMP and SoC systems have a significant amount of data transfer between the on-chip L2 cache and the L3 cache of off-chip memory through the power expensive off-chip memory bus. This paper addresses the problem of the high-power consumption of the on-chip data buses, exploring a framework for memory data bus power consumption minimization approach. A comprehensive analysis of the existing bus power minimization approaches is provided based on the performance, power, and area overhead consideration. A novel approaches for reducing the power consumption for the on-chip bus is introduced. In particular, a serialization-widening (SW) of data bus with frequent value encoding (FVE), called the SWE approach, is proposed as the best power savings approach for the on-chip cache data bus. The experimental results show that the SWE approach with FVE can achieve approximately 54% power savings over the conventional bus for multicore applications using a 64-bit wide data bus in 45 nm technology.
APA, Harvard, Vancouver, ISO, and other styles
42

Eswer, Varuna, and Sanket S. Naik Dessai. "Processor performance metrics analysis and implementation for MIPS using an open source OS." International Journal of Reconfigurable and Embedded Systems (IJRES) 10, no. 2 (July 1, 2021): 137. http://dx.doi.org/10.11591/ijres.v10.i2.pp137-148.

Full text
Abstract:
<p><span>Processor efficiency is a important in embedded system. The efficiency of the processor depends on the L1 cache and translation lookaside buffer (TLB). It is required to understand the L1 cache and TLB performances during varied load for the execution on the processor and hence studies the performance of the varing load and its performance with caches with MIPS and operating system (OS) are studied in this paper. The proposed methods of implementation in the paper considers the counting of the instruction exxecution for respective cache and TLB management and the events are measured using a dedicated counters in software. The software counters are used as there are limitation to hardware counters in the MIPS32. Twenty-seven metrics are considered for analysis and proper identification and implemented for the performance measurement of L1 cache and TLB on the MIPS32 processor. The generated data helps in future research in compiler tuning, memory management design for OS, analysing architectural issues, system benchmarking, scalability, address space analysis, studies of bus communication among processor and its workload sharing characterisation and kernel profiling.</span></p>
APA, Harvard, Vancouver, ISO, and other styles
43

Prete, C. A. "RST cache memory design for a highly coupled multiprocessor system." IEEE Micro 11, no. 2 (April 1991): 16–19. http://dx.doi.org/10.1109/40.76618.

Full text
APA, Harvard, Vancouver, ISO, and other styles
44

Chen, Mei-Chin, Ashish Ranjan, Anand Raghunathan, and Kaushik Roy. "Cache Memory Design With Magnetic Skyrmions in a Long Nanotrack." IEEE Transactions on Magnetics 55, no. 8 (August 2019): 1–9. http://dx.doi.org/10.1109/tmag.2019.2909188.

Full text
APA, Harvard, Vancouver, ISO, and other styles
45

Matsumoto, Akira, Takayuki Nakagawa, Masatoshi Sato, Yasunori Kimura, Kenji Nishida, and Atsuhiro Goto. "Locally parallel cache design based on KL1 memory access characteristics." New Generation Computing 9, no. 2 (June 1991): 149–69. http://dx.doi.org/10.1007/bf03037641.

Full text
APA, Harvard, Vancouver, ISO, and other styles
46

Hsieh, Tong-Yu, Chih-Hao Wang, Tsung-Liang Chih, and Ya-Hsiu Chi. "A Performance Degradation Tolerable Cache Design by Exploiting Memory Hierarchies." IEEE Transactions on Very Large Scale Integration (VLSI) Systems 24, no. 2 (February 2016): 784–88. http://dx.doi.org/10.1109/tvlsi.2015.2410218.

Full text
APA, Harvard, Vancouver, ISO, and other styles
47

Khatwal, Ravi, and Manoj Kumar Jain. "An Integrated Architectural Clock Implemented Memory Design for Embedded System." International Journal of Reconfigurable and Embedded Systems (IJRES) 4, no. 2 (July 1, 2015): 129. http://dx.doi.org/10.11591/ijres.v4.i2.pp129-141.

Full text
Abstract:
Recently Low power custom memory design is the major issue for embedded designer. Micro wind and Xilinx simulator performs efficient cache simulation and high performances with low power consumption. SRAM efficiency analyzed with 6-T architecture design and analyzed the simulation performance for specific application. We have implemented clock based memory architecture design and analyzed internal clock efficiency for SRAM. Architectural clock implemented memory design that reduces access time and propagation delay time for embedded devices. Internal semiconductor material improvement increases simulation performance and these design implemented for application specific design architecture.
APA, Harvard, Vancouver, ISO, and other styles
48

KIM, YOONJIN. "POWER-EFFICIENT CONFIGURATION CACHE STRUCTURE FOR COARSE-GRAINED RECONFIGURABLE ARCHITECTURE." Journal of Circuits, Systems and Computers 22, no. 03 (March 2013): 1350001. http://dx.doi.org/10.1142/s0218126613500011.

Full text
Abstract:
Coarse-grained reconfigurable architectures (CGRA) require many processing elements (PEs) and a configuration memory unit (configuration cache) for reconfiguration of its PE array. Although this structure is meant for high performance and flexibility, it consumes significant power. Specially, power consumption by configuration cache is explicit overhead compared to other types of IP cores. Reducing power in configuration cache is very crucial for CGRA to be more competitive and reliable processing core in embedded systems. In this paper, I propose a power-efficient configuration cache structure based on two design schemes — one is a reusable context pipelining (RCP) architecture to reduce power-overhead caused by reconfiguration and another is a dynamic context management strategy for power saving in configuration cache. This power-efficient approach works without degrading the performance and flexibility of CGRA. Experimental results show that the proposed approach saves 56.50%/86.84% of the average power in write/read-operation of configuration cache compared to the previous design.
APA, Harvard, Vancouver, ISO, and other styles
49

Wei, Xingda, Rong Chen, Haibo Chen, and Binyu Zang. "XStore : Fast RDMA-Based Ordered Key-Value Store Using Remote Learned Cache." ACM Transactions on Storage 17, no. 3 (August 31, 2021): 1–32. http://dx.doi.org/10.1145/3468520.

Full text
Abstract:
RDMA ( Remote Direct Memory Access ) has gained considerable interests in network-attached in-memory key-value stores. However, traversing the remote tree-based index in ordered key-value stores with RDMA becomes a critical obstacle, causing an order-of-magnitude slowdown and limited scalability due to multiple round trips. Using index cache with conventional wisdom—caching partial data and traversing them locally—usually leads to limited effect because of unavoidable capacity misses, massive random accesses, and costly cache invalidations. We argue that the machine learning (ML) model is a perfect cache structure for the tree-based index, termed learned cache . Based on it, we design and implement XStore , an RDMA-based ordered key-value store with a new hybrid architecture that retains a tree-based index at the server to perform dynamic workloads (e.g., inserts) and leverages a learned cache at the client to perform static workloads (e.g., gets and scans). The key idea is to decouple ML model retraining from index updating by maintaining a layer of indirection from logical to actual positions of key-value pairs. It allows a stale learned cache to continue predicting a correct position for a lookup key. XStore ensures correctness using a validation mechanism with a fallback path and further uses speculative execution to minimize the cost of cache misses. Evaluations with YCSB benchmarks and production workloads show that a single XStore server can achieve over 80 million read-only requests per second. This number outperforms state-of-the-art RDMA-based ordered key-value stores (namely, DrTM-Tree, Cell, and eRPC+Masstree) by up to 5.9× (from 3.7×). For workloads with inserts, XStore still provides up to 3.5× (from 2.7×) throughput speedup, achieving 53M reqs/s. The learned cache can also reduce client-side memory usage and further provides an efficient memory-performance tradeoff, e.g., saving 99% memory at the cost of 20% peak throughput.
APA, Harvard, Vancouver, ISO, and other styles
50

Harty, Kieran, and David R. Cheriton. "Application-controlled physical memory using external page-cache management." ACM SIGPLAN Notices 27, no. 9 (September 1992): 187–97. http://dx.doi.org/10.1145/143371.143511.

Full text
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography