Academic literature on the topic 'Computer architecture. Cache memory'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the lists of relevant articles, books, theses, conference reports, and other scholarly sources on the topic 'Computer architecture. Cache memory.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Journal articles on the topic "Computer architecture. Cache memory"

1

DRACH, N., A. GEFFLAUT, P. JOUBERT, and A. SEZNEC. "ABOUT CACHE ASSOCIATIVITY IN LOW-COST SHARED MEMORY MULTI-MICROPROCESSORS." Parallel Processing Letters 05, no. 03 (September 1995): 475–87. http://dx.doi.org/10.1142/s0129626495000436.

Full text
Abstract:
Sizes of on-chip caches on current commercial microprocessors range from 16 Kbytes to 36 Kbytes. These microprocessors can be directly used in the design of a low cost single-bus shared memory multiprocessors without using any second-level cache. In this paper, we explore the viability of such a multi-microprocessor. Simulations results clearly establish that performance of such a system will be quite poor if on-chip caches are direct-mapped. On the other hand, when the on-chip caches are partially associative, the achieved level of performance is quite promising. In particular, two recently proposed innovative cache structures, the skewed-associative cache organization and the semi-unified cache organization are shown to work fine.
APA, Harvard, Vancouver, ISO, and other styles
2

ALVES, MARCO A. Z., HENRIQUE C. FREITAS, and PHILIPPE O. A. NAVAUX. "HIGH LATENCY AND CONTENTION ON SHARED L2-CACHE FOR MANY-CORE ARCHITECTURES." Parallel Processing Letters 21, no. 01 (March 2011): 85–106. http://dx.doi.org/10.1142/s0129626411000096.

Full text
Abstract:
Several studies point out the benefits of a shared L2 cache, but some other properties of shared caches must be considered to lead to a thorough understanding of all chip multiprocessor (CMP) bottlenecks. Our paper evaluates and explains shared cache bottlenecks, which are very important considering the rise of many-core processors. The results of our simulations with 32 cores show low performance when L2 cache memory is shared between 2 or 4 cores. In these two cases, the increase of L2 cache latency and contention are the main causes responsible for the increase of execution time.
APA, Harvard, Vancouver, ISO, and other styles
3

Struharik, Rastislav, and Vuk Vranjković. "Striping input feature map cache for reducing off-chip memory traffic in CNN accelerators." Telfor Journal 12, no. 2 (2020): 116–21. http://dx.doi.org/10.5937/telfor2002116s.

Full text
Abstract:
Data movement between the Convolutional Neural Network (CNN) accelerators and off-chip memory is critical concerning the overall power consumption. Minimizing power consumption is particularly important for low power embedded applications. Specific CNN computes patterns offer a possibility of significant data reuse, leading to the idea of using specialized on-chip cache memories which enable a significant improvement in power consumption. However, due to the unique caching pattern present within CNNs, standard cache memories would not be efficient. In this paper, a novel on-chip cache memory architecture, based on the idea of input feature map striping, is proposed, which requires significantly less on-chip memory resources compared to previously proposed solutions. Experiment results show that the proposed cache architecture can reduce on-chip memory size by a factor of 16 or more, while increasing power consumption no more than 15%, compared to some of the previously proposed solutions.
APA, Harvard, Vancouver, ISO, and other styles
4

Charrier, Dominic E., Benjamin Hazelwood, Ekaterina Tutlyaeva, Michael Bader, Michael Dumbser, Andrey Kudryavtsev, Alexander Moskovsky, and Tobias Weinzierl. "Studies on the energy and deep memory behaviour of a cache-oblivious, task-based hyperbolic PDE solver." International Journal of High Performance Computing Applications 33, no. 5 (April 15, 2019): 973–86. http://dx.doi.org/10.1177/1094342019842645.

Full text
Abstract:
We study the performance behaviour of a seismic simulation using the ExaHyPE engine with a specific focus on memory characteristics and energy needs. ExaHyPE combines dynamically adaptive mesh refinement (AMR) with ADER-DG. It is parallelized using tasks, and it is cache efficient. AMR plus ADER-DG yields a task graph which is highly dynamic in nature and comprises both arithmetically expensive tasks and tasks which challenge the memory’s latency. The expensive tasks and thus the whole code benefit from AVX vectorization, although we suffer from memory access bursts. A frequency reduction of the chip improves the code’s energy-to-solution. Yet, it does not mitigate burst effects. The bursts’ latency penalty becomes worse once we add Intel Optane technology, increase the core count significantly or make individual, computationally heavy tasks fall out of close caches. Thread overbooking to hide away these latency penalties becomes contra-productive with noninclusive caches as it destroys the cache and vectorization character. In cases where memory-intense and computationally expensive tasks overlap, ExaHyPE’s cache-oblivious implementation nevertheless can exploit deep, noninclusive, heterogeneous memory effectively, as main memory misses arise infrequently and slow down only few cores. We thus propose that upcoming supercomputing simulation codes with dynamic, inhomogeneous task graphs are actively supported by thread runtimes in intermixing tasks of different compute character, and we propose that future hardware actively allows codes to downclock the cores running particular task types.
APA, Harvard, Vancouver, ISO, and other styles
5

Kaplow, Wesley K., and Boleslaw K. Szymanski. "Compile-Time Cache Performance Prediction and Its Application to Tiling." Parallel Processing Letters 07, no. 04 (December 1997): 393–407. http://dx.doi.org/10.1142/s0129626497000395.

Full text
Abstract:
Tiling has been used by parallelizing compilers to define fine-grain parallel tasks and to optimize cache performance. In this paper we present a novel compile-time technique, called miss-driven cache simulation, for determining loop tile sizes that achieve the highest cache hit-rate. The widening disparity between a processor's peak instruction rate and main memory access time in modern computer systems makes this kind of optimization increasingly important for overall program efficiency. Our simulation technique generates only those references of a loop nest that may generate a cache memory miss and processes them on an architecturally accurate cache model at compile-time. Processing only a small portion of the memory reference trace of a program yields simulation speeds in the millions of memory references per second on workstations, with statistics of misses per reference and inter-reference interference counts gathered at the same time. These simulation speeds and statistics allow for the accurate analysis of the impact of cache optimizations at compile-time. We discuss the results of applying this method to guide loop tiling for such commonly used computational kernels as matrix multiplication and Jacobi iteration for various cache parameters.
APA, Harvard, Vancouver, ISO, and other styles
6

Wyland, David C. "Cache tag RAM chips simplify cache memory design." Microprocessors and Microsystems 14, no. 1 (January 1990): 47–57. http://dx.doi.org/10.1016/0141-9331(90)90013-l.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Gan, Xin Biao, Li Shen, Quan Yuan Tan, Cong Liu, and Zhi Ying Wang. "Performance Evaluation and Optimization on GPU." Advanced Materials Research 219-220 (March 2011): 1445–49. http://dx.doi.org/10.4028/www.scientific.net/amr.219-220.1445.

Full text
Abstract:
GPU provides higher peak performance with hundreds of cores than CPU counterpart. However, it is a big challenge to take full advantage of their computing power. In order to understand performance bottlenecks of applications on many-core GPU and then optimize parallel programs on GPU architectures, we propose a performance evaluating model based on memory wall and then classify applications into AbM (Application bound-in Memory) and AbC (Application bound-in Computing). Furthermore, we optimize kernels characterized with low memory bandwidth including matrix multiplication and FFT (Fast Fourier Transform) by employing texture cache on NVIDIA GTX280 using CUDA (Compute Unified Device Architecture). Experimental results show that texture cache is helpful for AbM with better data locality, so it is critical to utilize GPU memory hierarchy efficiently for performance improvement.
APA, Harvard, Vancouver, ISO, and other styles
8

Dalui, Mamata, and Biplab K. Sikdar. "A Cache System Design for CMPs with Built-In Coherence Verification." VLSI Design 2016 (October 30, 2016): 1–16. http://dx.doi.org/10.1155/2016/8093614.

Full text
Abstract:
This work reports an effective design of cache system for Chip Multiprocessors (CMPs). It introduces built-in logic for verification of cache coherence in CMPs realizing directory based protocol. It is developed around the cellular automata (CA) machine, invented by John von Neumann in the 1950s. A special class of CA referred to as single length cycle 2-attractor cellular automata (TACA) has been planted to detect the inconsistencies in cache line states of processors’ private caches. The TACA module captures coherence status of the CMPs’ cache system and memorizes any inconsistent recording of the cache line states during the processors’ reference to a memory block. Theory has been developed to empower a TACA to analyse the cache state updates and then to settle to an attractor state indicating quick decision on a faulty recording of cache line status. The introduction of segmentation of the CMPs’ processor pool ensures a better efficiency, in determining the inconsistencies, by reducing the number of computation steps in the verification logic. The hardware requirement for the verification logic points to the fact that the overhead of proposed coherence verification module is much lesser than that of the conventional verification units and is insignificant with respect to the cost involved in CMPs’ cache system.
APA, Harvard, Vancouver, ISO, and other styles
9

Mohammad, Khader, Ahsan Kabeer, and Tarek Taha. "On-Chip Power Minimization Using Serialization-Widening with Frequent Value Encoding." VLSI Design 2014 (May 6, 2014): 1–14. http://dx.doi.org/10.1155/2014/801241.

Full text
Abstract:
In chip-multiprocessors (CMP) architecture, the L2 cache is shared by the L1 cache of each processor core, resulting in a high volume of diverse data transfer through the L1-L2 cache bus. High-performance CMP and SoC systems have a significant amount of data transfer between the on-chip L2 cache and the L3 cache of off-chip memory through the power expensive off-chip memory bus. This paper addresses the problem of the high-power consumption of the on-chip data buses, exploring a framework for memory data bus power consumption minimization approach. A comprehensive analysis of the existing bus power minimization approaches is provided based on the performance, power, and area overhead consideration. A novel approaches for reducing the power consumption for the on-chip bus is introduced. In particular, a serialization-widening (SW) of data bus with frequent value encoding (FVE), called the SWE approach, is proposed as the best power savings approach for the on-chip cache data bus. The experimental results show that the SWE approach with FVE can achieve approximately 54% power savings over the conventional bus for multicore applications using a 64-bit wide data bus in 45 nm technology.
APA, Harvard, Vancouver, ISO, and other styles
10

CHONG, FREDERIC T., and ANANT AGARWAL. "SHARED MEMORY VERSUS MESSAGE PASSING FOR ITERATIVE SOLUTION OF SPARSE, IRREGULAR PROBLEMS." Parallel Processing Letters 09, no. 01 (March 1999): 159–70. http://dx.doi.org/10.1142/s0129626499000177.

Full text
Abstract:
The benefits of hardware support for shared memory versus those for message passing are difficult to evaluate without an in-depth study of real applications on a common platform. We evaluate the communication mechanisms of the MIT Alewife machine, a multiprocessor which provides integrated cache-coherent shared memory, massage passing, and DMA. We perform this evaluation with "best-effort" implementations which solve several sparse, irregular benchmark problems with a preconditioned conjugate gradient sparse matrix solver (ICCG). We find that machines with fast global memory operations do not need message passing or bulk transfer to suport our irregular problems. This is primarily due to three reasons. First, a 5-to-1 ratio between global and local cache misses makes memory copies in bulk communication expensive relati to communication via shared memory. Second, although message passing has synchronization semantics superior to shared memory for data-driven computation, efficient shared memory can overcome this handicap by using global read-modify-writes to change from the traditional owner-computers model to a producer-computes model. Third, bulk transfers can result in high processor idle times in irregular applications.
APA, Harvard, Vancouver, ISO, and other styles
More sources

Dissertations / Theses on the topic "Computer architecture. Cache memory"

1

Gieske, Edmund Joseph. "Critical Words Cache Memory." University of Cincinnati / OhioLINK, 2008. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1208368190.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Sorenson, Elizabeth S. "Cache characterization and performance studies using locality surfaces /." Diss., CLICK HERE for online access, 2005. http://contentdm.lib.byu.edu/ETD/image/etd950.pdf.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Kim, Donglok. "Extended data cache prefetching using a reference prediction table /." Thesis, Connect to this title online; UW restricted, 1997. http://hdl.handle.net/1773/6127.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Bani, Ruchi Rastogi Mohanty Saraju. "A new N-way reconfigurable data cache architecture for embedded systems." [Denton, Tex.] : University of North Texas, 2009. http://digital.library.unt.edu/ark:/67531/metadc12079.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Bani, Ruchi Rastogi. "A New N-way Reconfigurable Data Cache Architecture for Embedded Systems." Thesis, University of North Texas, 2009. https://digital.library.unt.edu/ark:/67531/metadc12079/.

Full text
Abstract:
Performance and power consumption are most important issues while designing embedded systems. Several studies have shown that cache memory consumes about 50% of the total power in these systems. Thus, the architecture of the cache governs both performance and power usage of embedded systems. A new N-way reconfigurable data cache is proposed especially for embedded systems. This thesis explores the issues and design considerations involved in designing a reconfigurable cache. The proposed reconfigurable data cache architecture can be configured as direct-mapped, two-way, or four-way set associative using a mode selector. The module has been designed and simulated in Xilinx ISE 9.1i and ModelSim SE 6.3e using the Verilog hardware description language.
APA, Harvard, Vancouver, ISO, and other styles
6

Beg, Azam Muhammad. "Improving instruction fetch rate with code pattern cache for superscalar architecture." Diss., Mississippi State : Mississippi State University, 2005. http://library.msstate.edu/etd/show.asp?etd=etd-06202005-103032.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

SOHONI, SOHUM. "IMPROVING L2 CACHE PERFORMANCE THROUGH STREAM-DIRECTED OPTIMIZATIONS." University of Cincinnati / OhioLINK, 2004. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1092932892.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Janapsatya, Andhi Computer Science &amp Engineering Faculty of Engineering UNSW. "Optimization of instruction memory for embedded systems." Awarded by:University of New South Wales. School of Computer Science and Engineering, 2005. http://handle.unsw.edu.au/1959.4/24210.

Full text
Abstract:
This thesis presents methodologies for improving system performance and energy consumption by optimizing the memory hierarchy performance. The processor-memory performance gap is a well-known problem that is predicted to get worse, as the performance gap between processor and memory is widening. The author describes a method to estimate the best L1 cache configuration for a given application. In addition, three methods are presented to improve the performance and reduce energy in embedded systems by optimizing the instruction memory. Performance estimation is an important procedure to assess the performance of the system and to assess the effectiveness of any applied optimizations. A cache memory performance estimation methodology is presented in this thesis. The methodology is designed to quickly and accurately estimate the performance of multiple cache memory configurations. Experimental results showed that the methodology is on average 45 times faster compared to a widely used tool (Dinero IV). The first optimization method is a software-only method, called code placement, was implemented to improve the performance of instruction cache memory. The method involves careful placement of code within memory to ensure high cache hit rate when code is brought into the cache memory. Code placement methodology aims to improve cache hit rates to improve cache memory performance. Experimental results show that by applying the code placement method, a reduction in cache miss rate by up to 71%, and energy consumption reduction of up to 63% are observed when compared to application without code placement. The second method involves a novel architecture for utilizing scratchpad memory. The scratchpad memory is designed as a replacement of the instruction cache memory. Hardware modification was designed to allow data to be written into the scratchpad memory during program execution, allowing dynamic control of the scratchpad memory content. Scratchpad memory has a faster memory access time and a lower energy consumption per access compared to cache memory; the usage of scratchpad memory aims to improve performance and lower energy consumption of systems compared to system with cache memory. Experimental results show an average energy reduction of 26.59% and an average performance improvement of 25.63% when compared to a system with cache memory. The third is an application profiling method using statistical information to identify application???s hot-spots. Application profiling is important for identifying section in the application where performance degradation might occur and/or where maximum performance gain can be obtained through optimization. The method was applied and tested on the scratchpad based system described in this thesis. Experimental results show the effectiveness of the analysis method in reducing energy and improving performance when compared to previous method for utilizing the scratchpad memory based system (average performance improvement of 23.6% and average energy reduction of 27.1% are observed).
APA, Harvard, Vancouver, ISO, and other styles
9

Zhang, Xiushan. "L2 cache replacement based on inter-access time per access count prediction." Diss., Online access via UMI:, 2009.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
10

Elver, Marco Iskender. "Memory consistency directed cache coherence protocols for scalable multiprocessors." Thesis, University of Edinburgh, 2016. http://hdl.handle.net/1842/22073.

Full text
Abstract:
The memory consistency model, which formally specifies the behavior of the memory system, is used by programmers to reason about parallel programs. From a hardware design perspective, weaker consistency models permit various optimizations in a multiprocessor system: this thesis focuses on designing and optimizing the cache coherence protocol for a given target memory consistency model. Traditional directory coherence protocols are designed to be compatible with the strictest memory consistency model, sequential consistency (SC). When they are used for chip multiprocessors (CMPs) that provide more relaxed memory consistency models, such protocols turn out to be unnecessarily strict. Usually, this comes at the cost of scalability, in terms of per-core storage due to sharer tracking, which poses a problem with increasing number of cores in today’s CMPs, most of which no longer are sequentially consistent. The recent convergence towards programming language based relaxed memory consistency models has sparked renewed interest in lazy cache coherence protocols. These protocols exploit synchronization information by enforcing coherence only at synchronization boundaries via self-invalidation. As a result, such protocols do not require sharer tracking which benefits scalability. On the downside, such protocols are only readily applicable to a restricted set of consistency models, such as Release Consistency (RC), which expose synchronization information explicitly. In particular, existing architectures with stricter consistency models (such as x86) cannot readily make use of lazy coherence protocols without either: adapting the protocol to satisfy the stricter consistency model; or changing the architecture’s consistency model to (a variant of) RC, typically at the expense of backward compatibility. The first part of this thesis explores both these options, with a focus on a practical approach satisfying backward compatibility. Because of the wide adoption of Total Store Order (TSO) and its variants in x86 and SPARC processors, and existing parallel programs written for these architectures, we first propose TSO-CC, a lazy cache coherence protocol for the TSO memory consistency model. TSO-CC does not track sharers and instead relies on self-invalidation and detection of potential acquires (in the absence of explicit synchronization) using per cache line timestamps to efficiently and lazily satisfy the TSO memory consistency model. Our results show that TSO-CC achieves, on average, performance comparable to a MESI directory protocol, while TSO-CC’s storage overhead per cache line scales logarithmically with increasing core count. Next, we propose an approach for the x86-64 architecture, which is a compromise between retaining the original consistency model and using a more storage efficient lazy coherence protocol. First, we propose a mechanism to convey synchronization information via a simple ISA extension, while retaining backward compatibility with legacy codes and older microarchitectures. Second, we propose RC3 (based on TSOCC), a scalable cache coherence protocol for RCtso, the resulting memory consistency model. RC3 does not track sharers and relies on self-invalidation on acquires. To satisfy RCtso efficiently, the protocol reduces self-invalidations transitively using per-L1 timestamps only. RC3 outperforms a conventional lazy RC protocol by 12%, achieving performance comparable to a MESI directory protocol for RC optimized programs. RC3’s storage overhead per cache line scales logarithmically with increasing core count and reduces on-chip coherence storage overheads by 45% compared to TSO-CC. Finally, it is imperative that hardware adheres to the promised memory consistency model. Indeed, consistency directed coherence protocols cannot use conventional coherence definitions (e.g. SWMR) to be verified against, and few existing verification methodologies apply. Furthermore, as the full consistency model is used as a specification, their interaction with other components (e.g. pipeline) of a system must not be neglected in the verification process. Therefore, verifying a system with such protocols in the context of interacting components is even more important than before. One common way to do this is via executing tests, where specific threads of instruction sequences are generated and their executions are checked for adherence to the consistency model. It would be extremely beneficial to execute such tests under simulation, i.e. when the functional design implementation of the hardware is being prototyped. Most prior verification methodologies, however, target post-silicon environments, which when used for simulation-based memory consistency verification would be too slow. We propose McVerSi, a test generation framework for fast memory consistency verification of a full-system design implementation under simulation. Our primary contribution is a Genetic Programming (GP) based approach to memory consistency test generation, which relies on a novel crossover function that prioritizes memory operations contributing to non-determinism, thereby increasing the probability of uncovering memory consistency bugs. To guide tests towards exercising as much logic as possible, the simulator’s reported coverage is used as the fitness function. Furthermore, we increase test throughput by making the test workload simulation-aware. We evaluate our proposed framework using the Gem5 cycle accurate simulator in full-system mode with Ruby (with configurations that use Gem5’s MESI protocol, and our proposed TSO-CC together with an out-of-order pipeline). We discover 2 new bugs in the MESI protocol due to the faulty interaction of the pipeline and the cache coherence protocol, highlighting that even conventional protocols should be verified rigorously in the context of a full-system. Crucially, these bugs would not have been discovered through individual verification of the pipeline or the coherence protocol. We study 11 bugs in total. Our GP-based test generation approach finds all bugs consistently, therefore providing much higher guarantees compared to alternative approaches (pseudo-random test generation and litmus tests).
APA, Harvard, Vancouver, ISO, and other styles
More sources

Books on the topic "Computer architecture. Cache memory"

1

Balasubramonian, Rajeev. Multi-core cache hierarchies. San Rafael, Calif. (1537 Fourth Street, San Rafael, CA 94901 USA): Morgan & Claypool, 2011.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
2

Analysis of cache performance for operating systems and multiprogramming. Boston: Kluwer Academic Publishers, 1989.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
3

Machinery, Association for Computing, and IEEE Computer Society, eds. ASPLOS-VII proceedings: Seventh International Conference on Architectural Support for Programming Languages and Operating Systems, Cambridge, Massachusetts, October 1-5, 1996. New York: Association for Computing Machinery, 1996.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
4

William, Stallings. Computer organization and architecture: Designing for performance. 7th ed. Upper Saddle River, NJ: Pearson Prentice Hall, 2006.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
5

William, Stallings. Computer organization and architecture: Designing for performance. 6th ed. Upper Saddle River, NJ: Pearson Education, 2003.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
6

William, Stallings. Computer organization and architecture: Designing for performance. 4th ed. London: Prentice-Hall International (UK), 1996.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
7

William, Stallings. Computer organization and architecture: Designing for performance. 6th ed. Upper Saddle River, N.J: Prentice Hall Pearson Education International, 2003.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
8

William, Stallings. Computer organization and architecture: Designing for performance. 5th ed. Upper Saddle River, N.J: Prentice Hall, 2000.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
9

William, Stallings. Computer organization and architecture: Designing for performance. 4th ed. Upper Saddle River, N.J: Prentice Hall, 1996.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
10

William, Stallings. Computer organization and architecture: Principles of structure and function. New York: Macmillan, 1987.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
More sources

Book chapters on the topic "Computer architecture. Cache memory"

1

Rui, Hou, Fuxin Zhang, and Weiwu Hu. "A Memory Bandwidth Effective Cache Store Miss Policy." In Advances in Computer Systems Architecture, 750–60. Berlin, Heidelberg: Springer Berlin Heidelberg, 2005. http://dx.doi.org/10.1007/11572961_61.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Machanick, Philip, and Zunaid Patel. "L1 Cache and TLB Enhancements to the RAMpage Memory Hierarchy." In Advances in Computer Systems Architecture, 305–19. Berlin, Heidelberg: Springer Berlin Heidelberg, 2003. http://dx.doi.org/10.1007/978-3-540-39864-6_25.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Kong, Jinseok, and Gyungho Lee. "Relaxing the inclusion property in cache only memory architecture." In Lecture Notes in Computer Science, 435–44. Berlin, Heidelberg: Springer Berlin Heidelberg, 1996. http://dx.doi.org/10.1007/bfb0024733.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Jun-Min, Wu, Zhu Xiao-Dong, Sui Xiu-Feng, Jin Ying-Qi, and Zhao Xiao-Yu. "Dynamic Partitioning of Scalable Cache Memory for SMT Architectures." In Communications in Computer and Information Science, 12–25. Berlin, Heidelberg: Springer Berlin Heidelberg, 2013. http://dx.doi.org/10.1007/978-3-642-41591-3_2.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Tradowsky, Carsten, Enrique Cordero, Christoph Orsinger, Malte Vesper, and Jürgen Becker. "A Dynamic Cache Architecture for Efficient Memory Resource Allocation in Many-Core Systems." In Lecture Notes in Computer Science, 343–51. Cham: Springer International Publishing, 2016. http://dx.doi.org/10.1007/978-3-319-30481-6_29.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Oh, Chansoo, Dong Hyun Kang, Minho Lee, and Young Ik Eom. "A Buffer Cache Algorithm for Hybrid Memory Architecture in Mobile Devices." In Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, 293–300. Cham: Springer International Publishing, 2016. http://dx.doi.org/10.1007/978-3-319-38904-2_30.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Deakin, Tom, Wayne Gaudin, and Simon McIntosh-Smith. "On the Mitigation of Cache Hostile Memory Access Patterns on Many-Core CPU Architectures." In Lecture Notes in Computer Science, 348–62. Cham: Springer International Publishing, 2017. http://dx.doi.org/10.1007/978-3-319-67630-2_26.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Alam, Irina, Lara Dolecek, and Puneet Gupta. "Lightweight Software-Defined Error Correction for Memories." In Dependable Embedded Systems, 207–32. Cham: Springer International Publishing, 2020. http://dx.doi.org/10.1007/978-3-030-52017-5_9.

Full text
Abstract:
AbstractReliability of the memory subsystem is a growing concern in computer architecture and system design. From on-chip embedded memories in Internet-of-Things (IoT) devices and on-chip caches to off-chip main memories, the memory subsystems have become the limiting factor in the overall reliability of computing systems. This is because they are primarily designed to maximize bit storage density; this makes memories particularly sensitive to manufacturing process variation, environmental operating conditions, and aging-induced wearout. This chapter of the book focuses on software managed techniques and novel error correction codes to opportunistically cope with memory errors whenever they occur for improved reliability at minimal cost.
APA, Harvard, Vancouver, ISO, and other styles
9

Steele, Guy L., Xiaowei Shen, Josep Torrellas, Mark Tuckerman, Eric J. Bohm, Laxmikant V. Kalé, Glenn Martyna, et al. "COMA (Cache-Only Memory Architecture)." In Encyclopedia of Parallel Computing, 334. Boston, MA: Springer US, 2011. http://dx.doi.org/10.1007/978-0-387-09766-4_2231.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

Steele, Guy L., Xiaowei Shen, Josep Torrellas, Mark Tuckerman, Eric J. Bohm, Laxmikant V. Kalé, Glenn Martyna, et al. "Cache-Only Memory Architecture (COMA)." In Encyclopedia of Parallel Computing, 216–20. Boston, MA: Springer US, 2011. http://dx.doi.org/10.1007/978-0-387-09766-4_166.

Full text
APA, Harvard, Vancouver, ISO, and other styles

Conference papers on the topic "Computer architecture. Cache memory"

1

Koh, Cheng-Kok, Weng-Fai Wong, Yiran Chen, and Hai Li. "The salvage cache: A fault-tolerant cache architecture for next-generation memory technologies." In 2009 IEEE International Conference on Computer Design (ICCD 2009). IEEE, 2009. http://dx.doi.org/10.1109/iccd.2009.5413145.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Ryoo, Jee Ho, Mitesh R. Meswani, Andreas Prodromou, and Lizy K. John. "SILC-FM: Subblocked InterLeaved Cache-Like Flat Memory Organization." In 2017 IEEE International Symposium on High-Performance Computer Architecture (HPCA). IEEE, 2017. http://dx.doi.org/10.1109/hpca.2017.20.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Hashemi, Milad, Khubaib, Eiman Ebrahimi, Onur Mutlu, and Yale N. Patt. "Accelerating Dependent Cache Misses with an Enhanced Memory Controller." In 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA). IEEE, 2016. http://dx.doi.org/10.1109/isca.2016.46.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Iyer, R., and L. N. Bhuyan. "Switch cache: a framework for improving the remote memory access latency of CC-NUMA multiprocessors." In Proceedings Fifth International Symposium on High-Performance Computer Architecture. IEEE, 1999. http://dx.doi.org/10.1109/hpca.1999.744357.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Inoue, K., K. Kai, and K. Murakami. "Dynamically variable line-size cache exploiting high on-chip memory bandwidth of merged DRAM/logic LSIs." In Proceedings Fifth International Symposium on High-Performance Computer Architecture. IEEE, 1999. http://dx.doi.org/10.1109/hpca.1999.744366.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Korgaonkar, Kunal, Ishwar Bhati, Huichu Liu, Jayesh Gaur, Sasikanth Manipatruni, Sreenivas Subramoney, Tanay Karnik, Steven Swanson, Ian Young, and Hong Wang. "Density Tradeoffs of Non-Volatile Memory as a Replacement for SRAM Based Last Level Cache." In 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA). IEEE, 2018. http://dx.doi.org/10.1109/isca.2018.00035.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Nimako, Gideon, E. J. Otoo, and Daniel Ohene-Kwofie. "Cache-sensitive MapReduce DGEMM algorithms for shared memory architectures." In the South African Institute for Computer Scientists and Information Technologists Conference. New York, New York, USA: ACM Press, 2012. http://dx.doi.org/10.1145/2389836.2389849.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Liang, Li-Zheng, Ming-Chang Yang, Yuan-Hao Chang, Tseng-Yi Chen, Shuo-Han Chen, Hsin-Wen Wei, and Wei-Kuan Shih. "xB+-Tree: Access-Pattern-Aware Cache-Line-Based Tree for Non-volatile Main Memory Architecture." In 2017 IEEE 41st Annual Computer Software and Applications Conference (COMPSAC). IEEE, 2017. http://dx.doi.org/10.1109/compsac.2017.267.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Lee, Hyung Gyu, Seungcheol Baek, Chrysostomos Nicopoulos, and Jongman Kim. "An energy- and performance-aware DRAM cache architecture for hybrid DRAM/PCM main memory systems." In 2011 IEEE 29th International Conference on Computer Design (ICCD 2011). IEEE, 2011. http://dx.doi.org/10.1109/iccd.2011.6081427.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

Lin, Yun-Te, Yi-Hao Hsiao, Fang-Pang Lin, and Chung-Ming Wang. "A hybrid cache architecture of shared memory and meta-table used in big multimedia query." In 2016 IEEE/ACIS 15th International Conference on Computer and Information Science (ICIS). IEEE, 2016. http://dx.doi.org/10.1109/icis.2016.7550809.

Full text
APA, Harvard, Vancouver, ISO, and other styles

Reports on the topic "Computer architecture. Cache memory"

1

Chiarulli, Donald M., and Steven P. Levitan. Optoelectronic Cache Memory System Architecture. Fort Belvoir, VA: Defense Technical Information Center, December 1999. http://dx.doi.org/10.21236/ada371774.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Cheriton, David R., Hendrik A. Goosen, and Patrick D. Boyle. ParaDiGM: A Highly Scalable Shared-Memory Multi-Computer Architecture. Fort Belvoir, VA: Defense Technical Information Center, November 1990. http://dx.doi.org/10.21236/ada325912.

Full text
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography