Dissertations / Theses: 'Resource contention'

1

Smart, Robert John. "Resource contention in real-time systems." Thesis, Open University, 2003. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.288571.

Full text

APA, Harvard, Vancouver, ISO, and other styles

2

Sabesan, Subramaniam. "Contention and resource control in ATM and IP networks." Thesis, University of Cambridge, 2000. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.621637.

Full text

APA, Harvard, Vancouver, ISO, and other styles

3

Rouxel, Benjamin. "Minimising shared resource contention when scheduling real-time applications on multi-core architectures." Thesis, Rennes 1, 2018. http://www.theses.fr/2018REN1S092/document.

Full text

Abstract:

Les architectures multi-cœurs utilisant des mémoire bloc-notes sont des architectures attrayantes pour l'exécution des applications embarquées temps-réel, car elles offrent une grande capacité de calcul. Cependant, les systèmes temps-réel nécessitent de satisfaire des contraintes temporelles, ce qui peut être compliqué sur ce type d'architectures à cause notamment des ressources matérielles physiquement partagées entre les cœurs. Plus précisément, les scénarios de pire cas de partage du bus de communication entre les cœurs et la mémoire externe sont trop pessimistes. Cette thèse propose des stratégies pour réduire ce pessimisme lors de l'ordonnancement d'applications sur des architectures multi-cœurs. Tout d'abord, la précision du pire cas des coûts de communication est accrue grâce aux informations disponibles sur l'application et l'état de l'ordonnancement en cours. Ensuite, les capacités de parallélisation du matériel sont exploitées afin de superposer les calculs et les communications. De plus, les possibilités de superposition sont accrues par le morcellement de ces communications Multi-core architectures using scratch pad memories are very attractive to execute embedded time-critical applications, because they offer a large computational power. However, ensuring that timing constraints are met on such platforms is challenging, because some hardware resources are shared between cores. When targeting the bus connecting cores and external memory, worst-case sharing scenarios are too pessimistic. This thesis propose strategies to reduce this pessimism. These strategies offer to both improve the accuracy of worst-case communication costs, and to exploit hardware parallel capacities by overlapping computations and communications. Moreover, fragmenting the latter allow to increase overlapping possibilities

APA, Harvard, Vancouver, ISO, and other styles

4

Ramírez, García Tanausu. "Runahead threads." Doctoral thesis, Universitat Politècnica de Catalunya, 2010. http://hdl.handle.net/10803/6019.

Full text

Abstract:

Los temas de investigación sobre multithreading han ganado mucho interés en la arquitectura de computadores con la aparición de procesadores multihilo y multinucleo. Los procesadores SMT (Simultaneous Multithreading) son uno de estos nuevos paradigmas, combinando la capacidad de emisión de múltiples instrucciones de los procesadores superscalares con la habilidad de explotar el paralelismo a nivel de hilos (TLP). Así, la principal característica de los procesadores SMT es ejecutar varios hilos al mismo tiempo para incrementar la utilización de las etapas del procesador mediante la compartición de recursos. Los recursos compartidos son el factor clave de los procesadores SMT, ya que esta característica conlleva tratar con importantes cuestiones pues los hilos también compiten por estos recursos en el núcleo del procesador. Si bien distintos grupos de aplicaciones se benefician de disponer de SMT, las diferentes propiedades de los hilos ejecutados pueden desbalancear la asignación de recursos entre los mismos, disminuyendo los beneficios de la ejecución multihilo. Por otro lado, el problema con la memoria está aún presente en los procesadores SMT. Estos procesadores alivian algunos de los problemas de latencia provocados por la lentitud de la memoria con respecto a la CPU. Sin embargo, hilos con grandes cargas de trabajo y con altas tasas de fallos en las caches son unas de las mayores dificultades de los procesadores SMT. Estos hilos intensivos en memoria tienden a crear importantes problemas por la contención de recursos. Por ejemplo, pueden llegar a bloquear recursos críticos debido a operaciones de larga latencia impidiendo no solo su ejecución, sino el progreso de la ejecución de los otros hilos y, por tanto, degradando el rendimiento general del sistema. El principal objetivo de esta tesis es aportar soluciones novedosas a estos problemas y que mejoren el rendimiento de los procesadores SMT. Para conseguirlo, proponemos los Runahead Threads (RaT) aplicando una ejecución especulativa basada en runahead. RaT es un mecanismo alternativo a las políticas previas de gestión de recursos las cuales usualmente restringían a los hilos intensivos en memoria para conseguir más productividad. La idea clave de RaT es transformar un hilo intensivo en memoria en un hilo ligero en el uso de recursos que progrese especulativamente. Así, cuando un hilo sufre de un acceso de larga latencia, RaT transforma dicho hilo en un hilo de runahead mientras dicho fallo está pendiente. Los principales beneficios de esta simple acción son varios. Mientras un hilo está en runahead, éste usa los diferentes recursos compartidos sin monopolizarlos o limitarlos con respecto a los otros hilos. Al mismo tiempo, esta ejecución especulativa realiza prebúsquedas a memoria que se solapan con el fallo principal, por tanto explotando el paralelismo a nivel de memoria y mejorando el rendimiento. RaT añade muy poco hardware extra y complejidad en los procesadores SMT con respecto a su implementación. A través de un mecanismo de checkpoint y lógica de control adicional, podemos dotar a los contextos hardware con la capacidad de ejecución en runahead. Por medio de RaT, contribuímos a aliviar simultaneamente dos problemas en el contexto de los procesadores SMT. Primero, RaT reduce el problema de los accesos de larga latencia en los SMT mediante el paralelismo a nivel de memoria (MLP). Un hilo prebusca datos en paralelo en vez de estar parado debido a un fallo de L2 mejorando su rendimiento individual. Segundo, RaT evita que los hilos bloqueen recursos bajo fallos de larga latencia. RaT asegura que el hilo intensivo en memoria recicle más rápido los recursos compartidos que usa debido a la naturaleza de la ejecución especulativa. La principal limitación de RaT es que los hilos especulativos pueden ejecutar instrucciones extras cuando no realizan prebúsqueda e innecesariamente consumir recursos de ejecución en el procesador SMT. Este inconveniente resulta en hilos de runahead ineficientes pues no contribuyen a la ganancia de rendimiento e incrementan el consumo de energía debido al número extra de instrucciones especulativas. Por consiguiente, en esta tesis también estudiamos diferentes soluciones dirigidas a solventar esta desventaja del mecanismo RaT. El resultado es un conjunto de soluciones complementarias para mejorar la eficiencia de RaT en términos de consumo de potencia y gasto energético. Por un lado, mejoramos la eficiencia de RaT aplicando ciertas técnicas basadas en el análisis semántico del código ejecutado por los hilos en runahead. Proponemos diferentes técnicas que analizan y controlan la utilidad de ciertos patrones de código durante la ejecución en runahead. Por medio de un análisis dinámico, los hilos en runahead supervisan la utilidad de ejecutar los bucles y subrutinas dependiendo de las oportunidades de prebúsqueda. Así, RaT decide cual de estas estructuras de programa ejecutar dependiendo de la información de utilidad obtenida, decidiendo entre parar o saltar el bucle o la subrutina para reducir el número de las instrucciones no útiles. Entre las técnicas propuestas, conseguimos reducir las instrucciones especulativas y la energía gastada mientras obtenemos rendimientos similares a la técnica RaT original. Por otro lado, también proponemos lo que denominamos hilos de runahead eficientes. Esta propuesta se basa en una técnica más fina que cubre todo el rango de ejecución en runahead, independientemente de las características del programa ejecutado. La idea principal es averiguar "cuando" y "durante cuanto" un hilo en runahead debe ser ejecutado prediciendo lo que denominamos distancia útil de runahead. Los resultados muestran que la mejor de estas propuestas basadas en la predicción de la distancia de runahead reducen significativamente el número de instrucciones extras así como también el consumo de potencia. Asimismo, conseguimos mantener los beneficios de rendimiento de los hilos en runahead, mejorando de esta forma la eficiencia energética de los procesadores SMT usando el mecanismo RaT. La evolución de RaT desarrollada durante toda esta investigación nos proporciona no sólo una propuesta orientada a un mayor rendimiento sino también una forma eficiente de usar los recursos compartidos en los procesadores SMT en presencia de operaciones de memoria de larga latencia. Dado que los diseños SMT en el futuro estarán orientados a optimizar una combinación de rendimiento individual en las aplicaciones, la productividad y el consumo de energía, los mecanismos basados en RaT aquí propuestos son interesantes opciones que proporcionan un mejor balance de rendimiento y energía que las propuestas previas en esta área. Research on multithreading topics has gained a lot of interest in the computer architecture community due to new commercial multithreaded and multicore processors. Simultaneous Multithreading (SMT) is one of these relatively new paradigms, which combines the multiple instruction issue features of superscalar processors with the ability of multithreaded architectures to exploit thread level parallelism (TLP). The main feature of SMT processors is to execute multiple threads that increase the utilization of the pipeline by sharing many more resources than in other types of processors. Shared resources are the key of simultaneous multithreading, what makes the technique worthwhile. This feature also entails important challenges to deal with because threads also compete for resources in the processor core. On the one hand, although certain types and mixes of applications truly benefit from SMT, the different features of threads can unbalance the resource allocation among threads, diminishing the benefit of multithreaded execution. On the other hand, the memory wall problem is still present in these processors. SMT processors alleviate some of the latency problems arisen by main memory's slowness relative to the CPUs. Nevertheless, threads with high cache miss rates that use large working sets are one of the major pitfalls of SMT processors. These memory intensive threads tend to use processor and memory resources poorly creating the highest resource contention problems. Memory intensive threads can clog up shared resources due to long latency memory operations without making progress on a SMT processor, thereby hindering overall system performance. The main goal of this thesis is to alleviate these shortcomings on SMT scenarios. To accomplish this, the key contribution of this thesis is the application of the paradigm of Runahead execution in the design of multithreaded processors by Runahead Threads (RaT). RaT shows to be a promising alternative to prior SMT resource management mechanisms which usually restrict memory bound threads in order to get higher throughputs. The idea of RaT is to transform a memory intensive thread into a light-consumer resource thread by allowing that thread to progress speculatively. Therefore, as soon as a thread undergoes a long latency load, RaT transforms the thread to a runahead thread while it has that long latency miss outstanding. The main benefits of this simple action performed by RaT are twofold. While being a runahead thread, this thread uses the different shared resources without monopolizing or limiting the available resources for other threads. At the same time, this fast speculative thread issues prefetches that overlap other memory accesses with the main miss, thereby exploiting the memory level parallelism. Regarding implementation issues, RaT adds very little extra hardware cost and complexity to an existing SMT processor. Through a simple checkpoint mechanism and little additional control logic, we can equip the hardware contexts with the runahead thread capability. Therefore, by means of runahead threads, we contribute to alleviate simultaneously the two shortcomings in the context of SMT processor improving the performance. First, RaT alleviates the long latency load problem on SMT processors by exposing memory level parallelism (MLP). A thread prefetches data in parallel (if MLP is available) improving its individual performance rather than be stalled on an L2 miss. Second, RaT prevents threads from clogging resources on long latency loads. RaT ensures that the L2-missing thread recycles faster the shared resources it uses by the nature of runahead speculative execution. This avoids memory intensive threads clogging the important processor resources up. The main limitation of RaT though is that runahead threads can execute useless instructions and unnecessarily consume execution resources on the SMT processor when there is no prefetching to be exploited. This drawback results in inefficient runahead threads which do not contribute to the performance gain and increase dynamic energy consumption due to the number of extra speculatively executed instructions. Therefore, we also propose different solutions aimed at this major disadvantage of the Runahead Threads mechanism. The result of the research on this line is a set of complementary solutions to enhance RaT in terms of power consumption and energy efficiency. On the one hand, code semantic-aware Runahead threads improve the efficiency of RaT using coarse-grain code semantic analysis at runtime. We provide different techniques that analyze the usefulness of certain code patterns during runahead thread execution. The code patterns selected to perform that analysis are loops and subroutines. By means of the proposed coarse grain analysis, runahead threads oversee the usefulness of loops or subroutines depending on the prefetches opportunities during their executions. Thus, runahead threads decide which of these particular program structures execute depending on the obtained usefulness information, deciding either stall or skip the loop or subroutine executions to reduce the number of useless runahead instructions. Some of the proposed techniques reduce the speculative instruction and wasted energy while achieving similar performance to RaT. On the other hand, the efficient Runahead thread proposal is another contribution focused on improving RaT efficiency. This approach is based on a generic technique which covers all runahead thread executions, independently of the executed program characteristics as code semantic-aware runahead threads are. The key idea behind this new scheme is to find out --when' and --how long' a thread should be executed in runahead mode by predicting the useful runahead distance. The results show that the best of these approaches based on the runahead distance prediction significantly reduces the number of extra speculative instructions executed in runahead threads, as well as the power consumption. Likewise, it maintains the performance benefits of the runahead threads, thereby improving the energy-efficiency of SMT processors using the RaT mechanism. The evolution of Runahead Threads developed in this research provides not only a high performance but also an efficient way of using shared resources in SMT processors in the presence of long latency memory operations. As designers of future SMT systems will be increasingly required to optimize for a combination of single thread performance, total throughput, and energy consumption, RaT-based mechanisms are promising options that provide better performance and energy balance than previous proposals in the field.

APA, Harvard, Vancouver, ISO, and other styles

5

Nasim, Robayet. "Cost- and Performance-Aware Resource Management in Cloud Infrastructures." Doctoral thesis, Karlstads universitet, Institutionen för matematik och datavetenskap, 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:kau:diva-48482.

Full text

Abstract:

High availability, cost effectiveness and ease of application deployment have accelerated the adoption rate of cloud computing. This fast proliferation of cloud computing promotes the rapid development of large-scale infrastructures. However, large cloud datacenters (DCs) require infrastructure, design, deployment, scalability and reliability and need better management techniques to achieve sustainable design benefits. Resources inside cloud infrastructures often operate at low utilization, rarely exceeding 20-30%, which increases the operational cost significantly, especially due to energy consumption. To reduce operational cost without affecting quality of service (QoS) requirements, cloud applications should be allocated just enough resources to minimize their completion time or to maximize utilization. The focus of this thesis is to enable resource-efficient and performance-aware cloud infrastructures by addressing above mentioned cost and performance related challenges. In particular, we propose algorithms, techniques, and deployment strategies for improving the dynamic allocation of virtual machines (VMs) into physical machines (PMs). For minimizing the operational cost, we mainly focus on optimizing energy consumption of PMs by applying dynamic VM consolidation methods. To make VM consolidation techniques more efficient, we propose to utilize multiple paths to spread traffic and deploy recent queue management schemes which can maximize network resource utilization and reduce both downtime and migration time for live migration techniques. In addition, a dynamic resource allocation scheme is presented to distribute workloads among geographically dispersed DCs considering their location based time varying costs due to e.g. carbon emission or bandwidth provision. For optimizing performance level objectives, we focus on interference among applications contending in shared resources and propose a novel VM consolidation scheme considering sensitivity of the VMs to their demanded resources. Further, to investigate the impact of uncertain parameters on cloud resource allocation and applications’ QoS such as unpredictable variations in demand, we develop an optimization model based on the theory of robust optimization. Furthermore, in order to handle the scalability issues in the context of large scale infrastructures, a robust and fast Tabu Search algorithm is designed and evaluated. High availability, cost effectiveness and ease of application deployment have accelerated the adoption rate of cloud computing. This fast proliferation of cloud computing promotes the rapid development of large-scale infrastructures. However, large cloud datacenters (DCs) require infrastructure, design, deployment, scalability and reliability and need better management techniques to achieve sustainable design benefits. Resources inside cloud infrastructures often operate at low utilization, rarely exceeding 20-30%, which increases the operational cost significantly, especially due to energy consumption. To reduce operational cost without affecting quality of service (QoS) requirements, cloud applications should be allocated just enough resources to minimize their completion time or to maximize utilization. The focus of this thesis is to enable resource-efficient and performance-aware cloud infrastructures by addressing above mentioned cost and performance related challenges. In particular, we propose algorithms, techniques, and deployment strategies for improving the dynamic allocation of virtual machines (VMs) into physical machines (PMs).

APA, Harvard, Vancouver, ISO, and other styles

6

Turner, Andrew J. "Input Shaping to Achieve Service Level Objectives in Cloud Computing Environments." Research Showcase @ CMU, 2013. http://repository.cmu.edu/dissertations/289.

Full text

Abstract:

In this thesis we propose a cloud Input Shaper and Dynamic Resource Controller to provide application-level quality of service guarantees in cloud computing environments. The Input Shaper splits the cloud into two areas: one for shaped traffic that achieves quality of service targets, and one for overflow traffic that may not achieve the targets. The Dynamic Resource Controller profiles customers’ applications, then calculates and allocates the resources required by the applications to achieve given quality of service targets. The Input Shaper then shapes the rate of incoming requests to ensure that the applications achieve their quality of service targets based on the amount of allocated resources. To evaluate our system we create a new benchmark application that is suitable for use in cloud computing environments. It is designed to reflect the current design of cloud based applications and can dynamically scale each application tier to handle large and varying workload levels. In addition, the client emulator that drives the benchmark also mimics realistic user behaviors such as browsing from multiple tabs, using JavaScript, and has variable thinking and typing speeds. We show that a cloud management system evaluated using previous benchmarks could violate its estimated quality of service achievement rate by over 20%. The Input Shaper and Dynamic Resource Controller system consist of an application performance modeler, a resource allocator, decision engine, and an Apache HTTP server module to reshape the rate of incoming web requests. By dynamically allocating resources to applications, we show that their response times can be improved by as much as 30%. Also, the amount of resources required to host applications can be decreased by 20% while achieving quality of service objectives. The Input Shaper can reduce VMs’ resource utilization variances by 88%, and reduce the number of servers by 45%.

APA, Harvard, Vancouver, ISO, and other styles

7

Wadhwa, Bharti. "Scalable Data Management for Object-based Storage Systems." Diss., Virginia Tech, 2020. http://hdl.handle.net/10919/99791.

Full text

Abstract:

Parallel I/O performance is crucial to sustain scientific applications on large-scale High-Performance Computing (HPC) systems. Large scale distributed storage systems, in particular the object-based storage systems, face severe challenges for managing the data efficiently. Inefficient data management leads to poor I/O and storage performance in HPC applications and scientific workflows. Some of the main challenges for efficient data management arise from poor resource allocation, load imbalance in object storage targets, and inflexible data sharing between applications in a workflow. In addition, parallel I/O makes it challenging to shoehorn new interfaces, such as taking advantage of multiple layers of storage and support for analysis in the data path. Solving these challenges to improve performance and efficiency of object-based storage systems is crucial, especially for upcoming era of exascale systems. This dissertation is focused on solving these major challenges in object-based storage systems by providing scalable data management strategies. In the first part of the dis-sertation (Chapter 3), we present a resource contention aware load balancing tool (iez) for large scale distributed object-based storage systems. In Chapter 4, we extend iez to support Progressive File Layout for object-based storage system: Lustre. In the second part (Chapter 5), we present a technique to facilitate data sharing in scientific workflows using object-based storage, with our proposed tool Workflow Data Communicator. In the last part of this dissertation, we present a solution for transparent data management in multi-layer storage hierarchy of present and next-generation HPC systems.This dissertation shows that by intelligently employing scalable data management techniques, scientific applications' and workflows' flexibility and performance in object-based storage systems can be enhanced manyfold. Our proposed data management strategies can guide next-generation HPC storage systems' software design to efficiently support data for scientific applications and workflows. Doctor of Philosophy Large scale object-based storage systems face severe challenges to manage the data efficiently for HPC applications and workflows. These storage systems often manage and share data inflexibly, without considering the load imbalance and resource contention in the underlying multi-layer storage hierarchy. This dissertation first studies how resource contention and inflexible data sharing mechanisms impact HPC applications' storage and I/O performance; and then presents a series of efficient techniques, tools and algorithms to provide efficient and scalable data management for current and next-generation HPC storage systems

APA, Harvard, Vancouver, ISO, and other styles

8

Pumma, Sarunya. "Scalability Analysis and Optimization for Large-Scale Deep Learning." Diss., Virginia Tech, 2020. http://hdl.handle.net/10919/104417.

Full text

Abstract:

Despite its growing importance, scalable deep learning (DL) remains a difficult challenge. Scalability of large-scale DL is constrained by many factors, including those deriving from data movement and data processing. DL frameworks rely on large volumes of data to be fed to the computation engines for processing. However, current hardware trends showcase that data movement is already one of the slowest components in modern high performance computing systems, and this gap is only going to increase in the future. This includes data movement needed from the filesystem, within the network subsystem, and even within the node itself, all of which limit the scalability of DL frameworks on large systems. Even after data is moved to the computational units, managing this data is not easy. Modern DL frameworks use multiple components---such as graph scheduling, neural network training, gradient synchronization, and input pipeline processing---to process this data in an asynchronous uncoordinated manner, which results in straggler processes and consequently computational imbalance, further limiting scalability. This thesis studies a subset of the large body of data movement and data processing challenges that exist in modern DL frameworks. For the first study, we investigate file I/O constraints that limit the scalability of large-scale DL. We first analyze the Caffe DL framework with Lightning Memory-Mapped Database (LMDB), one of the most widely used file I/O subsystems in DL frameworks, to understand the causes of file I/O inefficiencies. Based on our analysis, we propose LMDBIO---an optimized I/O plugin for scalable DL that addresses the various shortcomings in existing file I/O for DL. Our experimental results show that LMDBIO significantly outperforms LMDB in all cases and improves overall application performance by up to 65-fold on 9,216 CPUs of the Blues and Bebop supercomputers at Argonne National Laboratory. Our second study deals with the computational imbalance problem in data processing. For most DL systems, the simultaneous and asynchronous execution of multiple data-processing components on shared hardware resources causes these components to contend with one another, leading to severe computational imbalance and degraded scalability. We propose various novel optimizations that minimize resource contention and improve performance by up to 35% for training various neural networks on 24,576 GPUs of the Summit supercomputer at Oak Ridge National Laboratory---the world's largest supercomputer at the time of writing of this thesis. Doctor of Philosophy Deep learning is a method for computers to automatically extract complex patterns and trends from large volumes of data. It is a popular methodology that we use every day when we talk to Apple Siri or Google Assistant, when we use self-driving cars, or even when we witnessed IBM Watson be crowned as the champion of Jeopardy! While deep learning is integrated into our everyday life, it is a complex problem that has gotten the attention of many researchers. Executing deep learning is a highly computationally intensive problem. On traditional computers, such as a generic laptop or desktop machine, the computation for large deep learning problems can take years or decades to complete. Consequently, supercomputers, which are machines with massive computational capability, are leveraged for deep learning workloads. The world's fastest supercomputer today, for example, is capable of performing almost 200 quadrillion floating point operations every second. While that is impressive, for large problems, unfortunately, even the fastest supercomputers today are not fast enough. The problem is not that they do not have enough computational capability, but that deep learning problems inherently rely on a lot of data---the entire concept of deep learning centers around the fact that the computer would study a huge volume of data and draw trends from it. Moving and processing this data, unfortunately, is much slower than the computation itself and with the current hardware trends it is not expected to get much faster in the future. This thesis aims at making deep learning executions on large supercomputers faster. Specifically, it looks at two pieces associated with managing data: (1) data reading---how to quickly read large amounts of data from storage, and (2) computational imbalance---how to ensure that the different processors on the supercomputer are not waiting for each other and thus wasting time. We first analyze each performance problem to identify the root cause of it. Then, based on the analysis, we propose several novel techniques to solve the problem. With our optimizations, we are able to significantly improve the performance of deep learning execution on a number of supercomputers, including Blues and Bebop at Argonne National Laboratory, and Summit---the world's fastest supercomputer---at Oak Ridge National Laboratory.

APA, Harvard, Vancouver, ISO, and other styles

9

Wen, Hao. "IMPROVING PERFORMANCE AND ENERGY EFFICIENCY FOR THE INTEGRATED CPU-GPU HETEROGENEOUS SYSTEMS." VCU Scholars Compass, 2018. https://scholarscompass.vcu.edu/etd/5664.

Full text

Abstract:

Current heterogeneous CPU-GPU architectures integrate general purpose CPUs and highly thread-level parallelized GPUs (Graphic Processing Units) in the same die. This dissertation focuses on improving the energy efficiency and performance for the heterogeneous CPU-GPU system. Leakage energy has become an increasingly large fraction of total energy consumption, making it important to reduce leakage energy for improving the overall energy efficiency. Cache occupies a large on-chip area, which are good targets for leakage energy reduction. For the CPU cache, we study how to reduce the cache leakage energy efﬁciently in a hybrid SPM (Scratch-Pad Memory) and cache architecture. For the GPU cache, the access pattern of GPU cache is different from the CPU, which usually has little locality and high miss rate. In addition, GPU can hide memory latency more effectively due to multi-threading. Because of the above reasons, we find it is possible to place the cache lines of the GPU data caches into the low power mode more aggressively than traditional leakage management for CPU caches, which can reduce more leakage energy without significant performance degradation. The contention in shared resources between CPU and GPU, such as the last level cache (LLC), interconnection network and DRAM, may degrade both CPU and GPU performance. We propose a simple yet effective method based on probability to control the LLC replacement policy for reducing the CPU’s inter-core conﬂict misses caused by GPU without significantly impacting GPU performance. In addition, we develop two strategies to combine the probability based method for the LLC and an existing technique called virtual channel partition (VCP) for the interconnection network to further improve the CPU performance. For a specific graph application of Breadth first search (BFS), which is a basis for graph search and a core building block for many higher-level graph analysis applications, it is a typical example of parallel computation that is inefficient on GPU architectures. In a graph, a small portion of nodes may have a large number of neighbors, which leads to irregular tasks on GPUs. These irregularities limit the parallelism of BFS executing on GPUs. Unlike the previous works focusing on fine-grained task management to address the irregularity, we propose Virtual-BFS (VBFS) to virtually change the graph itself. By adding virtual vertices, the high-degree nodes in the graph are divided into groups that have an equal number of neighbors, which increases the parallelism such that more GPU threads can work concurrently. This approach ensures correctness and can significantly improve both the performance and energy efficiency on GPUs.

APA, Harvard, Vancouver, ISO, and other styles

10

Brown, Mona-Lee C. "Merging forces : issues for contention in the merging of traditional media forms." Thesis, Georgia Institute of Technology, 1998. http://hdl.handle.net/1853/17694.

Full text

APA, Harvard, Vancouver, ISO, and other styles

11

Buday, Amanda T. "Fracturing Illinois: Fields of Political Contention in Hydraulic Fracturing Regulatory Policy." OpenSIUC, 2016. https://opensiuc.lib.siu.edu/dissertations/1267.

Full text

Abstract:

This dissertation examines the interactions between social movement organizations and a variety of state and municipal targets of movement activity during the construction of the Illinois Hydraulic Fracturing Regulatory Act (HFRA). Hydraulic fracturing is a controversial method of oil and gas extraction which created an unusual amount of public interest and participation in policy construction. This dissertation provides an overview of the political environment in Illinois during the legislative negotiations for the HFRA, outlining the playing field of political negotiations, and the relative positioning of social movement actors competing for influence in that field. Additionally, I examine the causes and consequences of conflict between coalition partners opposed to fracking, focusing on the impact of differential resources, expertise, and institutional legitimacy. Using data from interviews with organization leaders from industry and environmental coalitions, key informants from government bureaus, and participant observation at public meetings, my research contributes to the political process literature by elaborating the heterogeneity of the state’s interests in political challenges and revealing cleavages within social movement coalitions.

APA, Harvard, Vancouver, ISO, and other styles

12

Lindberg, Emil. "Measuring the effect of memory bandwidth contention in applications on multi-core processors." Thesis, Linköpings universitet, Programvara och system, 2015. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-114136.

Full text

Abstract:

In this thesis we design and implement a benchmarking tool for applications' sensitivity to main memory bandwidth contention, in a multi-core environment, on an ARM Cortex-A15 CPU. The tool is supposed to minimize usage of shared resources, except for the main memory bandwidth, allowing it to isolate the effects of the bandwidth contention only. The difficulty in doing this lies in using a correct memory access pattern for this purpose, i.e. which memory addresses to access, in which order and at what rate in order to minimize cache usage while generating a high and controllable main memory bandwidth usage. We manage to implement a tool with low cache memory usage while still being able to saturate the main memory bandwidth. The tool uses a proportional-integral controller to control the amount of bandwidth it uses. We then use the tool to investigate the memory behaviour of the platform and of some applications when the tool is using a variable amount of bandwidth. However, we have some difficulties in analyzing the results due to the lack of support for hardware performance counters in the operating system we are using and are forced to rely on hardware timers for our data gathering. Another difficulty is the platform's limited L2 cache bandwidth, which leads to a heavy impact on L2 cache read latency by the tool. Despite this, we are able to draw some conclusions on the bandwidth usage of other applications in optimal cases with the help of the tool.

APA, Harvard, Vancouver, ISO, and other styles

13

Tollefson, Jonathan. "Land Use, Power, and Knowledge at the Northern Resource Frontier: Mining, Public Engagement, and Contentious Land Imaginaries in Bristol Bay and the Yukon-Kuskokwim Delta." ScholarWorks @ UVM, 2018. https://scholarworks.uvm.edu/graddis/977.

Full text

Abstract:

The Donlin and Pebbles mines are two of the eight industrial-scale hard rock mines currently under the review of Alaska’s Large Mine Permitting program. Both projects promise to deliver profit and employment to their respective regions: Pebble to Bristol Bay in the southwest, and Donlin to the Yukon-Kuskokwim Delta, just north of Pebble. Both projects would also produce exceptional quantities of waste and will require almost-unprecedented infrastructure development, potentially threatening the lives and subsistence livelihoods of the Alaska Native peoples in their respective regions. The Pebble project inspired international protest and led to the emergence of a powerful resistance coalition of commercial, recreational, and subsistence fishers; activists and expert-consultants were thus able to build a powerful movement outside of and prior to the state permitting and impact assessment process. The coalitions that arose to oppose the Donlin project, in contrast, channeled their work through the state’s official public engagement processes – in part, due to strategic limitations stemming from the complexities of land use, sovereignty, and development politics specific to the Yukon-Kuskokwim region. The coalitional resistance to Pebble and the creative use of Donlin’s public participation process are key sites in which Western science and knowledge systems, as well as land use ideologies centered on extraction and profit, meet with Native Alaskan traditional knowledge and subsistence approaches to land use. I draw upon a history of Alaskan land use policy alongside extensive interviews with community organizers, state and federal officials, mining industry officials, and consultants in order to describe and understand the result: a set of creative resistance strategies that forefront hybrid approaches to knowledge and multiple, overlapping understandings of the land. Unfortunately, Alaska’s large mine permitting and environmental assessment processes are often structurally and epistemologically unable to consider these divergent discourses and the public imaginations of alternative futures they support and constitute.

APA, Harvard, Vancouver, ISO, and other styles

14

Yang, Yaling. "Distributed resource allocation in contention-based wireless networks /." 2006. http://gateway.proquest.com/openurl?url_ver=Z39.88-2004&rft_val_fmt=info:ofi/fmt:kev:mtx:dissertation&res_dat=xri:pqdiss&rft_dat=xri:pqdiss:3243034.

Full text

Abstract:

Thesis (Ph.D.)--University of Illinois at Urbana-Champaign, 2006. Source: Dissertation Abstracts International, Volume: 67-11, Section: B, page: 6539. Adviser: Robin Kravets. Includes bibliographical references (leaves 206-213) Available on microfilm from Pro Quest Information and Learning.

APA, Harvard, Vancouver, ISO, and other styles

15

Chen, Pei-Chi, and 陳培基. "HCOREMU: Accelerating Multicore System Emulationand Reducing Hardware Shared Resource Contention." Thesis, 2014. http://ndltd.ncl.edu.tw/handle/70014482754661071419.

Full text

Abstract:

碩士 國立臺灣大學 資訊工程學研究所 102 We present the high performance parallel system mode emulator, HCOREMU. Existing parallel system mode emulators focus on the correctness and synchronization mechanisms of emulation. However, there are two important factors that usually impede the performance: (1) the quality of emulation code and (2) threads contention on shared hardware resources. In this thesis, we take advantage of the ubiquitous multi-core platforms to improve our emulation code quality. We also propose two designs to accelerate multi-core system mode emulation based on the trace-based multi-threaded optimization in HQEMU. We reduce shared resource contention in three ways. First, We reduce the interconnect traf&;#64257;c and access latency of our threads due to the inconsistency of default Linux scheduler and memory allocator on NUMA platform. Second, we reduce the contention between optimization threads and emulation threads. Third, we &;#64257;nd out that some workloads have a hotspot when accessing memory. We use hardware performance counters to detect this situation. We reduce the interconnect traf&;#64257;c and access latency of emulation threads in workloads having this characteristics. HCOREMU improves the performance of COREMU by a factor of 1.8X in uni-processor emulation, 1.3X in multi-core emulation. Threads contention on shared resources are reduced by our scheduling, for that our scheduling outperforms the default Linux scheduling by a factor of 1.1X.

APA, Harvard, Vancouver, ISO, and other styles

16

Wei-TsoChen and 陳威佐. "Reduce Resource Contention by Task Scheduling on Hybrid SPM and Cache MPSoC." Thesis, 2012. http://ndltd.ncl.edu.tw/handle/43048123756529725333.

Full text

Abstract:

碩士 國立成功大學 資訊工程學系碩博士班 100 Modern MPSoC adopts scratchpad memory (SPM) in conjunction with cache because SPM is efficient in area and power. So, such MPSoCs is widely used in smart phones and tablet PCs. In order to have more computing power, the number of cores in a single chip increases and inevitably results in resource contention. Once resource contention becomes severe, performance gained by the number of cores may suffer. In this thesis, we propose a contention-reduction scheduling (CRS) for hybrid SPM/cache MPSoC. Tasks which use SPM as the local storage are profiled in advance. For non-profiled tasks, cache is used. In addition, a bus monitor (BM) is designed to provide bus activity information. In runtime, our contention-reduction scheduler (CRSer) uses the profiling information as well as the bus information to relieve resources contention. Simulation results show that CRS performs better than existing scheduling algorithms without considering the contention. CRS successfully reduces waiting time up to 23.3% and the average execution time up to 13.5%.

APA, Harvard, Vancouver, ISO, and other styles

17

Chen, Yan-Wei, and 陳彥瑋. "Energy-Efficient Scheduling Based on Reducing Resource Contention for Multi-Core Processor.pdf." Thesis, 2011. http://ndltd.ncl.edu.tw/handle/38022145509944527764.

Full text

Abstract:

碩士 國立暨南國際大學 資訊管理學系 100 Energy conservation is an important issue. In a multi-core and multi-processor system,some system resources are shared by processors, such as memory and processor’s cacheresources. When processors attempt to access shared resources at the same time, ifprocessors cannot get the required shared resources immediately, it will result in theperformance degradation and increased power consumption. In this thesis, we haveproposed a mechanism named energy-efficient scheduling which arranges tasks to run onprocessor cores in an appropriate way to avoid the access contention of shared resourcesand to save energy without sacrificing system performance too much.The proposed energy-efficient scheduling is implemented in Linux kernel 2.6.33version. The main work of implementation includes modification to the Linux kernelscheduler, the setting of the processor frequency, and the implementation of the system callfor setting parameters of the system resources. In this study, we treat memory as the mostimportant resource, and we classify each task as a memory-bound task or acomputing-bound task. Through the modification of the load balance function of Linuxkernel scheduler, memory-bound tasks and computing-bound tasks are separatelydispatched to run on the appropriate processors. Besides, we modify task’s execution orderto reduce the access contention of memory and processor cache. In addition, theIIImemory-bound tasks and computing-bound tasks are executed on processors in suitableprocessor frequencies to save energy. The performance of our mechanism is evaluated byrunning the SPEC CPU2006 benchmark with different proportions of memory-bound tasksand computing-bound tasks under the machine ASUS TS500 of two quad-core processorswith Hyper-Threading. Experimental results demonstrate that our energy-efficientscheduling can effectively save energy by avoiding the resource contention amongprocessor cores.

APA, Harvard, Vancouver, ISO, and other styles

18

Yang, Chieh-Jui, and 楊杰叡. "Dynamic Task-Aware Scheduling for Reducing Resource Contention on NUMA Multi-Core Systems." Thesis, 2014. http://ndltd.ncl.edu.tw/handle/19540666785959365490.

Full text

Abstract:

碩士 國立暨南國際大學 資訊管理學系 101 In a multi-core and multi-processor system, some system resources are shared by processors, such as bus, memory and processor’s cache resources. When processors attempt to access shared resources at the same time, if processors cannot get the required shared resources immediately, resource contention will lead to the performance degradation. Recently, many researches focus on the design of task-aware scheduling for solving the problem of resource contention, and most of them target at Symmetric Multi-Processor (SMP) system. The task-aware scheduler for SMP works poorly under multiprocessor systems with Non-Uniform Memory Access (NUMA) architecture and causes performance degradation. This is because the scheduler does not consider the latency of accessing memory in remote node while performing load-balancing. In this thesis, we have proposed a dynamic task-aware scheduling mechanism for reducing resource contention on NUMA multi-core systems and implemented it in Linux kernel 2.6.33. The main work of our implementation includes modifying the Linux kernel scheduler, setting task types and processor types dynamically, and setting the processor frequencies. In this thesis, we classify each task as a computing-bound task or a memory-bound task according to the task’s usage of cache memory. We also classify each processor as a computing-bound processor or a memory-bound processor for running different types of tasks. Through the modification of Linux kernel, our scheduler dispatches tasks to the corresponding processors according to task type. To avoid the load-unbalancing among processors due to the execution of different amount of computing-bound tasks and memory-bound tasks, a processor’s type would be dynamically adjusted for handling tasks in the system. Besides, in order to reduce power consumption, a processor’s frequency can be dynamically switched according to the processor’s type. Computing-bound processors are set with highest processor frequency to run computing-bound tasks efficiently, whereas, memory-bound processors are set with lowest processor frequency since running memory-bound tasks does not need highest processing capability. Experimental results demonstrate that our dynamic task-aware scheduling mechanism can effectively improve system performance and reduce power consumption by reducing resource contention among processor cores.

APA, Harvard, Vancouver, ISO, and other styles

19

劉美伶. "Random Contention Based Resource Allocation for QoS Concerned Mobile Wireless Body Area Network." Thesis, 2010. http://ndltd.ncl.edu.tw/handle/95744056193227263694.

Full text

Abstract:

碩士 國立交通大學 電子研究所 98 The mobile wireless body area network which is mainly applied to remote health care system provides the ubiquitous medical services of patients’ health monitoring beyond the confines of hospitals and clinics. For the monitoring of vital signals which is sensitive and important, the quality of service (QoS) should be considered to provide the critical level of operations. In this thesis, we propose a resource allocation scheme, random contention based resource allocation (RACOON) algorithm, for QoS concerned wireless body area network (WBAN). Besides of the quality of service control in a single WBAN, the proposed algorithm also considers the issues of the priority difference between WBANs. The algorithm provides the high risk WBAN users have more transmission opportunities than low risk WBAN users to guarantee the real time transmission of critical packets when the bandwidth resource is not enough. A MATLAB simulation platform is built to evaluate the performance of resource allocation of RACOON. Simulation results show that RACOON algorithm can effectively allocate data slots to each WBAN to meet both of its risk level and bandwidth requirements.

APA, Harvard, Vancouver, ISO, and other styles

20

"Intelligent Scheduling and Memory Management Techniques for Modern GPU Architectures." Doctoral diss., 2017. http://hdl.handle.net/2286/R.I.45547.

Full text

Abstract:

abstract: With the massive multithreading execution feature, graphics processing units (GPUs) have been widely deployed to accelerate general-purpose parallel workloads (GPGPUs). However, using GPUs to accelerate computation does not always gain good performance improvement. This is mainly due to three inefficiencies in modern GPU and system architectures. First, not all parallel threads have a uniform amount of workload to fully utilize GPU’s computation ability, leading to a sub-optimal performance problem, called warp criticality. To mitigate the degree of warp criticality, I propose a Criticality-Aware Warp Acceleration mechanism, called CAWA. CAWA predicts and accelerates the critical warp execution by allocating larger execution time slices and additional cache resources to the critical warp. The evaluation result shows that with CAWA, GPUs can achieve an average of 1.23x speedup. Second, the shared cache storage in GPUs is often insufficient to accommodate demands of the large number of concurrent threads. As a result, cache thrashing is commonly experienced in GPU’s cache memories, particularly in the L1 data caches. To alleviate the cache contention and thrashing problem, I develop an instruction aware Control Loop Based Adaptive Bypassing algorithm, called Ctrl-C. Ctrl-C learns the cache reuse behavior and bypasses a portion of memory requests with the help of feedback control loops. The evaluation result shows that Ctrl-C can effectively improve cache utilization in GPUs and achieve an average of 1.42x speedup for cache sensitive GPGPU workloads. Finally, GPU workloads and the co-located processes running on the host chip multiprocessor (CMP) in a heterogeneous system setup can contend for memory resources in multiple levels, resulting in significant performance degradation. To maximize the system throughput and balance the performance degradation of all co-located applications, I design a scalable performance degradation predictor specifically for heterogeneous systems, called HeteroPDP. HeteroPDP predicts the application execution time and schedules OpenCL workloads to run on different devices based on the optimization goal. The evaluation result shows HeteroPDP can improve the system fairness from 24% to 65% when an OpenCL application is co-located with other processes, and gain an additional 50% speedup compared with always offloading the OpenCL workload to GPUs. In summary, this dissertation aims to provide insights for the future microarchitecture and system architecture designs by identifying, analyzing, and addressing three critical performance problems in modern GPUs. Dissertation/Thesis Doctoral Dissertation Computer Engineering 2017

APA, Harvard, Vancouver, ISO, and other styles

Dissertations / Theses on the topic 'Resource contention'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles