Log in

Relevant bibliographies by topics / Computer architecture; branch prediction / Dissertations / Theses

To see the other types of publications on this topic, follow the link: Computer architecture; branch prediction.

Dissertations / Theses on the topic 'Computer architecture; branch prediction'

Author: Grafiati

Published: 4 June 2021

Last updated: 1 February 2022

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 40 dissertations / theses for your research on the topic 'Computer architecture; branch prediction.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.

1

GAO, HONGLIANG. "IMPROVING BRANCH PREDICTION ACCURACY VIA EFFECTIVE SOURCE INFORMATION AND PREDICTION ALGORITHMS." Doctoral diss., University of Central Florida, 2008. http://digital.library.ucf.edu/cdm/ref/collection/ETD/id/3286.

Full text

Abstract:

Modern superscalar processors rely on branch predictors to sustain a high instruction fetch throughput. Given the trend of deep pipelines and large instruction windows, a branch misprediction will incur a large performance penalty and result in a significant amount of energy wasted by the instructions along wrong paths. With their critical role in high performance processors, there has been extensive research on branch predictors to improve the prediction accuracy. Conceptually a dynamic branch prediction scheme includes three major components: a source, an information processor, and a predictor. Traditional works mainly focus on the algorithm for the predictor. In this dissertation, besides novel prediction algorithms, we investigate other components and develop untraditional ways to improve the prediction accuracy. First, we propose an adaptive information processing method to dynamically extract the most effective inputs to maximize the correlation to be exploited by the predictor. Second, we propose a new prediction algorithm, which improves the Prediction by Partial Matching (PPM) algorithm by selectively combining multiple partial matches. The PPM algorithm was previously considered optimal and has been used to derive the upper limit of branch prediction accuracy. Our proposed algorithm achieves higher prediction accuracy than PPM and can be implemented in realistic hardware budget. Third, we discover a new locality existing between the address of producer loads and the outcomes of their consumer branches. We study this address-branch correlation in detail and propose a branch predictor to explore this correlation for long-latency and hard-to-predict branches, which existing branch predictors fail to predict accurately.
Ph.D.
School of Electrical Engineering and Computer Science
Engineering and Computer Science
Computer Science PhD

APA, Harvard, Vancouver, ISO, and other styles

2

Lind, Tobias. "Evaluation of Instruction Prefetch Methods for Coresonic DSP Processor." Thesis, Linköpings universitet, Datorteknik, 2016. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-129128.

Full text

Abstract:

With increasing demands on mobile communication transfer rates the circuits in mobile phones must be designed for higher performance while maintaining low power consumption for increased battery life. One possible way to improve an existing architecture is to implement instruction prefetching. By predicting which instructions will be executed ahead of time the instructions can be prefetched from memory to increase performance and some instructions which will be executed again shortly can be stored temporarily to avoid fetching them from the memory multiple times. By creating a trace driven simulator the existing hardware can be simulated while running a realistic scenario. Different methods of instruction prefetch can be implemented into this simulator to measure how they perform. It is shown that the execution time can be reduced by up to five percent and the amount of memory accesses can be reduced by up to 25 percent with a simple loop buffer and return stack. The execution time can be reduced even further with the more complex methods such as branch target prediction and branch condition prediction.

APA, Harvard, Vancouver, ISO, and other styles

3

Zlatohlávková, Lucie. "Návrh a implementace prostředků pro zvýšení výkonu procesoru." Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2007. http://www.nusl.cz/ntk/nusl-412764.

Full text

Abstract:

This masters thesis is focused on the issue of processor architecture. The ground of this project is a design of a simple processor, which is enriched by modern components in processor architecture such as pipelining, cache memory and branch prediction. The processor has been made in VHDL programming language and was simulated in ModelSim simulation tool.

APA, Harvard, Vancouver, ISO, and other styles

4

Egan, Colin. "Dynamic branch prediction in high performance superscalar processors." Thesis, University of Hertfordshire, 2000. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.340035.

Full text

APA, Harvard, Vancouver, ISO, and other styles

5

Alovisi, Pietro. "Static Branch Prediction through Representation Learning." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-277923.

Full text

Abstract:

In the context of compilers, branch probability prediction deals with estimating the probability of a branch to be taken in a program. In the absence of profiling information, compilers rely on statically estimated branch probabilities, and state of the art branch probability predictors are based on heuristics. Recent machine learning approaches learn directly from source code using natural language processing algorithms. A representation learning word embedding algorithm is built and evaluated to predict branch probabilities on LLVM’s intermediate representation (IR) language. The predictor is trained and tested on SPEC’s CPU 2006 benchmark and compared to state-of-the art branch probability heuristics. The predictor obtains a better miss rate and accuracy in branch prediction than all the evaluated heuristics, but produces and average null performance speedup over LLVM’s branch predictor on the benchmark. This investigation shows that it is possible to predict branch probabilities using representation learning, but more effort must be put in obtaining a predictor with practical advantages over the heuristics.
Med avseende på kompilatorer, handlar branch probability prediction om att uppskatta sannolikheten att en viss förgrening kommer tas i ett program. Med avsaknad av profileringsinformation förlitar sig kompilatorer på statiskt upp- skattade branch probabilities och de främsta branch probability predictors är baserade på heuristiker. Den senaste maskininlärningsalgoritmerna lär sig direkt från källkod genom algoritmer för natural language processing. En algoritm baserad på representation learning word embedding byggs och utvärderas för branch probabilities prediction på LLVM’s intermediate language (IR). Förutsägaren är tränad och testad på SPEC’s CPU 2006 riktmärke och jämförd med de främsta branch probability heuristikerna. Förutsägaren erhåller en bättre frekvens av missar och träffsäkerhet i sin branch prediction har jämförts med alla utvärderade heuristiker, men producerar i genomsnitt ingen prestandaförbättring jämfört med LLVM’s branch predictor på riktmärket. Den här undersökningen visar att det är möjligt att förutsäga branch prediction probabilities med användande av representation learning, men att det behöver satsas mer på att få tag på en förutsägare som har praktiska övertag gentemot heuristiken.

APA, Harvard, Vancouver, ISO, and other styles

6

Jiménez, Daniel Angel. "Delay-sensitive branch predictors for future technologies." Full text (PDF) from UMI/Dissertation Abstracts International, 2002. http://wwwlib.umi.com/cr/utexas/fullcit?p3081043.

Full text

APA, Harvard, Vancouver, ISO, and other styles

7

Carver, Jason W. "Architecture of a prediction economy." Thesis, Massachusetts Institute of Technology, 2008. http://hdl.handle.net/1721.1/45807.

Full text

Abstract:

Thesis (M. Eng.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2008.
Includes bibliographical references.
A design and implementation of a Prediction Economy is presented and compared to alternative designs. A Prediction Economy is composed of prediction markets, market managers, information brokers and automated trading agents. Two important goals of a Prediction Economy are to improve liquidity and information dispersal. Market managers automatically open and close appropriate markets, quickly giving traders access to the latest claims. Information brokers deliver parsed data to the trading agents. The agents execute trades on markets that might not otherwise have much trading action. Some preliminary results from a running Prediction Economy are presented, with binary markets based on football plays during a college football game. The most accurate agent chose to enter 8 of 32 markets, and was able to predict 7 of the 8 football play attempts correctly. Source code for the newly implemented tools is available, as are references to the existing open source tools used.
by Jason W. Carver.
M.Eng.

APA, Harvard, Vancouver, ISO, and other styles

8

Tarlescu, Maria-Dana. "The Elastic History Buffer, a multi-hybrid branch prediction scheme using static classification." Thesis, National Library of Canada = Bibliothèque nationale du Canada, 1999. http://www.collectionscanada.ca/obj/s4/f2/dsk1/tape7/PQDD_0025/MQ50893.pdf.

Full text

APA, Harvard, Vancouver, ISO, and other styles

9

Thankappan, Achary Retnamma Renjith. "Broadcast Mechanism for improving Conditional Branch Prediction in Speculative Multithreaded Processors." PDXScholar, 2010. https://pdxscholar.library.pdx.edu/open_access_etds/368.

Full text

Abstract:

ABSTRACT Many aspects of speculative multithreading have been under constant and crucial research in the recent times with the increased importance in exploiting parallelism in single thread applications. One of the important architectural optimizations that is very pertinent in this scenario is branch prediction. Branch Prediction assumes increased importance for multi-threading systems that execute threads speculatively, since wrong predictions can be much costlier here, in terms of threads, than a few instructions that occupy the pipeline in a uni-processor. Conventional branch prediction techniques have provided increasingly better prediction accuracies for uni-core processing. But the branch prediction itself takes on a whole new dimension when applied to multi-core architectures based on Speculative Multithreading. Dependence on global branch history has helped branch predictors to achieve high prediction accuracy in single thread applications. The discontinuity of global history created at the thread boundaries cripple the performance of branch predictors in a multi-threaded environment. Many studies in the past have tried to address the branch history problem to improve the prediction accuracy. Most of these have been found either to be architecture specific or complex in terms of the hardware needed to recreate or approximate the right history to be given to the threads when they start executing out of order. This hardware overhead increases as the number and size of threads increase thereby limiting the scalability of the algorithms proposed so far. The current thesis takes a different direction and proposes a simple and scalable solution to effectively reduce the misprediction rates in Speculative Multithreaded systems. This is accomplished by making use of a synergistic interaction between threads to boost the inherent biased nature of branches and using less complex hardware to reduce aliasing between branches in the threads. The study proposes a new scheme called the Global Broadcast Buffer scheme to effectively reduce branch mispredictions in Speculative Multithreaded architectures.

APA, Harvard, Vancouver, ISO, and other styles

10

Jothi, Komal. "Dynamic Task Prediction for an SpMT Architecture Based on Control Independence." PDXScholar, 2009. https://pdxscholar.library.pdx.edu/open_access_etds/1707.

Full text

Abstract:

Exploiting better performance from computer programs translates to finding more instructions to execute in parallel. Since most general purpose programs are written in an imperatively sequential manner, closely lying instructions are always data dependent, making the designer look far ahead into the program for parallelism. This necessitates wider superscalar processors with larger instruction windows. But superscalars suffer from three key limitations, their inability to scale, sequential fetch bottleneck and high branch misprediction penalty. Recent studies indicate that current superscalars have reached the end of the road and designers will have to look for newer ideas to build computer processors. Speculative Multithreading (SpMT) is one of the most recent techniques to exploit parallelism from applications. Most SpMT architectures partition a sequential program into multiple threads (or tasks) that can be concurrently executed on multiple processing units. It is desirable that these tasks are sufficiently distant from each other so as to facilitate parallelism. It is also desirable that these tasks are control independent of each other so that execution of a future task is guaranteed in case of local control flow misspeculations. Some task prediction mechanisms rely on the compiler requiring recompilation of programs. Current dynamic mechanisms either rely on program constructs like loop iterations and function and loop boundaries, resulting in unbalanced loads, or predict tasks which are too short to be of use in an SpMT architecture. This thesis is the first proposal of a predictor that dynamically predicts control independent tasks that are consistently wide apart, and executes them on a novel SpMT architecture.

APA, Harvard, Vancouver, ISO, and other styles

11

Kim, Donglok. "Extended data cache prefetching using a reference prediction table /." Thesis, Connect to this title online; UW restricted, 1997. http://hdl.handle.net/1773/6127.

Full text

APA, Harvard, Vancouver, ISO, and other styles

12

Santos, Rafael Ramos dos. "DCE: the dynamic conditional execution in a multipath control independent architecture." reponame:Biblioteca Digital de Teses e Dissertações da UFRGS, 2003. http://hdl.handle.net/10183/5596.

Full text

Abstract:

Esta tese apresenta DCE, ou Execução Dinâmica Condicional, como uma alternativa para reduzir o custo da previsão incorreta de desvios. A idéia básica do modelo apresentado é buscar e executar todos os caminhos de desvios que obedecem à certas restrições no que diz respeito a complexidade e tamanho. Como resultado, tem-se um número menor de desvios sendo previstos e consequentemente um número menor de desvios previstos incorretamente. DCE busca todos os caminhos dos desvios selecionados evitando quebras no fluxo de busca quando estes desvios são buscados. Os caminhos buscados dos desvios selecionados são então executados mas somente o caminho correto é completado. Nesta tese nós propomos uma arquitetura para executar múltiplos caminhos dos desvios selecionados. A seleção dos desvios ocorre baseada no tamanho do desvio e em outras condições. A seleção de desvios simples e complexos permite a predicação dinâmica destes desvios sem a necessidade da existência de um conjunto específico de instruções nem otimizações especiais por parte do compilador. Além disso, é proposta também uma técnica para reduzir a sobrecarga gerada pela execução dos múltiplos caminhos dos desvios selecionados. O desempenho alcançado atinge níveis de até 12% quando um previsor de desvios Local é usado no DCE e um previsor Global é usado na máquina de referência. Quando ambas as máquinas empregam previsão Local, há um aumento de desempenho da ordem de 3-3.5%.
This thesis presents DCE, or Dynamic Conditional Execution, as an alternative to reduce the cost of mispredicted branches. The basic idea is to fetch all paths produced by a branch that obey certain restrictions regarding complexity and size. As a result, a smaller number of predictions is performed, and therefore, a lesser number of branches are mispredicted. DCE fetches through selected branches avoiding disruptions in the fetch flow when these branches are fetched. Both paths of selected branches are executed but only the correct path commits. In this thesis we propose an architecture to execute multiple paths of selected branches. Branches are selected based on the size and other conditions. Simple and complex branches can be dynamically predicated without requiring a special instruction set nor special compiler optimizations. Furthermore, a technique to reduce part of the overhead generated by the execution of multiple paths is proposed. The performance achieved reaches levels of up to 12% when comparing a Local predictor used in DCE against a Global predictor used in the reference machine. When both machines use a Local predictor, the speedup is increased by an average of 3-3.5%.

APA, Harvard, Vancouver, ISO, and other styles

13

Tsardakas, Renhuldt Nikos. "Protein contact prediction based on the Tiramisu deep learning architecture." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-231494.

Full text

Abstract:

Experimentally determining protein structure is a hard problem, with applications in both medicine and industry. Predicting protein structure is also difficult. Predicted contacts between residues within a protein is helpful during protein structure prediction. Recent state-of-the-art models have used deep learning to improve protein contact prediction. This thesis presents a new deep learning model for protein contact prediction, TiramiProt. It is based on the Tiramisu deep learning architecture, and trained and evaluated on the same data as the PconsC4 protein contact prediction model. 228 models using different combinations of hyperparameters were trained until convergence. The final TiramiProt model performs on par with two current state-of-the-art protein contact prediction models, PconsC4 and RaptorX-Contact, across a range of different metrics. A Python package and a Singularity container for running TiramiProt are available at https://gitlab.com/nikos.t.renhuldt/TiramiProt.
Att kunna bestämma proteiners struktur har tillämpningar inom både medicin och industri. Såväl experimentell bestämning av proteinstruktur som prediktion av densamma är svårt. Predicerad kontakt mellan olika delar av ett protein underlättar prediktion av proteinstruktur. Under senare tid har djupinlärning använts för att bygga bättre modeller för kontaktprediktion. Den här uppsatsen beskriver en ny djupinlärningsmodell för prediktion av proteinkontakter, TiramiProt. Modellen bygger på djupinlärningsarkitekturen Tiramisu. TiramiProt tränas och utvärderas på samma data som kontaktprediktionsmodellen PconsC4. Totalt tränades modeller med 228 olika hyperparameterkombinationer till konvergens. Mätt över ett flertal olika parametrar presterar den färdiga TiramiProt-modellen resultat i klass med state-of-the-art-modellerna PconsC4 och RaptorX-Contact. TiramiProt finns tillgängligt som ett Python-paket samt en Singularity-container via https://gitlab.com/nikos.t.renhuldt/TiramiProt.

APA, Harvard, Vancouver, ISO, and other styles

14

Zhang, Xiushan. "L2 cache replacement based on inter-access time per access count prediction." Diss., Online access via UMI:, 2009.

Find full text

APA, Harvard, Vancouver, ISO, and other styles

15

John, Tobias. "Instruction Timing Analysis for Linux/x86-based Embedded and Desktop Systems." Master's thesis, Universitätsbibliothek Chemnitz, 2005. http://nbn-resolving.de/urn:nbn:de:swb:ch1-200501401.

Full text

Abstract:

Real-time aspects are becoming more important in standard desktop PC environments and x86 based processors are being utilized in embedded systems more often. While these processors were not created for use in hard real time systems, they are fast and inexpensive and can be used if it is possible to determine the worst case execution time. Information on CPU caches (L1, L2) and branch prediction architecture is necessary to simulate best and worst cases in execution timing, but is often not detailed enough and sometimes not published at all. This document describes how the underlying hardware can be analysed to obtain this information.

APA, Harvard, Vancouver, ISO, and other styles

16

Khan, Salman. "Putting checkpoints to work in thread level speculative execution." Thesis, University of Edinburgh, 2010. http://hdl.handle.net/1842/4676.

Full text

Abstract:

With the advent of Chip Multi Processors (CMPs), improving performance relies on the programmers/compilers to expose thread level parallelism to the underlying hardware. Unfortunately, this is a difficult and error-prone process for the programmers, while state of the art compiler techniques are unable to provide significant benefits for many classes of applications. An interesting alternative is offered by systems that support Thread Level Speculation (TLS), which relieve the programmer and compiler from checking for thread dependencies and instead use the hardware to enforce them. Unfortunately, data misspeculation results in a high cost since all the intermediate results have to be discarded and threads have to roll back to the beginning of the speculative task. For this reason intermediate checkpointing of the state of the TLS threads has been proposed. When the violation does occur, we now have to roll back to a checkpoint before the violating instruction and not to the start of the task. However, previous work omits study of the microarchitectural details and implementation issues that are essential for effective checkpointing. Further, checkpoints have only been proposed and evaluated for a narrow class of benchmarks. This thesis studies checkpoints on a state of the art TLS system running a variety of benchmarks. The mechanisms required for checkpointing and the costs associated are described. Hardware modifications required for making checkpointed execution efficient in time and power are proposed and evaluated. Further, the need for accurately identifying suitable points for placing checkpoints is established. Various techniques for identifying these points are analysed in terms of both effectiveness and viability. This includes an extensive evaluation of data dependence prediction techniques. The results show that checkpointing thread level speculative execution results in consistent power savings, and for many benchmarks leads to speedups as well.

APA, Harvard, Vancouver, ISO, and other styles

17

Prémillieu, Nathanaël. "Améliorer la performance séquentielle à l’ère des processeurs massivement multicœurs." Thesis, Rennes 1, 2013. http://www.theses.fr/2013REN1S071/document.

Full text

Abstract:

L'omniprésence des ordinateurs et la demande de toujours plus de puissance poussent les architectes processeur à chercher des moyens d'augmenter les performances de ces processeurs. La tendance actuelle est de répliquer sur une même puce plusieurs cœurs d'exécution pour paralléliser l'exécution. Si elle se poursuit, les processeurs deviendront massivement multicoeurs avec plusieurs centaines voire un millier de cœurs disponibles. Cependant, la loi d'Amdahl nous rappelle que l'augmentation de la performance séquentielle sera toujours nécessaire pour améliorer les performances globales. Une voie essentielle pour accroître la performance séquentielle est de perfectionner le traitement des branchements, ceux-ci limitant le parallélisme d'instructions. La prédiction de branchements est la solution la plus étudiée, dont l'intérêt dépend essentiellement de la précision du prédicteur. Au cours des dernières années, cette précision a été continuellement améliorée et a atteint un seuil qu'il semble difficile de dépasser. Une autre solution est d'éliminer les branchements et de les remplacer par une construction reposant sur des instructions prédiquées. L'exécution des instructions prédiquées pose cependant plusieurs problèmes dans les processeurs à exécution dans le désordre, en particulier celui des définitions multiples. Les travaux présentés dans cette thèse explorent ces deux aspects du traitement des branchements. La première partie s'intéresse à la prédiction de branchements. Une solution pour améliorer celle-ci sans augmenter la précision est de réduire le coût d'une mauvaise prédiction. Cela est possible en exploitant la reconvergence de flot de contrôle et l'indépendance de contrôle pour récupérer une partie du travail fait par le processeur sur le mauvais chemin sur les instructions communes aux deux chemins pour éviter de le refaire sur le bon chemin. La deuxième partie s'intéresse aux instructions prédiquées. Nous proposons une solution au problème des définitions multiples qui passe par la prédiction sélective de la valeur des prédicats. Un mécanisme de rejeu sélectif est utilisé pour réduire le coût d'une mauvaise prédiction de prédicat
Computers are everywhere and the need for always more computation power has pushed the processor architects to find new ways to increase performance. The today's tendency is to replicate execution core on the same die to parallelize the execution. If it goes on, processors will become manycores featuring hundred to a thousand cores. However, Amdahl's law reminds us that increasing the sequential performance will always be vital to increase global performance. A perfect way to increase sequential performance is to improve how branches are executed because they limit instruction level parallelism. The branch prediction is the most studied solution, its interest greatly depending on its accuracy. In the last years, this accuracy has been continuously improved up to reach a hardly exceeding limit. An other solution is to suppress the branches by replacing them with a construct based on predicated instructions. However, the execution of predicated instructions on out-of-order processors comes up with several problems like the multiple definition problem. This study investigates these two aspects of the branch treatment. The first part is about branch prediction. A way to improve it without increasing the accuracy is to reduce the coast of a branch misprediction. This is possible by exploiting control flow reconvergence and control independence. The work done on the wrong path on instructions common to the two paths is saved to be reused on the correct path. The second part is about predicated instructions. We propose a solution to the multiple definition problem by selectively predicting the predicate values. A selective replay mechanism is used to reduce the cost of a predicate misprediction

APA, Harvard, Vancouver, ISO, and other styles

18

Li, Ying. "Interest management scheme and prediction model in intelligent transportation systems." Diss., Georgia Institute of Technology, 2012. http://hdl.handle.net/1853/45856.

Full text

Abstract:

This thesis focuses on two important problems related to DDDAS: interest management (data distribution) and prediction models. In order to reduce communication overhead, we propose a new interest management mechanism for mobile peer-to-peer systems. This approach involves dividing the entire space into cells and using an efficient sorting algorithm to sort the regions in each cell. A mobile landmarking scheme is introduced to implement this sort-based scheme in mobile peer-to-peer systems. The design does not require a centralized server, but rather, every peer can become a mobile landmark node to take a server-like role to sort and match the regions. Experimental results show that the scheme has better computational efficiency for both static and dynamic matching. In order to improve communication efficiency, we present a travel time prediction model based on boosting, an important machine learning technique, and combine boosting and neural network models to increase prediction accuracy. We also explore the relationship between the accuracy of travel time prediction and the frequency of traffic data collection with the long term goal of minimizing bandwidth consumption. Several different sets of experiments are used to evaluate the effectiveness of this model. The results show that the boosting neural network model outperforms other predictors.

APA, Harvard, Vancouver, ISO, and other styles

19

Harris, Erick Michael. "Amplifying the Prediction of Team Performance through Swarm Intelligence and Machine Learning." DigitalCommons@CalPoly, 2018. https://digitalcommons.calpoly.edu/theses/1964.

Full text

Abstract:

Modern companies are increasingly relying on groups of individuals to reach organizational goals and objectives, however many organizations struggle to cultivate optimal teams that can maximize performance. Fortunately, existing research has established that group personality composition (GPC), across five dimensions of personality, is a promising indicator of team effectiveness. Additionally, recent advances in technology have enabled groups of humans to form real-time, closed-loop systems that are modeled after natural swarms, like flocks of birds and colonies of bees. These Artificial Swarm Intelligences (ASI) have been shown to amplify performance in a wide range of tasks, from forecasting financial markets to prioritizing conflicting objectives. The present research examines the effects of group personality composition on team performance and investigates the impact of measuring GPC through ASI systems. 541 participants, across 111 groups, were administered a set of well-accepted and vetted psychometric assessments to capture the personality configurations and social sensitivities of teams. While group-level personality averages explained 10% of the variance in team performance, when group personality composition was measured through human swarms, it was able to explain 29% of the variance, representing a 19% amplification in predictive capacity. Finally, a series of machine learning models were applied and trained to predict group effectiveness. Multivariate Linear Regression and Logistic Regression achieved the highest performance exhibiting 0.19 mean squared error and 81.8% classification accuracy.

APA, Harvard, Vancouver, ISO, and other styles

20

White, Cory B. "A Neural Network Approach to Border Gateway Protocol Peer Failure Detection and Prediction." DigitalCommons@CalPoly, 2009. https://digitalcommons.calpoly.edu/theses/215.

Full text

Abstract:

The size and speed of computer networks continue to expand at a rapid pace, as do the corresponding errors, failures, and faults inherent within such extensive networks. This thesis introduces a novel approach to interface Border Gateway Protocol (BGP) computer networks with neural networks to learn the precursor connectivity patterns that emerge prior to a node failure. Details of the design and construction of a framework that utilizes neural networks to learn and monitor BGP connection states as a means of detecting and predicting BGP peer node failure are presented. Moreover, this framework is used to monitor a BGP network and a suite of tests are conducted to establish that this neural network approach as a viable strategy for predicting BGP peer node failure. For all performed experiments both of the proposed neural network architectures succeed in memorizing and utilizing the network connectivity patterns. Lastly, a discussion of this framework's generic design is presented to acknowledge how other types of networks and alternate machine learning techniques can be accommodated with relative ease.

APA, Harvard, Vancouver, ISO, and other styles

21

Kalaitzidis, Kleovoulos. "Advanced speculation to increase the performance of superscalar processors." Thesis, Rennes 1, 2020. http://www.theses.fr/2020REN1S007.

Full text

Abstract:

Même à l’ère des multicœurs, il est primordial d’améliorer la performance en contexte monocœur, étant donné l’existence de pro- grammes qui exposent des parties séquentielles non négligeables. Les performances séquentielles se sont essentiellement améliorées avec le passage à l’échelle des structures de processeurs qui permettent le parallélisme d’instructions (ILP). Cependant, les chaînes de dépendances séquentielles li- mitent considérablement la performance. La prédiction de valeurs (VP) et la prédiction d’adresse des lectures mémoire (LAP) sont deux techniques en développement qui permettent de surmonter cet obstacle en permettant l’exécution d’instructions en spéculant sur les données. Cette thèse propose des mécanismes basés sur VP et LAP qui conduisent à des améliorations de performances sensiblement plus élevées. D’abord, VP est examiné au niveau de l’ISA, ce qui fait apparaître l’impact de certaines particularités de l’ISA sur les performances. Ensuite, un nouveau prédicteur binaire (VSEP), qui permet d’exploiter certains motifs de valeurs, qui bien qu’ils soient fréquemment rencontrés, ne sont pas capturés par les modèles précédents, est introduit. VSEP améliore le speedup obtenu de 19% et, grâce à sa structure, il atténue le coût de la prédiction de va- leurs supérieures à 64 bits. Adapter cette approche pour effectuer LAP permet de prédire les adresses de 48% des lectures mémoire. Finalement, une microarchitecture qui exploite soigneusement ce mécanisme de LAP peut exécuter 32% des lectures mémoire en avance
Even in the multicore era, making single cores faster is paramount to achieve high- performance computing, given the existence of programs that are either inherently sequential or expose non-negligible sequential parts. Sequential performance has been essentially improving with the scaling of the processor structures that enable instruction-level parallelism (ILP). However, as modern microarchitectures continue to extract more ILP by employing larger instruction windows, true data dependencies remain a major performance bottleneck. Value Prediction (VP) and Load-Address Prediction (LAP) are two developing techniques that allow to overcome this obstacle and harvest more ILP by enabling the execution of instructions in a data-wise speculative manner. This thesis proposes mechanisms that are related with VP and LAP and lead to effectively higher performance improvements. First, VP is examined in an ISA-aware manner, that discloses the impact of certain ISA particularities on the anticipated speedup. Second, a novel binary-based VP model is introduced, namely VSEP, that allows to exploit certain value patterns that although they are encountered frequently, they cannot be captured by previous works. VSEP improves the obtained speedup by 19% and also, by virtue of its structure, it mitigates the cost of predicting values wider than 64 bits. By adapting this approach to perform LAP allows to predict the memory addresses of 48% of the committed loads. Eventually, a microarchitecture that leverages carefully this LAP mechanism can execute 32% of the committed loads early

APA, Harvard, Vancouver, ISO, and other styles

22

Luo, Meiling. "Indoor radio propagation modeling for system performance prediction." Phd thesis, INSA de Lyon, 2013. http://tel.archives-ouvertes.fr/tel-00961244.

Full text

Abstract:

This thesis aims at proposing all the possible enhancements for the Multi-Resolution Frequency-Domain ParFlow (MR-FDPF) model. As a deterministic radio propagation model, the MR-FDPF model possesses the property of a high level of accuracy, but it also suffers from some common limitations of deterministic models. For instance, realistic radio channels are not deterministic but a kind of random processes due to, e.g. moving people or moving objects, thus they can not be completely described by a purely deterministic model. In this thesis, a semi-deterministic model is proposed based on the deterministic MR-FDPF model which introduces a stochastic part to take into account the randomness of realistic radio channels. The deterministic part of the semi-deterministic model is the mean path loss, and the stochastic part comes from the shadow fading and the small scale fading. Besides, many radio propagation simulators provide only the mean power predictions. However, only mean power is not enough to fully describe the behavior of radio channels. It has been shown that fading has also an important impact on the radio system performance. Thus, a fine radio propagation simulator should also be able to provide the fading information, and then an accurate Bit Error Rate (BER) prediction can be achieved. In this thesis, the fading information is extracted based on the MR-FDPF model and then a realistic BER is predicted. Finally, the realistic prediction of the BER allows the implementation of the adaptive modulation scheme. This has been done in the thesis for three systems, the Single-Input Single-Output (SISO) systems, the Maximum Ratio Combining (MRC) diversity systems and the wideband Orthogonal Frequency-Division Multiplexing (OFDM) systems.

APA, Harvard, Vancouver, ISO, and other styles

23

Khan, Taj Muhammad. "Processor design-space exploration through fast simulation." Phd thesis, Université Paris Sud - Paris XI, 2011. http://tel.archives-ouvertes.fr/tel-00691175.

Full text

Abstract:

Simulation is a vital tool used by architects to develop new architectures. However, because of the complexity of modern architectures and the length of recent benchmarks, detailed simulation of programs can take extremely long times. This impedes the exploration of processor design space which the architects need to do to find the optimal configuration of processor parameters. Sampling is one technique which reduces the simulation time without adversely affecting the accuracy of the results. Yet, most sampling techniques either ignore the warm-up issue or require significant development effort on the part of the user.In this thesis we tackle the problem of reconciling state-of-the-art warm-up techniques and the latest sampling mechanisms with the triple objective of keeping the user effort minimum, achieving good accuracy and being agnostic to software and hardware changes. We show that both the representative and statistical sampling techniques can be adapted to use warm-up mechanisms which can accommodate the underlying architecture's warm-up requirements on-the-fly. We present the experimental results which show an accuracy and speed comparable to latest research. Also, we leverage statistical calculations to provide an estimate of the robustness of the final results.

APA, Harvard, Vancouver, ISO, and other styles

24

Hassan, Ahmed. "Mining Software Repositories to Assist Developers and Support Managers." Thesis, University of Waterloo, 2004. http://hdl.handle.net/10012/1017.

Full text

Abstract:

This thesis explores mining the evolutionary history of a software system to support software developers and managers in their endeavors to build and maintain complex software systems. We introduce the idea of evolutionary extractors which are specialized extractors that can recover the history of software projects from software repositories, such as source control systems. The challenges faced in building C-REX, an evolutionary extractor for the C programming language, are discussed. We examine the use of source control systems in industry and the quality of the recovered C-REX data through a survey of several software practitioners. Using the data recovered by C-REX, we develop several approaches and techniques to assist developers and managers in their activities. We propose Source Sticky Notes to assist developers in understanding legacy software systems by attaching historical information to the dependency graph. We present the Development Replay approach to estimate the benefits of adopting new software maintenance tools by reenacting the development history. We propose the Top Ten List which assists managers in allocating testing resources to the subsystems that are most susceptible to have faults. To assist managers in improving the quality of their projects, we present a complexity metric which quantifies the complexity of the changes to the code instead of quantifying the complexity of the source code itself. All presented approaches are validated empirically using data from several large open source systems. The presented work highlights the benefits of transforming software repositories from static record keeping repositories to active repositories used by researchers to gain empirically based understanding of software development, and by software practitioners to predict, plan and understand various aspects of their project.

APA, Harvard, Vancouver, ISO, and other styles

25

Wang, Yaou. "Failure mechanism and reliability prediction for bonded layered structure due to cracks initiating at the interface." Columbus, Ohio : Ohio State University, 2009. http://rave.ohiolink.edu/etdc/view?acc%5Fnum=osu1236645979.

Full text

APA, Harvard, Vancouver, ISO, and other styles

26

Vaswani, Kapil. "An Adaptive Recompilation Framework For Rotor And Architectural Support For Online Program Instrumentation." Thesis, Indian Institute of Science, 2003. http://hdl.handle.net/2005/174.

Full text

Abstract:

Microsoft Research
Although runtime systems and the dynamic compilation model have revolutionized the process of application development and deployment, the associated performance overheads continue to be a cause for concern and much research. In the first part of this thesis, we describe the design and implementation of an adaptive recompilation framework for Rotor, a shared source implementation of the Common Language Infrastructure (CLI) that can increase program performance through intelligent recompilation decisions and optimizations based on the program's past behavior. Our extensions to Rotor include a low overhead runtime-stack based sampling profiler that identifies program hotspots. A recompilation controller oversees the recompilation process and generates recompilation requests. At the first-level of a multi-level optimizing compiler, code in the intermediate language is converted to an internal intermediate representation and optimized using a set of simple transformations. The compiler uses a fast yet effective linear scan algorithm for register allocation. Hot methods can be instrumented in order to collect basic-block, edge and call-graph profile information. Profile-guided optimizations driven by online profile information are used to further optimize heavily executed methods at the second level of recompilation. An evaluation of the framework using a set of test programs shows that performance can improve by a maximum of 42.3% and by 9% on average. Our results also show that the overheads of collecting accurate profile information through instrumentation to an extent outweigh the benefits of profile-guided optimizations in our implementation, suggesting the need for implementing techniques that can reduce such overheads. A flexible and extensible framework design implies that additional profiling and optimization techniques can be easily incorporated to further improve performance. As previously stated, fine-grained and accurate profile information must be available at low cost for advanced profile-guided optimizations to be effective in online environments. In this second part of this thesis, we propose a generic framework that makes it possible for instrumentation based profilers to collect profile data efficiently, a task that has traditionally been associated with high overheads. The essence of the scheme is to make the underlying hardware aware of instrumentation using a special set of profile instructions and tuned microarchitecture. This not only allows the hardware to provide the runtime with mechanisms to control the profiling activity, but also makes it possible for the hardware itself to optimize the process of profiling in a manner transparent to the runtime. We propose selective instruction dispatch as one possible controlling mechanism that can be used by the runtime to manage the execution of profile instructions and keep profiling overheads under check. We propose profile flag prediction, a hardware optimization that complements the selective dispatch mechanism by not fetching profile instructions when the runtime has turned profiling off. The framework is light-weight and flexible. It eliminates the need for expensive book-keeping, recompilation or code duplication. Our simulations with benchmarks from the SPEC CPU2000 suite show that overheads for call-graph and basic block profiling can be reduced by 72.7% and 52.4% respectively with a negligible loss in accuracy.

APA, Harvard, Vancouver, ISO, and other styles

27

Valiukas, Tadas. "Kompiliatorių optimizavimas IA-64 architektūroje." Master's thesis, Lithuanian Academic Libraries Network (LABT), 2014. http://vddb.library.lt/obj/LT-eLABa-0001:E.02~2009~D_20140701_180746-19336.

Full text

Abstract:

Tradicinės x86 architektūros spartinimui artėjant prie galimybių ribos, kompanija Intel pradėjo kurti naują IA-64 architektūrą, paremtą EPIC – išreikštinai lygiagrečiai vykdomomis instrukcijomis vieno takto metu. Ši pagrindinė savybė leidžia vykdyti iki šešių instrukcijų per vieną taktą. Taipogi architektūra pasižymi tokiomis savybėmis, kurios leido efektyviai spręsti su kodo optimizavimu susijusias problemas tradicinėse architektūrose. Tačiau kompiliatorių optimizavimo algoritmai ilgą laiką buvo tobulinami tradicinėse architektūrose, todėl norint išnaudoti naująją architektūrą, reikia ieškoti būdų tobulinti esamus kompiliatorius. Vienas iš būdų – kompiliatoriaus vidinių parametrų atsakingų už optimizacijas reikšmių pritaikymas IA-64. Būtent toks yra šio darbo tikslas, kuriam pasiekti reikia išnagrinėti IA-64 savybes, jas vėliau eksperimentiškai taikyti realaus kodo pavyzdžiuose bei įvertinti jų įtaką kodo vykdymo spartai. Pagal gautus rezultatus nagrinėjami kompiliatoriaus vidiniai parametrai ir su specialia kompiliatorių testavimo programa randamas geriausias reikšmių rinkinys šiai architektūrai. Vėliau šis rinkinys išbandomas su taikomosiomis programomis. Gauto parametrų rinkinio reikšmės turėtų leisti generuoti efektyvesnį kodą IA-64 architektūrai.
After performance optimization of traditional architectures began to reach their limits, Intel corporation started to develop new architecture based on EPIC – Explicitly Parallel Instruction Counting. This main feature allowed up to six instructions to be executed in single CPU cycle. Also this architecture includes more features, which allowed efficient solution of traditional architectures code optimization problems. However for long time code optimization algorithms have been improved for traditional architectures only, as a result those algorithms should be adopted to new architecture. One of the ways to do that – exploration of internal compilers parameters, which are responsible for code optimizations. That is the primary target of this work and in order to reach it the features of the IA-64 architecture and impact to execution performance must be explored using real-life code examples. Tests results may be used later for internal parameters selection and further exploration of these parameters values by using special compiler performance testing benchmarks. The set of those new values could be tested with real life applications in order to prove efficiency of IA-64 architecture features.

APA, Harvard, Vancouver, ISO, and other styles

28

Li, Chong. "Un modèle de transition logico-matérielle pour la simplification de la programmation parallèle." Phd thesis, Université Paris-Est, 2013. http://tel.archives-ouvertes.fr/tel-00952082.

Full text

Abstract:

La programmation parallèle et les algorithmes data-parallèles sont depuis plusieurs décennies les principales techniques de soutien l'informatique haute performance. Comme toutes les propriétés non-fonctionnelles du logiciel, la conversion des ressources informatiques dans des performances évolutives et prévisibles implique un équilibre délicat entre abstraction et automatisation avec une précision sémantique. Au cours de la dernière décennie, de plus en plus de professions ont besoin d'une puissance de calcul très élevée, mais la migration des programmes existants vers une nouvelle configuration matérielle et le développement de nouveaux algorithmes à finalité spécifique dans un environnement parallèle n'est jamais un travail facile, ni pour les développeurs de logiciel, ni pour les spécialistes du domaine. Dans cette thèse, nous décrivons le travail qui vise à simplifier le développement de programmes parallèles, en améliorant également la portabilité du code de programmes parallèles et la précision de la prédiction de performance d'algorithmes parallèles pour des environnements hétérogènes. Avec ces objectifs à l'esprit, nous avons proposé un modèle de transition nommé SGL pour la modélisation des architectures parallèles hétérogènes et des algorithmes parallèles, et une mise en œuvre de squelettes parallèles basés sur le modèle SGL pour le calcul haute performance. SGL simplifie la programmation parallèle à la fois pour les machines parallèles classiques et pour les nouvelles machines hiérarchiques. Il généralise les primitives de la programmation BSML. SGL pourra plus tard en utilisant des techniques de Model-Driven pour la génération de code automatique á partir d'une fiche technique sans codage complexe, par exemple pour le traitement de Big-Data sur un système hétérogène massivement parallèle. Le modèle de coût de SGL améliore la clarté de l'analyse de performance des algorithmes, permet d'évaluer la performance d'une machine et la qualité d'un algorithme

APA, Harvard, Vancouver, ISO, and other styles

29

Chu, Yul. "Cache and branch prediction improvements for advanced computer architecture." Thesis, 2001. http://hdl.handle.net/2429/13689.

Full text

Abstract:

As the gap between memory and processor performance continues to grow, more and more programs will be limited in performance: by the memory latency of the system and by the branch instructions (control flow of the programs). Meanwhile, due to the increase in complexity of application programs over the last decade, object-oriented languages are replacing traditional languages because of convenient code reusability and maintainability. However, it has also been observed that the run-time performance of object-oriented programs can be improved by reducing the impact caused by the memory latency, branch misprediction, and several other factors. In this thesis, two new schemes are introduced for reducing the memory latency and branch mispredictions for High Performance Computing (HPC). For the first scheme, in order to reduce the memory latency, this thesis presents a new cache scheme called TAC (Thrashing-Avoidance Cache), which can effectively reduce instruction cache misses caused by procedure call/returns. The TAC scheme employs N-way banks and XOR mapping functions. The main function of the TAC is to place a group of instructions separated by a call instruction into a bank according to the initial and final bank selection mechanisms. After the initial bank selection mechanism selects a bank on an instruction cache miss, the final bank selection mechanism will determine the final bank for updating a cache line as a correction mechanism. These two mechanisms can guarantee that recent groups of instructions exist in each bank safely. A simulation program, TACSim, has been developed by using Shade and Spixtools, provided by SUN Microsystems, on an ultra SPARC/10 processor. Our experimental results show that TAC schemes reduce conflict misses more effectively than skewed-associative caches in both C (9.29% improvement) and C++ (44.44% improvement) programs on LI caches. In addition, TAC schemes also allow for a significant miss reduction on Branch Target Buffers (BTB). For the second scheme to reduce branch mispredictions, this thesis also presents a new hybrid branch predictor called the GoStay2 that can effectively reduce misprediction rates for indirect branches. The GoStay2 has two different mechanisms compared to other 2-stage hybrid predictors that use a Branch Target Buffer (BTB) as the first stage predictor: First, to reduce conflict misses in the first stage, an effective 2-way cache scheme is used instead of a 4-way set-associative scheme. Second, to reduce mispredictions caused by an inefficient predict and update rule, a new selection mechanism and update rule are proposed. A simulation program, GoS-Sim, has been developed by using Shade and Spixtools, provided by SUN Microsystems, on an Ultra SPARC/10 processor. Our results show significant improvement with these mechanisms compared to other hybrid predictors. For example, the GoStay2 improves indirect misprediction rates of a 64-entry to 4K-entry BTB (with a 512- or lK-entry PHT) by 14.9% to 21.53% compared to the Cascaded predictor (with leaky filter).

APA, Harvard, Vancouver, ISO, and other styles

30

"Extending branch prediction information to effective caching." Chinese University of Hong Kong, 1996. http://library.cuhk.edu.hk/record=b5888775.

Full text

Abstract:

by Chung-Leung, Chiu.
Thesis (M.Phil.)--Chinese University of Hong Kong, 1996.
Includes bibliographical references (leaves 110-113).
Abstract --- p.i
Acknowledgement --- p.iii
Chapter 1 --- Introduction --- p.1
Chapter 1.1 --- Partial Basic Block Storing Mechanism --- p.1
Chapter 1.2 --- Data-Tagged Mechanism in Branch Target Buffer --- p.4
Chapter 1.3 --- Organization of the dissertation --- p.5
Chapter 2 --- Related Research --- p.7
Chapter 2.1 --- Branch Prediction --- p.7
Chapter 2.2 --- Branch History Table --- p.8
Chapter 2.2.1 --- Performance of Branch History Table in reducing the Branch Penalty --- p.10
Chapter 2.3 --- Branch Target Cache --- p.10
Chapter 2.4 --- Early Resolution of Branch --- p.11
Chapter 2.5 --- Software Inter-block Reorganization --- p.12
Chapter 2.6 --- Branch Target Buffer --- p.13
Chapter 2.7 --- Data Prefetching --- p.16
Chapter 2.7.1 --- Software-Directed Prefetching --- p.16
Chapter 2.7.2 --- Hardware-based prefetching --- p.17
Chapter 3 --- New Branch Target Buffer Design --- p.19
Chapter 3.1 --- Alternate Line Storing --- p.22
Chapter 3.2 --- Storing More Than One Line On Entering The Dynamic Basic Block --- p.27
Chapter 4 --- Simulation Environment for New Branch Target Buffer Design --- p.30
Chapter 4.1 --- Architectural Models and Assumptions --- p.30
Chapter 4.2 --- Memory Models --- p.33
Chapter 4.3 --- Evaluation Methodology and Measurement Criteria --- p.34
Chapter 4.4 --- Description of the Traces --- p.35
Chapter 4.5 --- Effect of the limitation of ATOM on the statistics of SPEC92 Bench- marks --- p.35
Chapter 4.6 --- Environments for collecting relevant statistics of SPEC92 Benchmarks --- p.36
Chapter 5 --- Results for New Branch Target Buffer Design --- p.38
Chapter 5.1 --- Statistical Results and Analysis for SPEC92 Benchmarks --- p.38
Chapter 5.2 --- Overall Performance --- p.39
Chapter 5.3 --- Bus Latency Effect --- p.42
Chapter 5.4 --- Effect of Cache Size --- p.45
Chapter 5.5 --- Effect of Line Size --- p.47
Chapter 5.6 --- Cache Set Associativity --- p.50
Chapter 5.7 --- Partial Hits --- p.50
Chapter 5.8 --- Prefetch Accuracy --- p.53
Chapter 5.9 --- Effect of Prefetch Buffer Size --- p.54
Chapter 5.10 --- Effect of Storing More Than One Line on Entry of New Dynamic Basic Block --- p.56
Chapter 6 --- Data References Tagged into Branch Target Buffer --- p.60
Chapter 6.1 --- Branch History Table Tagged Mechanism --- p.60
Chapter 6.2 --- Lookahead Technique --- p.65
Chapter 6.3 --- Default Prefetches Vs Data-tagged Prefetches --- p.71
Chapter 6.4 --- New Priority Scheme --- p.73
Chapter 7 --- Architectural Model for Data-Tagged References in Branch Target Buffer --- p.74
Chapter 7.1 --- Architectural Models and Assumptions --- p.76
Chapter 7.2 --- Memory Models --- p.79
Chapter 7.3 --- Evaluation Methodology and Measurement Criteria --- p.79
Chapter 7.4 --- Description of the Traces --- p.80
Chapter 7.5 --- Environments for collecting relevant statistics of SPEC92 Benchmarks --- p.80
Chapter 8 --- Results for Data References Tagged into Branch Target Buffer --- p.82
Chapter 8.1 --- Statistical Results and Analysis --- p.82
Chapter 8.2 --- Overall Performance --- p.83
Chapter 8.3 --- Effect of Branch Prediction --- p.85
Chapter 8.4 --- Effect of Number of Tagged Registers --- p.87
Chapter 8.5 --- Effect of Different Tagged Positions in Basic Block --- p.90
Chapter 8.6 --- Effect of Lookahead Size --- p.91
Chapter 8.7 --- Prefetch Accuracy --- p.93
Chapter 8.8 --- Cache Size --- p.95
Chapter 8.9 --- Line Size --- p.96
Chapter 8.10 --- Set Associativity --- p.97
Chapter 8.11 --- Size of Branch History Table --- p.99
Chapter 8.12 --- Set Associativity of Branch History Table --- p.99
Chapter 8.13 --- New Priority Scheme Vs Default Priority Scheme --- p.102
Chapter 8.14 --- Effect of Prefetch-On-Miss --- p.103
Chapter 8.15 --- Memory Latency --- p.104
Chapter 9 --- Conclusions and Future Research --- p.106
Chapter 9.1 --- Conclusions --- p.106
Chapter 9.2 --- Future Research --- p.108
Bibliography --- p.110
Appendix --- p.114
Chapter A --- Statistical Results - SPEC92 Benchmarks --- p.114
Chapter A.1 --- Definition of Abbreviations and Terms --- p.114

APA, Harvard, Vancouver, ISO, and other styles

31

Sadooghi-Alvandi, Maryam. "Exploring Virtualization Techniques for Branch Outcome Prediction." Thesis, 2011. http://hdl.handle.net/1807/31424.

Full text

Abstract:

Modern processors use branch prediction to predict branch outcomes, in order to fetch ahead in the instruction stream, increasing concurrency and performance. Larger predictor tables can improve prediction accuracy, but come at the cost of larger area and longer access delay. This work introduces a new branch predictor design that increases the perceived predictor capacity without increasing its delay, by using a large virtual second-level table allocated in the second-level caches. Virtualization is applied to a state-of-the-art multi- table branch predictor. We evaluate the design using instruction count as proxy for timing on a set of commercial workloads. For a predictor whose size is determined by access delay constraints rather than area, accuracy can be improved by 8.7%. Alternatively, the design can be used to achieve the same accuracy as a non-virtualized design while using 25% less dedicated storage.

APA, Harvard, Vancouver, ISO, and other styles

32

Dropsho, Steven George. "Enhancing branch prediction via on-line statistical analysis." 2002. https://scholarworks.umass.edu/dissertations/AAI3039351.

Full text

Abstract:

To attain peak efficiency, high performance processors must anticipate changes in the flow of control before they actually occur. Branch prediction is the method of determining the most likely path to be taken at branch decision points in the pro gram. Many branch prediction mechanisms have been proposed. The most effective of these use a single tabular data structure in hardware to hold historical information regarding the behaviors of branches. The table is a limited resource, and is managed in a manner that can result in multiple branches attempting to share locations in the table in a conflicting manner. These conflicts are a major factor in the degradation of the accuracy of branch predictions. Such conflicts are called aliasing and much work has been done to reduce its occurrence. In this work, we present a novel method of analyzing the effectiveness of the various prediction methods relative to aliasing. The technique is based on a concept we introduce, branch entropy. With this concept, we develop an algorithm that efficiently determines combinations of predictor components that improve overall accuracy by adjusting the aggressiveness of the predictor to manage the degree of aliasing within the table, and thus optimize overall prediction accuracy. One result of this work is that identifying branches that are highly biased towards being taken (or not taken) and separately predicting them with a simple designation for their preferred direction, so that they are prevented from accessing the predictor table is the single most effective method for reducing aliasing and improving accuracy. From this result, we develop a design that automatically controls this process via a novel on-line profiling technique. Based on a robust statistical framework, our on-line profiling technique performs as well as off-line profiling that uses perfect feedback information. We show that this statistics-based method can also be applied to other, similar tasks by performing on-line profiling of objects in Java garbage collection for pretenuring into a mature object space.

APA, Harvard, Vancouver, ISO, and other styles

33

Lin, Chih-Ho, and 林志和. "A Study of Branch Prediction and Fetch Policy on Simultaneous Multithreading Architecture." Thesis, 2004. http://ndltd.ncl.edu.tw/handle/52310660014822550172.

Full text

Abstract:

碩士
大同大學
資訊工程學系(所)
92
In the present computer architecture, speculation execution is the general and effective way to handle the branch problem that using a branch prediction mechanism predicts the result of the branch instructions. The performance improvement from the speculation execution relies on the prediction accuracy. However, it may have different prediction behavior in the simultaneous multithreading (SMT) architecture. The SMT is the computer architecture that combines hardware features of wide-issue superscalar and multithreaded architecture. Thus SMT can issue instructions from multiple threads each cycle. Both the instructions-level and thread-level parallelism are exploited by dynamically sharing the hardware resource in this architecture. The features of SMT architecture and branch prediction on the architecture are the primary focus of this study. In this thesis, we propose a branch prediction mechanism with biased branch filter and confidence estimator to reduce the competition for branch predictor between thread and classify conditional branches as biased or confident branches. And then fetch unit that plays an important role in the SMT architecture decides which threads to fetch instructions from each cycle according to the information from our proposed branch prediction mechanism. Simulation shows that our proposed scheme reduces about 51% fetched instructions from wrong path at most and raises the average prediction accuracy over 91%.

APA, Harvard, Vancouver, ISO, and other styles

34

Jiménez, Daniel Angel 1969. "Delay-sensitive branch predictors for future technologies." 2002. http://hdl.handle.net/2152/11063.

Full text

APA, Harvard, Vancouver, ISO, and other styles

35

Sharma, Saurabh. "Spectral prediction: a signals approach to computer architecture prefetching /." 2006. http://www.lib.ncsu.edu/theses/available/etd-08092006-112725/unrestricted/etd.pdf.

Full text

APA, Harvard, Vancouver, ISO, and other styles

36

Homayoun, Houman. "Using lazy instruction prediction to reduce processor wakeup power dissipation." 2005. http://hdl.handle.net/1828/581.

Full text

APA, Harvard, Vancouver, ISO, and other styles

37

"The design of PABX with LAN architecture." Chinese University of Hong Kong, 1992. http://library.cuhk.edu.hk/record=b5886985.

Full text

Abstract:

Ko Wing Hoi.
Duplicate numbering of leave 67.
Thesis (M.Sc.)--Chinese University of Hong Kong, 1992.
Includes bibliographical references (leaves 71-72).
Chapter 1. --- INTRODUCTION --- p.1
Chapter 2. --- COMPARISONS OF LAN AND PABX --- p.3
Chapter 2.1 --- Typical LAN system --- p.3
Chapter 2.1.1 --- Characteristics of a LAN [1] --- p.3
Chapter 2.1.2 --- Transmission medium of LAN --- p.5
Chapter 2.1.3 --- LAN access control methods --- p.6
Chapter 2.1.4 --- Interfacing to the LAN --- p.8
Chapter 2.1.5 --- LAN topology --- p.8
Chapter 2.1.6 --- Switching techniques --- p.9
Chapter 2 .2 --- Applications of LAN --- p.11
Chapter 2.2.1 --- Small filestore LAN's --- p.12
Chapter 2.2.2 --- Wiring replacement LAN's --- p.12
Chapter 2.2.3 --- Personal computer networks --- p.13
Chapter 2.2.4 --- General purpose LAN's --- p.13
Chapter 2 .3 --- Typical PABX system --- p.14
Chapter 2.3.1 --- PABX topology --- p.15
Chapter 2.3.2 --- Circuit switching --- p.15
Chapter 2.3.3 --- Telephony signalling --- p.16
Chapter 2.3.3.1 --- Pulsing --- p.16
Chapter 2.3.3.2 --- Subscriber loop signaling [2] --- p.17
Chapter 2.3.4 --- ISDN (Integrated Services Digital Network) --- p.19
Chapter 2.4 --- Applications of PABX --- p.21
Chapter 2.5 --- Comparisons of LAN and PABX --- p.22
Chapter 3. --- INTEGRATION OF PABX WITH LAN --- p.25
Chapter 3.1 --- Advantages of integration of PABX with LAN --- p.25
Chapter 3.1.1. --- LAN-PABX Gateway --- p.28
Chapter 3.1.2. --- Problems in interconnecting PABX and LAN [6] --- p.29
Chapter 3.1.3. --- ISDN-PABX [7] --- p.30
Chapter 3.2 --- Architecture of Integrated LAN and PABX --- p.31
Chapter 3.3 --- Typical applications --- p.32
Chapter 4. --- CALL PROCESSING --- p.35
Chapter 4.1 --- Finite State Diagrams for voice calls --- p.37
Chapter 4.2 --- SDL representations of voice calls --- p.39
Chapter 4.3 --- Software implementations of SDL diagrams --- p.40
Chapter 4.3.1 --- PABX operating system --- p.40
Chapter 4.3.2 --- Trunk operating system --- p.43
Chapter 4.3.3 --- Message format --- p.43
Chapter 4.4 --- Pseudo codes for PABX --- p.45
Chapter 4.4 --- Pseudo codes for trunks --- p.52
Chapter 5. --- HARDWARE IMPLEMENTATION --- p.57
Chapter 5.1 --- TRUNK INTERFACE --- p.58
Chapter 5.1.1 --- PABX to CO call --- p.58
Chapter 5.1.2 --- CO to PABX call --- p.59
Chapter 5.2 --- Subscriber Interface Circuit --- p.59
Chapter 5.4 --- PSTN Trunk Interf ace --- p.60
Chapter 6. --- CONCLUSIONS --- p.62
Acknowledgements --- p.64
APPENDIX A --- p.65
CCITT SPECIFICATION AND DESCRIPTION LANGUAGE [15] --- p.65
APPENDIX B --- p.68
"SIGNALLING FOR SWITCHING SYSTEMS IN HK [16],[17]" --- p.68
Chapter B. 1 --- Tone plan --- p.68
Chapter B. 2 --- Tone levels --- p.68
Chapter B. 3 --- Ringing frequency and voltage --- p.68
Chapter B. 4 --- Dial pulse --- p.68
Chapter B. 5 --- DTMF (Dual-tone multi-frequency) --- p.69
Chapter B. 6 --- PCM coding --- p.69
REFERENCES --- p.71

APA, Harvard, Vancouver, ISO, and other styles

38

"Computation of daylighting for architecture: the impact of computer-based design tools for daylighting simulation and prediction for a built environment." 2000. http://library.cuhk.edu.hk/record=b5890323.

Full text

Abstract:

by Chow Ka-Ming.
Thesis (M.Phil.)--Chinese University of Hong Kong, 2000.
Includes bibliographical references (leaves 70-73).
Abstracts in English and Chinese.
Abstract --- p.i
Acknowledgement --- p.iii
Contents --- p.iv
Chapter 1 --- INTRODUCTION --- p.1
Chapter 1.1 --- DAYLIGHT --- p.1
Chapter 1.2 --- DAYLIGHTING DESIGN --- p.1
Chapter 1.3 --- COMPUTER SIMULATION AND RENDERING --- p.2
Chapter 1.4 --- COMPUTER-BASED DAYLIGHTING DESIGN --- p.3
Chapter 1.5 --- SCOPE --- p.3
Chapter 1.6 --- SIGNIFICANCE --- p.4
Chapter 2 --- LITERATURE REVIEW --- p.5
Chapter 2.1 --- COMPUTER-BASED DAYLIGHTING DESIGN TOOLS --- p.6
Chapter 2.1.1 --- Graphic User Interface and Pre-defined Scenarios --- p.6
Chapter 2.1.2 --- Performance-based Daylighting Simulation --- p.7
Chapter 2.2 --- RADIANCE VALIDATION AND COMPARISON WITH OTHER SYSTEMS --- p.8
Chapter 2.2.1 --- Validation and Accuracy of Radiance --- p.8
Chapter 2.2.2 --- Comparison of Radiance With Other Simulation Systems --- p.9
Chapter 2.2.3 --- Limitation on Geometry Input --- p.11
Chapter 2.2.4 --- Correctness of Scene Description --- p.11
Chapter 2.3 --- RADIANCE MODEL --- p.11
Chapter 3 --- METHODOLOGY --- p.15
Chapter 4 --- CLIMATIC AND URBAN CHARACTERISTIC OF HONG KONG --- p.18
Chapter 4.1 --- HONG KONG SKY CONDITION --- p.19
Chapter 4.2 --- HONG KONG URBAN CONTEXT --- p.22
Chapter 5 --- DAYLIGHTING SIMULATION FOR ARCHITECTURAL DESIGN --- p.25
Chapter 5.1 --- DAYLIGHTING DESIGN APPROACH --- p.26
Chapter 5.1.1 --- PHYSICAL MODEL --- p.26
Chapter 5.1.2 --- GRAPHIC TECHNIQUES --- p.27
Chapter 5.1.3 --- COMPUTATIONAL APPROACH --- p.28
Chapter 6 --- CASE STUDY l: ATRIUM DAYLIGHTING ARCHITECTURE 一 A FUTURE WORKPLACE --- p.30
Chapter 6.1 --- PROJECT INTRODUCTION --- p.30
Chapter 6.2 --- PROJECT APPROACH --- p.31
Chapter 6.2.1 --- GEOMETRY --- p.32
Chapter 6.2.2 --- IN-HOUSE SOFTWARE TOOL FOR MODELING --- p.32
Chapter 6.2.3 --- SKY CONDITION --- p.33
Chapter 6.2.4 --- MATERIALS --- p.33
Chapter 6.2.5 --- REFERENCE VIEWPOINT --- p.34
Chapter 6.2.6 --- RENDERING --- p.35
Chapter 6.3 --- PROJECT EXPERIMENT --- p.35
Chapter 6.3.1 --- ILLUMINANCE LEVEL --- p.36
Chapter 6.3.2 --- GLARE VISUAL COMFORT --- p.38
Chapter 6.3.3 --- IN-HOUSE SOFTWARE TOOL FOR ANIMATION --- p.39
Chapter 6.3.4 --- DAYLIGHTING DESIGN EVALUATION --- p.41
Chapter 7 --- CASE STUDY ll: BUILDING DESIGN EVALUATION AND PREDICTION -SENIOR CITIZENS HOUSING FACILITY --- p.42
Chapter 7.1 --- BUILDING DESIGN DATA AND FIELD MEASUREMENTS --- p.43
Chapter 7.1.1 --- SITE CONTEXT --- p.43
Chapter 7.1.2 --- BUILDING MATERIALS AND FINISHES FOR THE ATRIUM --- p.46
Chapter 7.1.3 --- OBSERVATIONS OF PATTERNS OF USE --- p.47
Chapter 7.1.4 --- LUMINANCE MEASUREMENTS --- p.48
Chapter 7.2 --- COMPUTATIONAL ANALYSIS --- p.50
Chapter 7.2.1 --- GEOMETRIC MODELING --- p.50
Chapter 7.2.2 --- COMPARISON OF FIELD MEASUREMENT AND COMPUTED LUMINANCE --- p.51
Chapter 7.2.3 --- VARIATION OF DESIGN PARAMETERS --- p.53
Chapter 7.2.4 --- VARIATION OF BEAM DEPTH --- p.53
Chapter 7.2.5 --- ADDITION OF LOUVERS --- p.56
Chapter 7.2.6 --- EFFECT OF INTER-BLOCK OBSTRUCTIONS --- p.58
Chapter 7.2.7 --- DAYLIGHTING DESIGN ALTERNATION --- p.60
Chapter 8 --- FINDINGS --- p.61
Chapter 8.1 --- CASE STUDY I: A FUTURE WORKPLACE --- p.61
Chapter 8.2 --- CASE STUDY II: SENIOR CITIZENS HOUSING --- p.63
Chapter 8.3 --- FUTURE WORKS --- p.64
Chapter 9 --- CONCLUSION --- p.66
REFERENCES --- p.67
BIBLIOGRAPHY --- p.70
APPENDIX A - Case study I (Atrium Daylighting Architecture - A Future Workplace) --- p.74
APPENDIX B - Case study II (senior citizens housing facility) --- p.115

APA, Harvard, Vancouver, ISO, and other styles

39

Περγαντής, Μηνάς. "Μελέτη της διαχείρισης της κρυφής μνήμης σε πραγματικό περιβάλλον." Thesis, 2009. http://nemertes.lis.upatras.gr/jspui/handle/10889/2573.

Full text

Abstract:

Στη σύγχρονη εποχή το κενό απόδοσης μεταξύ του επεξεργαστή και της μνήμης ενός σύγχρονου υπολογιστικού συστήματος συνεχώς μεγαλώνει. Είναι λοιπόν σημαντικό να ερευνηθούν νέοι τρόποι για να καλυφθεί η αδυναμία της κύριας μνήμης να ακολουθήσει τον επεξεργαστή. Η μνήμη cache ήταν ανέκαθεν ένα χρήσιμο εργαλείο προς αυτήν την κατεύθυνση. Χρειάζεται όμως πλέον να προχωρήσει πέρα από την απλοϊκή μορφή της και τον αλγόριθμο LRU Η παρούσα διπλωματική έχει σαν σκοπό την μελέτη της cache σε πραγματικό περιβάλλον και την ανάλυση της δυνατότητας και της χρησιμότητας της πρόβλεψης της συμπεριφοράς ενός σύγχρονου προγράμματος όσον αφορά την προσπέλαση της μνήμης. Η εργασία επικεντρώνεται στην χρήση τεχνικών dynamic instrumentation για την υλοποίηση ενός μηχανισμού πρόβλεψης της απόστασης επαναχρησιμοποίησης μιας θέσης μνήμης, μέσω της ανάλυσης και μελέτης της συμπεριφοράς της εντολής, που ζητά να προσπελάσει την συγκεκριμένη θέση μνήμης. Αναλύεται εκτενώς η λειτουργία ενός τέτοιου μηχανισμού και παρέχονται στατιστικές μετρήσεις που επιβεβαιώνουν την χρησιμότητα και ευστοχία μιας τέτοιας πρόβλεψης.
In contemporary times the performance gap between the CPU and the main memory of a modern computer system grows larger. So it is important to find new ways to cover the inability of the main memory to cope with the CPU’s performance. Cache memory has always been a useful tool towards this goal. However the need arises for it to move beyond simplistic implementations and algorithms like LRU. The present end year project aims towards the study of cache memory in a real time environment and the analysis of the capability and usefulness of prediction of the memory access behaviour of a modern program. The thesis puts weight on the use of dynamic instrumentation techniques for the creation of a prediction mechanism of the reuse distance of a memory address, through the analysis and study of the behavior of the instruction that accessed this memory address. The function of such a mechanism is analyzed in depth and statistical measures are provided to prove the usefulness and accuracy of such a prediction.

APA, Harvard, Vancouver, ISO, and other styles

40

Πετούμενος, Παύλος. "Διαχείριση κοινόχρηστων πόρων σε πολυεπεξεργαστικά συστήματα ενός ολοκληρωμένου." Thesis, 2011. http://nemertes.lis.upatras.gr/jspui/handle/10889/4712.

Full text

Abstract:

Στην παρούσα διατριβή προτείνονται μέθοδοι διαχείρισης των κοινόχρηστων πόρων σε υπολογιστικά συστήματα όπου πολλαπλοί επεξεργαστές μοιράζονται το ίδιο ολοκληρωμένο (Chip Multiprocessors – CMPs). Ενώ μέχρι πρόσφατα ο σχεδιασμός ενός υπολογιστικού συστήματος στόχευε στην ικανοποίηση των απαιτήσεων μόνο μίας εφαρμογής ανά χρονική περίοδο, τώρα πια απαιτείται και η εξισορρόπηση των απαιτήσεων διαφορετικών εφαρμογών που ανταγωνίζονται για την κατοχή των ίδιων πόρων. Σε πολλές περιπτώσεις, όμως, αυτό δεν αρκεί από μόνο του. Ακόμη και αν επιτευχθεί κάποιος ιδανικός διαμοιρασμός του πόρου, αν δεν βελτιστοποιηθεί ο τρόπος με τον οποίο χρησιμοποιούν οι επεξεργαστές τον κοινόχρηστο πόρο, δεν θα καταφέρει να εξυπηρετήσει ικανοποιητικά το αυξημένο φορτίο. Για να αντιμετωπιστούν τα προβλήματα που πηγάζουν από τον διαμοιρασμό των κοινόχρηστων πόρων, στην παρούσα εργασία προτείνονται τρεις εναλλακτικοί μηχανισμοί διαχείρισης. Η πρώτη μεθοδολογία εισάγει μία νέα θεωρητική μοντελοποίηση του διαμοιρασμού της κρυφής μνήμης, η οποία μπορεί να χρησιμοποιηθεί παράλληλα με την εκτέλεση των προγραμμάτων που διαμοιράζονται την κρυφή μνήμη. Η μεθοδολογία αξιοποιεί στην συνέχεια αυτήν την μοντελοποίηση, για να ελέγξει τον διαμοιρασμό της κρυφής μνήμης και να επιτύχει δικαιοσύνη στο πως κατανέμεται ο χώρος της κρυφής μνήμης μεταξύ των επεξεργαστών. Η δεύτερη μεθοδολογία παρουσιάζει μία νέα τεχνική για την πρόβλεψη της τοπικότητας των προσπελάσεων της κρυφής μνήμης. Καθώς η τοπικότητα είναι η βασική παράμετρος που καθορίζει την χρησιμότητα των δεδομένων της κρυφής μνήμης, χρησιμοποιώντας αυτήν την τεχνική πρόβλεψης μπορούν να οδηγηθούν μηχανισμοί διαχείρισης που βελτιώνουν την αξιοποίηση του χώρου της κρυφής μνήμης. Στα πλαίσια της μεθοδολογίας παρουσιάζουμε έναν τέτοιο μηχανισμό, ο οποίος στοχεύει στην ελαχιστοποίηση των αστοχιών της κρυφής μνήμης μέσω μίας νέας πολιτικής αντικατάστασης. Η τελευταία μεθοδολογία που παρουσιάζεται είναι μία μεθοδολογία για την μείωση της κατανάλωσης ενέργειας της ουράς εντολών, που είναι μία από τις πιο ενεργειακά απαιτητικές δομές του επεξεργαστή. Στα πλαίσια της μεθοδολογίας, δείχνεται ότι το κλειδί για την αποδοτική μείωση της κατανάλωσης ενέργειας της ουράς εντολών βρίσκεται στην αλληλεπίδραση της με το υποσύστημα μνήμης. Με βάση αυτό το συμπέρασμα, παρουσιάζουμε έναν νέο μηχανισμό δυναμικής διαχείρισης του μεγέθους της ουράς εντολών, ο οποίος συνδυάζει επιθετική μείωση της κατανάλωσης ενέργειας του επεξεργαστή με διατήρηση της υψηλής απόδοσής του.
This dissertation proposes methodologies for the management of shared resources in chip multi-processors (CMP). Until recently, the design of a computing system had to satisfy the computational and storage needs of a single program during each time period. Now instead, the designer has to balance the, perhaps conflicting, needs of multiple programs competing for the same resources. But, in many cases, even this is not enough. Even if we could invent a perfect way to manage sharing, without optimizing the way that each processor uses the shared resource, the resource could not deal efficiently with the increased load. In order to handle the negative effects of resource sharing, this dissertation proposes three management mechanisms. The first one introduces a novel theoretical model of the sharing of the shared cache, which can be used at run-time. Furthermore, out methodology uses the model to control sharing and to achieve a sense of justice in the way the cache is shared among the processors. Our second methodology presents a new technique for predicting the locality of cache accesses. Since locality determines, almost entirely, the usefulness of cache data, our technique can be used to drive any management mechanism which strives to improve the efficiency of the cache. As part of our methodology, we present such a mechanism, a new cache replacement policy which tries to minimize cache misses by near-optimal replacement decisions. The last methodology presented in this dissertation, targets the energy consumption of the processor. To that end, our methodology shows that the key to reducing the power consumption of the Issue Queue, without disproportional performance degradation, lies at the interaction of the Issue Queue with the memory subsystem: as long as the management of the Issue Queue doesn’t reduce the utilization of the memory subsystem, the effects of the management on the processor’s performance will be minimal. Based on this conclusion, we introduce a new mechanism for dynamically resizing the Issue Queue, which achieves aggressive downsizing and energy savings with almost no performance degradation.

APA, Harvard, Vancouver, ISO, and other styles

We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!