Dissertations / Theses on the topic 'Performance Optimization in Software and Hardware'
Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles
Consult the top 50 dissertations / theses for your research on the topic 'Performance Optimization in Software and Hardware.'
Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.
You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.
Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.
Schöne, Robert, Thomas Ilsche, Mario Bielert, Daniel Molka, and Daniel Hackenberg. "Software Controlled Clock Modulation for Energy Efficiency Optimization on Intel Processors." Saechsische Landesbibliothek- Staats- und Universitaetsbibliothek Dresden, 2017. http://nbn-resolving.de/urn:nbn:de:bsz:14-qucosa-224966.
Full textVujic, Nikola. "Software caching techniques and hardware optimizations for on-chip local memories." Doctoral thesis, Universitat Politècnica de Catalunya, 2012. http://hdl.handle.net/10803/83598.
Full textMalgrat les memòries cau encara son el component basic pel disseny del subsistema de memòria, les memòries locals han esdevingut una alternativa degut a les seves característiques pel que fa a l’ocupació d’àrea, el seu consum energètic i el seu rendiment amb un temps d’accés ràpid i constant. Aquestes característiques son d’especial interès quan les properes arquitectures multi-nucli estan limitades pel consum de potencia i la latència del subsistema de memòria.Les memòries locals pateixen de limitacions respecte la complexitat en la seva programació, fet que dificulta la seva introducció en arquitectures multi-nucli, tot i els avantatges esmentats anteriorment. Aquesta tesi presenta un seguit de solucions basades en programari i maquinari específicament dissenyat per resoldre aquestes limitacions.Les optimitzacions del programari estan basades amb tècniques d'emmagatzematge de memòria cau suportades per llibreries especifiques. La memòria cau per programari és un sòlid mètode per proporcionar a l'usuari una visió transparent de l'arquitectura, però aquest enfocament pot patir d'un rendiment deficient. En aquesta tesi, es proposa una estructura jeràrquica i híbrida. Posteriorment, desenvolupem optimitzacions per tal d'accelerar l’execució del programari que suporta el disseny de la memòria cau. Com a resultat de les optimitzacions realitzades, obtenim que el nostre disseny híbrid es comporta de 4 a 10 vegades més ràpid que una implementació tradicional de memòria cau sobre un conjunt d’aplicacions de referencia, com son els “NAS parallel benchmarks”.El treball de tesi inclou altres aspectes de les arquitectures amb memòries locals, com ara la qualitat del codi generat i la seva correspondència amb la qualitat de la gestió de memòria intermèdia en les memòries locals, per tal de millorar el rendiment d'aquestes arquitectures. La tesi desenvolupa propostes basades estrictament en el disseny de nou maquinari per tal de millorar el rendiment de les memòries locals quan ja no es possible realitzar mes optimitzacions en el programari. En particular, la tesi presenta dues propostes de maquinari: una relaxa les restriccions imposades per les memòries locals respecte l’alineament de dades, l’altra introdueix maquinari específic per accelerar les operacions mes usuals sobre les memòries locals.
Serpa, Matheus da Silva. "Source code optimizations to reduce multi core and many core performance bottlenecks." reponame:Biblioteca Digital de Teses e Dissertações da UFRGS, 2018. http://hdl.handle.net/10183/183139.
Full textNowadays, there are several different architectures available not only for the industry but also for final consumers. Traditional multi-core processors, GPUs, accelerators such as the Xeon Phi, or even energy efficiency-driven processors such as the ARM family, present very different architectural characteristics. This wide range of characteristics presents a challenge for the developers of applications. Developers must deal with different instruction sets, memory hierarchies, or even different programming paradigms when programming for these architectures. To optimize an application, it is important to have a deep understanding of how it behaves on different architectures. Related work proved to have a wide variety of solutions. Most of then focused on improving only memory performance. Others focus on load balancing, vectorization, and thread and data mapping, but perform them separately, losing optimization opportunities. In this master thesis, we propose several optimization techniques to improve the performance of a real-world seismic exploration application provided by Petrobras, a multinational corporation in the petroleum industry. In our experiments, we show that loop interchange is a useful technique to improve the performance of different cache memory levels, improving the performance by up to 5.3 and 3.9 on the Intel Broadwell and Intel Knights Landing architectures, respectively. By changing the code to enable vectorization, performance was increased by up to 1.4 and 6.5 . Load Balancing improved the performance by up to 1.1 on Knights Landing. Thread and data mapping techniques were also evaluated, with a performance improvement of up to 1.6 and 4.4 . We also compared the best version of each architecture and showed that we were able to improve the performance of Broadwell by 22.7 and Knights Landing by 56.7 compared to a naive version, but, in the end, Broadwell was 1.2 faster than Knights Landing.
Shee, Seng Lin Computer Science & Engineering Faculty of Engineering UNSW. "ADAPT : architectural and design exploration for application specific instruction-set processor technologies." Awarded by:University of New South Wales, 2007. http://handle.unsw.edu.au/1959.4/35404.
Full textSid, Lakhdar Riyane Yacine. "Méthodologie pour l'optimisation logicielle de structures de données pour les architectures hautes performances à mémoires complexes." Thesis, Université Grenoble Alpes, 2020. http://www.theses.fr/2020GRALM058.
Full textWith the rising impact of the memory wall, selecting the adequate data-structure implementation for a given kernel has become a performance-critical issue. The complexity of solving efficiently this Data-Layout-Decision (DLD) problem is dra- matically increased by the concurrence of complex, heterogeneous and application- specific hardware memories. Slightly modifying an optimized application or porting it to a new hardware architecture requires an important time and engineering effort. It also requires a deep knowledge of the host hardware platform.In this thesis, we plot a first step toward automatic software-adaptation to hard- ware. We present an iterative data-mining-related software-optimization approach based on the detection and the exploration of the most influential parameters linked to the hardware, operating system and software. We also propose a custom data- cache-miss modeling algorithm designed to be used as fully-parameterized perfor- mance evaluation. The proposed approach is designed to be embedded within a general-purpose compiler.In order to explore the parameters related to the data-layout implementation, we propose HARDSI, a custom patented method to solve the DLD problem. We also propose to apply our method using a custom domain-specific language and computation framework. The HARDSI method allows to choose, from a custom base of knowledge, an optimized data-layout implementation with regards to the memory-pattern followed to access the considered data-structure. The generated solutions are also specifically adapted to the properties of the host hardware-memory.Meanwhile, we consider the singular resolution of the DLD problem on memories that are explicitly addressed by the programmer (such as embedded scratchpad memories or GPUs). The problem that we address is to find an optimized memory- placement in order to maximize the amount of frequently-accessed data to be stored within this fast yet narrow memory. In this context, we propose DDLGS, a custom patented method designed to generate a dynamic data-layout with regards to the followed memory-access pattern. The generated implementations encompass the specific load and store routines as well as the granularity attributed to each data transferred. These implementations are also able to adapt, at run time, to the input of the considered source-code.Aiming to evaluate our implementations on different hardware environments, we have considered two different processor and memory architectures: (i) An x86 pro- cessor implementing an Intel Xeon with three levels of data-caches utilizing the least recently used replacement policy and a (ii) Massively Parallel Processor Array im- plementing a Kalray Coolidge-80-30 with a 16KBytes on-chip scratchpad memory. Experiments on linear algebra, artificial intelligence and image processing bench- marks show that our method accurately determines an optimized data-structure implementation. These implementations allow reaching an execution-time speed-up up to 48.9x on the Xeon processor and 54.2x on the Coolidge processor
Pinto, Christian <1986>. "Many-Core Architectures: Hardware-Software Optimization and Modeling Techniques." Doctoral thesis, Alma Mater Studiorum - Università di Bologna, 2015. http://amsdottorato.unibo.it/6824/.
Full textMuffang, Louis. "SLAM Hardware & Software optimization for mobile platform integration." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-294332.
Full textDenna rapport beskriver optimeringen av en algoritm för icke-stereo Visual- Inertial Odometry (VIO), för realtidsapplikationer med begränsade resurser på inbäddade system. Vi använder en multi-processor enhet utrustad med Digital Signal Processor (DSP)) för att öka prestandan och avlasta huvudprocessorn (CPU:n), så att den kan användas för andra uppgifter parallellt. Målet är att minska resursförbrukningen utan att försämra hastighet eller noggrannhet hos algoritmen. Vi identifierar OpenVINS som en lämplig VIO-algoritm att optimera. Resultatet av studien är att vi lyckas minska minnesåtgången för CPU:n med en faktor 2, och energiförbrukningen med en faktor 1,5. Dessa resultat kan komma till användning i alla system som använder en VIO-algoritm parallellt med andra beräkningskrävande uppgifter.
Motiwala, Quaeed. "Optimizations for acyclic dataflow graphs for hardware-software codesign." Thesis, This resource online, 1994. http://scholar.lib.vt.edu/theses/available/etd-06302009-040504/.
Full textShen, Chung-Ching. "Energy-driven optimization of hardware and software for distributed embedded systems." College Park, Md.: University of Maryland, 2008. http://hdl.handle.net/1903/8901.
Full textThesis research directed by: Dept. of Electrical and Computer Engineering . Title from t.p. of PDF. Includes bibliographical references. Published by UMI Dissertation Services, Ann Arbor, Mich. Also available in paper.
Brankovic, Aleksandar. "Performance simulation methodologies for hardware/software co-designed processors." Doctoral thesis, Universitat Politècnica de Catalunya, 2015. http://hdl.handle.net/10803/287978.
Full textEls processadors co-dissenyats Hardware/Software (HW/SW co-designed processors) han estat proposats per l'acadèmia i la indústria com a solucions potencials per a fabricar processadors menys complexos i que consumeixen menys energia. A diferència d'altres alternatives, aquest tipus de processadors redueixen la complexitat i el consum d'energia aplicant traducció y optimització dinàmica de binaris des d'un repertori d'instruccions (instruction set architecture) extern cap a un repertori d'instruccions intern adaptat. Aquesta tesi intenta resoldre els reptes relacionats a la simulació d'aquest tipus d'arquitectures. La simulació és un procés comú en el disseny i desenvolupament de processadors ja que permet explorar diverses alternatives sense haver de fabricar el hardware per a cadascuna d'elles. La simulació de processadors co-dissenyats Hardware/Software és un procés més complex que la simulació de processadores tradicionals, purament hardware. Per exemple, no existeixen eines de simulació disponibles per a la comunitat. Per tant, els investigadors acostumen a assumir que la capa de software, que s'encarrega de la traducció i optimització de les aplicacions, no té un pes específic i, per tant, uns costos computacionals baixos o constants en el millor dels casos. En aquesta tesis demostrem que aquestes premisses són incorrectes i que els resultats amb aquestes acostumen a ser molt imprecisos. Una primera conclusió d'aquesta tesi doncs és que la simulació de la capa software és totalment necessària. A més a més, degut a que els processos de simulació són lents, s'han proposat tècniques de simulació que intenten obtenir resultats precisos en el menor temps possible. Una pràctica habitual és la simulació només de parts de les aplicacions, anomenades mostres, en el disseny de processadors convencionals, purament hardware. Aquestes mostres corresponen a diferents fases de les aplicacions i acostumen a ser de pocs milions d'instruccions. Per tal d'aconseguir un estat microarquitectònic acurat per a cadascuna de les mostres, s'acostumen a estressar aquestes estructures microarquitectòniques del simulador abans de començar a extreure resultats, procés anomenat "escalfament" (warm-up). Desafortunadament, aquesta metodologia no pot ser aplicada a processadors co-dissenyats Hardware/Software. L'"escalfament" de les estructures internes del simulador en el disseny de processadores co-dissenyats Hardware/Software són 3-4 ordres de magnitud més gran que el mateix procés d' "escalfament" en simulacions de processadors convencionals, ja que en els primers cal "escalfar" també les estructures i l'estat de la capa software. En aquesta tesi proposem tècniques de simulació basades en l' "escalfament" de les estructures que redueixen el temps de simulació en 65X amb un error mig del 0,75%. Aquests resultats són extrapolables a diferents configuracions del hardware i de la capa software. Finalment, les tècniques convencionals de selecció de mostres d'aplicacions a simular no són aplicables tampoc a la simulació de processadors co-dissenyats Hardware/Software degut a que les mostres es comporten de manera molt diferent quan es té en compte la capa software. En aquesta tesi, proposem un nou algorisme que redueix 3X el nombre de mostres a simular comparat amb els algorismes tradicionals per a processadors convencionals per a obtenir un error similar. Aquests resultats també són extrapolables a diferents configuracions de hardware i de software. En conclusió, en aquesta tesi es respon al repte de com simular processadors co-dissenyats Hardware/Software, que són una alternativa al disseny tradicional de processadors. Hem demostrat que cal simular la capa software i s'han proposat noves tècniques i algorismes eficients d' "escalfament" i selecció de mostres que són tolerants a diferents configuracions
Oh, Jungju. "Efficient hardware and software assist for many-core performance." Diss., Georgia Institute of Technology, 2013. http://hdl.handle.net/1853/50219.
Full textBekker, Dmitriy L. "Hardware and software optimization of Fourier transform infrared spectrometry on hybrid-FPGAs /." Online version of thesis, 2007. http://hdl.handle.net/1850/4805.
Full textFigueiredo, Boneti Carlos Santieri de. "Exploring coordinated software and hardware support for hardware resource allocation." Doctoral thesis, Universitat Politècnica de Catalunya, 2009. http://hdl.handle.net/10803/6018.
Full textThis thesis targets to narrow the gap between the software and the hardware, with respect to the hardware resource allocation, by proposing a new explicit resource allocation hardware mechanism and novel schedulers that use the currently available hardware resource allocation mechanisms.
It approaches the problem in two different types of computing systems: on the high performance computing domain, we characterize the first processor to present a mechanism that allows the software to bias the allocation hardware resources, the IBM POWER5. In addition, we propose the use of hardware resource allocation as a way to balance high performance computing applications. Finally, we propose two new scheduling mechanisms that are able to transparently and successfully balance applications in real systems using the hardware resource allocation. On the soft real-time domain, we propose a hardware extension to the existing explicit resource allocation hardware and, in addition, two software schedulers that use the explicit allocation hardware to improve the schedulability of tasks in a soft real-time system.
In this thesis, we demonstrate that system performance improves by making the software aware of the mechanisms to control the amount of resources given to each running thread. In particular, for the high performance computing domain, we show that it is possible to decrease the execution time of MPI applications biasing the hardware resource assignation between threads. In addition, we show that it is possible to decrease the number of missed deadlines when scheduling tasks in a soft real-time SMT system.
Ramírez, Bellido Alejandro. "High performance instruction fetch using software and hardware co-design." Doctoral thesis, Universitat Politècnica de Catalunya, 2002. http://hdl.handle.net/10803/5969.
Full textEsta tesis explora los retos presentados por el diseño de la unidad de fetch desde dos puntos de vista: el diseño de un software mas adecuado para las arquitecturas de fetch ya existente, y el diseño de un hardware adaptado a las características especiales del nuevo software que hemos generado.
Nuestra aproximación al diseño de un suevo software ha sido la propuesta de un nuevo algoritmo de reordenación de código que no solo pretende mejorar el rendimiento de la cache de instrucciones, sino que al mismo tiempo pretende incrementar la anchura efectiva de la unidad de fetch. Usando información sobre el comportamiento del programa (profile data), encadenamos los bloques básicos del programa de forma que los saltos condicionales tendrán tendencia a ser no tomados, lo cual favorece la ejecución secuencial del código. Una vez hemos organizado los bloques básicos en estas trazas, mapeamos las diferentes trazas en memoria de forma que minimicen la cantidad de espacio requerida para el código realmente útil, y los conflictos en memoria de este código. Además de describir el algoritmo, hemos realizado un análisis en detalle del impacto de estas optimizaciones sobre los diferentes aspectos del rendimiento de la unidad de fetch: la latencia de memoria, la anchura efectiva de la unidad de fetch, y la capacidad de predicción del predictor de saltos.
Basado en el análisis realizado sobre el comportamiento de los códigos optimizados, proponemos también una modificacion del mecanismo de la trace cache que pretende realizar un uso mas efectivo del escaso espacio de almacenaje disponible. Este mecanismo utiliza la trace cache únicamente para almacenar aquellas trazas que no podrían ser proporcionadas por la cache de instrucciones en un único ciclo.
También basado en el conocimiento adquirido sobre el comportamiento de los códigos optimizados, proponemos un nuevo predictor de saltos que hace un uso extensivo de la misma información que se uso para reordenar el código, pero en este caso se usa para mejorar la precisión del predictor de saltos.
Finalmente, proponemos una nueva arquitectura para la unidad de fetch del procesador basada en explotar las características especiales de los códigos optimizados. Nuestra arquitectura tiene un nivel de complejidad muy bajo, similar al de una arquitectura capaz de leer un único bloque básico por ciclo, pero ofrece un rendimiento muy superior, siendo comparable al de una trace cache, mucho mas costosa y compleja.
Schultz, Eric A. "Empirical Performance Comparison of Hardware and Software Task Context Switching." Thesis, Monterey, California. Naval Postgraduate School, 2009. http://hdl.handle.net/10945/46226.
Full textThere are many divergent opinions regarding possible differences between the performance of hardware and software context switching implementations. However, there are no concrete empirical measures of their true differences. Using an empirical testing methodology, this research performed seven experiments, collecting quantitative performance results on hardware and software-based context switch implementations with two and four hardware privilege level support. The implementations measured are the hardware-based Intel IA-32 context switch, the software-based MINIX 3 context switch, a software-based simulation of a MINIX 3 context switch with four hardware privileged level support, and a software-based simulation of an Intel IA-32 hardware context switch. Experiments were executed using the Trusted Computing Exemplar Least Privilege Separation Kernel and the Linux 2.6 Kernel. The results include the number of cycles and time required to complete processing of each implementation. This study concludes that the hardware-based context switching mechanism is significantly slower than software implementation, even those that simulate the elaborate checks of the hardware implementation. A possible reason for this is posited.
Chaudhuri, Matthew Alan. "Optimization of a hardware/software coprocessing platform for EEG eyeblink detection and removal /." Online version of thesis, 2008. http://hdl.handle.net/1850/8967.
Full textPowell, Richard, and Jeff Kuhn. "HARDWARE- VS. SOFTWARE-DRIVEN REAL-TIME DATA ACQUISITION." International Foundation for Telemetering, 2000. http://hdl.handle.net/10150/608291.
Full textThere are two basic approaches to developing data acquisition systems. The first is to buy or develop acquisition hardware and to then write software to input, identify, and distribute the data for processing, display, storage, and output to a network. The second is to design a system that handles some or all of these tasks in hardware instead of software. This paper describes the differences between software-driven and hardware-driven system architectures as applied to real-time data acquisition systems. In explaining the characteristics of a hardware-driven system, a high-performance real-time bus system architecture developed by L-3 will be used as an example. This architecture removes the bottlenecks and unpredictability that can plague software-driven systems when applied to complex real-time data acquisition applications. It does this by handling the input, identification, routing, and distribution of acquired data without software intervention.
Sredojević, Ranko Radovin. "Template-based hardware-software codesign for high-performance embedded numerical accelerators." Thesis, Massachusetts Institute of Technology, 2013. http://hdl.handle.net/1721.1/84895.
Full textCataloged from PDF version of thesis.
Includes bibliographical references (pages 129-132).
Sophisticated algorithms for control, state estimation and equalization have tremendous potential to improve performance and create new capabilities in embedded and mobile systems. Traditional implementation approaches are not well suited for porting these algorithmic solutions into practical implementations within embedded system constraints. Most of the technical challenges arise from design approach that manipulates only one level in the design stack, thus being forced to conform to constraints imposed by other levels without question. In tightly constrained environments, like embedded and mobile systems, such approaches have a hard time efficiently delivering and delivering efficiency. In this work we offer a solution that cuts through all the design stack layers. We build flexible structures at the hardware, software and algorithm level, and approach the solution through design space exploration. To do this efficiently we use a template-based hardware-software development flow. The main incentive for template use is, as in software development, to relax the generality vs. efficiency/performance type tradeoffs that appear in solutions striving to achieve run-time flexibility. As a form of static polymorphism, templates typically incur very little performance overhead once the design is instantiated, thus offering the possibility to defer many design decisions until later stages when more is known about the overall system design. However, simply including templates into design flow is not sufficient to result in benefits greater than some level of code reuse. In our work we propose using templates as flexible interfaces between various levels in the design stack. As such, template parameters become the common language that designers at different levels of design hierarchy can use to succinctly express their assumptions and ideas. Thus, it is of great benefit if template parameters map directly and intuitively into models at every level. To showcase the approach we implement a numerical accelerator for embedded Model Predictive Control (MPC) algorithm. While most of this work and design flow are quite general, their full power is realized in search for good solutions to a specific problem. This is best understood in direct comparison with recent works on embedded and high-speed MPC implementations. The controllers we generate outperform published works by a handsome margin in both speed and power consumption, while taking very little time to generate.
by Ranko Radovin Sredojević.
Ph.D.
Marques, Vítor Manuel dos Santos. "Performance of hardware and software sorting algorithms implemented in a SOC." Master's thesis, Universidade de Aveiro, 2017. http://hdl.handle.net/10773/23467.
Full textField Programmable Gate Arrays (FPGAs) were invented by Xilinx in 1985. Their reconfigurable nature allows to use them in multiple areas of Information Technologies. This project aims to study this technology to be an alternative to traditional data processing methods, namely sorting. The proposed solution is based on the principle of reusing resources to counter this technology’s known resources limitations.
As Field Programmable Gate Arrays (FPGAs) foram inventadas em 1985 pela Xilinx. A sua natureza reconfiguratória permite que sejam utilizadas em várias áreas das tecnologias de informação. Este trabalho tem como objectivo estudar o uso desta tecnologia como alternativa aos métodos tradicionais de processamento de dados, nomeadamente a ordenação. A solução proposta baseia-se na reutilização de recursos para combater as conhecidas limitações deste tipo de tecnologia.
O'Connor, R. Brendan. "Dataflow Analysis and Optimization of High Level Language Code for Hardware-Software Co-Design." Thesis, Virginia Tech, 1996. http://hdl.handle.net/10919/36653.
Full textMaster of Science
Igual, Pérez Román José. "Platform Hardware/Software for the energy optimization in a node of wireless sensor networks." Thesis, Lille 1, 2016. http://www.theses.fr/2016LIL10041/document.
Full textThe significant increase of connected objects in Internet of Things will entail different problems. Among them, the energy efficiency. The present work deals with the energy efficiency and more precisely with the study of the modeling of the energy consumption in the node.We have designed a platform to instrument a node of wireless sensor network in its real environment. The hardware and software platform is made of:- a hardware energy measurement platform;- a software allowing the automatic generation of an energy consumption model;- a node lifetime estimation algorithm.The energy measurement platform recovers the current values directly from the node under evaluation, component per component in the electronic circuit and function per function of the embedded software. This hardware/software analysis of the energy consumption offers important information about the behavior of each electronic component in the node.An algorithm carries out a statistical analysis of the energy measurements. This algorithm creates automatically an energy consumption model based on a Markov chain. Thus, this platform allows to create a stochastic model of the energy behavior of a real node, in a real network and in real channel conditions. The model is made in contrast to the deterministic energy models found in the literature, whose energy behavior is extracted from the datasheets of the components. Finally, we estimate the node lifetime based on battery models. We also show on examples the simplicity to change some parameters of the model in order to improve the energy efficiency
Suuronen, Janne. "Towards Defining Models of Hardware Capacity and Software Performance for Telecommunication Applications." Thesis, Mälardalens högskola, Inbyggda system, 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:mdh:diva-48773.
Full textRosqvist, Åkerblom Linn. "JavaScript Performance and Optimization : Removing bottlenecks." Thesis, Mittuniversitetet, Avdelningen för data- och systemvetenskap, 2015. http://urn.kb.se/resolve?urn=urn:nbn:se:miun:diva-25474.
Full textPatki, Tapasya. "The Case For Hardware Overprovisioned Supercomputers." Diss., The University of Arizona, 2015. http://hdl.handle.net/10150/577307.
Full textEwing, John M. "Autonomic Performance Optimization with Application to Self-Architecting Software Systems." Thesis, George Mason University, 2015. http://pqdtopen.proquest.com/#viewpdf?dispub=3706982.
Full textService Oriented Architectures (SOA) are an emerging software engineering discipline that builds software systems and applications by connecting and integrating well-defined, distributed, reusable software service instances. SOA can speed development time and reduce costs by encouraging reuse, but this new service paradigm presents significant challenges. Many SOA applications are dependent upon service instances maintained by vendors and/or separate organizations. Applications and composed services using disparate providers typically demonstrate limited autonomy with contemporary SOA approaches. Availability may also suffer with the proliferation of possible points of failure—restoration of functionality often depends upon intervention by human administrators.
Autonomic computing is a set of technologies that enables self-management of computer systems. When applied to SOA systems, autonomic computing can provide automatic detection of faults and take restorative action. Additionally, autonomic computing techniques possess optimization capabilities that can leverage the features of SOA (e.g., loose coupling) to enable peak performance in the SOA system's operation. This dissertation demonstrates that autonomic computing techniques can help SOA systems maintain high levels of usefulness and usability.
This dissertation presents a centralized autonomic controller framework to manage SOA systems in dynamic service environments. The centralized autonomic controller framework can be enhanced through a second meta-optimization framework that automates the selection of optimization algorithms used in the autonomic controller. A third framework for autonomic meta-controllers can study, learn, adjust, and improve the optimization procedures of the autonomic controller at run-time. Within this framework, two different types of meta-controllers were developed. The Overall Best meta-controller tracks overall performance of different optimization procedures. Context Best meta-controllers attempt to determine the best optimization procedure for the current optimization problem. Three separate Context Best meta-controllers were implemented using different machine learning techniques: 1) K-Nearest Neighbor (KNN MC), 2) Support Vector Machines (SVM) trained offline (Offline SVM), and 3) SVM trained online (Online SVM).
A detailed set of experiments demonstrated the effectiveness and scalability of the approaches. Autonomic controllers of SOA systems successfully maintained performance on systems with 15, 25, 40, and 65 components. The Overall Best meta-controller successfully identified the best optimization technique and provided excellent performance at all levels of scale. Among the Context Best meta-controllers, the Online SVM meta-controller was tested on the 40 component system and performed better than the Overall Best meta-controller at a 95% confidence level. Evidence indicates that the Online SVM was successfully learning which optimization procedures were best applied to encountered optimization problems. The KNN MC and Offline SVM were less successful. The KNN MC struggled because the KNN algorithm does not account for the asymmetric cost of prediction errors. The Offline SVM was unable to predict the correct optimization procedure with sufficient accuracy—this was likely due to the challenge of building a relevant offline training set. The meta-optimization framework, which was tested on the 65 component system, successfully improved the optimization techniques used by the autonomic controller.
The meta-optimization and meta-controller frameworks described in this dissertation have broad applicability in autonomic computing and related fields. This dissertation also details a technique for measuring the overlap of two populations of points, establishes an approach for using penalty weights to address one-sided overfitting by SVM on asymmetric data sets, and develops a set of high performance data structure and heuristic search templates for C++.
Dudebout, Nicolas. "Multigigabit multimedia processor for 60GHz WPAN a hardware software codesign implementation /." Thesis, Atlanta, Ga. : Georgia Institute of Technology, 2008. http://hdl.handle.net/1853/26677.
Full textCommittee Member: Chang, Gee-Kung; Committee Member: Hasler, Paul; Committee Member: Laskar, Joy. Part of the SMARTech Electronic Thesis and Dissertation Collection.
Subramanian, Sriram. "Software Performance Estimation Techniques in a Co-Design Environment." University of Cincinnati / OhioLINK, 2003. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1061553201.
Full textGanpaa, Gayatri. "An R*-Tree Based Semi-Dynamic Clustering Method for the Efficient Processing of Spatial Join in a Shared-Nothing Parallel Database System." ScholarWorks@UNO, 2006. http://scholarworks.uno.edu/td/298.
Full textLinford, John Christian. "Accelerating Atmospheric Modeling Through Emerging Multi-core Technologies." Diss., Virginia Tech, 2010. http://hdl.handle.net/10919/27599.
Full textPh. D.
Aryan, Omid. "A hardware-defined approach to software-defined radios : improving performance without trading In flexibility." Thesis, Massachusetts Institute of Technology, 2013. http://hdl.handle.net/1721.1/85402.
Full textCataloged from PDF version of thesis.
Includes bibliographical references (pages 93-94).
The thesis presents an implementation of a general DSP framework on the Texas Instruments OMAP-L138 processor. Today's software-defined radios suffer from fundamental drawbacks that inhibit their use in practical settings. These drawbacks include their large sizes, their dependence on a PC for digital signal processing operations, and their inability to process signals in real-time. Furthermore, FPGA-based implementations that achieve higher performances lack the flexibility that software implementations provide. The present implementation endeavors to overcome these issues by utilizing a processor that is low-power, small in size, and that provides a library of assembly-level optimized functions in order to achieve much faster performance with a software implementation. The evaluations show substantial improvements in performance when the DSP framework is implemented with the OMAP-L138 processor compared to that achieved with other software implemented radios.
by Omid Aryan.
M. Eng.
Paolillo, Antonio. "Optimisation of Performance Metrics of Embedded Hard Real-Time Systems using Software/Hardware Parallelism." Doctoral thesis, Universite Libre de Bruxelles, 2018. http://hdl.handle.net/2013/ULB-DIPOT:oai:dipot.ulb.ac.be:2013/277427.
Full textOptimisation de Métriques de Performances de Systèmes Embarqués Temps Réel Durs par utilisation du Parallélisme Logiciel et Matériel. De nos jours, les systèmes embarqués font partie intégrante de notre quotidien.Certains de ces systèmes, appelés systèmes critiques, sont soumis à de fortes contraintes de fiabilité et de robustesse.De plus, des contraintes de coûts, d’autonomie et de performances s’additionnent à la fiabilité.Enfin, ces systèmes doivent très souvent respecter des délais très stricts de façon prédictible.Lorsque ces différentes contraintes sont combinées dans le cahier de charge d’un produit, les techniques classiques de conception consistant à utiliser un seul cœur d’un processeur ne suffisent plus.La recherche académique dans le domaine des systèmes embarqués temps réel a produit de nombreuses techniques pour exploiter les plate-formes modernes.Ces techniques sont souvent basées sur l’exploitation du parallélisme inhérent au matériel pour améliorer les performances du système et la puissance dissipée par la plate-forme.Cependant, peu de systèmes existant sur le marché exploitent ces techniques de la littérature et peu de ces techniques ont été validées dans le cadre d’expériences pratiques.Dans cette thèse, nous réalisons l’étude des techniques, au niveau du système d’exploitation, permettant l’exploitation du parallélisme matériel par l’implémentation de logiciels parallèles afin de maximiser les performances et réduire l’impact sur l’énergie consommée tout en satisfaisant les contraintes temporelles strictes du cahier de charge applicatif. Nous détaillons les fondements théoriques des idées qui sont appliquées dans la dissertation et nous les validons par des travaux expérimentaux.A ces fins, nous utilisons le nouveau noyau d’un système d’exploitation écrit dans le cadre de la création d’une spin-off de l’Université libre de Bruxelles.Nos expériences, basées sur l’exécution d’applications sur le système d’exploitation qui s’exécute lui-même sur une plate-forme embarquée réelle, montre que l’utilisation de techniques d’ordonnancement exploitant le parallélisme matériel et logiciel permet de larges économies d’énergie consommée lors de l’exécution d’applications embarquées.De futurs travaux en cours de réalisation sont présentés.Ceux-ci exploitent des plate-formes innovantes qui combinent processeurs multi-cœurs et matériel reconfigurable, permettant d’aller encore plus loin dans l’amélioration des performances et les gains énergétiques.
Doctorat en Sciences
info:eu-repo/semantics/nonPublished
Kong, Martin Richard. "Enabling Task Parallelism on Hardware/Software Layers using the Polyhedral Model." The Ohio State University, 2016. http://rave.ohiolink.edu/etdc/view?acc_num=osu1452252422.
Full textCornevaux-Juignet, Franck. "Hardware and software co-design toward flexible terabits per second traffic processing." Thesis, Ecole nationale supérieure Mines-Télécom Atlantique Bretagne Pays de la Loire, 2018. http://www.theses.fr/2018IMTA0081/document.
Full textThe reliability and the security of communication networks require efficient components to finely analyze the traffic of data. Service diversification and through put increase force network operators to constantly improve analysis systems in order to handle through puts of hundreds,even thousands of Gigabits per second. Commonly used solutions are software oriented solutions that offer a flexibility and an accessibility welcome for network operators, but they can no more answer these strong constraints in many critical cases.This thesis studies architectural solutions based on programmable chips like Field-Programmable Gate Arrays (FPGAs) combining computation power and processing flexibility. Boards equipped with such chips are integrated into a common software/hardware processing flow in order to balance short comings of each element. Network components developed with this innovative approach ensure an exhaustive processing of packets transmitted on physical links while keeping the flexibility of usual software solutions, which was never encountered in the previous state of theart.This approach is validated by the design and the implementation of a flexible packet processing architecture on FPGA. It is able to process any packet type at the cost of slight resources over consumption. It is moreover fully customizable from the software part. With the proposed solution, network engineers can transparently use the processing power of an hardware accelerator without the need of prior knowledge in digital circuit design
Quintal, Luis Fernando Curi. "The applicability of hardware design strategies to improve software application performance in multi-core architectures." Thesis, University of Reading, 2014. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.628531.
Full textSuljevic, Benjamin. "Mapping HW resource usage towards SW performance." Thesis, Mälardalens högskola, Akademin för innovation, design och teknik, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:mdh:diva-44176.
Full textHill, Terrance, Mark Geoghegan, and Kevin Hutzel. "IMPLEMENTATION AND PERFORMANCE OF A HIGHSPEED, VHDL-BASED, MULTI-MODE ARTM DEMODULATOR." International Foundation for Telemetering, 2002. http://hdl.handle.net/10150/606325.
Full textLegacy telemetry systems, although widely deployed, are being severely taxed to support the high data rate requirements of advanced aircraft and missile platforms. Increasing data rates, in conjunction with loss of spectrum have created a need to use available spectrum more efficiently. In response to this, new modulation techniques have been developed which offer more data capacity in the same operating bandwidth. Demodulation of these new waveforms is a computationally challenging task, especially at high data rates. This paper describes the design, implementation and performance of a high-speed, multi-mode demodulator for the Advanced Range Telemetry (ARTM) program which meets these challenges.
Damasceno, Costa Diego Elias [Verfasser], and Artur [Akademischer Betreuer] Andrzejak. "Benchmark-driven Software Performance Optimization / Diego Elias Damasceno Costa ; Betreuer: Artur Andrzejak." Heidelberg : Universitätsbibliothek Heidelberg, 2019. http://d-nb.info/1192373170/34.
Full textDamasceno, Costa Diego Elias Verfasser], and Artur [Akademischer Betreuer] [Andrzejak. "Benchmark-driven Software Performance Optimization / Diego Elias Damasceno Costa ; Betreuer: Artur Andrzejak." Heidelberg : Universitätsbibliothek Heidelberg, 2019. http://nbn-resolving.de/urn:nbn:de:bsz:16-heidok-269197.
Full textFong, Fredric, and Mustafa Raed. "Performance comparison of GraalVM, Oracle JDK andOpenJDK for optimization of test suite execution time." Thesis, Mittuniversitetet, Institutionen för data- och systemvetenskap, 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:miun:diva-43169.
Full textHou, Wei. "Integrated Reliability and Availability Aanalysis of Networks With Software Failures and Hardware Failures." Scholar Commons, 2003. https://scholarcommons.usf.edu/etd/1393.
Full textHou, Wei. "Integrated reliability and availability analysis of networks with software failures and hardware failures." [Tampa, Fla.] : University of South Florida, 2003. http://purl.fcla.edu/fcla/etd/SFE0000173.
Full textWeng, Lichen. "A Hardware and Software Integrated Approach for Adaptive Thread Management in Multicore Multithreaded Microprocessors." FIU Digital Commons, 2012. http://digitalcommons.fiu.edu/etd/653.
Full textLewerentz, Andreaz, and Jonathan Lindvall. "Performance and Energy Optimization for the Android Platform." Thesis, Blekinge Tekniska Högskola, Sektionen för datavetenskap och kommunikation, 2012. http://urn.kb.se/resolve?urn=urn:nbn:se:bth-4490.
Full textChen, Kuan-Hsun [Verfasser], Jian-Jia [Akademischer Betreuer] Chen, and Rolf [Gutachter] Ernst. "Optimization and analysis for dependable application software on unreliable hardware platforms / Kuan-Hsun Chen ; Gutachter: Rolf Ernst ; Betreuer: Jian-Jia Chen." Dortmund : Universitätsbibliothek Dortmund, 2019. http://d-nb.info/1189420333/34.
Full textLedinov, Dmytro. "UpTime 4 - Health Monitoring Component." Thesis, Linnéuniversitetet, Institutionen för datavetenskap (DV), 2013. http://urn.kb.se/resolve?urn=urn:nbn:se:lnu:diva-31343.
Full textSargur, Sudarshan Lakshminarasimhan. "An Efficient Architecture for Dynamic Profiling of Multicore Systems." Thesis, The University of Arizona, 2015. http://hdl.handle.net/10150/595814.
Full textBelsick, Charlotte Ann. "Space Vehicle Testing." DigitalCommons@CalPoly, 2012. https://digitalcommons.calpoly.edu/theses/888.
Full textBeaugnon, Ulysse. "Efficient code generation for hardware accelerators by refining partially specified implementation." Thesis, Paris Sciences et Lettres (ComUE), 2019. http://www.theses.fr/2019PSLEE050.
Full textCompilers looking for an efficient implementation of a function must find which optimizations are the most beneficial. This is a complex problem, especially in the early steps of the compilation process. Each decision may impact the transformations available in subsequent steps. We propose to represent the compilation process as the progressive refinement of a partially specified implementation. All potential decisions are exposed upfront and commute. This allows for making the most discriminative decisions first and for building a performance model aware of which optimizations may be applied in subsequent steps. We apply this approach to the generation of efficient GPU code for linear algebra and yield performance competitive with hand-tuned libraries
Hayes, Brian C. "Performance oriented scheduling with power constraints." [Tampa, Fla.] : University of South Florida, 2005. http://purl.fcla.edu/fcla/etd/SFE0001073.
Full textVarma, Krishnaraj M. "Fast Split Arithmetic Encoder Architectures and Perceptual Coding Methods for Enhanced JPEG2000 Performance." Diss., Virginia Tech, 2006. http://hdl.handle.net/10919/26519.
Full textPh. D.