Relevant bibliographies by topics / Performance Optimization in Software and Hardware

Journal articles
Dissertations / Theses
Books
Book chapters
Conference papers
Reports

Academic literature on the topic 'Performance Optimization in Software and Hardware'

Author: Grafiati

Published: 4 June 2021

Last updated: 1 February 2022

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the lists of relevant articles, books, theses, conference reports, and other scholarly sources on the topic 'Performance Optimization in Software and Hardware.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Journal articles on the topic "Performance Optimization in Software and Hardware"

Zhang, Tao, Changfu Yang, and Xin Zhao. "Using Improved Brainstorm Optimization Algorithm for Hardware/Software Partitioning." Applied Sciences 9, no. 5 (February 28, 2019): 866. http://dx.doi.org/10.3390/app9050866.

Full text

Abstract:

Today, more and more complex tasks are emerging. To finish these tasks within a reasonable time, using the complex embedded system which has multiple processing units is necessary. Hardware/software partitioning is one of the key technologies in designing complex embedded systems, it is usually taken as an optimization problem and be solved with different optimization methods. Among the optimization methods, swarm intelligent (SI) algorithms are easily applied and have the advantages of strong robustness and excellent global search ability. Due to the high complexity of hardware/software partitioning problems, the SI algorithms are ideal methods to solve the problems. In this paper, a new SI algorithm, called brainstorm optimization (BSO), is applied to hardware/software partitioning. In order to improve the performance of the BSO, we analyzed its optimization process when solving the hardware/software partitioning problem and found the disadvantages in terms of the clustering method and the updating strategy. Then we proposed the improved brainstorm optimization (IBSO) which ameliorated the original clustering method by setting the cluster points and improved the updating strategy by decreasing the number of updated individuals in each iteration. Based on the simulation methods which are usually used to evaluate the performance of the hardware/software partitioning algorithms, we generated eight benchmarks which represent tasks with different scales to test the performance of IBSO, BSO, four original heuristic algorithms and two improved BSO. Simulation results show that the IBSO algorithm can achieve the solutions with the highest quality within the shortest running time among these algorithms.

APA, Harvard, Vancouver, ISO, and other styles

Yang, Fu, Liu Xin, and Pei Yuan Guo. "A Multi-Objective Optimization Genetic Algorithm for SOPC Hardware-Software Partitioning." Advanced Materials Research 457-458 (January 2012): 1142–48. http://dx.doi.org/10.4028/www.scientific.net/amr.457-458.1142.

Full text

Abstract:

Hardware-software partitioning is the key technology in hardware-software co-design; the results will determine the design of system directly. Genetic algorithm is a classical search algorithm for solving such combinatorial optimization problem. A Multi-objective genetic algorithm for hardware-software partitioning is presented in this paper. This method can give consideration to both system performance and indicators such as time, power, area and cost, and achieve multi-objective optimization in system on programmable chip (SOPC). Simulation results show that the method can solve the SOPC hardware-software partitioning problem effectively.

APA, Harvard, Vancouver, ISO, and other styles

Mhadhbi, Imene, Slim Ben Othman, and Slim Ben Saoud. "An Efficient Technique for Hardware/Software Partitioning Process in Codesign." Scientific Programming 2016 (2016): 1–11. http://dx.doi.org/10.1155/2016/6382765.

Full text

Abstract:

Codesign methodology deals with the problem of designing complex embedded systems, where automatic hardware/software partitioning is one key issue. The research efforts in this issue are focused on exploring new automatic partitioning methods which consider only binary or extended partitioning problems. The main contribution of this paper is to propose a hybrid FCMPSO partitioning technique, based on Fuzzy C-Means (FCM) and Particle Swarm Optimization (PSO) algorithms suitable for mapping embedded applications for both binary and multicores target architecture. Our FCMPSO optimization technique has been compared using different graphical models with a large number of instances. Performance analysis reveals that FCMPSO outperforms PSO algorithm as well as the Genetic Algorithm (GA), Simulated Annealing (SA), Ant Colony Optimization (ACO), and FCM standard metaheuristic based techniques and also hybrid solutions including PSO then GA, GA then SA, GA then ACO, ACO then SA, FCM then GA, FCM then SA, and finally ACO followed by FCM.

APA, Harvard, Vancouver, ISO, and other styles

Umesh, I. M., and G. N. Srinivasan. "Optimum Software Aging Prediction and Rejuvenation Model for Virtualized Environment." Indonesian Journal of Electrical Engineering and Computer Science 3, no. 3 (September 1, 2016): 572. http://dx.doi.org/10.11591/ijeecs.v3.i3.pp572-578.

Full text

Abstract:

<p><em>Advancement in electronics and hardware has resulted in multiple softwares running on the same hardware. The result is multiuser, multitasking and virtualized environments. However, reliability of such high performance computing systems depends both on hardware and software. For hardware, aging can be dealt with replacement. But, software aging needs to be dealt with software only. For aging detection, a new approach using machine learning framework has been proposed in this paper. For rejuvenation, Adaptive Genetic Algorithm (A-GA) has been developed to perform live migration to avoid downtime and SLA violation. The proposed A-GA based rejuvenation controller (A-GARC) has outperformed other heuristic techniques such as Ant Colony Optimization (ACO) and best fit decreasing (BFD) for migration. Results reveal that the proposed aging forecasting method and A-GA based rejuvenation outperforms other approaches to ensure optimal system availability, minimum task migration, performance degradation and SLA violation.</em></p>

APA, Harvard, Vancouver, ISO, and other styles

Tomecek, Jozef. "Hardware optimizations of stream cipher rabbit." Tatra Mountains Mathematical Publications 50, no. 1 (December 1, 2011): 87–101. http://dx.doi.org/10.2478/v10127-011-0039-8.

Full text

Abstract:

ABSTRACT Stream ciphers form part of cryptographic primitives focused on privacy. Synchronous, symmetric and software-oriented stream cipher Rabbit is member of final portfolio of European Union's eStream project. Although it was designed to perform well in software, employed operations seem to compute efficiently in hardware. 128-bit security, with no known security weaknesses is claimed by Rabbit's designers. Since hardware performance of Rabbit was only estimated in the proposal of algorithm, comparison of direct and optimized FPGA implementations of Rabbit stream cipher is presented, identifying algorithm bottlenecks, discussing optimization techniques applied to algorithm computations, along with key area/time trade-offs.

APA, Harvard, Vancouver, ISO, and other styles

Bezemer, Cor-Paul, and Andy Zaidman. "Performance optimization of deployed software-as-a-service applications." Journal of Systems and Software 87 (January 2014): 87–103. http://dx.doi.org/10.1016/j.jss.2013.09.013.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Algarni, Sultan Abdullah, Mohammad Rafi Ikbal, Roobaea Alroobaea, Ahmed S. Ghiduk, and Farrukh Nadeem. "Performance Evaluation of Xen, KVM, and Proxmox Hypervisors." International Journal of Open Source Software and Processes 9, no. 2 (April 2018): 39–54. http://dx.doi.org/10.4018/ijossp.2018040103.

Full text

Abstract:

Hardware virtualization plays a major role in IT infrastructure optimization in private data centers and public cloud platforms. Though there are many advancements in CPU architecture and hypervisors recently, but overhead still exists as there is a virtualization layer between the guest operating system and physical hardware. This is particularly when multiple virtual guests are competing for resources on the same physical hardware. Understanding performance of a virtualization layer is crucial as this would have a major impact on entire IT infrastructure. This article has performed an extensive study on comparing the performance of three hypervisors KVM, Xen, and Proxmox VE. The experiments showed that KVM delivers the best performance on most of the selected parameters. Xen excels in file system performance and application performance. Though Proxmox has delivered the best performance in only the sub-category of CPU throughput. This article suggests best-suited hypervisors for targeted applications.

APA, Harvard, Vancouver, ISO, and other styles

Wang, Xin. "Research on Software Optimization Solutions of E-Commerce Site." Applied Mechanics and Materials 198-199 (September 2012): 626–30. http://dx.doi.org/10.4028/www.scientific.net/amm.198-199.626.

Full text

Abstract:

There are generally two types of E-commerce platform optimized programs: hardware optimization and software optimization, This paper first analyzes the system optimization techniques of software optimization, Including dynamic load optimization technology and cluster technology; Then studies the database performance optimization methods from the table, connection pooling, query and several other aspects; Finally to carry on the research to optimization electronic commerce platform used the cache technology. Proposes a universal significance of E-commerce platform software optimization solutions, these studies have some references for relevant E-commerce website designers and maintainers, and provides a strategy for the corresponding E-commerce enterprises to optimize platform environments.

APA, Harvard, Vancouver, ISO, and other styles

Koltakov, S. A., and A. A. Cherepnev. "HARDWARE-SOFTWARE COMPLEX FOR DIGITAL PROCESSING OF HYDROACOUSTIC SIGNALS." Issues of radio electronics, no. 5 (June 8, 2019): 60–63. http://dx.doi.org/10.21778/2218-5453-2019-5-60-63.

Full text

Abstract:

The article describes the hardware‑software complex (HSC) based on the debugging stand, its composition, modules and operations. A method for synthesizing the output signal is described, a formula and a table of parameters for its calculation are given. Signals and spectra at the input and output of the developed HSC are shown. The obtained parameters of the performance of various agribusiness, based on the signal processor with a General‑purpose processor and two variants with General‑purpose processors. The proposed version of the HSC2–3 times wins in performance compared to the HSC based on the general‑ purpose processor of Intel. This is achieved through the use of modern methods and programming tools, digital signal processing modules, as well as the optimization of the executable code. Recommendations for possible further improvement of the proposed complex are given, which is possible due to the use of modern FPGAs and high‑speed interface.

APA, Harvard, Vancouver, ISO, and other styles

Rahim, N. H. A., A. M. Kassim, M. F. Miskon, A. H. Azahar, and H. Sakidin. "Optimization of One Legged Hopping Robot Hardware Parameters via Solidworks." Applied Mechanics and Materials 393 (September 2013): 544–49. http://dx.doi.org/10.4028/www.scientific.net/amm.393.544.

Full text

Abstract:

This paper discussed about simulation of one legged hopping robot via Solidworks software in order to determine the optimum hardware parameters of the hopping robot. Simulations have been done according to different variables that have been set up earlier which are crank bar length, spring length and spring coefficient. The best parameters were chosen in terms of higher and stable hopping performance. Besides that, an experiment is done to validate the parameters from the simulation. Average hopping height is discussed and overall performances of hopping height stability are proved by the normal distribution graph. As the result, the optimum parameter values for hardware of one legged hopping robot are validated.

APA, Harvard, Vancouver, ISO, and other styles

More sources

Dissertations / Theses on the topic "Performance Optimization in Software and Hardware"

Schöne, Robert, Thomas Ilsche, Mario Bielert, Daniel Molka, and Daniel Hackenberg. "Software Controlled Clock Modulation for Energy Efficiency Optimization on Intel Processors." Saechsische Landesbibliothek- Staats- und Universitaetsbibliothek Dresden, 2017. http://nbn-resolving.de/urn:nbn:de:bsz:14-qucosa-224966.

Full text

Abstract:

Current Intel processors implement a variety of power saving features like frequency scaling and idle states. These mechanisms limit the power draw and thereby decrease the thermal dissipation of the processors. However, they also have an impact on the achievable performance. The various mechanisms significantly differ regarding the amount of power savings, the latency of mode changes, and the associated overhead. In this paper, we describe and closely examine the so-called software controlled clock modulation mechanism for different processor generations. We present results that imply that the available documentation is not always correct and describe when this feature can be used to improve energy efficiency. We additionally compare it against the more popular feature of dynamic voltage and frequency scaling and develop a model to decide which feature should be used to optimize inter-process synchronizations on Intel Haswell-EP processors.

APA, Harvard, Vancouver, ISO, and other styles

Vujic, Nikola. "Software caching techniques and hardware optimizations for on-chip local memories." Doctoral thesis, Universitat Politècnica de Catalunya, 2012. http://hdl.handle.net/10803/83598.

Full text

Abstract:

Despite the fact that the most viable L1 memories in processors are caches, on-chip local memories have been a great topic of consideration lately. Local memories are an interesting design option due to their many benefits: less area occupancy, reduced energy consumption and fast and constant access time. These benefits are especially interesting for the design of modern multicore processors since power and latency are important assets in computer architecture today. Also, local memories do not generate coherency traffic which is important for the scalability of the multicore systems. Unfortunately, local memories have not been well accepted in modern processors yet, mainly due to their poor programmability. Systems with on-chip local memories do not have hardware support for transparent data transfers between local and global memories, and thus ease of programming is one of the main impediments for the broad acceptance of those systems. This thesis addresses software and hardware optimizations regarding the programmability, and the usage of the on-chip local memories in the context of both single-core and multicore systems. Software optimizations are related to the software caching techniques. Software cache is a robust approach to provide the user with a transparent view of the memory architecture; but this software approach can suffer from poor performance. In this thesis, we start optimizing traditional software cache by proposing a hierarchical, hybrid software-cache architecture. Afterwards, we develop few optimizations in order to speedup our hybrid software cache as much as possible. As the result of the software optimizations we obtain that our hybrid software cache performs from 4 to 10 times faster than traditional software cache on a set of NAS parallel benchmarks. We do not stop with software caching. We cover some other aspects of the architectures with on-chip local memories, such as the quality of the generated code and its correspondence with the quality of the buffer management in local memories, in order to improve performance of these architectures. Therefore, we run our research till we reach the limit in software and start proposing optimizations on the hardware level. Two hardware proposals are presented in this thesis. One is about relaxing alignment constraints imposed in the architectures with on-chip local memories and the other proposal is about accelerating the management of local memories by providing hardware support for the majority of actions performed in our software cache.
Malgrat les memòries cau encara son el component basic pel disseny del subsistema de memòria, les memòries locals han esdevingut una alternativa degut a les seves característiques pel que fa a l’ocupació d’àrea, el seu consum energètic i el seu rendiment amb un temps d’accés ràpid i constant. Aquestes característiques son d’especial interès quan les properes arquitectures multi-nucli estan limitades pel consum de potencia i la latència del subsistema de memòria.Les memòries locals pateixen de limitacions respecte la complexitat en la seva programació, fet que dificulta la seva introducció en arquitectures multi-nucli, tot i els avantatges esmentats anteriorment. Aquesta tesi presenta un seguit de solucions basades en programari i maquinari específicament dissenyat per resoldre aquestes limitacions.Les optimitzacions del programari estan basades amb tècniques d'emmagatzematge de memòria cau suportades per llibreries especifiques. La memòria cau per programari és un sòlid mètode per proporcionar a l'usuari una visió transparent de l'arquitectura, però aquest enfocament pot patir d'un rendiment deficient. En aquesta tesi, es proposa una estructura jeràrquica i híbrida. Posteriorment, desenvolupem optimitzacions per tal d'accelerar l’execució del programari que suporta el disseny de la memòria cau. Com a resultat de les optimitzacions realitzades, obtenim que el nostre disseny híbrid es comporta de 4 a 10 vegades més ràpid que una implementació tradicional de memòria cau sobre un conjunt d’aplicacions de referencia, com son els “NAS parallel benchmarks”.El treball de tesi inclou altres aspectes de les arquitectures amb memòries locals, com ara la qualitat del codi generat i la seva correspondència amb la qualitat de la gestió de memòria intermèdia en les memòries locals, per tal de millorar el rendiment d'aquestes arquitectures. La tesi desenvolupa propostes basades estrictament en el disseny de nou maquinari per tal de millorar el rendiment de les memòries locals quan ja no es possible realitzar mes optimitzacions en el programari. En particular, la tesi presenta dues propostes de maquinari: una relaxa les restriccions imposades per les memòries locals respecte l’alineament de dades, l’altra introdueix maquinari específic per accelerar les operacions mes usuals sobre les memòries locals.

APA, Harvard, Vancouver, ISO, and other styles

Serpa, Matheus da Silva. "Source code optimizations to reduce multi core and many core performance bottlenecks." reponame:Biblioteca Digital de Teses e Dissertações da UFRGS, 2018. http://hdl.handle.net/10183/183139.

Full text

Abstract:

Atualmente, existe uma variedade de arquiteturas disponíveis não apenas para a indústria, mas também para consumidores finais. Processadores multi-core tradicionais, GPUs, aceleradores, como o Xeon Phi, ou até mesmo processadores orientados para eficiência energética, como a família ARM, apresentam características arquiteturais muito diferentes. Essa ampla gama de características representa um desafio para os desenvolvedores de aplicações. Os desenvolvedores devem lidar com diferentes conjuntos de instruções, hierarquias de memória, ou até mesmo diferentes paradigmas de programação ao programar para essas arquiteturas. Para otimizar uma aplicação, é importante ter uma compreensão profunda de como ela se comporta em diferentes arquiteturas. Os trabalhos relacionados provaram ter uma ampla variedade de soluções. A maioria deles se concentrou em melhorar apenas o desempenho da memória. Outros se concentram no balanceamento de carga, na vetorização e no mapeamento de threads e dados, mas os realizam separadamente, perdendo oportunidades de otimização. Nesta dissertação de mestrado, foram propostas várias técnicas de otimização para melhorar o desempenho de uma aplicação de exploração sísmica real fornecida pela Petrobras, uma empresa multinacional do setor de petróleo. Os experimentos mostram que loop interchange é uma técnica útil para melhorar o desempenho de diferentes níveis de memória cache, melhorando o desempenho em até 5,3 e 3,9 nas arquiteturas Intel Broadwell e Intel Knights Landing, respectivamente. Ao alterar o código para ativar a vetorização, o desempenho foi aumentado em até 1,4 e 6,5 . O balanceamento de carga melhorou o desempenho em até 1,1 no Knights Landing. Técnicas de mapeamento de threads e dados também foram avaliadas, com uma melhora de desempenho de até 1,6 e 4,4 . O ganho de desempenho do Broadwell foi de 22,7 e do Knights Landing de 56,7 em comparação com uma versão sem otimizações, mas, no final, o Broadwell foi 1,2 mais rápido que o Knights Landing.
Nowadays, there are several different architectures available not only for the industry but also for final consumers. Traditional multi-core processors, GPUs, accelerators such as the Xeon Phi, or even energy efficiency-driven processors such as the ARM family, present very different architectural characteristics. This wide range of characteristics presents a challenge for the developers of applications. Developers must deal with different instruction sets, memory hierarchies, or even different programming paradigms when programming for these architectures. To optimize an application, it is important to have a deep understanding of how it behaves on different architectures. Related work proved to have a wide variety of solutions. Most of then focused on improving only memory performance. Others focus on load balancing, vectorization, and thread and data mapping, but perform them separately, losing optimization opportunities. In this master thesis, we propose several optimization techniques to improve the performance of a real-world seismic exploration application provided by Petrobras, a multinational corporation in the petroleum industry. In our experiments, we show that loop interchange is a useful technique to improve the performance of different cache memory levels, improving the performance by up to 5.3 and 3.9 on the Intel Broadwell and Intel Knights Landing architectures, respectively. By changing the code to enable vectorization, performance was increased by up to 1.4 and 6.5 . Load Balancing improved the performance by up to 1.1 on Knights Landing. Thread and data mapping techniques were also evaluated, with a performance improvement of up to 1.6 and 4.4 . We also compared the best version of each architecture and showed that we were able to improve the performance of Broadwell by 22.7 and Knights Landing by 56.7 compared to a naive version, but, in the end, Broadwell was 1.2 faster than Knights Landing.

APA, Harvard, Vancouver, ISO, and other styles

Shee, Seng Lin Computer Science &amp Engineering Faculty of Engineering UNSW. "ADAPT : architectural and design exploration for application specific instruction-set processor technologies." Awarded by:University of New South Wales, 2007. http://handle.unsw.edu.au/1959.4/35404.

Full text

Abstract:

This thesis presents design automation methodologies for extensible processor platforms in application specific domains. The work presents first a single processor approach for customization; a methodology that can rapidly create different processor configurations by the removal of unused instructions sets from the architecture. A profile directed approach is used to identify frequently used instructions and to eliminate unused opcodes from the available instruction pool. A coprocessor approach is next explored to create an SoC (System-on-Chip) to speedup the application while reducing energy consumption. Loops in applications are identified and accelerated by tightly coupling a coprocessor to an ASIP (Application Specific Instruction-set Processor). Latency hiding is used to exploit the parallelism provided by this architecture. A case study has been performed on a JPEG encoding algorithm; comparing two different coprocessor approaches: a high-level synthesis approach and our custom coprocessor approach. The thesis concludes by introducing a heterogenous multi-processor system using ASIPs as processing entities in a pipeline configuration. The problem of mapping each algorithmic stage in the system to an ASIP configuration is formulated. We proposed an estimation technique to calculate runtimes of the configured multiprocessor system without running cycle-accurate simulations, which could take a significant amount of time. We present two heuristics to efficiently search the design space of a pipeline-based multi ASIP system and compare the results against an exhaustive approach. In our first approach, we show that, on average, processor size can be reduced by 30%, energy consumption by 24%, while performance is improved by 24%. In the coprocessor approach, compared with the use of a main processor alone, a loop performance improvement of 2.57x is achieved using the custom coprocessor approach, as against 1.58x for the high level synthesis method, and 1.33x for the customized instruction approach. Energy savings are 57%, 28% and 19%, respectively. Our multiprocessor design provides a performance improvement of at least 4.03x for JPEG and 3.31x for MP3, for a single processor design system. The minimum cost obtained using our heuristic was within 0.43% and 0.29% of the optimum values for the JPEG and MP3 benchmarks respectively.

APA, Harvard, Vancouver, ISO, and other styles

Sid, Lakhdar Riyane Yacine. "Méthodologie pour l'optimisation logicielle de structures de données pour les architectures hautes performances à mémoires complexes." Thesis, Université Grenoble Alpes, 2020. http://www.theses.fr/2020GRALM058.

Full text

Abstract:

La sélection d’une implémentation adéquate de structure de données pour un noyau de calcul donné est un problème critique pour les performances logicielles. La com- plexité de la résolution efficace de ce problème est exacerbée par la concurrence de mémoires matérielles complexes, hétérogènes et dédiées à une application specifique. Modifier légèrement une application optimisée ou la porter sur une nouvelle archi- tecture matérielle nécessite un temps et un effort d’ingénierie considérable. Cela nécessite également une connaissance approfondie de la plateforme matérielle hôte.Au cours de cette thèse, nous franchissons une première étape vers l’optimisation par l’adaptation automatique du logiciel au matériel. Nous présentons une approche itérative d’optimisation basée sur la détection et l’exploration des paramètres les plus influents liés au matériel, au système d’exploitation et au logiciel. La méthode proposée est conçue pour être intégrée dans un compilateur à usage général. Dans ce contexte, nous proposons un algorithme de génération de modèles (entièrement paramétrées) de mémoires caches. Les modèles de performance générés sont conçus pour être utilisé dans le cadre d’évaluations de performances et d’optimisation.Afin d’explorer les paramètres liés à aux structures de données, nous pro- posons HARDSI, une méthode brevetée permettant la résolution du problème de l’agencement des données pour logiciel donné. Dans le but d’appliquer notre méth- ode, nous proposons également un langage dédié (basé sur le langage C/C++) ainsi que son environnement logiciel de compilation et d’exécution. La méthode HARDSI permet de choisir, à partir d’une base de connaissances spécialisée, une implémen- tation optimisée de l’agencement des données en fonction de la géométrie d’accès à la structure de données. Les solutions générées sont également spécifiquement adaptées aux caractéristiques matérielles de la mémoire hôte considérée.De même, nous considérons la résolution du problème de l’agencement des don- nées sur les mémoires singulières qui sont explicitement adressés par le program- meur (tel que les mémoires de type "scratchpad" ou GPU). Le problème que nous abordons est de trouver un emplacement mémoire optimisé afin de maximiser la quantité de données fréquemment accédées et à stocker dans ce type de mémoires rapides bien qu’étroites. Dans ce contexte, nous proposons DDLGS, une méthode brevetée conçue pour générer une implémentation dynamique des données sur mé- moires scratchpad. Ces implémentations sont conçus par DDLGS en considérant le schéma d’accès à la mémoire spécifiquement suivi par le code a optimiser.Dans le but d’évaluer nos implémentations sur différents environnements matériels, nous considérons deux processeurs et mémoires différents: (i) un pro- cesseur x86 implémentant un Intel Xeon à trois niveaux de caches de données et (ii) un processeur massivement parallel implémentant un Kalray Coolidge-80-30 à mé- moire scratchpad sur puce de 16K octets. Les expériences menées sur des noyeaux d’algèbre linéaire, d’intelligence artificielle et de traitement d’images montrent que notre méthode détermine avec précision une implémentation optimisée des struc- tures de données. Ces implémentations permettent d’atteindre une accélération du temps d’exécution jusqu’à 48,9x sur le processeur Xeon et 54,2x sur le Coolidge
With the rising impact of the memory wall, selecting the adequate data-structure implementation for a given kernel has become a performance-critical issue. The complexity of solving efficiently this Data-Layout-Decision (DLD) problem is dra- matically increased by the concurrence of complex, heterogeneous and application- specific hardware memories. Slightly modifying an optimized application or porting it to a new hardware architecture requires an important time and engineering effort. It also requires a deep knowledge of the host hardware platform.In this thesis, we plot a first step toward automatic software-adaptation to hard- ware. We present an iterative data-mining-related software-optimization approach based on the detection and the exploration of the most influential parameters linked to the hardware, operating system and software. We also propose a custom data- cache-miss modeling algorithm designed to be used as fully-parameterized perfor- mance evaluation. The proposed approach is designed to be embedded within a general-purpose compiler.In order to explore the parameters related to the data-layout implementation, we propose HARDSI, a custom patented method to solve the DLD problem. We also propose to apply our method using a custom domain-specific language and computation framework. The HARDSI method allows to choose, from a custom base of knowledge, an optimized data-layout implementation with regards to the memory-pattern followed to access the considered data-structure. The generated solutions are also specifically adapted to the properties of the host hardware-memory.Meanwhile, we consider the singular resolution of the DLD problem on memories that are explicitly addressed by the programmer (such as embedded scratchpad memories or GPUs). The problem that we address is to find an optimized memory- placement in order to maximize the amount of frequently-accessed data to be stored within this fast yet narrow memory. In this context, we propose DDLGS, a custom patented method designed to generate a dynamic data-layout with regards to the followed memory-access pattern. The generated implementations encompass the specific load and store routines as well as the granularity attributed to each data transferred. These implementations are also able to adapt, at run time, to the input of the considered source-code.Aiming to evaluate our implementations on different hardware environments, we have considered two different processor and memory architectures: (i) An x86 pro- cessor implementing an Intel Xeon with three levels of data-caches utilizing the least recently used replacement policy and a (ii) Massively Parallel Processor Array im- plementing a Kalray Coolidge-80-30 with a 16KBytes on-chip scratchpad memory. Experiments on linear algebra, artificial intelligence and image processing bench- marks show that our method accurately determines an optimized data-structure implementation. These implementations allow reaching an execution-time speed-up up to 48.9x on the Xeon processor and 54.2x on the Coolidge processor

APA, Harvard, Vancouver, ISO, and other styles

Pinto, Christian <1986&gt. "Many-Core Architectures: Hardware-Software Optimization and Modeling Techniques." Doctoral thesis, Alma Mater Studiorum - Università di Bologna, 2015. http://amsdottorato.unibo.it/6824/.

Full text

Abstract:

During the last few decades an unprecedented technological growth has been at the center of the embedded systems design paramount, with Moore’s Law being the leading factor of this trend. Today in fact an ever increasing number of cores can be integrated on the same die, marking the transition from state-of-the-art multi-core chips to the new many-core design paradigm. Despite the extraordinarily high computing power, the complexity of many-core chips opens the door to several challenges. As a result of the increased silicon density of modern Systems-on-a-Chip (SoC), the design space exploration needed to find the best design has exploded and hardware designers are in fact facing the problem of a huge design space. Virtual Platforms have always been used to enable hardware-software co-design, but today they are facing with the huge complexity of both hardware and software systems. In this thesis two different research works on Virtual Platforms are presented: the first one is intended for the hardware developer, to easily allow complex cycle accurate simulations of many-core SoCs. The second work exploits the parallel computing power of off-the-shelf General Purpose Graphics Processing Units (GPGPUs), with the goal of an increased simulation speed. The term Virtualization can be used in the context of many-core systems not only to refer to the aforementioned hardware emulation tools (Virtual Platforms), but also for two other main purposes: 1) to help the programmer to achieve the maximum possible performance of an application, by hiding the complexity of the underlying hardware. 2) to efficiently exploit the high parallel hardware of many-core chips in environments with multiple active Virtual Machines. This thesis is focused on virtualization techniques with the goal to mitigate, and overtake when possible, some of the challenges introduced by the many-core design paradigm.

APA, Harvard, Vancouver, ISO, and other styles

Muffang, Louis. "SLAM Hardware & Software optimization for mobile platform integration." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-294332.

Full text

Abstract:

This thesis work will focus on the optimization of a state-of-the-art monocular Visual-Inertial Odometry (VIO) algorithm for real-time application with limited resources on an embedded system. We will be using a multi-processor unit equipped with a Digital Signal Processor (DSP) to accelerate and offload tasks from the CPU. The goal is to reduce resource consumption without damaging the algorithm performance in speed and accuracy. To this end, we will first identify OpenVINS [1] as a suitable algorithm for this work and find the functions to optimize. When comparing the version of the optimized algorithm with the DSP and its original version, we achieved a similar performance accuracy with more than x1.5 power consumption saving on the CPU and more than x2 memory saving. This work finds its importance in every embedded system which requires a vision-based localization system running along with other CPU heavy tasks.
Denna rapport beskriver optimeringen av en algoritm för icke-stereo Visual- Inertial Odometry (VIO), för realtidsapplikationer med begränsade resurser på inbäddade system. Vi använder en multi-processor enhet utrustad med Digital Signal Processor (DSP)) för att öka prestandan och avlasta huvudprocessorn (CPU:n), så att den kan användas för andra uppgifter parallellt. Målet är att minska resursförbrukningen utan att försämra hastighet eller noggrannhet hos algoritmen. Vi identifierar OpenVINS som en lämplig VIO-algoritm att optimera. Resultatet av studien är att vi lyckas minska minnesåtgången för CPU:n med en faktor 2, och energiförbrukningen med en faktor 1,5. Dessa resultat kan komma till användning i alla system som använder en VIO-algoritm parallellt med andra beräkningskrävande uppgifter.

APA, Harvard, Vancouver, ISO, and other styles

Motiwala, Quaeed. "Optimizations for acyclic dataflow graphs for hardware-software codesign." Thesis, This resource online, 1994. http://scholar.lib.vt.edu/theses/available/etd-06302009-040504/.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Shen, Chung-Ching. "Energy-driven optimization of hardware and software for distributed embedded systems." College Park, Md.: University of Maryland, 2008. http://hdl.handle.net/1903/8901.

Full text

Abstract:

Thesis (Ph. D.) -- University of Maryland, College Park, 2008.
Thesis research directed by: Dept. of Electrical and Computer Engineering . Title from t.p. of PDF. Includes bibliographical references. Published by UMI Dissertation Services, Ann Arbor, Mich. Also available in paper.

APA, Harvard, Vancouver, ISO, and other styles

Brankovic, Aleksandar. "Performance simulation methodologies for hardware/software co-designed processors." Doctoral thesis, Universitat Politècnica de Catalunya, 2015. http://hdl.handle.net/10803/287978.

Full text

Abstract:

Recently the community started looking into Hardware/Software (HW/SW) co-designed processors as potential solutions to move towards the less power consuming and the less complex designs. Unlike other solutions, they reduce the power and the complexity doing so called dynamic binary translation and optimization from a guest ISA to an internal host custom ISA. This thesis tries to answer the question on how to simulate this kind of architectures. For any kind of processor's architecture, the simulation is the common practice, because it is impossible to build several versions of hardware in order to try all alternatives. The simulation of HW/SW co-designed processors has a big issue in comparison with the simulation of traditional HW-only architectures. First of all, open source tools do not exist. Therefore researches many times assume that the software layer overhead, which is in charge for dynamic binary translation and optimization, is constant or ignored. In this thesis we show that such an assumption is not valid and that can lead to very inaccurate results. Therefore including the software layer in the simulation is a must. On the other side, the simulation is very slow in comparison to native execution, so the community spent a big effort on delivering accurate results in a reasonable amount of time. Therefore it is the common practice for HW-only processors that only parts of application stream, which are called samples, are simulated. Samples usually correspond to different phases in the application stream and usually they are no longer than a few million of instructions. In order to archive accurate starting state of each sample, microarchitectural structures are warmed-up for a few million instructions prior to samples instructions. Unfortunately, such a methodology cannot be directly applied for HW/SW co-designed processors. The warm-up for HW/SW co-designed processors needs to be 3-4 orders of magnitude longer than the warm-up needed for traditional HW-only processor, because the warm-up of software layer needs to be longer than the warm-up of hardware structures. To overcome such a problem, in this thesis we propose a novel warm-up technique specialized for HW/SW co-designed processors. Our solution reduces the simulation time by at least 65X with an average error of just 0.75\%. Such a trend is visible for different software and hardware configurations. The process used to determine simulation samples cannot be applied to HW/SW co-designed processors as well, because due to the software layer, samples show more dissimilarities than in the case of HW-only processors. Therefore we propose a novel algorithm that needs 3X less number of samples to achieve similar error like the state of the art algorithms. Again, such a trend is visible for different software and hardware configurations.
Els processadors co-dissenyats Hardware/Software (HW/SW co-designed processors) han estat proposats per l'acadèmia i la indústria com a solucions potencials per a fabricar processadors menys complexos i que consumeixen menys energia. A diferència d'altres alternatives, aquest tipus de processadors redueixen la complexitat i el consum d'energia aplicant traducció y optimització dinàmica de binaris des d'un repertori d'instruccions (instruction set architecture) extern cap a un repertori d'instruccions intern adaptat. Aquesta tesi intenta resoldre els reptes relacionats a la simulació d'aquest tipus d'arquitectures. La simulació és un procés comú en el disseny i desenvolupament de processadors ja que permet explorar diverses alternatives sense haver de fabricar el hardware per a cadascuna d'elles. La simulació de processadors co-dissenyats Hardware/Software és un procés més complex que la simulació de processadores tradicionals, purament hardware. Per exemple, no existeixen eines de simulació disponibles per a la comunitat. Per tant, els investigadors acostumen a assumir que la capa de software, que s'encarrega de la traducció i optimització de les aplicacions, no té un pes específic i, per tant, uns costos computacionals baixos o constants en el millor dels casos. En aquesta tesis demostrem que aquestes premisses són incorrectes i que els resultats amb aquestes acostumen a ser molt imprecisos. Una primera conclusió d'aquesta tesi doncs és que la simulació de la capa software és totalment necessària. A més a més, degut a que els processos de simulació són lents, s'han proposat tècniques de simulació que intenten obtenir resultats precisos en el menor temps possible. Una pràctica habitual és la simulació només de parts de les aplicacions, anomenades mostres, en el disseny de processadors convencionals, purament hardware. Aquestes mostres corresponen a diferents fases de les aplicacions i acostumen a ser de pocs milions d'instruccions. Per tal d'aconseguir un estat microarquitectònic acurat per a cadascuna de les mostres, s'acostumen a estressar aquestes estructures microarquitectòniques del simulador abans de començar a extreure resultats, procés anomenat "escalfament" (warm-up). Desafortunadament, aquesta metodologia no pot ser aplicada a processadors co-dissenyats Hardware/Software. L'"escalfament" de les estructures internes del simulador en el disseny de processadores co-dissenyats Hardware/Software són 3-4 ordres de magnitud més gran que el mateix procés d' "escalfament" en simulacions de processadors convencionals, ja que en els primers cal "escalfar" també les estructures i l'estat de la capa software. En aquesta tesi proposem tècniques de simulació basades en l' "escalfament" de les estructures que redueixen el temps de simulació en 65X amb un error mig del 0,75%. Aquests resultats són extrapolables a diferents configuracions del hardware i de la capa software. Finalment, les tècniques convencionals de selecció de mostres d'aplicacions a simular no són aplicables tampoc a la simulació de processadors co-dissenyats Hardware/Software degut a que les mostres es comporten de manera molt diferent quan es té en compte la capa software. En aquesta tesi, proposem un nou algorisme que redueix 3X el nombre de mostres a simular comparat amb els algorismes tradicionals per a processadors convencionals per a obtenir un error similar. Aquests resultats també són extrapolables a diferents configuracions de hardware i de software. En conclusió, en aquesta tesi es respon al repte de com simular processadors co-dissenyats Hardware/Software, que són una alternativa al disseny tradicional de processadors. Hem demostrat que cal simular la capa software i s'han proposat noves tècniques i algorismes eficients d' "escalfament" i selecció de mostres que són tolerants a diferents configuracions

APA, Harvard, Vancouver, ISO, and other styles

More sources

Books on the topic "Performance Optimization in Software and Hardware"

Kastner, Ryan. Arithmetic optimization techniques for hardware and software design. New York: Cambridge University Press, 2010.

Find full text

APA, Harvard, Vancouver, ISO, and other styles

L, Crawford Isom, ed. Software optimization for high-performance computing. Upper Saddle River, N.J: Prentice Hall PTR, 2000.

Find full text

APA, Harvard, Vancouver, ISO, and other styles

Zhan, Jianfeng, Rui Han, and Roberto V. Zicari, eds. Big Data Benchmarks, Performance Optimization, and Emerging Hardware. Cham: Springer International Publishing, 2016. http://dx.doi.org/10.1007/978-3-319-29006-5.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Zhan, Jianfeng, Rui Han, and Chuliang Weng, eds. Big Data Benchmarks, Performance Optimization, and Emerging Hardware. Cham: Springer International Publishing, 2014. http://dx.doi.org/10.1007/978-3-319-13021-7.

Full text

APA, Harvard, Vancouver, ISO, and other styles

SAP performance optimization guide. 6th ed. Bonn: Galileo Press, 2011.

Find full text

APA, Harvard, Vancouver, ISO, and other styles

Pro Android apps performance optimization. New York, NY: Apress, 2012.

Find full text

APA, Harvard, Vancouver, ISO, and other styles

Di Pillo, Gianni, and Almerico Murli, eds. High Performance Algorithms and Software for Nonlinear Optimization. Boston, MA: Springer US, 2003. http://dx.doi.org/10.1007/978-1-4613-0241-4.

Full text

APA, Harvard, Vancouver, ISO, and other styles

De Leone, Renato, Almerico Murli, Panos M. Pardalos, and Gerardo Toraldo, eds. High Performance Algorithms and Software in Nonlinear Optimization. Boston, MA: Springer US, 1998. http://dx.doi.org/10.1007/978-1-4613-3279-4.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Yang, Laurence Tianruo. High Performance Scientific and Engineering Computing: Hardware/Software Support. Boston, MA: Springer US, 2004.

Find full text

APA, Harvard, Vancouver, ISO, and other styles

Pro iOS apps performance optimization. [New York, N.Y.]: Apress, 2011.

Find full text

APA, Harvard, Vancouver, ISO, and other styles

More sources

Book chapters on the topic "Performance Optimization in Software and Hardware"

Salehi, Mohammad, Florian Kriebel, Semeen Rehman, and Muhammad Shafique. "Power-Aware Fault-Tolerance for Embedded Systems." In Dependable Embedded Systems, 565–88. Cham: Springer International Publishing, 2020. http://dx.doi.org/10.1007/978-3-030-52017-5_24.

Full text

Abstract:

AbstractPower-constrained fault-tolerance has emerged as a key challenge in the deep sub-micron technology. Multi-/many-core chips can support different hardening modes considering variants of redundant multithreading (RMT). In dark silicon chips, the maximum number of cores that can simultaneously be powered-on (at the full performance level) is constrained by the thermal design power (TDP). The rest of the cores have to be power-gated (i.e., stay “dark”), or the cores have to operate at a lower performance level. It has been predicted that about 25–50% of a many-core chip can potentially be “dark.” In this chapter, a system-level power–reliability management technique is presented. The technique jointly considers multiple hardening modes at the software and hardware levels, each offering distinct power, reliability, and performance properties. Also, a framework for the system-level optimization is introduced which considers different power–reliability–performance management problems for many-core processors depending upon the target system and user constraints.

APA, Harvard, Vancouver, ISO, and other styles

Malik, Sharad, Wayne Wolf, Andrew Wolfe, Yau-Tsun Steven, and Ti-Yen Yen. "Performance Analysis of Embedded Systems." In Hardware/Software Co-Design, 45–71. Dordrecht: Springer Netherlands, 1996. http://dx.doi.org/10.1007/978-94-009-0187-2_3.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Hofmann, Robin, Leonie Ahrendts, and Rolf Ernst. "CPA: Compositional Performance Analysis." In Handbook of Hardware/Software Codesign, 721–51. Dordrecht: Springer Netherlands, 2017. http://dx.doi.org/10.1007/978-94-017-7267-9_24.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Hofmann, Robin, Leonie Ahrendts, and Rolf Ernst. "CPA – Compositional Performance Analysis." In Handbook of Hardware/Software Codesign, 1–31. Dordrecht: Springer Netherlands, 2016. http://dx.doi.org/10.1007/978-94-017-7358-4_24-2.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Walster, G. William. "Stimulating Hardware and Software Support for Interval Arithmetic." In Applied Optimization, 405–16. Boston, MA: Springer US, 1996. http://dx.doi.org/10.1007/978-1-4613-3440-8_15.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Panerati, Jacopo, Donatella Sciuto, and Giovanni Beltrame. "Optimization Strategies in Design Space Exploration." In Handbook of Hardware/Software Codesign, 189–216. Dordrecht: Springer Netherlands, 2017. http://dx.doi.org/10.1007/978-94-017-7267-9_7.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Panerati, Jacopo, Donatella Sciuto, and Giovanni Beltrame. "Optimization Strategies in Design Space Exploration." In Handbook of Hardware/Software Codesign, 1–29. Dordrecht: Springer Netherlands, 2016. http://dx.doi.org/10.1007/978-94-017-7358-4_7-1.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Liao, Stan, Srinivas Devadas, Kurt Keutzer, Steve Tjiang, Albert Wang, Guido Araujo, Ashok Sudarsanam, Sharad Malik, Vojin Živojnović, and Heinrich Meyr. "Code Generation and Optimization Techniques for Embedded Digital Signal Processors." In Hardware/Software Co-Design, 165–86. Dordrecht: Springer Netherlands, 1996. http://dx.doi.org/10.1007/978-94-009-0187-2_7.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Rehman, Semeen, Muhammad Shafique, and Jörg Henkel. "Cross-Layer Reliability Analysis, Modeling, and Optimization." In Reliable Software for Unreliable Hardware, 51–80. Cham: Springer International Publishing, 2016. http://dx.doi.org/10.1007/978-3-319-25772-3_3.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Schulte, Michael J., and Earl E. Swartzlander. "Software and Hardware Techniques for Accurate, Self-Validating Arithmetic." In Applied Optimization, 381–404. Boston, MA: Springer US, 1996. http://dx.doi.org/10.1007/978-1-4613-3440-8_14.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Conference papers on the topic "Performance Optimization in Software and Hardware"

Murari, Rafael, João Paulo Carvalho, Guido Araujo, and Alexandro Baldassin. "Performance Optimization of Persistent Memory Systems Through Phase-Based Transactional Memory." In Escola Regional de Alto Desempenho de São Paulo. Sociedade Brasileira de Computação, 2019. http://dx.doi.org/10.5753/eradsp.2019.13590.

Full text

Abstract:

The emerging persistent memory technologies (PM) are aimed to eliminate the gap between main memory and storage. Nevertheless, its adoption requires measures to guarantee consistency, since crash failures might render the program in an unrecoverable state. In this context, the usage of durable transactions is one of the main investigated approaches to ease the adoption of PM. However, today's implementations are based exclusively on software (SW) or hardware (HW), which might degrade system performance. This paper presents NV-PhTM, a transactional system for PM that delivers the best out of both HW and SW transactions by dynamically changing the execution according to the application's characteristics.

APA, Harvard, Vancouver, ISO, and other styles

Lotz, R. D. "Aerodynamic Optimization Process for Turbocharger Compressor Impellers." In ASME Turbo Expo 2017: Turbomachinery Technical Conference and Exposition. American Society of Mechanical Engineers, 2017. http://dx.doi.org/10.1115/gt2017-64365.

Full text

Abstract:

This paper presents the progression at BorgWarner Turbo Systems of the aerodynamic optimization process for radial turbocharger compressor impellers used in commercial vehicle applications. The design process was refined over several years, starting from relatively simple, single objective optimizations and moving to increasingly higher complexity with multiple operating points and objectives. CFD and numerical optimization techniques are used extensively with the aim of reducing costly gas stand testing with prototype hardware. Commercial software packages are used throughout for geometry definition, flow field evaluation, as well as an optimization scheduling providing genetic and gradient based algorithms. Design outcomes of the various developments were prototyped and tested at the BorgWarner Technical Center in Arden, NC. CFD predictions are compared with test data and discrepancies quantified. The resulting impeller designs show steady improvements with each design and methodology iteration, to the point that significant improvements in performance over conventional designs can be achieved consistently.

APA, Harvard, Vancouver, ISO, and other styles

Xia, Yanjun, George Maddox, Sam Lowry, and Hui Ding. "Design and Optimization of a Vertical Turbine Pump." In ASME/JSME/KSME 2015 Joint Fluids Engineering Conference. American Society of Mechanical Engineers, 2015. http://dx.doi.org/10.1115/ajkfluids2015-33233.

Full text

Abstract:

An example of the complete process used by PumpWorks 610, LLC in designing and optimizing a vertical turbine pump is presented, starting with the creation of an initial prototype using the design software CFturbo®, and the subsequent virtual testing and analysis with the CFD tool PumpLinx®. The results of the CFD analysis are used to identify the sources of loss of efficiency, with specific examples of how those losses are identified. The geometry of the pump is then modified to improve performance and re-tested in the virtual environment before any hardware is manufactured. Once built, the predicted performance of the optimized pump is verified by physical testing. Comparisons between the CFD predictions and the empirical data are presented. Once demonstrated to perform as intended, the final design is delivered to the customer.

APA, Harvard, Vancouver, ISO, and other styles

Cheng, Pengxin, Cheng Ren, Yongyong Wu, and Rui Li. "Design and Optimization of Temperature Acquisition System for Determination of Effective Thermal Conductivity of Pebble Bed." In 2017 25th International Conference on Nuclear Engineering. American Society of Mechanical Engineers, 2017. http://dx.doi.org/10.1115/icone25-66148.

Full text

Abstract:

A full-scale heat transfer test facility has been designed and built for the determination of effective thermal conductivity of pebble bed, which is a macroscopic parameter to characterize the heat transfer capacity of the core in the High Temperature Gas-Cooled Reactor. The data acquisition system is developed to collect, display and record the temperature data in monitoring points. Two alternative software systems are designed to obtain better performance. To enhance precision of the measurement system, several aspects are analyzed and optimized in the implementation of LabVIEW. The error of the hardware system is analyzed, which is within the acceptable range. The data acquisition system can meet the practical demands of temperature acquisition in the range of thermal analysis.

APA, Harvard, Vancouver, ISO, and other styles

Zhao, G. "Micro-pilot-induced Ignition Diesel/ Natural Gas Engine Control System Development and Engine Performance /Emission Optimization." In International Ship Control Systems Symposium. IMarEST, 2018. http://dx.doi.org/10.24868/issn.2631-8741.2018.010.

Full text

Abstract:

Diesel/natural gas dual fuel engine is acquiring more and more attention due to its potential to reduce NOX and soot emission simultaneously. Micro-pilot-induced diesel ignition natural gas engine is a popular manner to further improve the emission reduction capability of dual fuel engine. A six cylinder, four stroke, commonrail diesel engine is converted into dual fuel engine. Natural gas is injected into the intake manifold after the throttle. Five gas injection valves are used to control natural gas flow rate. Based to the established fuel supply system, a dual fuel control system is developed by using MS9S12XEP100 MCU. Voltage boosting circuit, fuel injector driving circuit, gas injection valve driving circuit and MeUn driving circuit are integrated on the platform of MCU hardware. Two ECU is connected to each other by CAN bus and several I/O ports to fulfil the fuel injection functional requirement. A software framework involves gas injection timing synchronization, fuel mode managing, multi-time injection. A MAP based fresh air mass flow rate and intake charge efficiency model is integrated in the MCU to calculate the fresh air quality in cylinder. The last part is performance optimization research at low load. Ignition diesel is divided into two stages, and the first injection timing, first injection ratio and injection pressure are used as controllable parameter to reduce NOX and HC emission. Experimental result reveal that by dividing ignition injection into two stage and advancing first injection to 60°CA BTDC CH4 emission can be reduced by 77% while NOX remains unchanged. Increasing the first injection ratio and injection pressure can also reduce THC emission. If injection pressure is higher than 75MPa, the effect of HC reduction effect is not that obvious. Experimental results shows that developed control system can accomplish the functional requirements of dual fuel engine management. Emission test results demonstrate that IMO TierII can be satisfied at diesel mode. DF mode emission performance can meet the requirement of IMO TierIII. Furthermore, as the first domestic product dual fuel dedicated control system, which has passed through the CCS authentication in China, the engine emission level can meet the current and upcoming China’s emission standard on non-road engine on the premise of guaranteeing engine power and economy.

APA, Harvard, Vancouver, ISO, and other styles

Burk, Reinhard, Frederic Jacquelin, and Russell Wakeman. "Using Co-Simulation Methods to Establish Variable Valve Actuation Hardware Specifications and Control Strategies." In ASME 2001 Internal Combustion Engine Division Fall Technical Conference. American Society of Mechanical Engineers, 2001. http://dx.doi.org/10.1115/2001-ice-427.

Full text

Abstract:

Abstract With the increasing recognition that variable valve actuation (VVA) in its various forms is a powerful tool for optimizing the performance of internal combustion engines, more and more production systems are being designed and implemented throughout the industry. However, as these control systems become more capable of altering lift, timing, duration, and even the number of valve events, the complexity of designing algorithms and calibrating them becomes enormous. In addition, without prior knowledge of an engine’s response to these algorithms, designing a cost-effective mechanism which provides adequate but not over-reaching capability is difficult. Ricardo has developed methodology for timestep coupled simulations which enables the use of one-dimensional (1-D) gas dynamics simulation of engine performance (WAVE™) coupled to a simulation of the valve actuation mechanism constructed in MATLAB® and AMESim®. This arrangement allows valve motion input to the 1-D code to be controlled either manually or by a VVA controller simulation, allowing such engine parameters as torque, fuel consumption, NVH, and EGR rates to be monitored as a function of valve timing strategy. This method allows the examination of such engine development concerns as tolerances, valve velocities and accelerations, and interactions with other engine controls to be studied without the costs, leadtimes, or hardware reliability problems that are associated with prototyping a VVA system. In addition, the interfacing of the valve control/engine performance simulation combination with the Design of Experiments optimization software iSIGHT allows the control system space to be explored automatically, without the brute force numerical search required to examine all permutations of the control strategies. The output of this procedure is an array of requirements which can be quickly translated into a specification document which will guide hardware and controls design efforts.

APA, Harvard, Vancouver, ISO, and other styles

Edwards, M. D. "Hardware/software partitioning for performance enhancement." In IEE Colloquium on Partitioning in Hardware-Software Codesigns. IEE, 1995. http://dx.doi.org/10.1049/ic:19950168.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Dassatti, Alberto, and Roberto Rigamonti. "Heterogeneous Hardware from Homogeneous Software." In 2017 International Conference on High Performance Computing & Simulation (HPCS). IEEE, 2017. http://dx.doi.org/10.1109/hpcs.2017.153.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Suzuki, Kei, and Alberto Sangiovanni-Vincentelli. "Efficient software performance estimation methods for hardware/software codesign." In the 33rd annual conference. New York, New York, USA: ACM Press, 1996. http://dx.doi.org/10.1145/240518.240633.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Zucker, R. N., and J. L. Baer. "Software versus hardware coherence: performance versus cost." In Proceedings of the Twenty-Seventh Annual Hawaii International Conference on System Sciences. IEEE Comput. Soc. Press, 1994. http://dx.doi.org/10.1109/hicss.1994.323175.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Reports on the topic "Performance Optimization in Software and Hardware"

Henry, Wendell A. High Performance Hardware and Software for Pattern Reconition and Image Processing. Fort Belvoir, VA: Defense Technical Information Center, December 1994. http://dx.doi.org/10.21236/ada289153.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Henry, Wendell A. High Performance Hardware and Software for Pattern Recognition and Image Processing. Fort Belvoir, VA: Defense Technical Information Center, June 1995. http://dx.doi.org/10.21236/ada295580.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Henry, Wendell A. High Performance Hardware and Software for Pattern Recognition and Image Processing. Fort Belvoir, VA: Defense Technical Information Center, September 1996. http://dx.doi.org/10.21236/ada315017.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Henry, Wendell A. High Performance Hardware and Software for Pattern Recognition and Image Processing. Fort Belvoir, VA: Defense Technical Information Center, June 1996. http://dx.doi.org/10.21236/ada310034.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Henry, Wendell A. High Performance Hardware and Software for Pattern Recognition and Image Processing. Fort Belvoir, VA: Defense Technical Information Center, March 1996. http://dx.doi.org/10.21236/ada305420.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Henry, Wendell A. High Performance Hardware and Software for Pattern Recognition and Image Processing. Fort Belvoir, VA: Defense Technical Information Center, February 1994. http://dx.doi.org/10.21236/ada276405.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Feller, D. F. The MSRC Ab Initio Methods Benchmark Suite: A measurement of hardware and software performance in the area of electronic structure methods. Office of Scientific and Technical Information (OSTI), July 1993. http://dx.doi.org/10.2172/10121145.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Allen, Luke, Joon Lim, Robert Haehnel, and Ian Detwiller. Rotor blade design framework for airfoil shape optimization with performance considerations. Engineer Research and Development Center (U.S.), June 2021. http://dx.doi.org/10.21079/11681/41037.

Full text

Abstract:

A framework for optimizing rotor blade airfoil shape is presented. The framework uses two digital workflows created within the Galaxy Simulation Builder (GSB) software package. The first is a workflow enabling the automated creation of a surrogate model for predicting airfoil performance coefficients. An accurate surrogate model for the rapid generation of airfoil coefficient tables has been developed using linear interpolation techniques that is based on C81Gen and ARC2D CFD codes. The second workflow defines the rotor blade optimization problem using GSB and the Dakota numerical optimization library. The presented example uses a quasi-Newton optimization algorithm to optimize the tip region of the UH-60A main rotor blade with respect to vehicle performance. This is accomplished by morphing the blade tip airfoil shape for optimum power, subject to a constraint on the maximum pitch link load.

APA, Harvard, Vancouver, ISO, and other styles

We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

Contents

Academic literature on the topic 'Performance Optimization in Software and Hardware'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Journal articles on the topic "Performance Optimization in Software and Hardware"

Dissertations / Theses on the topic "Performance Optimization in Software and Hardware"

Books on the topic "Performance Optimization in Software and Hardware"

Book chapters on the topic "Performance Optimization in Software and Hardware"

Conference papers on the topic "Performance Optimization in Software and Hardware"

Reports on the topic "Performance Optimization in Software and Hardware"