Dissertations / Theses on the topic 'Parallel and dynamic reconfigurable computing'
Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles
Consult the top 50 dissertations / theses for your research on the topic 'Parallel and dynamic reconfigurable computing.'
Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.
You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.
Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.
Viswanathan, Venkatasubramanian. "Une architecture évolutive flexible et reconfigurable dynamiquement pour les systèmes embarqués haute performance." Thesis, Valenciennes, 2015. http://www.theses.fr/2015VALE0029.
Full textIn this thesis, we propose a scalable and customizable reconfigurable computing platform, with a parallel full-duplex switched communication network, and a software execution model to redefine the computation, communication and reconfiguration paradigms in High Performance Embedded Systems. High Performance Embedded Computing (HPEC) applications are becoming highly sophisticated and resource consuming for three reasons. First, they should capture and process real-time data from several I/O sources in parallel. Second, they should adapt their functionalities according to the application or environment variations within given Size Weight and Power (SWaP) constraints. Third, since they process several parallel I/O sources, applications are often distributed on multiple computing nodes making them highly parallel. Due to the hardware parallelism and I/O bandwidth offered by Field Programmable Gate Arrays (FPGAs), application can be duplicated several times to process parallel I/Os, making Single Program Multiple Data (SPMD) the favorite execution model for designers implementing parallel architectures on FPGAs. Furthermore Dynamic Partial Reconfiguration (DPR) feature allows efficient reuse of limited hardware resources, making FPGA a highly attractive solution for such applications. The problem with current HPEC systems is that, they are usually built to meet the needs of a specific application, i.e., lacks flexibility to upgrade the system or reuse existing hardware resources. On the other hand, applications that run on such hardware architectures are constantly being upgraded. Thus there is a real need for flexible and scalable hardware architectures and parallel execution models in order to easily upgrade the system and reuse hardware resources within acceptable time bounds. Thus these applications face challenges such as obsolescence, hardware redesign cost, sequential and slow reconfiguration, and wastage of computing power.Addressing the challenges described above, we propose an architecture that allows the customization of computing nodes (FPGAs), broadcast of data (I/O, bitstreams) and reconfiguration several or a subset of computing nodes in parallel. The software environment leverages the potential of the hardware switch, to provide support for the SPMD execution model. Finally, in order to demonstrate the benefits of our architecture, we have implemented a scalable distributed secure H.264 encoding application along with several avionic communication protocols for data and control transfers between the nodes. We have used a FMC based high-speed serial Front Panel Data Port (sFPDP) data acquisition protocol to capture, encode and encrypt RAW video streams. The system has been implemented on 3 different FPGAs, respecting the SPMD execution model. In addition, we have also implemented modular I/Os by swapping I/O protocols dynamically when required by the system. We have thus demonstrated a scalable and flexible architecture and a parallel runtime reconfiguration model in order to manage several parallel input video sources. These results represent a conceptual proof of a massively parallel dynamically reconfigurable next generation embedded computers
SURENDIRANATH, SUDHA. "ACCELERATING DNA SEQUENTIAL ANALYSIS EXPLOITING PARALLEL HARDWARE AND RECONFIGURABLE COMPUTING." University of Cincinnati / OhioLINK, 2005. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1131856327.
Full textJacob, Aju. "Distributed configuration management for reconfigurable cluster computing." [Gainesville, Fla.] : University of Florida, 2004. http://purl.fcla.edu/fcla/etd/UFE0007181.
Full textHuang, Jian. "RECONFIGURABLE COMPUTING FOR VIDEO CODING." Doctoral diss., University of Central Florida, 2010. http://digital.library.ucf.edu/cdm/ref/collection/ETD/id/4301.
Full textPh.D.
School of Electrical Engineering and Computer Science
Engineering and Computer Science
Electrical Engineering PhD
Varvarigos, Emmanouel A. "Static and dynamic communication in parallel computing." Thesis, Massachusetts Institute of Technology, 1992. http://hdl.handle.net/1721.1/12868.
Full textIncludes bibliographical references (p. 186-191).
by Emmanouel A. Varvarigos.
Ph.D.
Phan, Cong-Vinh. "Formal aspects of dynamic reconfigurability in reconfigurable computing systems." Thesis, London South Bank University, 2006. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.435200.
Full textPANDEY, ANKUR. "A MULTITHREADED RUNTIME SUPPORT ENVIRONMENT FOR DYNAMIC RECONFIGURABLE COMPUTING." University of Cincinnati / OhioLINK, 2002. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1026133065.
Full textSurendiranath, Sudha. "Accelerating DNA sequential analysis through exploiting parallel hardware and reconfigurable computing." Cincinnati, Ohio : University of Cincinnati, 2005. http://www.ohiolink.edu/etd/view.cgi?acc%5Fnum=ucin1131856327.
Full textThorndike, David Andrew. "A Multicore Computing Platform for Benchmarking Dynamic Partial Reconfiguration Based Designs." Case Western Reserve University School of Graduate Studies / OhioLINK, 2012. http://rave.ohiolink.edu/etdc/view?acc_num=case1338933284.
Full textCraven, Stephen Douglas. "Structured Approach to Dynamic Computing Application Development." Diss., Virginia Tech, 2008. http://hdl.handle.net/10919/27730.
Full textPh. D.
LEE, TAI-CHUN. "AN EVENT-BASED APPROACH TO DEMAND-DRIVEN DYNAMIC RECONFIGURABLE COMPUTING." University of Cincinnati / OhioLINK, 2001. http://rave.ohiolink.edu/etdc/view?acc_num=ucin990821256.
Full textMurphy, Ciaron William. "Run time reconfigurable DSP parallel processing system using dynamic FPGAs." Thesis, Liverpool John Moores University, 2002. http://researchonline.ljmu.ac.uk/4924/.
Full textLanore, Vincent. "On Scalable Reconfigurable Component Models for High-Performance Computing." Thesis, Lyon, École normale supérieure, 2015. http://www.theses.fr/2015ENSL1051/document.
Full textComponent-based programming is a programming paradigm which eases code reuse and separation of concerns. Some component models, which are said to be "reconfigurable", allow the modification at runtime of an application's structure. However, these models are not suited to High-Performance Computing (HPC) as they rely on non-scalable mechanisms.The goal of this thesis is to provide models, algorithms and tools to ease the development of component-based reconfigurable HPC applications.The main contribution of the thesis is the DirectMOD component model which eases development and reuse of distributed transformations. In order to improve on this core model in other directions, we have also proposed:• the SpecMOD formal component model which allows automatic specialization of hierarchical component assemblies and provides high-level software engineering features;• mechanisms for efficient fine-grain reconfiguration for AMR applications, an important application class in HPC.An implementation of DirectMOD, called DirectL2C, as been developed so as to implement a series of benchmarks to evaluate our approach. Experiments on HPC architectures show our approach scales. Moreover, a quantitative analysis of the benchmark's codes show that our approach is compact and eases reuse
Miller, Simon. "Parallel computing and the molecular dynamic simulation of ionic materials." Thesis, Keele University, 1994. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.260050.
Full textKerr, Andrew. "A model of dynamic compilation for heterogeneous compute platforms." Diss., Georgia Institute of Technology, 2012. http://hdl.handle.net/1853/47719.
Full textLloyd, G. Scott. "Accelerated Large-Scale Multiple Sequence Alignment with Reconfigurable Computing." BYU ScholarsArchive, 2011. https://scholarsarchive.byu.edu/etd/2729.
Full textKrishnan, Manoj Kumar. "ProLAS a novel dynamic load balancing library for advanced scientific computing /." Master's thesis, Mississippi State : Mississippi State University, 2003. http://library.msstate.edu/etd/show.asp?etd=etd-11102003-184622.
Full textBhardwaj, Prabhaav. "Framework for Hardware Agility on FPGAs." Thesis, Virginia Tech, 2010. http://hdl.handle.net/10919/36347.
Full textMaster of Science
Hernandez, Jesus Israel. "Reactive scheduling of DAG applications on heterogeneous and dynamic distributed computing systems." Thesis, University of Edinburgh, 2008. http://hdl.handle.net/1842/2336.
Full textLi, Shen Carmen C. Duren Russell Walker. "Evaluating Impulse C and multiple parallelism partitions for a low-cost reconfigurable computing system." Waco, Tex. : Baylor University, 2008. http://hdl.handle.net/2104/5280.
Full textZhang, Fanjiong. "Design and Verification of SOPC FDP2009 and Research of Reconfigurable Applications." Thesis, KTH, Skolan för informations- och kommunikationsteknik (ICT), 2011. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-66724.
Full textBanihashemi, Seyed Parsa. "Parallel explicit FEM algorithms using GPU's." Diss., Georgia Institute of Technology, 2015. http://hdl.handle.net/1853/54391.
Full textGil, Otero Rafael. "Characterisation of a reconfigurable free space optical interconnect system for parallel computing applications and experimental validation using rapid prototyping technology." Thesis, Heriot-Watt University, 2008. http://hdl.handle.net/10399/2141.
Full textIturbe, Xabier. "Design and implementation of a reliable reconfigurable real-time operating system (R3TOS)." Thesis, University of Edinburgh, 2013. http://hdl.handle.net/1842/9413.
Full textLima, João Vicente Ferreira. "Controle de granularidade com threads em programas MPI dinâmicos." reponame:Biblioteca Digital de Teses e Dissertações da UFRGS, 2009. http://hdl.handle.net/10183/16132.
Full textIn the last years, the demand for high performance enables the emergence of more efficient computing platforms and algorithms. The increase of distributed computing platforms rises new challenges for parallel algorithm development like communication, heterogeneity, and resource management. These factors can result in applications whose work load is unknown until runtime. An irregular behavior from algorithm or data can also affect the work load. A parallel application can solve these questions through a programming technique which predicts the work load of a task and offers resource on demand. The granularity, which is the ratio of computation to communication, considers more practical issues, and is an important factor in performance of dynamic algorithms. However, this control is difficult to be designed and the support of a programming tool is needed. Yet, the programming tools have extensive and complicated interfaces which difficult your usage in HPC. This work implements a library (libSpawn) which adds a granularity control on MPI dynamic programs. The library controls the granularity by mapping tasks between processes or threads with three parameters: cores of architecture, load and resources of the operating system. The results obtained between processes and libSpawn show significant gains on synthetic benchmarks from other programming tools.
Chinnusamy, Malarvizhi. "Data and Processor Mapping Strategies for Dynamically Resizable Parallel Applications." Thesis, Virginia Tech, 2004. http://hdl.handle.net/10919/33868.
Full textDue to the unpredictability in job arrival times in clusters and widely varying resource requirements, dynamic scheduling of parallel computing resources is necessary to increase system throughput. Dynamically resizable applications provide the flexibility needed for dynamic scheduling. These applications can expand to take advantage of additional free processors, or to meet a Quality of Service (QoS) deadline, or can shrink to accommodate a high priority application, without getting suspended.
This thesis is part of a larger effort to define a framework for dynamically resizable parallel applications. This framework includes a scheduler that supports resizing applications, an API to enable applications to interact with the scheduler, and libraries that make resizing viable. This thesis focuses on libraries for efficient resizing of parallel applications â efficient in terms of minimizing the cost of data redistribution, choosing and allocating the right set of additional processors, and focusing on the performance of the application after resizing. We explore the tradeoffs between these goals on both homogeneous and heterogeneous clusters. We focus on structured applications that have 2D data arrays distributed across a 2D processor grid.
Our library includes algorithms for processor selection and processor mapping. For homogeneous clusters, processor selection involves selecting the number of processors that needs to be added and processor mapping decides the placement of the new processors in the context of the given topology such that it minimizes the amount of data that is to be redistributed. For heterogeneous clusters, since the processing powers of the processors vary, there is also an additional problem of choosing the right set of processors that needs to be added. We also present results that demonstrate the effectiveness of our approach.
Master of Science
Siverskog, Jacob. "Evaluation of partial reconfiguration for FPGA debugging." Thesis, Linköping University, Computer Engineering, 2010. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-57714.
Full textReconfigurable computing is an old concept that during the past couple of decades has become increasingly popular. The concept combines the flexibility of software with the performance of hardware. One important contributing factor to the uprising in popularity is the presence of FPGAs (field-programmable gate arrays), which realize the concept by allowing the hardware to be reconfigured dynamically. The current state of reconfigurable computing is discussed further in the thesis.
Debugging is a vital part in the development of a hardware design. It can be done in several ways depending on the situation. The most common way is to perform simulations but in some cases the fault-finding has to be done when the design is implemented in hardware.
In this thesis a framework concept is designed that utilizes and evaluates some of the reconfigurable computing ideas. The framework provides debugging possibilities for FPGA designs in a novel way, with a modular system where each module provide means to aid finding a specific fault. The framework is added to an existing design, and offers the user a glimpse into the design behavior and the hardware it runs on.
One of the debug modules will be released separately under a free license. It allows the developer to see the contents of the memories in a design without requiring special debugging equipment.
Helal, Manal Computer Science & Engineering Faculty of Engineering UNSW. "Indexing and partitioning schemes for distributed tensor computing with application to multiple sequence alignment." Awarded by:University of New South Wales. Computer Science & Engineering, 2009. http://handle.unsw.edu.au/1959.4/44781.
Full textTesser, Rafael Keller. "A simulation workflow to evaluate the performance of dynamic load balancing with over decomposition for iterative parallel applications." reponame:Biblioteca Digital de Teses e Dissertações da UFRGS, 2018. http://hdl.handle.net/10183/180129.
Full textIn this thesis we present a novel simulation workflow to evaluate the performance of dynamic load balancing with over-decomposition applied to iterative parallel applications at low-cost. Its goals are to perform such evaluation with minimal application modification and at a low cost in terms of time and of resource requirements. Many parallel applications suffer from dynamic (temporal) load imbalance that can not be treated at the application level. It may be caused by intrinsic characteristics of the application or by external software and hardware factors. As demonstrated in this thesis, such dynamic imbalance can be found even in applications whose codes do not hint at any dynamism. Therefore, we need to rely on runtime dynamic load balancing mechanisms, such as dynamic load balancing based on over-decomposition. The problem is that evaluating and tuning the performance of such technique can be costly. This usually entails modifications to the application and a large number of executions to get statistically sound performance measurements with different load balancing parameter combinations. Moreover, useful and accurate measurements often require big resource allocations on a production cluster. Our simulation workflow, dubbed Simulated Adaptive MPI (SAMPI), employs a combined sequential emulation and trace-replay simulation approach to reduce the cost of such an evaluation Both sequential emulation and trace-replay require a single computer node. Additionally, the trace-replay simulation lasts a small fraction of the real-life parallel execution time of the application. Besides the basic SAMPI simulation, we developed spatial aggregation and applicationlevel rescaling techniques to speed-up the emulation process. To demonstrate the real-life performance benefits of dynamic load balance with over-decomposition, we evaluated the performance gains obtained by employing this technique on a iterative parallel geophysics application, called Ondes3D. Dynamic load balancing support was provided by Adaptive MPI (AMPI). This resulted in up to 36.58% performance improvement, on 288 cores of a cluster. This real-life evaluation also illustrates the difficulties found in this process, thus justifying the use of simulation. To implement the SAMPI workflow, we relied on SimGrid’s Simulated MPI (SMPI) interface in both emulation and trace-replay modes.To validate our simulator, we compared simulated (SAMPI) and real-life (AMPI) executions of Ondes3D. The simulations presented a load balance evolution very similar to real-life and were also successful in choosing the best load balancing heuristic for each scenario. Besides the validation, we demonstrate the use of SAMPI for load balancing parameter exploration and for computational capacity planning. As for the performance of the simulation itself, we roughly estimate that our full workflow can simulate the execution of Ondes3D with 24 different load balancing parameter combinations in 5 hours for our heavier earthquake scenario and in 3 hours for the lighter one.
Balasubramaniam, Mahadevan. "Performance analysis and evaluation of dynamic loop scheduling techniques in a competitive runtime environment for distributed memory architectures." Master's thesis, Mississippi State : Mississippi State University, 2003. http://library.msstate.edu/etd/show.asp?etd=etd-04022003-154254.
Full textTemplin, Joshua R. "Design of an Adaptable Run-Time Reconfigurable Software-Defined Radio Processing Architecture." DigitalCommons@USU, 2010. https://digitalcommons.usu.edu/etd/810.
Full textAfonso, Fernando Abrahão. "MPI2.NET : criação dinâmica de tarefas com orientação a objetos." reponame:Biblioteca Digital de Teses e Dissertações da UFRGS, 2010. http://hdl.handle.net/10183/26952.
Full textMessage Passing Interface (MPI) is the de facto standard for the development of high performance applications executing on clusters. The standard defines APIs for the programming languages Fortran C and C++. On the other hand, object oriented programming has become the dominant programming paradigm, where programming languages as Java and C# are becoming very popular. This can be justified by the abstractions contained in these programming languages, allowing a more efficient programming/maintenance cycle. Because of this, several MPI libraries emerged for these programming languages. Among them, we can highlight the MPI.NET library for the C# programming language, which has the best relation between abstraction and performance. In parallel computing, the model used for the development of applications is very important, and the Divide and Conquer model is efficiently scalable, applicable to several problems and allows efficient execution of applications whose workload is unknown or irregular. To program using this model, the execution environment must provide dynamism, which is not provided by the MPI.NET library. From this scenario emerges the main goal of this work, which is to explore dynamic tasks creation on the MPI.NET library. In the end we where able to obtain a library with competitive performance against MPI C++ libraries.
De, Grande Robson E. "Dynamic Load Balancing Schemes for Large-scale HLA-based Simulations." Thèse, Université d'Ottawa / University of Ottawa, 2012. http://hdl.handle.net/10393/23110.
Full textEdirisinghe, Pathirannehelage Neranjan S. "Charge Transfer in Deoxyribonucleic Acid (DNA): Static Disorder, Dynamic Fluctuations and Complex Kinetic." Digital Archive @ GSU, 2011. http://digitalarchive.gsu.edu/phy_astr_diss/45.
Full textBahcecioglu, Tunc. "Parallel Solution Of Soil-structure Interaction Problems On Pc Clusters." Master's thesis, METU, 2011. http://etd.lib.metu.edu.tr/upload/12612954/index.pdf.
Full textSubbiah, Arun. "Design and evaluation of a distributed diagnosis algorithm for arbitrary network topologies in dynamic fault environments." Thesis, Georgia Institute of Technology, 2001. http://hdl.handle.net/1853/13273.
Full textBokhari, Saniyah S. "Parallel Solution of the Subset-sum Problem: An Empirical Study." The Ohio State University, 2011. http://rave.ohiolink.edu/etdc/view?acc_num=osu1305898281.
Full textGreen, Oded. "High performance computing for irregular algorithms and applications with an emphasis on big data analytics." Diss., Georgia Institute of Technology, 2014. http://hdl.handle.net/1853/51860.
Full textBauer, Heiner. "Dynamic instruction set extension of microprocessors with embedded FPGAs." Master's thesis, Saechsische Landesbibliothek- Staats- und Universitaetsbibliothek Dresden, 2017. http://nbn-resolving.de/urn:nbn:de:bsz:14-qucosa-222858.
Full textZunehmend komplexere Anwendungen und Besonderheiten moderner Halbleitertechnologien haben zu einer großen Nachfrage an leistungsfähigen und gleichzeitig sehr energieeffizienten Mikroprozessoren geführt. Konventionelle Architekturen versuchen den Befehlsdurchsatz durch Parallelisierung zu steigern und stellen anwendungsspezifische Befehlssätze oder Hardwarebeschleuniger zur Steigerung der Energieeffizienz bereit. Rekonfigurierbare Prozessoren ermöglichen ähnliche Performancesteigerungen und besitzen gleichzeitig den enormen Vorteil, dass die Spezialisierung auf eine bestimmte Anwendung nach der Herstellung erfolgen kann. In dieser Diplomarbeit wurde ein rekonfigurierbarer Mikroprozessor mit einem eng gekoppelten FPGA untersucht. Im Gegensatz zu früheren Forschungsansätzen wurde eine umfangreiche Entwurfsraumexploration der FPGA-Architektur im Zusammenhang mit einem kommerziellen 22nm Herstellungsprozess durchgeführt. Bisher verwendeten die meisten Forschungsprojekte entweder kommerzielle Architekturen, die nicht unbedingt auf diesen Anwendungsfall zugeschnitten sind, oder die vorgeschlagenen FGPA-Komponenten wurden nur unzureichend untersucht und charakterisiert. Jedoch ist gerade dieser Baustein ausschlaggebend für die Leistungsfähigkeit des gesamten Systems. Deshalb wurden im Rahmen dieser Arbeit über 200 verschiedene logische FPGA-Architekturen untersucht. Zur Modellierung wurden konkrete Schaltungstopologien und ein auf den Herstellungsprozess zugeschnittenes Modell zur Abschätzung der Layoutfläche verwendet. Generell wurden die gleichen Trends wie bei vorhergehenden und ähnlich umfangreichen Untersuchungen beobachtet. Auch hier wurden die Ergebnisse maßgeblich von der Größe der LUTs (engl. "Lookup Tables") und der Struktur des Routingnetzwerks bestimmt. Gleichzeitig wurde ein viel breiterer Bereich von Architekturen mit nahezu gleicher Effizienz identifiziert. Zur weiteren Evaluation wurde eine FPGA-Architektur mit 5-LUTs und 8 Logikelementen ausgewählt. Die Performance des ausgewählten Mikroprozessors, der auf einer erprobten Befehlssatzarchitektur aufbaut, wurde mit Ergebnissen eines 28nm Testchips abgeschätzt. Eine modifizierte Sammlung von akademischen Softwarewerkzeugen wurde verwendet, um Spezialbefehle auf die modellierte FPGA-Architektur abzubilden und eine Netzliste für die anschließende Simulation und Verifikation zu erzeugen. Für eine Reihe unterschiedlicher Anwendungs-Benchmarks wurde eine relative Leistungssteigerung zwischen 3 und 15 gegenüber dem ursprünglichen Prozessor ermittelt. Obwohl die vorgeschlagene FPGA-Architektur vergleichsweise primitiv ist und keinerlei arithmetische Erweiterungen besitzt, musste dabei, bis auf eine Ausnahme, kein überproportionaler Anstieg der Chipfläche in Kauf genommen werden. Die gewonnen Erkenntnisse zu den Abhängigkeiten zwischen den Architekturparametern, der entwickelte Ablauf für die Exploration und das konkrete Kostenmodell sind essenziell für weitere Verbesserungen der FPGA-Architektur. Die vorliegende Arbeit hat somit erfolgreich den Vorteil der untersuchten Systemarchitektur gezeigt und den Weg für mögliche Erweiterungen und Hardwareimplementierungen geebnet. Zusätzlich wurden eine Reihe von Optimierungen der Architektur und weitere potenziellen Forschungsansätzen aufgezeigt
Hardyniec, Andrew B. "An Investigation of the Behavior of Structural Systems with Modeling Uncertainties." Diss., Virginia Tech, 2014. http://hdl.handle.net/10919/56635.
Full textPh. D.
Sreeram, Rohan. "Improved Framework for Fast and Efficient Memory-based Frame Data Reconfiguration for Multi-row Spanning Designs on Field Programmable Gate Arrays." DigitalCommons@USU, 2010. https://digitalcommons.usu.edu/etd/682.
Full textQuadri, Imran Rafiq. "MARTE based model driven design methodology for targeting dynamically reconfigurable FPGA based SoCs." Phd thesis, Université des Sciences et Technologie de Lille - Lille I, 2010. http://tel.archives-ouvertes.fr/tel-00486483.
Full textSinha, Udayan Prabir. "Memory Management Error Detection in Parallel Software using a Simulated Hardware Platform." Thesis, KTH, Skolan för informations- och kommunikationsteknik (ICT), 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-219606.
Full textMinneshanteringsfel i parallell mjukvara som exekverar på flerkärniga arkitekturer kan vara svåra att detektera, samt kostsamma att åtgärda. Exempel på fel kan vara användning av ej initialiserat minne, minnesläckage, samt att data blir överskrivna av en process som inte är ägare till de data som skrivs över. Om minneshanteringsfel kan detekteras i ett tidigt skede, t ex genom att använda en simulator, som körs innan mjukvaran har levererats och integrerats i en produkt, skulle man kunna erhålla signifikanta kostnadsbesparingar. Detta examensarbete undersöker och utvecklar metoder för detektion av ej initialiserat minne i mjukvara som körs på en virtuell plattform. Den virtuella plattformen innehåller modeller av delar av den digitala hårdvara, för basband och radio, som finns i en Ericsson radiobasstation. Modellerna är bit-exakta representationer av motsvarande hårdvarublock, och innefattar processorer och periferienheter. Den virtuella plattformen används av Ericsson för utveckling och integration av mjukvara. Det finns verktyg, exempelvis Memcheck (Valgrind), samt MemorySanitizer och AddressSanitizer (Clang), som kan användas för att detektera minneshanteringsfel. Egenskaper hos sådana verktyg har undersökts, och algoritmer för detektion av minneshanteringsfel har utvecklats, för en specifik processor och dess instruktioner. Algoritmerna har implementerats i en virtuell plattform, och kravställningar och design-överväganden som speglar den tillämpnings-specifika instruktionsrepertoaren för den valda processorn, har behandlats. En prototyp-implementation av presentation av minneshanteringsfel, där källkodsraderna samt anropsstacken för de platser där fel har hittats pekas ut, har utvecklats, med användning av en debugger. Ett experiment, som använder sig av ett för ändamålet utvecklat program, har använts för att utvärdera feldetektions-förmågan för de algoritmer som implementerats i den virtuella plattformen, samt för att jämföra med feldetektions-förmågan hos Memcheck. De algoritmer som implementerats i den virtuella plattformen kan, för det program som används, detektera alla kända fel, förutom ett. Algoritmerna rapporterar också falska felindikeringar. Dessa rapporter är huvudsakligen ett resultat av att den aktuella implementationen har begränsad kunskap om det operativsystem som används på den simulerade processorn.
Sanguinet, William Charles. "Various extensions in the theory of dynamic materials with a specific focus on the checkerboard geometry." Digital WPI, 2017. https://digitalcommons.wpi.edu/etd-dissertations/243.
Full textSilva, Hamilton Soares da. "Estudo para otimização do algoritmo Non-local means visando aplicações em tempo real." Universidade Federal da Paraíba, 2014. http://tede.biblioteca.ufpb.br:8080/handle/tede/5383.
Full textCoordenação de Aperfeiçoamento de Pessoal de Nível Superior
The aim of this work is to study the non-local means algorithm and propose techniques to optimize and implement this algorithm for its application in real-time. Two alternatives are suggested for implementation. The first deals with the development of an accelerator card for computers, which has a PCI bus containing specialized hardware that implements the NLM filter. The second implementation uses densely GPU multiprocessor environment, which exists in the parent video. Both proposals significantly accelerates the NLM algorithm, while maintains the same visual quality of traditional software implementations, enabling real-time use. Image denoising is an important area for digital image processing. Recently, its use is becoming more popular due to improvements of of the new acquisition equipments and, thus, the increase of image resolution that favors the occurrence of such perturbations. It is widely studied in the fields of image processing, computer vision and predictive maintenance of electrical substations, motors, tires, building facilities, pipes and fittings, focusing on reducing the noise without removing details of the original image. Several approaches have been proposed for filtering noise. One of such approaches is the non-local method called Non-Local Means (NLM), which uses the entire image rather than local information and stands out as the state of the art. However, a problem in this method is its high computational complexity, which turns its application almost impossible in real time applications, even for small images
O propósito deste trabalho é estudar o algoritmo non-local means(NLM) e propor técnicas para otimizar e implementar o referido algoritmo visando sua aplicação em tempo real. Ao todo são sugeridas duas alternativas de implementação. A primeira trata do desenvolvimento de uma placa aceleradora para computadores que possuam Barramento PCI, contendo um hardware especializado que implementa o Filtro NLM. A segunda implementação utiliza o ambiente densamente multiprocessado GPU, existente nas controladoras de vídeo. As duas propostas aceleraram significativamente o algoritmo NLM, mantendo a mesma qualidade visual das implementações tradicionais em software, tornando possível sua utilização em tempo real. A filtragem de ruídos é uma área importante para o processamento digital de imagens, sendo cada vez mais utilizada devido as melhorias dos novos equipamentos de captação, e o consequente aumento da resolução da imagem, que favorece o aparecimento dessas perturbações. Ela é amplamente estudada nos campos de tratamento de imagens, visão computacional e manutenção preditiva de subestações elétricas, motores, pneus, instalações prediais, tubos e conexões, focando em reduzir os ruídos sem que se remova os detalhes da imagem original. Várias abordagens foram propostas para filtragem de ruídos, uma delas é o método não-local, chamado de Non-Local Means (NLM), que não só utiliza as informações locais, mas a imagem inteira, destaca-se como o estado da arte, porém, há um problema neste método, que é a sua alta complexidade computacional, que o torna praticamente inviável de ser utilizado em aplicações em tempo real, até mesmo para imagens pequenas
Fanfakh, Ahmed Badri Muslim. "Energy consumption optimization of parallel applications with Iterations using CPU frequency scaling." Thesis, Besançon, 2016. http://www.theses.fr/2016BESA2021/document.
Full textIn recent years, green computing has become an important topic in the supercomputing research domain. However, the computing platforms are still consuming more and more energy due to the increase in the number of nodes composing them. To minimize the operating costs of these platforms many techniques have been used. Dynamic voltage and frequency scaling (DVFS) is one of them. It can be used to reduce the power consumption of the CPU while computing, by lowering its frequency. However, lowering the frequency of a CPU may increase the execution time of the application running on that processor. Therefore, the frequency that gives the best trade-off between the energy consumption and the performance of an application must be selected.This thesis, presents the algorithms developed to optimize the energy consumption and theperformance of synchronous and asynchronous message passing applications with iterations runningover clusters or grids. The energy consumption and performance models for each type of parallelapplication predicts its execution time and energy consumption for any selected frequency accordingto the characteristics of both the application and the architecture executing this application.The contribution of this thesis can be divided into three parts: Firstly, optimizing the trade-offbetween the energy consumption and the performance of the message passing applications withsynchronous iterations running over homogeneous clusters. Secondly, adapting the energy andperformance models to heterogeneous platforms where each node can have different specificationssuch as computing power, energy consumption, available frequency gears or network’s latency andbandwidth. The frequency scaling algorithm was also modified to suit the heterogeneity of theplatform. Thirdly, the models and the frequency scaling algorithm were completely rethought to takeinto considerations the asynchronism in the communication and computation. All these models andalgorithms were applied to message passing applications with iterations and evaluated over eitherSimGrid simulator or Grid’5000 platform. The experiments showed that the proposed algorithms areefficient and outperform existing methods such as the energy and delay product. They also introducea small runtime overhead and work online without any training or profiling
Astolfi, Vitor Fiorotto. "ChipCflow - em hardware dinamicamente reconfigurável." Universidade de São Paulo, 2009. http://www.teses.usp.br/teses/disponiveis/55/55134/tde-05032010-203142/.
Full textIn recent years, reconfigurable computing has become increasingly more advanced, especially in hardware that uses Field-Programmable Gate Arrays. However, the increase of performance in FPGAs accumulated the gap between design capacity and technology for the development of the design. Imperative high-level programming languages such as C are more appropriate for the development of complex algorithms than hardware description languages (HDL). For this reason, many ANSI C-like programming tools for the development of hardware came to existence. The ChipCflow project, of which this project is part, is one of these tools. The execution of algorithms through this tool will be completely directed by data flow, according to the dynamic model found on Dataflow Architectures, taking advantage of its natural high levels of parallelism and the characteristics of the partially reconfigurable hardware. In this project, the objective is a proof of concept for the creation of instances, in the form of operators, of a ChipCflow algorithm on a partially reconfigurable hardware, taking as reference the Xilinx Virtex boards
Sun, Yi. "High Performance Simulation of DEVS Based Large Scale Cellular Space Models." Digital Archive @ GSU, 2009. http://digitalarchive.gsu.edu/cs_diss/40.
Full textFERNANDEZ, BARRERO DIEGO. "Dynamic Soil-Structure Interactionof Soil-Steel Composite Bridges : A Frequency Domain Approach Using PML Elements and Model Updating." Thesis, KTH, Bro- och stålbyggnad, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-256033.
Full textRehme, Koy D. "An Internal Representation for Adaptive Online Parallelization." Diss., CLICK HERE for online access, 2009. http://contentdm.lib.byu.edu/ETD/image/etd2939.pdf.
Full text