Dissertations / Theses: 'High performace Computation'

1

Reis, Ruy Freitas. "Simulações numéricas 3D em ambiente paralelo de hipertermia com nanopartículas magnéticas." Universidade Federal de Juiz de Fora (UFJF), 2014. https://repositorio.ufjf.br/jspui/handle/ufjf/3499.

Full text

Abstract:

Submitted by Renata Lopes (renatasil82@gmail.com) on 2017-02-24T15:43:42Z No. of bitstreams: 1 ruyfreitasreis.pdf: 10496081 bytes, checksum: 05695a7e896bd684b83ab5850df95449 (MD5)
Approved for entry into archive by Adriana Oliveira (adriana.oliveira@ufjf.edu.br) on 2017-03-06T19:28:45Z (GMT) No. of bitstreams: 1 ruyfreitasreis.pdf: 10496081 bytes, checksum: 05695a7e896bd684b83ab5850df95449 (MD5)
Made available in DSpace on 2017-03-06T19:28:45Z (GMT). No. of bitstreams: 1 ruyfreitasreis.pdf: 10496081 bytes, checksum: 05695a7e896bd684b83ab5850df95449 (MD5) Previous issue date: 2014-11-05
CAPES - Coordenação de Aperfeiçoamento de Pessoal de Nível Superior
Este estudo tem como objetivo a modelagem numérica do tratamento de tumores sólidos com hipertermia utilizando nanopartículas magnéticas, considerando o modelo tridimensional de biotransferência de calor proposto por Pennes (1948). Foram comparadas duas diferentes possibilidades de perfusão sanguínea, a primeira constante e, a segunda, dependente da temperatura. O tecido é modelado com as camadas de pele, gordura e músculo, além do tumor. Para encontrar a solução aproximada do modelo foi aplicado o método das diferenças finitas (MDF) em um meio heterogêneo. Devido aos diferentes parâmetros de perfusão, foram obtidos sistemas de equações lineares (perfusão constante) e não lineares (perfusão dependente da temperatura). No domínio do tempo foram utilizados dois esquemas numéricos explícitos, o primeiro utilizando o método clássico de Euler e o segundo um algoritmo do tipo preditor-corretor adaptado dos métodos de integração generalizada da família-alpha trapezoidal. Uma vez que a execução de um modelo tridimensional demanda um alto custo computacional, foram empregados dois esquemas de paralelização do método numérico, o primeiro baseado na API de programação paralela OpenMP e o segundo com a plataforma CUDA. Os resultados experimentais mostraram que a paralelização em OpenMP obteve aceleração de até 39 vezes comparada com a versão serial, e, além disto, a versão em CUDA também foi eficiente, obtendo um ganho de 242 vezes, também comparando-se com o tempo de execução sequencial. Assim, o resultado da execução é obtido cerca de duas vezes mais rápido do que o fenômeno biológico.
This work deals with the numerical modeling of solid tumor treatments with hyperthermia using magnetic nanoparticles considering a 3D bioheat transfer model proposed by Pennes(1948). Two different possibilities of blood perfusion were compared, the first assumes a constant value, and the second one a temperature-dependent function. The living tissue was modeled with skin, fat and muscle layers, in addition to the tumor. The model solution was approximated with the finite difference method (FDM) in an heterogeneous medium. Due to different blood perfusion parameters, a system of linear equations (constant perfusion), and a system of nonlinear equations (temperaturedependent perfusion) were obtained. To discretize the time domain, two explicit numerical strategies were used, the first one was using the classical Euler method, and the second one a predictor-corrector algorithm originated from the generalized trapezoidal alpha-family of time integration methods. Since the computational time required to solve a threedimensional model is large, two different parallel strategies were applied to the numerical method. The first one uses the OpenMP parallel programming API, and the second one the CUDA platform. The experimental results showed that the parallelization using OpenMP improves the performance up to 39 times faster than the sequential execution time, and the CUDA version was also efficient, yielding gains up to 242 times faster than the sequential execution time. Thus, this result ensures an execution time twice faster than the biological phenomenon.

APA, Harvard, Vancouver, ISO, and other styles

2

Campos, Joventino de Oliveira. "Método de lattice Boltzmann para simulação da eletrofisiologia cardíaca em paralelo usando GPU." Universidade Federal de Juiz de Fora (UFJF), 2015. https://repositorio.ufjf.br/jspui/handle/ufjf/3555.

Full text

Abstract:

Submitted by Renata Lopes (renatasil82@gmail.com) on 2017-03-06T20:24:42Z No. of bitstreams: 1 joventinodeoliveiracampos.pdf: 3604904 bytes, checksum: aca8053f097ddcb9d96ba51186838610 (MD5)
Approved for entry into archive by Adriana Oliveira (adriana.oliveira@ufjf.edu.br) on 2017-03-06T20:27:39Z (GMT) No. of bitstreams: 1 joventinodeoliveiracampos.pdf: 3604904 bytes, checksum: aca8053f097ddcb9d96ba51186838610 (MD5)
Made available in DSpace on 2017-03-06T20:27:39Z (GMT). No. of bitstreams: 1 joventinodeoliveiracampos.pdf: 3604904 bytes, checksum: aca8053f097ddcb9d96ba51186838610 (MD5) Previous issue date: 2015-06-26
CAPES - Coordenação de Aperfeiçoamento de Pessoal de Nível Superior
Este trabalho apresenta o método de lattice Boltzmann (MLB) para simulações computacionais da atividade elétrica cardíaca usando o modelo monodomínio. Uma implementação otimizada do método de lattice Boltzmann é apresentada, a qual usa um modelo de colisão com múltiplos parâmetros de relaxação conhecido como multiple relaxation time (MRT), para considerar a anisotropia do tecido cardíaco. Com foco em simulações rápidas da dinâmica cardíaca, devido ao alto grau de paralelismo presente no MLB, uma implementação que executa em uma unidade de processamento gráfico (GPU) foi realizada e seu desempenho foi estudado através de domínios tridimensionais regulares e irregulares. Os resultados da implementação para simulações cardíacas mostraram fatores de aceleração tão altos quanto 500x para a simulação global e para o MLB um desempenho de 419 mega lattice update per second (MLUPS) foi alcançado. Com tempos de execução próximos ao tempo real em um único computador equipado com uma GPU moderna, estes resultados mostram que este trabalho é uma proposta promissora para aplicação em ambiente clínico.
This work presents the lattice Boltzmann method (LBM) for computational simulations of the cardiac electrical activity using monodomain model. An optimized implementation of the lattice Boltzmann method is presented which uses a collision model with multiple relaxation parameters known as multiple relaxation time (MRT) in order to consider the anisotropy of the cardiac tissue. With focus on fast simulations of cardiac dynamics, due to the high level of parallelism present in the LBM, a GPU parallelization was performed and its performance was studied under regular and irregular three-dimensional domains. The results of our optimized LBM GPU implementation for cardiac simulations shown acceleration factors as high as 500x for the overall simulation and for the LBM a performance of 419 mega lattice updates per second (MLUPS) was achieved. With near real time simulations in a single computer equipped with a modern GPU these results show that the proposed framework is a promising approach for application in a clinical workflow.

APA, Harvard, Vancouver, ISO, and other styles

3

Isa, Mohammad Nazrin. "High performance reconfigurable architectures for biological sequence alignment." Thesis, University of Edinburgh, 2013. http://hdl.handle.net/1842/7721.

Full text

Abstract:

Bioinformatics and computational biology (BCB) is a rapidly developing multidisciplinary field which encompasses a wide range of domains, including genomic sequence alignments. It is a fundamental tool in molecular biology in searching for homology between sequences. Sequence alignments are currently gaining close attention due to their great impact on the quality aspects of life such as facilitating early disease diagnosis, identifying the characteristics of a newly discovered sequence, and drug engineering. With the vast growth of genomic data, searching for a sequence homology over huge databases (often measured in gigabytes) is unable to produce results within a realistic time, hence the need for acceleration. Since the exponential increase of biological databases as a result of the human genome project (HGP), supercomputers and other parallel architectures such as the special purpose Very Large Scale Integration (VLSI) chip, Graphic Processing Unit (GPUs) and Field Programmable Gate Arrays (FPGAs) have become popular acceleration platforms. Nevertheless, there are always trade-off between area, speed, power, cost, development time and reusability when selecting an acceleration platform. FPGAs generally offer more flexibility, higher performance and lower overheads. However, they suffer from a relatively low level programming model as compared with off-the-shelf microprocessors such as standard microprocessors and GPUs. Due to the aforementioned limitations, the need has arisen for optimized FPGA core implementations which are crucial for this technology to become viable in high performance computing (HPC). This research proposes the use of state-of-the-art reprogrammable system-on-chip technology on FPGAs to accelerate three widely-used sequence alignment algorithms; the Smith-Waterman with affine gap penalty algorithm, the profile hidden Markov model (HMM) algorithm and the Basic Local Alignment Search Tool (BLAST) algorithm. The three novel aspects of this research are firstly that the algorithms are designed and implemented in hardware, with each core achieving the highest performance compared to the state-of-the-art. Secondly, an efficient scheduling strategy based on the double buffering technique is adopted into the hardware architectures. Here, when the alignment matrix computation task is overlapped with the PE configuration in a folded systolic array, the overall throughput of the core is significantly increased. This is due to the bound PE configuration time and the parallel PE configuration approach irrespective of the number of PEs in a systolic array. In addition, the use of only two configuration elements in the PE optimizes hardware resources and enables the scalability of PE systolic arrays without relying on restricted onboard memory resources. Finally, a new performance metric is devised, which facilitates the effective comparison of design performance between different FPGA devices and families. The normalized performance indicator (speed-up per area per process technology) takes out advantages of the area and lithography technology of any FPGA resulting in fairer comparisons. The cores have been designed using Verilog HDL and prototyped on the Alpha Data ADM-XRC-5LX card with the Virtex-5 XC5VLX110-3FF1153 FPGA. The implementation results show that the proposed architectures achieved giga cell updates per second (GCUPS) performances of 26.8, 29.5 and 24.2 respectively for the acceleration of the Smith-Waterman with affine gap penalty algorithm, the profile HMM algorithm and the BLAST algorithm. In terms of speed-up improvements, comparisons were made on performance of the designed cores against their corresponding software and the reported FPGA implementations. In the case of comparison with equivalent software execution, acceleration of the optimal alignment algorithm in hardware yielded an average speed-up of 269x as compared to the SSEARCH 35 software. For the profile HMM-based sequence alignment, the designed core achieved speed-up of 103x and 8.3x against the HMMER 2.0 and the latest version of HMMER (version 3.0) respectively. On the other hand, the implementation of the gapped BLAST with the two-hit method in hardware achieved a greater than tenfold speed-up compared to the latest NCBI BLAST software. In terms of comparison against other reported FPGA implementations, the proposed normalized performance indicator was used to evaluate the designed architectures fairly. The results showed that the first architecture achieved more than 50 percent improvement, while acceleration of the profile HMM sequence alignment in hardware gained a normalized speed-up of 1.34. In the case of the gapped BLAST with the two-hit method, the designed core achieved 11x speed-up after taking out advantages of the Virtex-5 FPGA. In addition, further analysis was conducted in terms of cost and power performances; it was noted that, the core achieved 0.46 MCUPS per dollar spent and 958.1 MCUPS per watt. This shows that FPGAs can be an attractive platform for high performance computation with advantages of smaller area footprint as well as represent economic ‘green’ solution compared to the other acceleration platforms. Higher throughput can be achieved by redeploying the cores on newer, bigger and faster FPGAs with minimal design effort.

APA, Harvard, Vancouver, ISO, and other styles

4

Nasar-Ullah, Q. A. "High performance parallel financial derivatives computation." Thesis, University College London (University of London), 2014. http://discovery.ucl.ac.uk/1431080/.

Full text

Abstract:

Computing the price and risk of financial derivatives is a necessary activity for many financial market participants and is often undertaken by large and costly computing farms. This thesis seeks to explore the use of parallel computing, with particular focus on graphics processing units (GPUs), to improve the speed per cost ratio of such computation. This thesis addresses three distinct layers of high performance parallel financial derivatives computation: the first layer is related to the formulation of parallel algorithms that are generally used in the context of derivatives. The second layer is related to the optimum computation of pricing models, which consist of a series of computational steps or algorithms, where such pricing models are used to calculate the price and risk of individual derivatives. The third and final layer is related to deploying several pricing models within large scale infrastructures with particular focus on optimal scheduling approaches. Several contributions are made within this thesis: (i) with regard to the formulation of parallel algorithms, we introduce novel approaches for evaluating the normal cumulative distribution function (CDF), calculating option implied volatility, calibrating SABR (stochastic-αβρ) volatility models and generating CDF lookup tables. (ii) With regard to pricing models, we explore the computation of two dominant fixed income pricing models, namely non-callable bullet options and callable bond options. (iii) With regard to the computation of many such pricing models within large scale infrastructures, we devise and verify novel scheduling approaches that are able to optimally allocate tasks between a heterogeneous mix of CPU and GPU processors.

APA, Harvard, Vancouver, ISO, and other styles

5

Ahrens, James P. "Scientific experiment management with high-performance distributed computation /." Thesis, Connect to this title online; UW restricted, 1996. http://hdl.handle.net/1773/6974.

Full text

APA, Harvard, Vancouver, ISO, and other styles

6

Pandya, Ajay Kirit. "Performance of multithreaded computations on high-speed networks." Thesis, National Library of Canada = Bibliothèque nationale du Canada, 1998. http://www.collectionscanada.ca/obj/s4/f2/dsk2/ftp01/MQ32212.pdf.

Full text

APA, Harvard, Vancouver, ISO, and other styles

7

Pilkey, Deborah F. "Computation of a Damping Matrix for Finite Element Model Updating." Diss., Virginia Tech, 1998. http://hdl.handle.net/10919/30453.

Full text

Abstract:

The characterization of damping is important in making accurate predictions of both the true response and the frequency response of any device or structure dominated by energy dissipation. The process of modeling damping matrices and experimental verification of those is challenging because damping can not be determined via static tests as can mass and stiffness. Furthermore, damping is more difficult to determine from dynamic measurements than natural frequency. However, damping is extremely important in formulating predictive models of structures. In addition, damping matrix identification may be useful in diagnostics or health monitoring of structures. The objective of this work is to find a robust, practical procedure to identify damping matrices. All aspects of the damping identification procedure are investigated. The procedures for damping identification presented herein are based on prior knowledge of the finite element or analytical mass matrices and measured eigendata. Alternately, a procedure is based on knowledge of the mass and stiffness matrices and the eigendata. With this in mind, an exploration into model reduction and updating is needed to make the problem more complete for practical applications. Additionally, high performance computing is used as a tool to deal with large problems. High Performance Fortran is exploited for this purpose. Finally, several examples, including one experimental example are used to illustrate the use of these new damping matrix identification algorithms and to explore their robustness.
Ph. D.

APA, Harvard, Vancouver, ISO, and other styles

8

Steen, Adrianus Jan van der. "Benchmarking of high performance computers for scientific and technical computation." [S.l.] : Utrecht : [s.n.] ; Universiteitsbibliotheek Utrecht [Host], 1997. http://www.ubu.ruu.nl/cgi-bin/grsn2url?01761909.

Full text

APA, Harvard, Vancouver, ISO, and other styles

9

Zhao, Yu. "High performance Monte Carlo computation for finance risk data analysis." Thesis, Brunel University, 2013. http://bura.brunel.ac.uk/handle/2438/8206.

Full text

Abstract:

Finance risk management has been playing an increasingly important role in the finance sector, to analyse finance data and to prevent any potential crisis. It has been widely recognised that Value at Risk (VaR) is an effective method for finance risk management and evaluation. This thesis conducts a comprehensive review on a number of VaR methods and discusses in depth their strengths and limitations. Among these VaR methods, Monte Carlo simulation and analysis has proven to be the most accurate VaR method in finance risk evaluation due to its strong modelling capabilities. However, one major challenge in Monte Carlo analysis is its high computing complexity of O(n²). To speed up the computation in Monte Carlo analysis, this thesis parallelises Monte Carlo using the MapReduce model, which has become a major software programming model in support of data intensive applications. MapReduce consists of two functions - Map and Reduce. The Map function segments a large data set into small data chunks and distribute these data chunks among a number of computers for processing in parallel with a Mapper processing a data chunk on a computing node. The Reduce function collects the results generated by these Map nodes (Mappers) and generates an output. The parallel Monte Carlo is evaluated initially in a small scale MapReduce experimental environment, and subsequently evaluated in a large scale simulation environment. Both experimental and simulation results show that the MapReduce based parallel Monte Carlo is greatly faster than the sequential Monte Carlo in computation, and the accuracy level is maintained as well. In data intensive applications, moving huge volumes of data among the computing nodes could incur high overhead in communication. To address this issue, this thesis further considers data locality in the MapReduce based parallel Monte Carlo, and evaluates the impacts of data locality on the performance in computation.

APA, Harvard, Vancouver, ISO, and other styles

10

Vetter, Jeffrey Scott. "Techniques and optimizations for high performance computational steering." Diss., Georgia Institute of Technology, 1998. http://hdl.handle.net/1853/9242.

Full text

APA, Harvard, Vancouver, ISO, and other styles

11

Chow, Yi-Mei Maria 1974. "Computational fluid dynamics for high performance structural facilities." Thesis, Massachusetts Institute of Technology, 1998. http://hdl.handle.net/1721.1/50366.

Full text

Abstract:

Thesis (M. Eng.)--Massachusetts Institute of Technology, Dept. of Civil and Environmental Engineering, 1998.
Includes bibliographical references (leaves 104-106).
by Yi-Mei Maria Chow.
M.Eng.

APA, Harvard, Vancouver, ISO, and other styles

12

Skjerven, Brian M. "A parallel implementation of an agent-based brain tumor model." Link to electronic thesis, 2007. http://www.wpi.edu/Pubs/ETD/Available/etd-060507-172337/.

Full text

Abstract:

Thesis (M.S.) -- Worcester Polytechnic Institute.
Keywords: Visualization; Numerical analysis; Computational biology; Scientific computation; High-performance computing. Includes bibliographical references (p.19).

APA, Harvard, Vancouver, ISO, and other styles

13

Henning, Peter Allen. "Computational Parameter Selection and Simulation of Complex Sphingolipid Pathway Metabolism." Thesis, Georgia Institute of Technology, 2006. http://hdl.handle.net/1853/16202.

Full text

Abstract:

Systems biology is an emerging field of study that seeks to provide systems-level understanding of biological systems through the integration of high-throughput biological data into predictive computational models. The integrative nature of this field is in sharp contrast as compared to the Reductionist methods that have been employed since the advent of molecular biology. Systems biology investigates not only the individual components of the biological system, such as metabolic pathways, organelles, and signaling cascades, but also considers the relationships and interactions between the components in the hope that an understandable model of the entire system can eventually be developed. This field of study is being hailed by experts as a potential vital technology in revolutionizing the pharmaceutical development process in the post-genomic era. This work not only provides a systems biology investigation into principles governing de novo sphingolipid metabolism but also the various computational obstacles that are present in converting high-throughput data into an insightful model.

APA, Harvard, Vancouver, ISO, and other styles

14

Arora, Nitin. "High performance algorithms to improve the runtime computation of spacecraft trajectories." Diss., Georgia Institute of Technology, 2013. http://hdl.handle.net/1853/49076.

Full text

Abstract:

Challenging science requirements and complex space missions are driving the need for fast and robust space trajectory design and simulation tools. The main aim of this thesis is to develop new and improved high performance algorithms and solution techniques for commonly encountered problems in astrodynamics. Five major problems are considered and their state-of-the art algorithms are systematically improved. Theoretical and methodological improvements are combined with modern computational techniques, resulting in increased algorithm robustness and faster runtime performance. The five selected problems are 1) Multiple revolution Lambert problem, 2) High-fidelity geopotential (gravity field) computation, 3) Ephemeris computation, 4) Fast and accurate sensitivity computation, and 5) High-fidelity multiple spacecraft simulation. The work being presented enjoys applications in a variety of fields like preliminary mission design, high-fidelity trajectory simulation, orbit estimation and numerical optimization. Other fields like space and environmental science to chemical and electrical engineering also stand to benefit.

APA, Harvard, Vancouver, ISO, and other styles

15

Cyrus, Sam. "Fast Computation on Processing Data Warehousing Queries on GPU Devices." Scholar Commons, 2016. http://scholarcommons.usf.edu/etd/6214.

Full text

Abstract:

Current database management systems use Graphic Processing Units (GPUs) as dedicated accelerators to process each individual query, which results in underutilization of GPU. When a single query data warehousing workload was run on an open source GPU query engine, the utilization of main GPU resources was found to be less than 25%. The low utilization then leads to low system throughput. To resolve this problem, this paper suggests a way to transfer all of the desired data into the global memory of GPU and keep it until all queries are executed as one batch. The PCIe transfer time from CPU to GPU is minimized, which results in better performance in less time of overall query processing. The execution time was improved by up to 40% when running multiple queries, compared to dedicated processing.

APA, Harvard, Vancouver, ISO, and other styles

16

Axner, Lilit. "High performance computational hemodynamics with the Lattice Boltzmann method." [S.l. : Amsterdam : s.n.] ; Universiteit van Amsterdam [Host], 2007. http://dare.uva.nl/document/54726.

Full text

APA, Harvard, Vancouver, ISO, and other styles

17

Kulkarni, Amol S. "Application of computational intelligence to high performance electric drives /." Thesis, Connect to this title online; UW restricted, 1999. http://hdl.handle.net/1773/5897.

Full text

APA, Harvard, Vancouver, ISO, and other styles

18

Kuhlman, Christopher J. "High Performance Computational Social Science Modeling of Networked Populations." Diss., Virginia Tech, 2013. http://hdl.handle.net/10919/51175.

Full text

Abstract:

Dynamics of social processes in populations, such as the spread of emotions, influence, opinions, and mass movements (often referred to individually and collectively as contagions), are increasingly studied because of their economic, social, and political impacts. Moreover, multiple contagions may interact and hence studying their simultaneous evolution is important. Within the context of social media, large datasets involving many tens of millions of people are leading to new insights into human behavior, and these datasets continue to grow in size. Through social media, contagions can readily cross national boundaries, as evidenced by the 2011 Arab Spring. These and other observations guide our work. Our goal is to study contagion processes at scale with an approach that permits intricate descriptions of interactions among members of a population. Our contributions are a modeling environment to perform these computations and a set of approaches to predict contagion spread size and to block the spread of contagions. Since we represent populations as networks, we also provide insights into network structure effects, and present and analyze a new model of contagion dynamics that represents a person\'s behavior in repeatedly joining and withdrawing from collective action. We study variants of problems for different classes of social contagions, including those known as simple and complex contagions.
Ph. D.

APA, Harvard, Vancouver, ISO, and other styles

19

Pugaonkar, Aniket Narayan. "A High Performance C++ Generic Benchmark for Computational Epidemiology." Thesis, Virginia Tech, 2015. http://hdl.handle.net/10919/51243.

Full text

Abstract:

An effective tool used by planners and policy makers in public health, such as Center for Disease Control (CDC), to curtail spread of infectious diseases over a given population is contagion diffusion simulations. These simulations model the relevant characteristics of the population (age, gender, income etc.) and the disease (attack rate, etc.) and compute the spread under various configuration and plausible intervention strategies (such as vaccinations, school closure, etc.). Hence, the model and the computation form a complex agent based system and are highly compute and resource intensive. In this work, we design a benchmark consisting of several kernels which capture the essential compute, communication, and data access patterns for such applications. For each kernel, the benchmark provides different evaluation strategies. The goal is to (a) derive alternative implementations for computing the contagion by combining different implementation of the kernels, and (b) evaluate which combination of implementation, runtime, and hardware is most effective in running large scale contagion diffusion simulations. Our proposed benchmark is designed using C++ generic programming primitives and lifting sequential strategies for parallel computations. Together, these lead to a succinct description of the benchmark and significant code reuse when deriving strategies for new hardware. For the benchmark to be effective, this aspect is crucial, because the potential combination of hardware and runtime are growing rapidly thereby making infeasible to write optimized strategy for the complete contagion diffusion from ground up for each compute system.
Master of Science

APA, Harvard, Vancouver, ISO, and other styles

20

McFarlane, Ross. "High-performance computing for computational biology of the heart." Thesis, University of Liverpool, 2010. http://livrepository.liverpool.ac.uk/3173/.

Full text

Abstract:

This thesis describes the development of Beatbox — a simulation environment for computational biology of the heart. Beatbox aims to provide an adaptable, approachable simulation tool and an extensible framework with which High Performance Computing may be harnessed by researchers. Beatbox is built upon the QUI software package, which is studied in Chapter 2. The chapter discusses QUI’s functionality and common patterns of use, and describes its underlying software architecture, in particular its extensibility through the addition of new software modules called ‘devices’. The chapter summarises good practice for device developers in the Laws of Devices. Chapter 3 discusses the parallel architecture of Beatbox and its implementation for distributed memory clusters. The chapter discusses strategies for domain decomposition, halo swapping and introduces an efficient method for exchange of data with diagonal neighbours called Magic Corners. The development of Beatbox’s parallel Input/Output facilities is detailed, and its impact on scaling performance discussed. The chapter discusses the way in which parallelism can be hidden from the user, even while permitting the runtime execution user-defined functions. The chapter goes on to show how QUI’s extensibility can be continued in a parallel environment by providing implicit parallelism for devices and defining Laws of Parallel Devices to guide third-party developers. Beatbox’s parallel performance is evaluated and discussed. Chapter 4 describes the extension of Beatbox to simulate anatomically realistic tissue geometry. Representation of irregular geometries is described, along with associated user controls. A technique to compute no-flux boundary conditions on irregular boundaries is introduced. The Laws of Devices are further developed to include irregular geometries. Finally, parallel performance of anatomically realistic meshes is evaluated.

APA, Harvard, Vancouver, ISO, and other styles

21

Ragan-Kelley, Jonathan Millard. "Decoupling algorithms from the organization of computation for high performance image processing." Thesis, Massachusetts Institute of Technology, 2014. http://hdl.handle.net/1721.1/89996.

Full text

Abstract:

Thesis: Ph. D., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2014.
Cataloged from PDF version of thesis. "June 2014."
Includes bibliographical references (pages 127-133).
Future graphics and imaging applications-from self-driving cards, to 4D light field cameras, to pervasive sensing-demand orders of magnitude more computation than we currently have. This thesis argues that the efficiency and performance of an application are determined not only by the algorithm and the hardware architecture on which it runs, but critically also by the organization of computations and data on that architecture. Real graphics and imaging applications appear embarrassingly parallel, but have complex dependencies, and are limited by locality (the distance over which data has to move, e.g., from nearby caches or far away main memory) and synchronization. Increasingly, the cost of communication-both within a chip and over a network-dominates computation and power consumption, and limits the gains realized from shrinking transistors. Driven by these trends, writing high-performance processing code is challenging because it requires global reorganization of computations and data, not simply local optimization of an inner loop. Existing programming languages make it difficult for clear and composable code to express optimized organizations because they conflate the intrinsic algorithms being defined with their organization. To address the challenge of productively building efficient, high-performance programs, this thesis presents the Halide language and compiler for image processing. Halide explicitly separates what computations define an algorithm from the choices of execution structure which determine parallelism, locality, memory footprint, and synchronization. For image processing algorithms with the same complexity-even the exact same set of arithmetic operations and data-executing on the same hardware, the order and granularity of execution and placement of data can easily change performance by an order of magnitude because of locality and parallelism. I will show that, for data-parallel pipelines common in graphics, imaging, and other data-intensive applications, the organization of computations and data for a given algorithm is constrained by a fundamental tension between parallelism, locality, and redundant computation of shared values. I will present a systematic model of "schedules" which explicitly trade off these pressures by globally reorganizing the computations and data for an entire pipeline, and an optimizing compiler that synthesizes high performance implementations from a Halide algorithm and a schedule. The end result is much simpler programs, delivering performance often many times faster than the best prior hand-tuned C, assembly, and CUDA implementations, while scaling across radically different architectures, from ARM mobile processors to massively parallel GPUs.
by Jonathan Ragan-Kelley.
Ph. D.

APA, Harvard, Vancouver, ISO, and other styles

22

Livesey, Daria. "High performance computations with Hecke algebras : bilinear forms and Jantzen filtrations." Thesis, University of Aberdeen, 2014. http://digitool.abdn.ac.uk:80/webclient/DeliveryManager?pid=214835.

Full text

APA, Harvard, Vancouver, ISO, and other styles

23

Aldred, Peter L. "Diffraction studies and computational modelling of high-performance aromatic polymers." Thesis, University of Reading, 2003. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.413675.

Full text

APA, Harvard, Vancouver, ISO, and other styles

24

Kasap, Server. "High performance reconfigurable architectures for bioinformatics and computational biology applications." Thesis, University of Edinburgh, 2010. http://hdl.handle.net/1842/24757.

Full text

Abstract:

The field of Bioinformatics and Computational Biology (BCB), a relatively new discipline which spans the boundaries of Biology, Computer Science and Engineering, aims to develop systems that help organise, store, retrieve and analyse genomic and other biological information in a convenient and speedy way. This new discipline emerged mainly as a result of the Human Genome project which succeeded in transcribing the complete DNA sequence of the human genome, hence making it possible to address many problems which were impossible to even contemplate before, with a plethora of applications including disease diagnosis, drug engineering, bio-material engineering and genetic engineering of plants and animals; all with a real impact on the quality of the life of ordinary individuals. Due to the sheer immensity of the data sets involved in BCB algorithms (often measured in tens/hundreds of Gigabytes) as well as their computation demands (often measured in Tera-Ops), high performance supercomputers and computer clusters have been used as implementation platforms for high performance BCB computing. However, the high cost as well as the lack of suitable programming interfaces for these platforms still impedes a wider undertaking of this technology in the BCB community. Moreover, with increased heat dissipation, supercomputers are now often augmented with special-purpose hardware (or ASICs) in order to speed up their operations while reducing their power dissipation. However, since ASICs are fully customised to implement particular tasks/algorithms, they suffer from increased development times, higher Non-Recurring-Engineering (NRE) costs, and inflexibility as they cannot be reused to implement tasks/algorithms other than those they have been designed to perform. On the other hand, Field Programmable Gate Arrays (FPGAs) have recently been proposed as a viable alternative implementation platform for BCB applications due to their flexible computing and memory architecture which gives them ASIC-like performance with the added programmability feature. In order to counter the aforementioned limitations of both supercomputers and ASICs, this research proposes the use of state-of-the-art reprogrammable system-on-chip technology, in the form of platform FPGAs, as a relatively low cost, high performance and reprogrammable implementation platform for BCB applications. This research project aims to develop a sophisticated library of FPGA architectures for bio-sequence analysis, phylogenetic analysis, and molecular dynamics simulation.

APA, Harvard, Vancouver, ISO, and other styles

25

Chugunov, Svyatoslav. "High-Performance Simulations for Atmospheric Pressure Plasma Reactor." Diss., North Dakota State University, 2012. https://hdl.handle.net/10365/26626.

Full text

Abstract:

Plasma-assisted processing and deposition of materials is an important component of modern industrial applications, with plasma reactors sharing 30% to 40% of manufacturing steps in microelectronics production [1]. Development of new flexible electronics increases demands for efficient high-throughput deposition methods and roll-to-roll processing of materials. The current work represents an attempt of practical design and numerical modeling of a plasma enhanced chemical vapor deposition system. The system utilizes plasma at standard pressure and temperature to activate a chemical precursor for protective coatings. A specially designed linear plasma head, that consists of two parallel plates with electrodes placed in the parallel arrangement, is used to resolve clogging issues of currently available commercial plasma heads, as well as to increase the flow-rate of the processed chemicals and to enhance the uniformity of the deposition. A test system is build and discussed in this work. In order to improve operating conditions of the setup and quality of the deposited material, we perform numerical modeling of the plasma system. The theoretical and numerical models presented in this work comprehensively describe plasma generation, recombination, and advection in a channel of arbitrary geometry. Number density of plasma species, their energy content, electric field, and rate parameters are accurately calculated and analyzed in this work. Some interesting engineering outcomes are discussed with a connection to the proposed setup. The numerical model is implemented with the help of high-performance parallel technique and evaluated at a cluster for parallel calculations. A typical performance increase, calculation speed-up, parallel fraction of the code and overall efficiency of the parallel implementation are discussed in details.

APA, Harvard, Vancouver, ISO, and other styles

26

Palm, Johan. "High Performance FPGA-Based Computation and Simulation for MIMO Measurement and Control Systems." Thesis, Mälardalen University, School of Innovation, Design and Engineering, 2009. http://urn.kb.se/resolve?urn=urn:nbn:se:mdh:diva-7477.

Full text

Abstract:

The Stressometer system is a measurement and control system used in cold rolling to improve the flatness of a metal strip. In order to achieve this goal the system employs a multiple input multiple output (MIMO) control system that has a considerable number of sensors and actuators. As a consequence the computational load on the Stressometer control system becomes very high if too advance functions are used. Simultaneously advances in rolling mill mechanical design makes it necessary to implement more complex functions in order for the Stressometer system to stay competitive. Most industrial players in this market considers improved computational power, for measurement, control and modeling applications, to be a key competitive factor. Accordingly there is a need to improve the computational power of the Stressometer system. Several different approaches towards this objective have been identified, e.g. exploiting hardware parallelism in modern general purpose and graphics processors.

Another approach is to implement different applications in FPGA-based hardware, either tailored to a specific problem or as a part of hardware/software co-design. Through the use of a hardware/software co-design approach the efficiency of the Stressometer system can be increased, lowering overall demand for processing power since the available resources can be exploited more fully. Hardware accelerated platforms can be used to increase the computational power of the Stressometer control system without the need for major changes in the existing hardware. Thus hardware upgrades can be as simple as connecting a cable to an accelerator platform while hardware/software co-design is used to find a suitable hardware/software partition, moving applications between software and hardware.

In order to determine whether this hardware/software co-design approach is realistic or not, the feasibility of implementing simulator, computational and control applications in FPGAbased hardware needs to be determined. This is accomplished by selecting two specific applications for a closer study, determining the feasibility of implementing a Stressometer measuring roll simulator and a parallel Cholesky algorithm in FPGA-based hardware.

Based on these studies this work has determined that the FPGA device technology is perfectly suitable for implementing both simulator and computational applications. The Stressometer measuring roll simulator was able to approximate the force and pulse signals of the Stressometer measuring roll at a relative modest resource consumption, only consuming 1747 slices and eight DSP slices. This while the parallel FPGA-based Cholesky component is able to provide performance in the range of GFLOP/s, exceeding the performance of the personal computer used for comparison in several simulations, although at a very high resource consumption. The result of this thesis, based on the two feasibility studies, indicates that it is possible to increase the processing power of the Stressometer control system using the FPGA device technology.

APA, Harvard, Vancouver, ISO, and other styles

27

Lee, Hua, Stephanie Lockwood, James Tandon, and Andrew Brown. "BACKWARD PROPAGATION BASED ALGORITHMS FOR HIGH-PERFORMANCE IMAGE FORMATION." International Foundation for Telemetering, 2000. http://hdl.handle.net/10150/608300.

Full text

Abstract:

International Telemetering Conference Proceedings / October 23-26, 2000 / Town & Country Hotel and Conference Center, San Diego, California
In this paper, we present the recent results of theoretical development and software implementation of a complete collection of high-performance image reconstruction algorithms designed for high-resolution imaging for various data acquisition configurations.

APA, Harvard, Vancouver, ISO, and other styles

28

Sanghvi, Niraj D. "Parallel Computation of the Meddis MATLAB Auditory Periphery Model." The Ohio State University, 2012. http://rave.ohiolink.edu/etdc/view?acc_num=osu1339092782.

Full text

APA, Harvard, Vancouver, ISO, and other styles

29

Salavert, Torres José. "Inexact Mapping of Short Biological Sequences in High Performance Computational Environments." Doctoral thesis, Universitat Politècnica de València, 2014. http://hdl.handle.net/10251/43721.

Full text

Abstract:

La bioinformática es la aplicación de las ciencias computacionales a la gestión y análisis de datos biológicos. A partir de 2005, con la aparición de los secuenciadores de ADN de nueva generación surge lo que se conoce como Next Generation Sequencing o NGS. Un único experimento biológico puesto en marcha en una máquina de secuenciación NGS puede producir fácilmente cientos de gigabytes o incluso terabytes de datos. Dependiendo de la técnica elegida este proceso puede realizarse en unas pocas horas o días. La disponibilidad de recursos locales asequibles, tales como los procesadores multinúcleo o las nuevas tarjetas gráfi cas preparadas para el cálculo de propósito general GPGPU (General Purpose Graphic Processing Unit ), constituye una gran oportunidad para hacer frente a estos problemas. En la actualidad, un tema abordado con frecuencia es el alineamiento de secuencias de ADN. En bioinformática, el alineamiento permite comparar dos o más secuencias de ADN, ARN, o estructuras primarias proteicas, resaltando sus zonas de similitud. Dichas similitudes podrían indicar relaciones funcionales o evolutivas entre los genes o proteínas consultados. Además, la existencia de similitudes entre las secuencias de un individuo paciente y de otro individuo con una enfermedad genética detectada podría utilizarse de manera efectiva en el campo de la medicina diagnóstica. El problema en torno al que gira el desarrollo de la tesis doctoral consiste en la localización de fragmentos de secuencia cortos dentro del ADN. Esto se conoce bajo el sobrenombre de mapeo de secuencia o sequence mapping. Dicho mapeo debe permitir errores, pudiendo mapear secuencias incluso existiendo variabilidad genética o errores de lectura en el mapeo. Existen diversas técnicas para abordar el mapeo, pero desde la aparición de la NGS destaca la búsqueda por pre jos indexados y agrupados mediante la transformada de Burrows-Wheeler [28] (o BWT en lo sucesivo). Dicha transformada se empleó originalmente en técnicas de compresión de datos, como es el caso del algoritmo bzip2. Su utilización como herramienta para la indización y búsqueda posterior de información es más reciente [22]. La ventaja es que su complejidad computacional depende únicamente de la longitud de la secuencia a mapear. Por otra parte, una gran cantidad de técnicas de alineamiento se basan en algoritmos de programación dinámica, ya sea Smith-Watterman o modelos ocultos de Markov. Estos proporcionan mayor sensibilidad, permitiendo mayor cantidad de errores, pero su coste computacional es mayor y depende del tamaño de la secuencia multiplicado por el de la cadena de referencia. Muchas herramientas combinan una primera fase de búsqueda con la BWT de regiones candidatas al alineamiento y una segunda fase de alineamiento local en la que se mapean cadenas con Smith-Watterman o HMM. Cuando estamos mapeando permitiendo pocos errores, una segunda fase con un algoritmo de programación dinámica resulta demasiado costosa, por lo que una búsqueda inexacta basada en BWT puede resultar más e ficiente. La principal motivación de la tesis doctoral es la implementación de un algoritmo de búsqueda inexacta basado únicamente en la BWT, adaptándolo a las arquitecturas paralelas modernas, tanto en CPU como en GPGPU. El algoritmo constituirá un método nuevo de rami cación y poda adaptado a la información genómica. Durante el periodo de estancia se estudiarán los Modelos ocultos de Markov y se realizará una implementación sobre modelos de computación funcional GTA (Aggregate o Test o Generate), así como la paralelización en memoria compartida y distribuida de dicha plataforma de programación funcional.
Salavert Torres, J. (2014). Inexact Mapping of Short Biological Sequences in High Performance Computational Environments [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/43721
TESIS

APA, Harvard, Vancouver, ISO, and other styles

30

Schneck, Phyllis Adele. "Dynamic management of computation and communication resources to enable secure high-performances applications." Diss., Georgia Institute of Technology, 1999. http://hdl.handle.net/1853/8264.

Full text

APA, Harvard, Vancouver, ISO, and other styles

31

Bas, Erdeniz Ozgun. "Load-Balancing Spatially Located Computations using Rectangular Partitions." The Ohio State University, 2011. http://rave.ohiolink.edu/etdc/view?acc_num=osu1306909831.

Full text

APA, Harvard, Vancouver, ISO, and other styles

32

Middleton, Anthony M. "High-Performance Knowledge-Based Entity Extraction." NSUWorks, 2009. http://nsuworks.nova.edu/gscis_etd/246.

Full text

Abstract:

Human language records most of the information and knowledge produced by organizations and individuals. The machine-based process of analyzing information in natural language form is called natural language processing (NLP). Information extraction (IE) is the process of analyzing machine-readable text and identifying and collecting information about specified types of entities, events, and relationships. Named entity extraction is an area of IE concerned specifically with recognizing and classifying proper names for persons, organizations, and locations from natural language. Extant approaches to the design and implementation named entity extraction systems include: (a) knowledge-engineering approaches which utilize domain experts to hand-craft NLP rules to recognize and classify named entities; (b) supervised machine-learning approaches in which a previously tagged corpus of named entities is used to train algorithms which incorporate statistical and probabilistic methods for NLP; or (c) hybrid approaches which incorporate aspects of both methods described in (a) and (b). Performance for IE systems is evaluated using the metrics of precision and recall which measure the accuracy and completeness of the IE task. Previous research has shown that utilizing a large knowledge base of known entities has the potential to improve overall entity extraction precision and recall performance. Although existing methods typically incorporate dictionary-based features, these dictionaries have been limited in size and scope. The problem addressed by this research was the design, implementation, and evaluation of a new high-performance knowledge-based hybrid processing approach and associated algorithms for named entity extraction, combining rule-based natural language parsing and memory-based machine learning classification facilitated by an extensive knowledge base of existing named entities. The hybrid approach implemented by this research resulted in improved precision and recall performance approaching human-level capability compared to existing methods measured using a standard test corpus. The system design incorporated a parallel processing system architecture with capabilities for managing a large knowledge base and providing high throughput potential for processing large collections of natural language text documents.

APA, Harvard, Vancouver, ISO, and other styles

33

Godwin, Jeswin Samuel. "High-Performancs Sparse Matrix-Vector Multiplication on GPUS for Structured Grid Computations." The Ohio State University, 2013. http://rave.ohiolink.edu/etdc/view?acc_num=osu1357280824.

Full text

APA, Harvard, Vancouver, ISO, and other styles

34

Ruddy, John. "Computational Acceleration for Next Generation Chemical Standoff Sensors Using FPGAs." Master's thesis, Temple University Libraries, 2012. http://cdm16002.contentdm.oclc.org/cdm/ref/collection/p245801coll10/id/175971.

Full text

Abstract:

Electrical Engineering
M.S.E.E.
This research provides the real-time computational resource for three dimensional tomographic chemical threat mapping using mobile hyperspectral sensors from sparse input data. The crucial calculation limiting real-time execution of the algorithm is the determination of the projection matrix using the algebraic reconstruction technique (ART). The computation utilizes the inherent parallel nature of ART with an implementation of the algorithm on a field programmable gate array. The MATLAB Fixed-Point Toolbox is used to determine the optimal fixed-point data types in the conversion from the original floating-point algorithm. The computation is then implemented using the Xilinx System Generator, which generates a hardware description language representation from a block diagram design.
Temple University--Theses

APA, Harvard, Vancouver, ISO, and other styles

35

Herrero, Zaragoza Jose Ramón. "A framework for efficient execution of matrix computations." Doctoral thesis, Universitat Politècnica de Catalunya, 2006. http://hdl.handle.net/10803/5991.

Full text

Abstract:

Matrix computations lie at the heart of most scientific computational tasks. The solution of linear systems of equations is a very frequent operation in many fields in science, engineering, surveying, physics and others. Other matrix operations occur frequently in many other fields such as pattern recognition and classification, or multimedia applications. Therefore, it is important to perform matrix operations efficiently. The work in this thesis focuses on the efficient execution on commodity processors of matrix operations which arise frequently in different fields.

We study some important operations which appear in the solution of real world problems: some sparse and dense linear algebra codes and a classification algorithm. In particular, we focus our attention on the efficient execution of the following operations: sparse Cholesky factorization; dense matrix multiplication; dense Cholesky factorization; and Nearest Neighbor Classification.

A lot of research has been conducted on the efficient parallelization of numerical algorithms. However, the efficiency of a parallel algorithm depends ultimately on the performance obtained from the computations performed on each node. The work presented in this thesis focuses on the sequential execution on a single processor.

There exists a number of data structures for sparse computations which can be used in order to avoid the storage of and computation on zero elements. We work with a hierarchical data structure known as hypermatrix. A matrix is subdivided recursively an arbitrary number of times. Several pointer matrices are used to store the location of
submatrices at each level. The last level consists of data submatrices which are dealt with as dense submatrices. When the block size of this dense submatrices is small, the number of zeros can be greatly reduced. However, the performance obtained from BLAS3 routines drops heavily. Consequently, there is a trade-off in the size of data submatrices used for a sparse Cholesky factorization with the hypermatrix scheme. Our goal is that of reducing the overhead introduced by the unnecessary operation on zeros when a hypermatrix data structure is used to produce a sparse Cholesky factorization. In this work we study several techniques for reducing such overhead in order to obtain high performance.

One of our goals is the creation of codes which work efficiently on different platforms when operating on dense matrices. To obtain high performance, the resources offered by the CPU must be properly utilized. At the same time, the memory hierarchy must be exploited to tolerate increasing memory latencies. To achieve the former, we produce inner kernels which use the CPU very efficiently. To achieve the latter, we investigate nonlinear data layouts. Such data formats can contribute to the effective use of the memory system.

The use of highly optimized inner kernels is of paramount importance for obtaining efficient numerical algorithms. Often, such kernels are created by hand. However, we want to create efficient inner kernels for a variety of processors using a general approach and avoiding hand-made codification in assembly language. In this work, we present an alternative way to produce efficient kernels automatically, based on a set of simple codes written in a high level language, which can be parameterized at compilation time. The advantage of our method lies in the ability to generate very efficient inner kernels by means of a good compiler. Working on regular codes for small matrices most of the compilers we used in different platforms were creating very efficient inner kernels for matrix multiplication. Using the resulting kernels we have been able to produce high performance sparse and dense linear algebra codes on a variety of platforms.

In this work we also show that techniques used in linear algebra codes can be useful in other fields. We present the work we have done in the optimization of the Nearest Neighbor classification focusing on the speed of the classification process.

Tuning several codes for different problems and machines can become a heavy and unbearable task. For this reason we have developed an environment for development and automatic benchmarking of codes which is presented in this thesis.

As a practical result of this work, we have been able to create efficient codes for several matrix operations on a variety of platforms. Our codes are highly competitive with other state-of-art codes for some problems.

APA, Harvard, Vancouver, ISO, and other styles

36

Gruener, Charles J. "Design and implementation of a computational cluster for high performance design and modeling of integrated circuits /." Online version of thesis, 2009. http://hdl.handle.net/1850/11204.

Full text

APA, Harvard, Vancouver, ISO, and other styles

37

Jiménez, García Brian. "Development and optimization of high-performance computational tools for protein-protein docking." Doctoral thesis, Universitat de Barcelona, 2016. http://hdl.handle.net/10803/398790.

Full text

Abstract:

Computing has pushed a paradigm shift in many disciplines, including structural biology and chemistry. This change has been mainly driven by the increase in performance of computers, the capacity of dealing with huge amounts of experimental and analysis data and the development of new algorithms. Thanks to these advances, our understanding on the chemistry that supports life has increased and it is even more sophisticated that we had never imagined before. Proteins play a major role in nature and are often described as the factories of the cell as they are involved in virtually all important function in living organisms. Unfortunately, our understanding of the function of many proteins is still very poor due to the actual limitations in experimental techniques which, at the moment, they can not provide crystal structure for many protein complexes. The development of computational tools as protein-protein docking methods could help to fill this gap. In this thesis, we have presented a new protein-protein docking method, LightDock, which supports the use of different custom scoring functions and it includes anisotropic normal analysis to model backbone flexibility upon binding process. Second, several interesting web-based tools for the scientific community have been developed, including a web server for protein-protein docking, a web tool for the characterization of protein-protein interfaces and a web server for including SAXS experimental data for a better prediction of protein complexes. Moreover, the optimizations made in the pyDock protocol and the increase in th performance helped our group to score in the 5th position among more than 60 participants in the past two CAPRI editions. Finally, we have designed and compiled the Protein-Protein (version 5.0) and Protein-RNA (version 1.0) docking benchmarks, which are important resources for the community to test and to develop new methods against a reference set of curated cases.
Gràcies als recents avenços en computació, el nostre coneixement de la química que suporta la vida ha incrementat enormement i ens ha conduït a comprendre que la química de la vida és més sofisticada del que mai haguéssim pensat. Les proteïnes juguen un paper fonamental en aquesta química i són descrites habitualment com a les fàbriques de les cèl·lules. A més a més, les proteïnes estan involucrades en gairebé tots els processos fonamentals en els éssers vius. Malauradament, el nostre coneixement de la funció de moltes proteïnes és encara escaig degut a les limitacions actuals de molts mètodes experimentals, que encara no són capaços de proporcionar-nos estructures de cristall per a molts complexes proteïna-proteïna. El desenvolupament de tècniques i eines informàtiques d’acoblament proteïna-proteïna pot ésser crucial per a ajudar-nos a reduir aquest forat. En aquesta tesis, hem presentat un nou mètode computacional de predicció d’acoblament proteïna-proteïna, LightDock, que és capaç de fer servir diverses funcions energètiques definides per l’usuari i incloure un model de flexibilitat de la cadena principal mitjançant la anàlisis de modes normals. Segon, diverses eines d’interès per a la comunitat científica i basades en tecnologia web han sigut desenvolupades: un servidor web de predicció d’acoblament proteïna-proteïna, una eina online per a caracteritzar les interfícies d’acoblament proteïna-proteïna i una eina web per a incloure dades experimentals de tipus SAXS. A més a més, les optimitzacions fetes al protocol pyDock i la conseqüent millora en rendiment han propiciat que el nostre grup de recerca obtingués la cinquena posició entre més de 60 grups en les dues darreres avaluacions de l’experiment internacional CAPRI. Finalment, hem dissenyat i compilat els banc de proves d’acoblament proteïna-proteïna (versió 5) i proteïna-ARN (versió 1), molt importants per a la comunitat ja que permeten provar i desenvolupar nous mètodes i analitzar-ne el rendiment en aquest marc de referència comú.

APA, Harvard, Vancouver, ISO, and other styles

38

Ling, Cheng. "High performance bioinformatics and computational biology on general-purpose graphics processing units." Thesis, University of Edinburgh, 2012. http://hdl.handle.net/1842/6260.

Full text

Abstract:

Bioinformatics and Computational Biology (BCB) is a relatively new multidisciplinary field which brings together many aspects of the fields of biology, computer science, statistics, and engineering. Bioinformatics extracts useful information from biological data and makes these more intuitive and understandable by applying principles of information sciences, while computational biology harnesses computational approaches and technologies to answer biological questions conveniently. Recent years have seen an explosion of the size of biological data at a rate which outpaces the rate of increases in the computational power of mainstream computer technologies, namely general purpose processors (GPPs). The aim of this thesis is to explore the use of off-the-shelf Graphics Processing Unit (GPU) technology in the high performance and efficient implementation of BCB applications in order to meet the demands of biological data increases at affordable cost. The thesis presents detailed design and implementations of GPU solutions for a number of BCB algorithms in two widely used BCB applications, namely biological sequence alignment and phylogenetic analysis. Biological sequence alignment can be used to determine the potential information about a newly discovered biological sequence from other well-known sequences through similarity comparison. On the other hand, phylogenetic analysis is concerned with the investigation of the evolution and relationships among organisms, and has many uses in the fields of system biology and comparative genomics. In molecular-based phylogenetic analysis, the relationship between species is estimated by inferring the common history of their genes and then phylogenetic trees are constructed to illustrate evolutionary relationships among genes and organisms. However, both biological sequence alignment and phylogenetic analysis are computationally expensive applications as their computing and memory requirements grow polynomially or even worse with the size of sequence databases. The thesis firstly presents a multi-threaded parallel design of the Smith- Waterman (SW) algorithm alongside an implementation on NVIDIA GPUs. A novel technique is put forward to solve the restriction on the length of the query sequence in previous GPU-based implementations of the SW algorithm. Based on this implementation, the difference between two main task parallelization approaches (Inter-task and Intra-task parallelization) is presented. The resulting GPU implementation matches the speed of existing GPU implementations while providing more flexibility, i.e. flexible length of sequences in real world applications. It also outperforms an equivalent GPPbased implementation by 15x-20x. After this, the thesis presents the first reported multi-threaded design and GPU implementation of the Gapped BLAST with Two-Hit method algorithm, which is widely used for aligning biological sequences heuristically. This achieved up to 3x speed-up improvements compared to the most optimised GPP implementations. The thesis then presents a multi-threaded design and GPU implementation of a Neighbor-Joining (NJ)-based method for phylogenetic tree construction and multiple sequence alignment (MSA). This achieves 8x-20x speed up compared to an equivalent GPP implementation based on the widely used ClustalW software. The NJ method however only gives one possible tree which strongly depends on the evolutionary model used. A more advanced method uses maximum likelihood (ML) for scoring phylogenies with Markov Chain Monte Carlo (MCMC)-based Bayesian inference. The latter was the subject of another multi-threaded design and GPU implementation presented in this thesis, which achieved 4x-8x speed up compared to an equivalent GPP implementation based on the widely used MrBayes software. Finally, the thesis presents a general evaluation of the designs and implementations achieved in this work as a step towards the evaluation of GPU technology in BCB computing, in the context of other computer technologies including GPPs and Field Programmable Gate Arrays (FPGA) technology.

APA, Harvard, Vancouver, ISO, and other styles

39

Kissami, Imad. "High Performance Computational Fluid Dynamics on Clusters and Clouds : the ADAPT Experience." Thesis, Sorbonne Paris Cité, 2017. http://www.theses.fr/2017USPCD019/document.

Full text

Abstract:

Dans cette thèse, nous présentons notre travail de recherche dans le domaine du calcul haute performance en mécanique des fluides (CFD) pour architectures de type cluster et cloud. De manière générale, nous nous proposons de développer un solveur efficace, appelé ADAPT, pour la résolution de problèmes de CFD selon une vue classique correspondant à des développements en MPI et selon une vue qui nous amène à représenter ADAPT comme un graphe de tâches destinées à être ordonnancées sur une plateforme de type cloud computing. Comme première contribution, nous proposons une parallélisation de l’équation de diffusion-convection couplée àun système linéaire en 2D et en 3D à l’aide de MPI. Une parallélisation à deux niveaux est utilisée dans notre implémentation pour exploiter au mieux les capacités des machines multi-coeurs. Nous obtenons une distribution équilibrée de la charge de calcul en utilisant la décomposition du domaine à l’aide de METIS, ainsi qu’une résolution pertinente de notre système linéaire creux de très grande taille en utilisant le solveur parallèle MUMPS (Solveur MUltifrontal Massivement Parallèle). Notre deuxième contribution illustre comment imaginer la plateforme ADAPT, telle que représentée dans la premièrecontribution, comme un service. Nous transformons le framework ADAPT (en fait, une partie du framework)en DAG (Direct Acyclic Graph) pour le voir comme un workflow scientifique. Ensuite, nous introduisons de nouvelles politiques à l’intérieur du moteur de workflow RedisDG, afin de planifier les tâches du DAG, de manière opportuniste.Nous introduisons dans RedisDG la possibilité de travailler avec des machines dynamiques (elles peuvent quitter ou entrer dans le système de calcul comme elles veulent) et une approche multi-critères pour décider de la “meilleure”machine à choisir afin d’exécuter une tâche. Des expériences sont menées sur le workflow ADAPT pour illustrer l’efficacité de l’ordonnancement et des décisions d’ordonnancement dans le nouveau RedisDG
In this thesis, we present our research work in the field of high performance computing in fluid mechanics (CFD) for cluster and cloud architectures. In general, we propose to develop an efficient solver, called ADAPT, for problemsolving of CFDs in a classic view corresponding to developments in MPI and in a view that leads us to represent ADAPT as a graph of tasks intended to be ordered on a cloud computing platform. As a first contribution, we propose a parallelization of the diffusion-convection equation coupled to a linear systemin 2D and 3D using MPI. A two-level parallelization is used in our a implementation to take advantage of thecurrent distributed multicore machines. A balanced distribution of the computational load is obtained by using the decomposition of the domain using METIS, as well as a relevant resolution of our very large linear system using the parallel solver MUMPS (Massive Parallel MUltifrontal Solver). Our second contribution illustrates how to imagine the ADAPT framework, as depicted in the first contribution, as a Service. We transform the framework (in fact, a part of the framework) as a DAG (Direct Acyclic Graph) in order to see it as a scientific workflow. Then we introduce new policies inside the RedisDG workflow engine, in order to schedule tasks of the DAG, in an opportunistic manner. We introduce into RedisDG the possibility to work with dynamic workers (they can leave or enter into the computing system as they want) and a multi-criteria approach to decide on the “best” worker to choose to execute a task. Experiments are conducted on the ADAPT workflow to exemplify howfine is the scheduling and the scheduling decisions into the new RedisDG

APA, Harvard, Vancouver, ISO, and other styles

40

Jiang, Wei. "A Map-Reduce-Like System for Programming and Optimizing Data-Intensive Computations on Emerging Parallel Architectures." The Ohio State University, 2012. http://rave.ohiolink.edu/etdc/view?acc_num=osu1343677821.

Full text

APA, Harvard, Vancouver, ISO, and other styles

41

Miotti, Bettanini Alvise. "Welding of high performance metal matrix composite materials: the ICME approach." Thesis, KTH, Metallografi, 2014. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-154021.

Full text

Abstract:

The material development cycle is becoming too slow if compared with other technologies sectors like IT and electronics. The materials scientists’ community needs to bring materials science back to the core of human development. ICME (Integrated Computational Materials Engineer) is a new discipline that uses advanced computational tools to simulate material microstructures, processes and their links with the final properties. There is the need for a new way to design tailor-made materials with a faster and cheaper development cycle while creating products that meet “real-world” functionalities rather than vague set of specifications. Using the ICME approach, cutting edge computational thermodynamics models were employed in order to assist the microstructure characterization and refinement during the TIG welding of a functionally graded composite material with outstanding wear and corrosion resistance. The DICTRA diffusion model accurately predicted the carbon diffusion during sintering, Thermo-Calc and TC-PRISMA models described the thermodynamic and kinetics of harmful carbide precipitation, while COMSOL Multhiphysic furnished the temperature distribution profile at every timestep during TIG welding of the material. Bainite transformation and the influence of chromium and molybdenum was studied and modelled with MAP_STEEL software. The simulations were then compared with experimental observations and a very good agreement between computational works and experiments was found for both thermodynamic and kinetics predictions. The use of this new system proved to be a robust assistance to the classic development method and the material microstructures and processes were carefully adjusted in order to increase corrosion resistance and weldability. This new approach to material development can radically change the way we think and we make materials. The results suggest that the use of computational tools is a reality that can dramatically increase the efficiency of the material development.

APA, Harvard, Vancouver, ISO, and other styles

42

Pulla, Gautam. "High Performance Computing Issues in Large-Scale Molecular Statics Simulations." Thesis, Virginia Tech, 1999. http://hdl.handle.net/10919/33206.

Full text

Abstract:

Successful application of parallel high performance computing to practical problems requires overcoming several challenges. These range from the need to make sequential and parallel improvements in programs to the implementation of software tools which create an environment that aids sharing of high performance hardware resources and limits losses caused by hardware and software failures. In this thesis we describe our approach to meeting these challenges in the context of a Molecular Statics code. We describe sequential and parallel optimizations made to the code and also a suite of tools constructed to facilitate the execution of the Molecular Statics program on a network of parallel machines with the aim of increasing resource sharing, fault tolerance and availability.
Master of Science

APA, Harvard, Vancouver, ISO, and other styles

43

Hospital, Gasch Adam. "High Throughput Computational Studies of Macromolecular Structure Flexibility." Doctoral thesis, Universitat de Barcelona, 2014. http://hdl.handle.net/10803/284440.

Full text

Abstract:

Macromolecular structure, and, specifically, its dynamics and flexibility, play a crucial role in its final biological function. Intense efforts are being made to obtain experimental information about macromolecular flexibility. However, despite encouraging advances, we are far from achieving a complete description of the flexibility of a molecular system. Theoretical approaches are convenient alternatives. One of the most used theoretical techniques to account for dynamic information of structures is Molecular Dynamics (MD). Unfortunately, the practical use of MD has been severely limited by its computational cost and by the problems found in the automatic setup of simulations. An alternative to this methods are Coarse-Grained (CG) Dynamics, where, in order to increase computer efficiency, a certain loss of accuracy is accepted, with a significant reduction in structural resolution. Then, using CG algorithms, larger macromolecules and larger timescales can be simulated, reaching the mesoscopic scale. Nowadays, with the development of new and more efficient simulation engines and the availability of supercomputers and grid platforms (High Performance Computing – HPC), these methods are becoming more and more popular. However, their use in large computational High Throughput (HT) studies requires a complete automation of all the necessary steps in the process of generation of the final trajectory and its subsequent analysis, as well as the building of an efficient storage system, giving the huge amount of data generated by MD/CG methods. In this thesis, we have designed and implemented a set of bioinformatics tools to port Molecular and Coarse-Grained Dynamics to the HT regime. We have obtained a library of 1,595 protein MD simulations (MoDEL), containing a picture of macromolecular structure flexibility. This large library allowed us to perform HT studies such as the analysis of protein-solvent dynamics, with more than 16 million water molecules available. Finally, all the bioinformatics tools developed in this thesis were included in a set of graphical interfaces as web servers, to ease their use for non-expert users. Addition of pre-configured workflows, integration of macromolecular flexibility analyses and visualization possibilities enhance the value of the final project. The set of web server applications designed and implemented in this thesis is publicly accessible for the scientific community, forming an integrated macromolecular flexibility portal, which can be reached directly http://mmb.irbbarcelona.org/FlexPortal or through the Spanish National Institute of Bioinformatics (INB) portal http://www.inab.org .
Las estructuras tridimensionales de las macromoléculas, y en particular, su dinámica y flexibilidad, están íntimamente relacionadas con su función biológica. Debido a la tremenda dificultad del estudio experimental de las propiedades dinámicas de las macromoléculas, se han popularizado un conjunto de técnicas teóricas con las que obtener simulaciones de su movimiento. En los últimos años, los grandes y rápidos avances tanto en la computación como en los estudios teóricos de flexibilidad de macromoléculas han abierto la posibilidad de llevar a cabo estudios masivos de alto rendimiento (High throughput). Sin embargo, para lograr realizar este tipo de estudios, no solo se requieren algoritmos potentes y poder computacional, sino también una automatización de los distintos pasos necesarios en el proceso de cálculo de trayectorias así como de su posterior análisis. Casi tan importante como los cálculos, es necesario un sistema de almacenamiento que permita tanto guardar como consultar de manera eficiente la cantidad enorme de datos generados por el estudio masivo. En esta tesis, se han estudiado, diseñado e implementado diferentes sistemas de automatización high throughput de cálculos de dinámica molecular, tanto atomística como de baja resolución, así como herramientas para su posterior análisis. Así mismo, y para acercar estas metodologías complejas a usuarios no expertos, hemos implementado un conjunto de entornos gráficos a partir de servidores web, que directamente, o vía el portal del Instituto Nacional de Bioinformática (INB), permiten su uso por una amplia comunidad científica.

APA, Harvard, Vancouver, ISO, and other styles

44

Ozog, David. "High Performance Computational Chemistry: Bridging Quantum Mechanics, Molecular Dynamics, and Coarse-Grained Models." Thesis, University of Oregon, 2017. http://hdl.handle.net/1794/22778.

Full text

Abstract:

The past several decades have witnessed tremendous strides in the capabilities of computational chemistry simulations, driven in large part by the extensive parallelism offered by powerful computer clusters and scalable programming methods in high performance computing (HPC). However, such massively parallel simulations increasingly require more complicated software to achieve good performance across the vastly diverse ecosystem of modern heterogeneous computer systems. Furthermore, advanced “multi-resolution” methods for modeling atoms and molecules continue to evolve, and scientific software developers struggle to keep up with the hardships involved with building, scaling, and maintaining these coupled code systems. This dissertation describes these challenges facing the computational chemistry community in detail, along with recent solutions and techniques that circumvent some primary obstacles. In particular, I describe several projects and classify them by the 3 primary models used to simulate atoms and molecules: quantum mechanics (QM), molecular mechanics (MM), and coarse-grained (CG) models. Initially, the projects investigate methods for scaling simulations to larger and more relevant chemical applications within the same resolution model of either QM, MM, or CG. However, the grand challenge lies in effectively bridging these scales, both spatially and temporally, to study richer chemical models that go beyond single-scale physics and toward hybrid QM/MM/CG models. This dissertation concludes with an analysis of the state of the art in multiscale computational chemistry, with an eye toward improving developer productivity on upcoming computer architectures, in which we require productive software environments, enhanced support for coupled scientific workflows, useful abstractions to aid with data transfer, adaptive runtime systems, and extreme scalability. This dissertation includes previously published and co-authored material, as well as unpublished co-authored material.

APA, Harvard, Vancouver, ISO, and other styles

45

Stefanek, Anton. "A high-level framework for efficient computation of performance : energy trade-offs in Markov population models." Thesis, Imperial College London, 2013. http://hdl.handle.net/10044/1/23931.

Full text

Abstract:

Internet scale applications such as search engines and social networks run their services on large-scale data centres consisting of tens of thousands of servers. These systems have to cope with explosive and highly variable user demand and maintain a high level of performance. At the same time, the energy consumption of a data centre is one of the major contributors to its operational cost. This embodies the performance-energy trade-off problem. We need to find configurations which minimise the energy consumed in running important applications in complex environments, but which also allow those applications to run reliably and fast. In this thesis, we develop a general performance--energy analysis framework that can be used to express complex behaviour in communicating systems and provide a rapid analysis of performance and energy goals. It is intended that this framework can be used both at design time to predict long-run performance and energy consumption of an application in a large execution environment; and at run time to make short-term predictions given current conditions of the environment. In both cases the rapid model analysis permits detailed what-if scenarios to be tested without the need for expensive experiments or time-consuming simulations. The major contributions of this thesis are: (i) development of the Population Continuous-Time Markov Chain (PCTMC) representation as a low-level abstraction for very large performance models, (ii) development of rapid ODE analysis techniques to compute performance-based Service Level Agreements (SLA) and reward-based energy metrics in PCTMCs, (iii) hybrid extension of PCTMCs that allows models to incorporate continuous variables such as temperature and that permits the specification of systems with time-varying workloads, (iv) an extension of the GPEPA process algebra that can support session-based synchronisation between agents and that can be mapped to PCTMCs, thus giving access to the rapid ODE analysis. We support the framework with a software tool GPA, which implements all the described formalisms and analysis techniques.

APA, Harvard, Vancouver, ISO, and other styles

46

Green, Robert C. II. "Novel Computational Methods for the Reliability Evaluation of Composite Power Systems using Computational Intelligence and High Performance Computing Techniques." University of Toledo / OhioLINK, 2012. http://rave.ohiolink.edu/etdc/view?acc_num=toledo1338894641.

Full text

APA, Harvard, Vancouver, ISO, and other styles

47

Elango, Venmugil. "Techniques for Characterizing the Data Movement Complexity of Computations." The Ohio State University, 2016. http://rave.ohiolink.edu/etdc/view?acc_num=osu1452242436.

Full text

APA, Harvard, Vancouver, ISO, and other styles

48

Makgata, Katlego Webster. "Computational analysis and optimisation of the inlet system of a high-performance rally engine." Diss., Pretoria : [s.n.], 2005. http://upetd.up.ac.za/thesis/available/etd-01242006-123639.

Full text

APA, Harvard, Vancouver, ISO, and other styles

49

Setta, Mario. "Multiscale numerical approximation of morphology formation in ternary mixtures with evaporation : Discrete and continuum models for high-performance computing." Thesis, Karlstads universitet, Institutionen för matematik och datavetenskap (from 2013), 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:kau:diva-85036.

Full text

Abstract:

We propose three models to study morphology formations in interacting ternary mixtures with the evaporation of one component. Our models involve three distinct length scales: microscopic, mesoscopic, and respectively, macroscopic. The real-world application we have in mind concerns charge transport through the heterogeneous structures arising in the fabrication of organic solar cells. As first model, we propose a microscopic 3-spins lattice dynamics with short-range interactions between the considered species. This microscopic model is approximated numerically via a Monte Carlo Metropolis-based algorithm. We explore the effect of the model parameters (volatility of the solvent, system's temperature, and interaction strengths) on the structure of the formed morphologies. Our second model is built upon the first one, by introducing a new mesoscale corresponding to the size of block spins. The link between these two models as well as between the effects of the model parameters and formed morphologies are studied in detail. These two models offer insight into cross-sections of the modeling box. Our third model encodes a macroscopic view of the evaporating mixture. We investigate its capability to lead to internal coherent structures. We propose a macroscopic system of nonlinearly coupled Cahn-Hilliard equations to capture numerical results for a top view of the modeling box. Effects of effective evaporation rates, effective interaction energy parameters, and degree of polymerization on the wanted morphology formation are explored via the computational platform FEniCS using a FEM approximation of a suitably linearized system. High-performance computing resources and Python-based parallel implementations have been used to facilitate the numerical approximation of the three models.

APA, Harvard, Vancouver, ISO, and other styles

50

Tintó, Prims Oriol. "NEMO: computational challenges in ocean simulation." Doctoral thesis, Universitat Autònoma de Barcelona, 2019. http://hdl.handle.net/10803/669877.

Full text

Abstract:

Els oceans juguen un paper molt important modulant la temperatura de la Terra absorbint, emmagatzemant i transportant l'energia que ens arriba del sol. Entendre millor la dinàmica dels oceans pot ajudar a millorar les prediccions meteorològiques i a comprendre millor el clima, qüestions d'especial rellevància per la societat. Utilitzant ordinadors ha sigut possible resoldre numèricament les equacions que descriuen la dinàmica dels oceans, i millorant com els models oceànics exploten els recursos computacionals, podem reduir el cost de les simulacions alhora que fem possibles nous desenvolupaments que milloraràn la qualitat científica dels models. Enfrontant els reptes computacionals de la simulació oceànica podem contribuir en camps que tenen un impacte directe en la societat mentre reduïm el cost dels experiments. Per ser un dels principals models oceànics, la tesis s'ha centrat en el model NEMO. Per tal de millorar el rendiment dels models oceànics, un dels objectius inicials va ser entendre millor el seu comportament computacional. Per aconseguir-ho, es va proposar una metodologia d'anàlisis, posant especial atenció en les comunicacions entre processos. Utilitzada amb NEMO, va ajudar a resaltar diverses ineficiències en la implementació que, un cop sol·lucionades, van portar a una millora del 46-49% en la velocitat màxima del model, tot millorant la seva escalabilitat. Aquest resultat ilustra que aquest tipus d'anàlisis poden ajudar als desenvolupadors dels models a adaptar-los tot mostrant l'origen dels problemes que pateixen. Un altre dels problemes detectats va ser que l'impacte d'escollir una descomposició de domini concreta estava molt subestimat, ja que en certes circumstàncies el model triava una descomposició sub-optima. Tenint en compte els factors que fan que una descomposició concreta afecti el rendiment del model, es va proposar un mètode per fer una selecció òptima. Els resultats mostren que parant atenció a la descomposició no només es poden estalviar recursos sinó que la velocitat màxima del model també se'n beneficia, arribant al 41% de millora en alguns casos. Després dels èxits aconseguits en la primera part de la tesis, arribant a doblar la velocitat màxima del model, l'atenció es va posar sobre els algoritmes de precisió mixta. Idealment, un ús adequat de la precisió numèrica ha de permetre millorar el rendiment d'un model sense perjudicar-ne els resultats. Per tal d'aconseguir-ho en models oceànics, es va desenvolupar un mètode que permet determinar quina és la precisió necessària en cada una de las variables d'un codi informàtic. Utilitzat amb NEMO i ROMS va resultar que en ambdós models la major part de les variables pot utilitzar sense problema menys precisió que els 64-bits habituals, mostrant que potencialment els models oceànics es poden beneficiar molt d'una reducció de la precisió numèrica. Finalment, durant el desenvolupament de la tesi es va observar que degut a la no-linealitat dels models oceànics, determinar si un canvi en el codi informàtic perjudica la qualitat dels resultats esdevé molt complicat. Per solucionar-ho, es va presentar un mètode per verificar els resultats de models no-lineals. Encara que les contribucions que donen forma a aquesta tesis han sigut diverses, conjuntament han ajudat a identificar i combatre els reptes computacionals que afecten els models oceànics. Aquestes contribucions no només han resultat en quatre publicacions sinó que també han resultat en la contribució al codi informàtic de NEMO i del consorci EC-Earth. Per tant, els resultat de la recerca realitzada ja estan tenint un impacte positiu en la comunitat, ajudant als usuaris dels models a estalviar recursos i temps. A més a més, aquestes contribucions no només han ajudat a millorar significativament el rendiment computacional de NEMO sinó que han sobrepassat l'objectiu inicial de la tesis i poden ser transferibles a altres models computacionals.
The ocean plays a very important role in modulating the temperature of the Earth through absorbing, storing and transporting the energy that arrives from the sun. Better understanding the dynamics of the ocean can help us to better predict the weather and to better comprehend the climate, two topics of special relevance for society. Ocean models had become an extremely useful tools, as they became a framework upon with it was possible to build knowledge. Using computers it became possible to numerically solve the fluid equations of the ocean and by improving how ocean models exploit the computational resources, we can reduce the cost of simulation whilst enabling new developments that will increase its skill. By facing the computational challenges of ocean simulation we can contribute to topics that have a direct impact on society whilst helping to reduce the cost of our experiments. Being the major European ocean model and one of the main state-of-the-art ocean models worldwide, this thesis has focused on the Nucleus NEMO. To find a way to improve the computational performance of ocean models, one of the initial goals was to better understand their computational behaviour. To do so, an analysis methodology was proposed, paying special attention to inter-process communication. Used with NEMO, the methodology helped to highlight several implementation inefficiencies, whose optimization led to a 46-49\% gain in the maximum model throughput, increasing the scalability of the model. This result illustrated that this kind of analysis can significantly help model developers to adapt their code highlighting where the problems really are. Another of the issues detected was that the impact of the domain decomposition was alarmingly underestimated, since in certain circumstances the model's algorithm was selecting a sub-optimal decomposition. Taking into account the factors that make a specific decomposition impact the performance, a method to select an optimal decomposition was proposed. The results showed that that by a wise selection of the domain decomposition it was possible not only to save resources but also to increase the maximum model throughput by a 41\% in some cases. After the successes achieved during the first part of the thesis, that allowed an increase of the maximum throughput of the model by a factor of more than two, the attention focused on mixed-precision algorithms. Ideally, a proper usage of numerical precision would allow to improve the computational performance without sacrificing accuracy. In order to achieve that in ocean models, a method to find out the precision required for each one of the real variables in a code was presented. The method was used with NEMO and with the Regional Ocean Modelling System showing that in both models most of the variables could use less than the standard 64-bit without problems. Last but not least, it was found that being ocean models nonlinear it was not straightforward to determine whether a change made into the code was deteriorating the accuracy of the model or not. In order to solve this problem a method to verify the accuracy of a non-linear model was presented. Although the different contributions that gave form to this thesis have been diverse, they helped to identify and tackle computational challenges that affect computational ocean models. These contributions resulted in four peer-reviewed publications and many outreach activities. Moreover, the research outcomes have reached NEMO and EC-Earth consortium codes, having already helped model users to save resources and time. These contributions not only have significantly improved the computational performance of the NEMO model but have surpassed the original scope of the thesis and would be easily transferable to other computational models.

APA, Harvard, Vancouver, ISO, and other styles

Dissertations / Theses on the topic 'High performace Computation'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles