Tesis sobre el tema "MPI"
Crea una cita precisa en los estilos APA, MLA, Chicago, Harvard y otros
Consulte los 50 mejores tesis para su investigación sobre el tema "MPI".
Junto a cada fuente en la lista de referencias hay un botón "Agregar a la bibliografía". Pulsa este botón, y generaremos automáticamente la referencia bibliográfica para la obra elegida en el estilo de cita que necesites: APA, MLA, Harvard, Vancouver, Chicago, etc.
También puede descargar el texto completo de la publicación académica en formato pdf y leer en línea su resumen siempre que esté disponible en los metadatos.
Explore tesis sobre una amplia variedad de disciplinas y organice su bibliografía correctamente.
Kamal, Humaira. "FG-MPI : Fine-Grain MPI". Thesis, University of British Columbia, 2013. http://hdl.handle.net/2429/44668.
Texto completoRamesh, Srinivasan. "MPI Performance Engineering with the MPI Tools Information Interface". Thesis, University of Oregon, 2018. http://hdl.handle.net/1794/23779.
Texto completoMassetto, Francisco Isidro. "Hybrid MPI - uma implementação MPI para ambientes distribuídos híbridos". Universidade de São Paulo, 2007. http://www.teses.usp.br/teses/disponiveis/3/3141/tde-08012008-100937/.
Texto completoThe increasing develpment of high performance applications is a reality on current days. However, the diversity of computer architectures, including mono and multiprocessor machines, clusters with or without front-end node, the variety of operating systems and MPI implementations has growth increasingly. Focused on this scenario, programming libraries that allows integration of several MPI implementations, operating systems and computer architectures are needed. This thesis introduces HyMPI, a MPI implementation aiming integratino, on a distributed high performance system nodes with different architectures, clusters with or without front-end machine, operating systems and MPI implementations. HyMPI offers a set of primitives based on MPI specification, including point-to-point communication, collective operations, startup and finalization and some other utility functions.
Subotic, Vladimir. "Evaluating techniques for parallelization tuning in MPI, OmpSs and MPI/OmpSs". Doctoral thesis, Universitat Politècnica de Catalunya, 2013. http://hdl.handle.net/10803/129573.
Texto completoLa programación paralela consiste en dividir un problema de computación entre múltiples unidades de procesamiento y definir como interactúan (comunicación y sincronización) para garantizar un resultado correcto. El rendimiento de un programa paralelo normalmente está muy lejos de ser óptimo: el desequilibrio de la carga computacional y la excesiva interacción entre las unidades de procesamiento a menudo causa ciclos perdidos, reduciendo la eficiencia de la computación paralela. En esta tesis proponemos técnicas orientadas a explotar mejor el paralelismo en aplicaciones paralelas, poniendo énfasis en técnicas que incrementan el asincronismo. En teoría, estas técnicas prometen múltiples beneficios. Primero, tendrían que mitigar el retraso de la comunicación y la sincronización, y por lo tanto incrementar el rendimiento global. Además, la calibración de la paralelización tendría que exponer un paralelismo adicional, incrementando la escalabilidad de la ejecución. Finalmente, un incremente en el asincronismo proveería una tolerancia mayor a redes de comunicación lentas y ruido externo. En la primera parte de la tesis, estudiamos el potencial para la calibración del paralelismo a través de MPI. En concreto, exploramos técnicas automáticas para solapar la comunicación con la computación. Proponemos una técnica de mensajería especulativa que incrementa el solapamiento y no requiere cambios en la aplicación MPI original. Nuestra técnica identifica automáticamente la actividad MPI de la aplicación y la reinterpreta usando solicitudes MPI no bloqueantes situadas óptimamente. Demostramos que esta técnica maximiza el solapamiento y, en consecuencia, acelera la ejecución y permite una mayor tolerancia a las reducciones de ancho de banda. Aún así, en el caso de cargas de trabajo científico realistas, mostramos que el potencial de solapamiento está significativamente limitado por el patrón según el cual cada proceso MPI opera localmente en el paso de mensajes. En la segunda parte de esta tesis, exploramos el potencial para calibrar el paralelismo híbrido MPI/OmpSs. Intentamos obtener una comprensión mejor del paralelismo de aplicaciones híbridas MPI/OmpSs para evaluar de qué manera se ejecutarían en futuras máquinas. Exploramos como las aplicaciones MPI/OmpSs pueden escalar en una máquina paralela con centenares de núcleos por nodo. Además, investigamos cómo este paralelismo de cada nodo se reflejaría en las restricciones de la red de comunicación. En especia, nos concentramos en identificar secciones críticas de código en MPI/OmpSs. Hemos concebido una técnica que rápidamente evalúa, para una aplicación MPI/OmpSs dada y la máquina objetivo seleccionada, qué sección de código tendría que ser optimizada para obtener la mayor ganancia de rendimiento. También estudiamos técnicas para explorar rápidamente el paralelismo potencial de OmpSs inherente en las aplicaciones. Proporcionamos mecanismos para evaluar fácilmente el paralelismo potencial de cualquier descomposición en tareas. Además, describimos una aproximación iterativa para buscar una descomposición en tareas que mostrará el suficiente paralelismo en la máquina objetivo dada. Para finalizar, exploramos el potencial para automatizar la aproximación iterativa. En el trabajo expuesto en esta tesis hemos diseñado herramientas que pueden ser útiles para otros investigadores de este campo. La más avanzada es Tareador, una herramienta para ayudar a migrar aplicaciones al modelo de programación MPI/OmpSs. Tareador proporciona una interfaz simple para proponer una descomposición del código en tareas OmpSs. Tareador también calcula dinámicamente las dependencias de datos entre las tareas anotadas, y automáticamente estima el potencial de paralelización OmpSs. Por último, Tareador da indicaciones adicionales sobre como completar el proceso de migración a OmpSs. Tareador ya se ha mostrado útil al ser incluido en las clases de programación de la UPC.
Träff, Jesper. "Aspects of the efficient implementation of the message passing interface (MPI)". Aachen Shaker, 2009. http://d-nb.info/994501803/04.
Texto completoYoung, Bobby Dalton. "MPI WITHIN A GPU". UKnowledge, 2009. http://uknowledge.uky.edu/gradschool_theses/614.
Texto completoAngadi, Raghavendra. "Best effort MPI/RT as an alternative to MPI design and performance comparison /". Master's thesis, Mississippi State : Mississippi State University, 2002. http://library.msstate.edu/etd/show.asp?etd=etd-12032002-162333.
Texto completoSankarapandian, Dayala Ganesh R. Kamal Raj. "Profiling MPI Primitives in Real-time Using OSU INAM". The Ohio State University, 2020. http://rave.ohiolink.edu/etdc/view?acc_num=osu1587336162238284.
Texto completoHoefler, Torsten. "Communication/Computation Overlap in MPI". Universitätsbibliothek Chemnitz, 2006. http://nbn-resolving.de/urn:nbn:de:swb:ch1-200600021.
Texto completoChung, Ryan Ki Sing. "CMCMPI : Compose-Map-Configure MPI". Thesis, University of British Columbia, 2014. http://hdl.handle.net/2429/51185.
Texto completoScience, Faculty of
Computer Science, Department of
Graduate
Mir, Taheri Seyed M. "Scalability of communicators in MPI". Thesis, University of British Columbia, 2011. http://hdl.handle.net/2429/33128.
Texto completoSilva, Rafael Ennes. "Escalonamento estático de programas-MPI". reponame:Biblioteca Digital de Teses e Dissertações da UFRGS, 2006. http://hdl.handle.net/10183/11472.
Texto completoA good performance of a parallel application is obtained according to the mode as the parallelization techniques are applied. To make use of these techniques, is necessary to nd an appropriate way to extract the parallelism. This extraction can be done through a representative graph of the application. In this work, methods of partitioning graphs are applied to optimize the communication between processes that belong to a parallel computation. In this context, the processes allocation aims to minimize the communication amount between processors. This technique is frequently adopted in High Performance Computing - HPC. However, the graph building is generally inside the program, that has private data structures employed in the graph building. The proposal is to utilize tools directly in MPI programs, employing only standard resources of the MPI 1.2 norm. The goal is to provide a portable library (b -MPI) to static schedule MPI programs. The static scheduling realized by the library is done through the mapping of processes. This mapping seeks to cluster the processes that exchange a lot of information in the same machine that, in this case decreases the data volume passed through the net. The mapping will be done staticly after a previous execution of a MPI program. The target applications to make use of b -MPI are those whose keep the same communication pattern after successives executions. The library validation is done through the available applications in the FFTW package, the solving of the problem of Heat Transference through the Additive Schwarz Method and Multigrid and the LU factorization implemented in the HPL benchmark. The results show that b -MPI can be utilized to distribute the processes ef ciently minimizing the volume of messages exchanged through the network.
Marjanović, Vladimir. "The MPI/OmpSs parallel programming model". Doctoral thesis, Universitat Politècnica de Catalunya, 2016. http://hdl.handle.net/10803/398135.
Texto completoLas supercomputadoras están formadas por un creciente número de núcleos, del orden de millones en la actualidad, que se comunican a través de una compleja red de interconexión. Para obtener el más alto rendimiento posible es necesario reducir el tiempo de comunicación entre procesos. MPI ("Message Passing Interface", Interfaz de Paso de Mensajes), el modelo de programación más usado para grandes sistemas con memoria distribuida, permite llamadas de comunicación asíncrona para solapar la comunicación y la computación. Sin embargo, dichas llamadas son difíciles de usar e incrementan la complejidad del código, necesitándose un mayor esfuerzo en la implementación del código y dando lugar a programas más difíciles de leer. Esta tesis presenta un nuevo modelo de programación que permite al programador introducir fácilmente la asincronía necesaria para solapar la comunicación y la computación. El modelo de programación propuesto está fundamentado en MPI y la infraestructura basada en tareas y memoria compartida OmpSs. La tesis describe en profundidad los detalles de la implementación para la eficiente interoperabilidad entre OmpSs y MPI. En la tesis se demuestra el uso híbrido de MPI/OmpSs con distintas aplicaciones de las cuales el benchmark HPL es el más importante. La versión híbrida MPI/OmpSs mejora significativamente el rendimiento de las aplicaciones respecto a las versiones MPI originales. En el caso de HPL se acerca a un rendimiento asintótico para problemas relativamente pequeños, obteniendo mejoras significativas para problemas grandes. Además la versión híbrida MPI/OmpSs reduce substancialmente la complejidad del código y se ve menos afectada por el ancho de banda de la red y el ruido del sistema operativo que la versión MPI pura. Esta tesis también analiza y compara otros métodos actuales para solapar computación y comunicación colectiva, tales como usar comunicación punto a punto con hilos adicionales para la comunicación. La tesis resalta la importancia de entender las características de la computación que se ejecuta simultáneamente con la comunicación. Los resultados experimentales se han obtenido usando el benchmark sintético CCUBE ("Communication Computation Concurrent", Comunicación Computación Concurrente), desarrollado en esta tesis, además de HPL.
Tsai, Mike Yao Chen. "Hybrid design of MPI over SCTP". Thesis, University of British Columbia, 2007. http://hdl.handle.net/2429/32492.
Texto completoScience, Faculty of
Computer Science, Department of
Graduate
Zhang, Wenbin. "Libra: Detecting Unbalance MPI Collective Calls". The Ohio State University, 2011. http://rave.ohiolink.edu/etdc/view?acc_num=osu1313160584.
Texto completoCheng, Chih-Kai. "Java simulation of MPI collective communications". Leeds, 2001. http://www.leeds.ac.uk/library/counter2/compstmsc/20002001/cheng.pdf.
Texto completoFlorez-Larrahondo, German. "A trusted environment for MPI programs". Master's thesis, Mississippi State : Mississippi State University, 2002. http://library.msstate.edu/etd/show.asp?etd=etd-10172002-103135.
Texto completoMohror, Kathryn Marie. "Infrastructure For Performance Tuning MPI Applications". PDXScholar, 2004. https://pdxscholar.library.pdx.edu/open_access_etds/2660.
Texto completoFord, Corey. "Lazy Fault Detection for Redundant MPI". DigitalCommons@CalPoly, 2016. https://digitalcommons.calpoly.edu/theses/1561.
Texto completoGabriel, Edgar. "Erweiterung einer MPI-Umgebung zur Interoperabilität verteilter MPP-Systeme". [S.l.] : Universität Stuttgart , Zentrale Universitätseinrichtung (RUS, UB etc.), 1996. http://www.bsz-bw.de/cgi-bin/xvms.cgi?SWB6783410.
Texto completoCooper, Ian Michael. "MPI-style Web services : an investigation into the potential of using Web services for MPI-style applications". Thesis, Cardiff University, 2009. http://orca.cf.ac.uk/54979/.
Texto completoHoefler, Torsten, Mirko Reinhardt, Frank Mietke, Torsten Mehlan y Wolfgang Rehm. "Low Overhead Ethernet Communication for Open MPI on Linux Clusters". Universitätsbibliothek Chemnitz, 2006. http://nbn-resolving.de/urn:nbn:de:swb:ch1-200601112.
Texto completoNagel, Wolfgang E., Alfred Arnold, Michael Weber, Hans-Christian Hoppe y Karl Solchenbach. "VAMPIR: Visualization and Analysis of MPI Resources". Saechsische Landesbibliothek- Staats- und Universitaetsbibliothek Dresden, 2010. http://nbn-resolving.de/urn:nbn:de:bsz:14-qucosa-26639.
Texto completoKubiš, Milan. "Optimalizace sběrného výfukového potrubí Škoda 1,2 MPI". Master's thesis, Vysoké učení technické v Brně. Fakulta strojního inženýrství, 2015. http://www.nusl.cz/ntk/nusl-230446.
Texto completoGrabowsky, L., Th Ermer y J. Werner. "Nutzung von MPI für parallele FEM-Systeme". Universitätsbibliothek Chemnitz, 1998. http://nbn-resolving.de/urn:nbn:de:bsz:ch1-199801365.
Texto completoNakashima, Raul Junji. "Paralelização de programas sisal para sistemas MPI". Universidade de São Paulo, 1996. http://www.teses.usp.br/teses/disponiveis/76/76132/tde-06052008-105502/.
Texto completoThis work describes a method for the partial parallelization of SISAL programs into programs with calls to MPI routines. We focused on the parallelization of the forall loop (through slicing of the index range). The generated code is a master/slave SPMD program. The work was validated through the compilation of some simple SISAL programs and comparison of the results with an unmodified version
Ignatenko, S. N. y S. A. Petrov. "Application of mpi technology for allocated calculations". Thesis, Вид-во СумДУ, 2009. http://essuir.sumdu.edu.ua/handle/123456789/17001.
Texto completoKühnemann, Matthias, Thomas Rauber y Gudula Rünger. "Optimizing MPI Collective Communication by Orthogonal Structures". Universitätsbibliothek Chemnitz, 2007. http://nbn-resolving.de/urn:nbn:de:swb:ch1-200701061.
Texto completoKazilas, Panagiotis. "Augmenting MPI Programming Process with Cognitive Computing". Thesis, Linnéuniversitetet, Institutionen för datavetenskap och medieteknik (DM), 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:lnu:diva-88913.
Texto completoWang, Liqiang. "An Efficient Platform for Large-Scale MapReduce Processing". ScholarWorks@UNO, 2009. http://scholarworks.uno.edu/td/963.
Texto completoAlmeida, Alexandre Vinicius. "Uso de auto-tuning para otimização de decomposição de domínios paralela". reponame:Biblioteca Digital de Teses e Dissertações da UFRGS, 2011. http://hdl.handle.net/10183/39121.
Texto completoAchieving the peak performance level of a particular platform requires technical knowledge of the hardware environment involved, since the software must explore specific details inherent to the hardware. Once the software is optimized for a target platform, if the hardware evolves or is changed, the software probably would not be as efficient in the new environment. This performance portability problem is addressed by software auto-tuning, which emerged in the past decade as an automated technique to adapt a particular software to an underlying hardware. The software adaptation is performed by an auto-tuner. The auto-tuner is an entity that empirically adjusts specific application parameters in order to improve the overall application performance, or even generates source-code optimized for the target platform. This dissertation proposes an auto-tuner to optimize the domain decomposition of a parallel application that performs stencil computations. The proposed auto-tuner works in a parameterized adaptation fashion, and varies the dimensions of a 2D domain, the number of parallel processes and the extension of the overlapping zones between subdomains. For each combination of parameter values, the auto-tuner probes the application in the parallel architecture in order to seek the best combination of values. In order to make auto-tuning possible, it is proposed a C++ class called Mesh, based on the Message Passing Interface (MPI) standard. The role of this class is to abstract the domain decomposition from the application using the Object Orientation facilities provided by C++, and also to enable the extension of the overlapping zones between subdomain. The experimental results showed that the performance gains were mainly due to the variation of the number of processes, which was one of the application factors dealt by the auto-tuner. The parallel architecture used in the experiments showed itself as not adequate for optimizing the domain decomposition by increasing the overlapping zones extension.
Dickov, Branimir. "MPI layer techniques to improve network energy efficiency". Doctoral thesis, Universitat Politècnica de Catalunya, 2015. http://hdl.handle.net/10803/334181.
Texto completoEn los últimos años, el consumo de energia en la red de interconexión se esta considerando como uno de los factores que pueden condicionar la carrera hacia los sistemas Exascale. En la red de interconexion, la mayor parte de este consumo de energía se debe a los enlaces de red, cuyo consumo permanece constante independientemente de si los datos se intercambian de forma activa, dado que ambos extremos deben de permanecer activos para poder mantener la sincronización. Esta tesis complementa los esfuerzos de investigación que actualmente se estan llevando a cabo a nivel internacional con el objetivo de reducir la potencia y conseguir una proporcionalidad de consumo de energía con respecto al ancho de banda requerido en las comunicaciones. En esta tesis se contemplan dos direcciones complementarias para conseguir dichos objetivos: por un lado, la posibilidad de usar sólo el ancho de banda necesario durante las fases de comunicación; y por lo tanto usar el modo de bajo consumo durante las fases de computación en las que no se requiere de la red de interconexión. Para abordar la primera de ellas se investiga los posibles beneficios de usar compresión en los datos que se transfieren en los mensajes MPI. Cuando ello es posible, se puede realizar la comunicación con una menor necesidad de ancho de banda de los enlaces sin que necesariamente se produzca una penalizacion en el rendimiento de la aplicación. Varias técnicas de compresión han sido propuestas en la literatura con el objetivo de reducir el tiempo de comunicación y la escalabilidad de las aplicaciones paralelas. Aunque estas técnicas han mostrado un potencial importante en ciertos nucleos computacionales, su adopción en sistemas reales no se ha llevado a cabo. En esta tesis, se muestra como el uso de la compresión de datos en los mensajes MPI puede permitir una reducción en el consumo de energia, reduciendo el número de enlaces activos que son requeridos para realizar la comunicación, en proporción a la reducción de los bytes que deben de ser transferidos. En general, los desarrolladores de aplicaciones consideran el tiempo pasado en la comunicación como un gasto innecesario, y por lo tanto se esfuerzan en mantenerlo al mínimo. Esto lleva a una demanda de un ancho de banda que puede afrontar el pico de alto trafico y de una sensibilidad a la latencía, pero con una utilización mediana baja, lo que ofrece unas oportunidades significativas para el ahorro de energía. Por lo tanto, es posible ahorrar la energía apoyándose en los modos de bajo consumo, pero las latencias de reactivación de los enlaces no deben producir una pérdida en el rendimiento. En esta tesis doctoral se propone un mecanismo que permite predecir con exactitud los periodos de inactividad de los enlaces, lo que permitirá pasarlos al modo más eficiente de energía que disponga la infraestructura de red. La propuesta en esta tesis doctoral actua en tiempo de ejecución y se denomina Sistema de Predicción de Patrones (SPP). SPP permite predecir con exactitud no sólo cuando un enlace llega a ser no usado, sino también cuando se requiere de nuevo su reactivación, permitiendo que los enlaces entren en modo de bajo consumo durante los periodos de inactividad y se vuelven de nuevo activos a tiempo evitando provocar una degradación significativa en el rendimiento. Muchas aplicaciones de HPC (High-Performance Computing) pueden beneficiarse de esta predicción, ya que tienen fases de computación y de comunicación repetitivas. Mediante la implementación de los mecanismos de ahorro de energía dentro de la libreria MPI, los programas MPI existentes no requiren ninguna modificación. En la tesis, tambien desarrollamos una version más avanzada del sistema de predicción que dominamos como el Sistema de Prediccion de Patrones con Ajustes Automáticos (SPPA) que además permite ajustar de forma autónoma uno de los parámetros importantes de SPP que determina el grado de agregación de mensajes en el algoritmo de predicción
Hagen, Knut Imar. "Fault-tolerance for MPI Codes on Computational Clusters". Thesis, Norwegian University of Science and Technology, Department of Computer and Information Science, 2007. http://urn.kb.se/resolve?urn=urn:nbn:no:ntnu:diva-8728.
Texto completoThis thesis focuses on fault-tolerance for MPI codes on computational clusters. When an application runs on a very large cluster with thousands of processors, there is likely that a process crashes due to a hardware or software failure. Fault-tolerance is the ability of a system to respond gracefully to an unexpected hardware or software failure. A test application which is meant to run for several weeks on several nodes is used in this thesis. The application is a seismic MPI application, written in Fortran90. This application was provided by Statoil, who wanted a fault-tolerant implementation. The original test application had no degree of fault-tolerance --if one process or one node crashed, the entire application also crashed. In this thesis, a collection of fault-tolerant techniques are analysed, including checkpointing, MPI Error handlers, extending MPI, replication, fault detection, atomic clocks and multiple simultaneous failures. Several MPI implementations are described, like MPICH1, MPICH2, LAM/MPI and Open MPI. Next, some fault-tolerant products which are developed at other universities are described, like FT-MPI, FEMPI, MPICH-V including its five protocols, the fault-tolerant functionality of Open MPI, and MPI Error handlers. A fault-tolerant simulator which simulates the application's behaviour is developed. The simulator uses two fault-tolerance methods: FT-MPI and MPI Error handlers. Next, our test application is similarly made fault-tolerant with FT-MPI using three proposed approaches: MPI_Reduce(), MPI_Barrier(), and the final and current implementation: MPI Loop. Tests of the MPI Loop implementation are run on a small and a large cluster to verify the fault-tolerant behaviour. The seismic application survives a crash of n-2 nodes/processes. Process number 0 must stay alive since it acts as an I/O server, and there must be at least one process left to compute data. Processes can also be restarted rather than left out, but the test application needs to be modified to support this.
Karlbom, David. "A Performance Evaluation of MPI Shared Memory Programming". Thesis, KTH, Skolan för datavetenskap och kommunikation (CSC), 2016. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-188676.
Texto completoI detta examensarbete undersöker vi Message Passing Inferfaces (MPI) support för shared memory programmering på modern hårdvaruarkitektur med flera Non-Uniform Memory Access (NUMA) domäner. Vi undersöker prestanda med hjälp av två fallstudier: matris-matris multiplikation och Conway’s game of life. Vi jämför prestandan utav MPI shared med hjälp utav exekveringstid samt minneskonsumtion jämtemot OpenMP och MPI punkt-till-punkt kommunikation, även känd som MPI two-sided. Vi utför strong scaling tests för båda fallstudierna. Vi observerar att MPI-two sided är 21% snabbare än MPI shared och 18% snabbare än OpenMP för matris-matris multiplikation när 32 processorer användes. För samma testdata har MPI shared en 45% lägre minnesförburkning än MPI two-sided. För Conway’s game of life är MPI two-sided 10% snabbare än MPI shared samt 82% snabbare än OpenMP implementation vid användandet av 32 processorer. Vi kunde också utskilja att om ingen mappning av virtuella minnet till en specifik NUMA domän görs, leder det till en ökning av exekveringstiden med upp till 64% när 32 processorer används. Vi kom fram till att MPI shared är användbart för intranode kommunikation på modern hårdvaruarkitektur med flera NUMA domäner.
Sihota, Amit Kaur. "Conjugate gradient methods using MPI for distributed systems". Thesis, McGill University, 2004. http://digitool.Library.McGill.CA:80/R/?func=dbin-jump-full&object_id=81569.
Texto completoAguilar, Xavier. "Towards Scalable Performance Analysis of MPI Parallel Applications". Licentiate thesis, KTH, High Performance Computing and Visualization (HPCViz), 2015. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-165043.
Texto completoQC 20150508
Saifi, Mohamad Maamoun El. "PMPI: uma implementação MPI multi-plataforma, multi-linguagem". Universidade de São Paulo, 2006. http://www.teses.usp.br/teses/disponiveis/3/3141/tde-08122006-154811/.
Texto completoThis dissertation describes PMPI, an implementation of the MPI standard on a heterogeneous platform. Unlike other MPI implementations, PMPI permits MPI computation to run on a multiplatform system. In addition, PMPI permits programs executing on different nodes to be written in different programming languages. PMPI is build on the top of Dotnet framework. With PMPI, nodes call MPI functions that are transparently executed on the participating nodes across the network. PMPI can span multiple administrative domains distributed geographically. To programmers, the grid looks like a local MPI computation. The model of computation is indistinguishable from that of standard MPI computation. This dissertation studies the implementation of PMPI with Microsoft Dotnet framework and MONO Dotnet framework to provide a common layer for a multiprogramming language multiplatform MPI library. Results obtained from tests running PMPI on a heterogeneous system are analyzed. The obtained results show that PMPI implementation is feasible and has many advantages that can be explored.
Michel, Martial. "Contribution au transfert de données : application à MPI". Nancy 1, 2001. http://www.theses.fr/2001NAN10197.
Texto completoMPI is a Standard that defines a library based upon the Message Passing concept, aimed at solving parallel problem by allowing direct communications between tasks. MPI has many basic data-types available, but the creation of data-types composed of other basic data-types is a long and repetitive process, eased by AutoMap. MPI has no mechanism for automatic transfer of pointer linked data-types : AutoLink is a library developed to answer the adaptation and transfer need of such a mecanism. Those tools compose the MPI Data-Types Tools, and are the work discussed in this PhD thesis. We will present the evolution of the inner mechanisms as well as algorithms, describe the way AutoLink produces a serialized data to be transfered using buffers to enhance communications, present experimental studies of performances, explain uses of the MPI Data-Types Tools, compare them with other similar tools, and finally possible extensions to the current algorithms will be introduced
Liu, Jiuxing. "Designing high performance and scalable MPI over InfiniBand". The Ohio State University, 2004. http://rave.ohiolink.edu/etdc/view?acc_num=osu1095296555.
Texto completoVaria, Siddharth. "REGULARIZED MARKOV CLUSTERING IN MPI AND MAP REDUCE". The Ohio State University, 2013. http://rave.ohiolink.edu/etdc/view?acc_num=osu1374153215.
Texto completoMosch, Marek Höfler Torsten. "Entwicklung einer optimierten kollektiven Komponente für Open MPI". [S.l. : s.n.], 2007.
Buscar texto completoRibeiro, Hethini do Nascimento. "Paralelização do algoritmo DIANA com OpenMP e MPI". Universidade Estadual Paulista (UNESP), 2018. http://hdl.handle.net/11449/157280.
Texto completoRejected by Elza Mitiko Sato null (elzasato@ibilce.unesp.br), reason: Solicitamos que realize correções na submissão seguindo as orientações abaixo: Problema 01) A FICHA CATALOGRÁFICA (Obrigatório pela ABNT NBR14724) está desconfigurada e falta número do CDU. Problema 02) Falta citação nos agradecimentos, segundo a Portaria nº 206, de 4 de setembro de 2018, todos os trabalhos que tiveram financiamento CAPES deve constar nos agradecimentos a expressão: "O presente trabalho foi realizado com apoio da Coordenação de Aperfeiçoamento de Pessoal de Nível Superior - Brasil (CAPES) - Código de Financiamento 001 Problema 03) Falta o ABSTRACT (resumo em língua estrangeira), você colocou apenas o resumo em português. Problema 04) Na lista de tabelas, a página referente a Tabela 9 está desconfigurada. Problema 05) A cidade na folha de aprovação deve ser Bauru, cidade onde foi feita a defesa. Bauru 31 de agosto de 2018 Problema 06) A paginação deve ser sequencial, iniciando a contagem na folha de rosto e mostrando o número a partir da introdução, a ficha catalográfica ficará após a folha de rosto e não deverá ser contada. OBS:-Estou encaminhando via e-mail o template/modelo das páginas pré-textuais para que você possa fazer as correções da paginação, sugerimos que siga este modelo pois ele contempla as normas da ABNT Lembramos que o arquivo depositado no repositório deve ser igual ao impresso, o rigor com o padrão da Universidade se deve ao fato de que o seu trabalho passará a ser visível mundialmente. Agradecemos a compreensão on 2018-10-09T14:18:32Z (GMT)
Submitted by HETHINI DO NASCIMENTO RIBEIRO (hethini.ribeiro@outlook.com) on 2018-10-10T00:30:40Z No. of bitstreams: 1 Dissertação_hethini_corrigido.pdf: 1570340 bytes, checksum: a42848ab9f1c4352dcef8839391827a7 (MD5)
Approved for entry into archive by Elza Mitiko Sato null (elzasato@ibilce.unesp.br) on 2018-10-10T14:37:37Z (GMT) No. of bitstreams: 1 ribeiro_hn_me_sjrp.pdf: 1566499 bytes, checksum: 640247f599771152e290426a2174d30f (MD5)
Made available in DSpace on 2018-10-10T14:37:37Z (GMT). No. of bitstreams: 1 ribeiro_hn_me_sjrp.pdf: 1566499 bytes, checksum: 640247f599771152e290426a2174d30f (MD5) Previous issue date: 2018-08-31
Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES)
No início desta década havia cerca de 5 bilhões de telefones em uso gerando dados. Essa produção global aumentou aproximadamente 40% ao ano no início da década passada. Esses grandes conjuntos de dados que podem ser capturados, comunicados, agregados, armazenados e analisados, também chamados de Big Data, estão colocando desafios inevitáveis em muitas áreas e, em particular, no campo Machine Learning. Algoritmos de Machine Learning são capazes de extrair informações úteis desses grandes repositórios de dados e por este motivo está se tornando cada vez mais importante o seu estudo. Os programas aptos a realizarem essa tarefa podem ser chamados de algoritmos de classificação e clusterização. Essas aplicações são dispendiosas computacionalmente. Para citar alguns exemplos desse custo, o algoritmo Quality Threshold Clustering tem, no pior caso, complexidade O(�������������5). Os algoritmos hierárquicos AGNES e DIANA, por sua vez, possuem O(n²) e O(2n) respectivamente. Sendo assim, existe um grande desafio, que consiste em processar grandes quantidades de dados em um período de tempo realista, encorajando o desenvolvimento de algoritmos paralelos que se adequam ao volume de dados. O objetivo deste trabalho é apresentar a paralelização do algoritmo de hierárquico divisivo DIANA. O desenvolvimento do algoritmo foi realizado em MPI e OpenMP, chegando a ser três vezes mais rápido que a versão monoprocessada, evidenciando que embora em ambientes de memória distribuídas necessite de sincronização e troca de mensagens, para um certo grau de paralelismo é vantajosa a aplicação desse tipo de otimização para esse algoritmo.
Earlier in this decade there were about 5 billion phones in use generating data. This global production increased approximately 40% per year at the beginning of the last decade. These large datasets that can be captured, communicated, aggregated, stored and analyzed, also called Big Data, are posing inevitable challenges in many areas, and in particular in the Machine Learning field. Machine Learning algorithms are able to extract useful information from these large data repositories and for this reason their study is becoming increasingly important. The programs that can perform this task can be called classification and clustering algorithms. These applications are computationally expensive. To cite some examples of this cost, the Quality Threshold Clustering algorithm has, in the worst case, complexity O (n5). The hierarchical algorithms AGNES and DIANA, in turn, have O (n²) and O (2n) respectively. Thus, there is a great challenge, which is to process large amounts of data in a realistic period of time, encouraging the development of parallel algorithms that fit the volume of data. The objective of this work is to present the parallelization of the DIANA divisive hierarchical algorithm. The development of the algorithm was performed in MPI and OpenMP, reaching three times faster than the monoprocessed version, evidencing that although in distributed memory environments need synchronization and exchange of messages, for a certain degree of parallelism it is advantageous to apply this type of optimization for this algorithm.
1757857
Hoefler, Torsten. "Fast Barrier Synchronization for InfiniBand". Universitätsbibliothek Chemnitz, 2006. http://nbn-resolving.de/urn:nbn:de:swb:ch1-200600019.
Texto completoŠeinauskas, Vytenis. "Lygiagrečių programų efektyvumo tyrimas". Master's thesis, Lithuanian Academic Libraries Network (LABT), 2008. http://vddb.library.lt/obj/LT-eLABa-0001:E.02~2008~D_20080811_151827-94348.
Texto completoParallel program execution is often used to overcome the constraints of processing speed and memory size when executing complex and time-consuming algorithms. The downside to this approach is the increased overall complexity of programs and their implementations. Parallel execution introduces a new class of software bugs and performance shortcomings, that are usually difficult to trace using traditional methods and tools. Hence, new tools and methods need to be introduced, which deal specifically with problems encountered in parallel programs. The goal of this project is the development of MPI-based parallel program performance monitoring tool and research into the ways this tool can be used for measuring, comparing and improving the performance of target programs.
Sehrish, Saba. "IMPROVING PERFORMANCE AND PROGRAMMER PRODUCTIVITY FOR I/O-INTENSIVE HIGH PERFORMANCE COMPUTING APPLICATIONS". Doctoral diss., University of Central Florida, 2010. http://digital.library.ucf.edu/cdm/ref/collection/ETD/id/3300.
Texto completoPh.D.
School of Electrical Engineering and Computer Science
Engineering and Computer Science
Computer Engineering PhD
Cera, Marcia Cristina. "Providing adaptability to MPI applications on current parallel architectures". reponame:Biblioteca Digital de Teses e Dissertações da UFRGS, 2012. http://hdl.handle.net/10183/55464.
Texto completoCurrently, adaptability is a desired feature in parallel applications. For instante, the increasingly number of user competing for resources of the parallel architectures causes dynamic changes in the set of available processors. Adaptive applications are able to execute using a set of volatile processors, providing better resource utilization. This adaptive behavior is known as malleability. Another example comes from the constant evolution of the multi-core architectures, which increases the number of cores to each new generation of chips. Adaptability is the key to allow parallel programs portability from one multi-core machine to another. Thus, parallel programs can adapt the unfolding of the parallelism to the specific degree of parallelism of the target architecture. This adaptive behavior can be seen as a particular case of evolutivity. In this sense, this thesis is focused on: (i) malleability to adapt the execution of parallel applications as changes in processors availability; and (ii) evolutivity to adapt the unfolding of the parallelism at runtime as the architecture and input data properties. Thus, the open issue is "How to provide and support adaptive applications?". This thesis aims to answer this question taking into account the MPI (Message-Passing Interface), which is the standard parallel API for HPC in distributed-memory environments. Our work is based on MPI-2 features that allow spawning processes at runtime. adding some fiexibility to the MPI applications. Malleable MPI applications use dynamic process creation to expand themselves in growth action (to use further processors). The shrinkage actions (to release processors) end the execution of the MPI processes on the required processors in such a way that the application's data are preserved. Notice that malleable applications require a runtime environment support to execute, once they must be notified about the processors availability. Evolving MPI applications follow the explicit task parallelism paradigm to allow their runtime adaptation. Thus, dynamic process creation is used to unfold the parallelism, i.e., to create new MPI tasks on demand. To provide these applications we defined the abstract MPI tasks, implemented the synchronization among these tasks through message exchanges, and proposed an approach to adjust MPI tasks granularity aiming at efficiency in distributed-memory environments. Experimental results validated our hypothesis that adaptive applications can be provided using the MPI-2 features. Additionally, this thesis identifies the requirements to support these applications in cluster environments. Thus, malleable MPI applications were able to improve the cluster utilization; and the explicit task ones were able to adapt the unfolding of the parallelism to the target architecture, showing that this programming paradigm can be efficient also in distributed-memory contexts.
Attari, Sanya. "An Investigation of I/O Strategies for MPI Workloads". Thesis, Virginia Tech, 2010. http://hdl.handle.net/10919/36212.
Texto completoMaster of Science
Grass, Max. "Parallelisierung einer hybriden Partikel/Finite-Volumen Simulationsplattform mittels MPI". Zürich : ETH, Eidgenössische Technische Hochschule Zürich, Institut für Fluiddynamik, 2006. http://e-collection.ethbib.ethz.ch/show?type=dipl&nr=240.
Texto completoCastellanos, Carrazana Abel. "Performance model for hybrid MPI+OpenMP master/worker applications". Doctoral thesis, Universitat Autònoma de Barcelona, 2014. http://hdl.handle.net/10803/283403.
Texto completoIn the current environment, various branches of science are in need of auxiliary high-performance computing to obtain relatively short-term results. This is mainly due to the high volume of information that needs to be processed and the computational cost demanded by these calculations. The benefit to performing this processing using distributed and parallel programming mechanisms is that it achieves shorter waiting times in obtaining the results. To support this, there are basically two widespread programming models: the model of message passing based on the standard libraries MPI and the shared memory model with the use of OpenMP. Hybrid applications are those that combine both models in order to take the specific potential of parallelism of each one in each case. Unfortunately, experience has shown that using this combination of models does not necessarily guarantee an improvement in the behavior of applications. There are several parameters that must be considered to determine the configuration of the application that provides the best execution time. The number of process that must be used,the number of threads on each node, the data distribution among processes and threads, and so on, are parameters that seriously affect the performance of the application. On the one hand, the appropriate value of such parameters depends on the architectural features of the system (communication latency, communication bandwidth, cache memory size and architecture, computing capabilities, etc.), and, on the other hand, on the features of the application. The main contribution of this thesis is a novel technique for predicting the performance and efficiency of parallel hybrid Master/Worker applications. This technique is known as model-based regression trees into the field of machine learning. The experimental results obtained allow us to be optimistic about the use of this algorithm for predicting both metrics and to select the best application execution parameters.
Yu, Weikuan. "Enhancing MPI with modern networking mechanisms in cluster interconnects". Columbus, Ohio : Ohio State University, 2006. http://rave.ohiolink.edu/etdc/view?acc%5Fnum=osu1150470374.
Texto completo