Dissertations / Theses on the topic 'Shared-memory parallel programming'
Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles
Consult the top 16 dissertations / theses for your research on the topic 'Shared-memory parallel programming.'
Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.
You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.
Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.
Ravela, Srikar Chowdary. "Comparison of Shared memory based parallel programming models." Thesis, Blekinge Tekniska Högskola, Sektionen för datavetenskap och kommunikation, 2010. http://urn.kb.se/resolve?urn=urn:nbn:se:bth-3384.
Full textFrom this study it is clear that threading model Pthreads model is identified as a dominant programming model by supporting high speedups for two of the three different dwarfs but on the other hand the tasking models are dominant in the development time and reducing the number of errors by supporting high growth in speedup for the applications without any communication and less growth in self-relative speedup for the applications involving communications. The degrade of the performance by the tasking models for the problems based on communications is because task based models are designed and bounded to execute the tasks in parallel without out any interruptions or preemptions during their computations. Introducing the communications violates the purpose and there by resulting in less performance. The directive model OpenMP is moderate in both aspects and stands in between these models. In general the directive models and tasking models offer better speedup than any other models for the task based problems which are based on the divide and conquer strategy. But for the data parallelism the speedup growth however achieved is low (i.e. they are less scalable for data parallel applications) are equally compatible in execution times with threading models. Also the development times are considerably low for data parallel applications this is because of the ease of development supported by those models by introducing less number of functional routines required to parallelize the applications. This thesis is concerned about the comparison of the shared memory based parallel programming models in terms of the speedup. This type of work acts as a hand in guide that the programmers can consider during the development of the applications under the shared memory based parallel programming models. We suggest that this work can be extended in two different ways: one is from the developer‘s perspective and the other is a cross-referential study about the parallel programming models. The former can be done by using a similar study like this by a different programmer and comparing this study with the new study. The latter can be done by including multiple data points in the same programming model or by using a different set of parallel programming models for the study.
C/O K. Manoj Kumar; LGH 555; Lindbloms Vägan 97; 37233; Ronneby. Phone no: 0738743400 Home country phone no: +91 9948671552
Schneider, Scott. "Shared Memory Abstractions for Heterogeneous Multicore Processors." Diss., Virginia Tech, 2010. http://hdl.handle.net/10919/30240.
Full textPh. D.
Stoker, Michael Allan. "The exploitation of parallelism on shared memory multiprocessors." Thesis, University of Newcastle Upon Tyne, 1990. http://hdl.handle.net/10443/2000.
Full textKarlbom, David. "A Performance Evaluation of MPI Shared Memory Programming." Thesis, KTH, Skolan för datavetenskap och kommunikation (CSC), 2016. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-188676.
Full textI detta examensarbete undersöker vi Message Passing Inferfaces (MPI) support för shared memory programmering på modern hårdvaruarkitektur med flera Non-Uniform Memory Access (NUMA) domäner. Vi undersöker prestanda med hjälp av två fallstudier: matris-matris multiplikation och Conway’s game of life. Vi jämför prestandan utav MPI shared med hjälp utav exekveringstid samt minneskonsumtion jämtemot OpenMP och MPI punkt-till-punkt kommunikation, även känd som MPI two-sided. Vi utför strong scaling tests för båda fallstudierna. Vi observerar att MPI-two sided är 21% snabbare än MPI shared och 18% snabbare än OpenMP för matris-matris multiplikation när 32 processorer användes. För samma testdata har MPI shared en 45% lägre minnesförburkning än MPI two-sided. För Conway’s game of life är MPI two-sided 10% snabbare än MPI shared samt 82% snabbare än OpenMP implementation vid användandet av 32 processorer. Vi kunde också utskilja att om ingen mappning av virtuella minnet till en specifik NUMA domän görs, leder det till en ökning av exekveringstiden med upp till 64% när 32 processorer används. Vi kom fram till att MPI shared är användbart för intranode kommunikation på modern hårdvaruarkitektur med flera NUMA domäner.
Atukorala, G. S. "Porting a distributed operating system to a shared memory parallel computer." Thesis, University of Bath, 1990. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.256756.
Full textAlmas, Luís Pedro Parreira Galito Pimenta. "DSM-PM2 adequacy for distributed constraint programming." Master's thesis, Universidade de Évora, 2007. http://hdl.handle.net/10174/16454.
Full textCordeiro, Silvio Ricardo. "Code profiling and optimization in transactional memory systems." reponame:Biblioteca Digital de Teses e Dissertações da UFRGS, 2014. http://hdl.handle.net/10183/97866.
Full textTransactional Memory has shown itself to be a promising paradigm for the implementation of shared-memory concurrent applications that eschew a lock-based model of data synchronization. Rather than conditioning exclusive access on the value of a lock that is shared across concurrent threads, Transactional Memory attempts to execute critical sections optimistically, rolling back the modifications in the event of a data access conflict. However, while the lock-based approach has acquired a significant body of debugging, profiling and automated optimization tools (as one of the oldest and most researched synchronization techniques), the field of Transactional Memory is still comparably recent, and programmers are usually tasked with an unguided manual tuning of their transactional applications when facing efficiency problems. We propose a system in which code profiling in a simulated hardware implementation of Transactional Memory is used to characterize a transactional application, which forms the basis for the automated tuning of the underlying speculative system for the efficient execution of that particular application. We also propose a profile-guided approach to the scheduling of threads in a software-based implementation of Transactional Memory, using collected data to predict the likelihood of conflicts and determine what thread to schedule based on this prediction. We present the results achieved under both designs.
Farooq, Mohammad Habibur Rahman & Qaisar. "Performance Prediction of Parallel Programs in a Linux Environment." Thesis, Blekinge Tekniska Högskola, Sektionen för datavetenskap och kommunikation, 2010. http://urn.kb.se/resolve?urn=urn:nbn:se:bth-1143.
Full textcontact: +46(0)736368336
Tillenius, Martin. "Scientific Computing on Multicore Architectures." Doctoral thesis, Uppsala universitet, Avdelningen för beräkningsvetenskap, 2014. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-221241.
Full textUPMARC
eSSENCE
Bokhari, Saniyah S. "Parallel Solution of the Subset-sum Problem: An Empirical Study." The Ohio State University, 2011. http://rave.ohiolink.edu/etdc/view?acc_num=osu1305898281.
Full textAlnervik, Erik. "Evaluation of the Configurable Architecture REPLICA with Emulated Shared Memory." Thesis, Linköpings universitet, Programvara och system, 2014. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-104313.
Full textREPLICA är en grupp av konfigurerbara multiprocessorer som med hjälp utav ett emulerat delat minne realiserar PRAM modellen. Syftet med denna avhandling är att genom benchmarking av olika beräkningsproblem på REPLICA, liknande (SB-PRAM och XMT) och mindre lika (Xeon X5660 och Tesla M2050) parallella arkitekturer, utvärdera hur REPLICA står sig mot andra befintliga arkitekturer. Både prestandamässigt och hur enkel arkitekturen är att programmera effektiv, men även försöka ta reda på om REPLICA är speciellt lämpad för några särskilda typer av beräkningsproblem. Genom att använda välkända Berkeley dwarfs applikationer och opartisk indata från bland annat The University of Florida Sparse Matrix Collection och Rodinia benchmark suite, säkerställer vi att det är relevanta beräkningsproblem som utförs och mäts. Vi visar att dagens parallella arkitekturer har problem med prestandan för applikationer med oregelbundna minnesaccessmönster, vilken REPLICA arkitekturen kan vara en lösning på. Till exempel, så behöver REPLICA endast vara klockad med några få MHz för att matcha både Xeon X5660 och Tesla M2050 för algoritmen breadth first search, vilken lider av just oregelbunden minnesåtkomst. Genom att jämföra effektiviteten för REPLICA gentemot en CPU (Xeon X5660), visar vi att det är lättare att programmera REPLICA effektivt än dagens multiprocessorer.
Ideguchi, Antonio Diogo Hidee. "LX-MCAPI : biblioteca de comunicação para suporte a programação paralela em sistemas multi-core." Universidade Federal de São Carlos, 2016. https://repositorio.ufscar.br/handle/ufscar/8420.
Full textApproved for entry into archive by Marina Freitas (marinapf@ufscar.br) on 2017-01-16T18:00:17Z (GMT) No. of bitstreams: 1 DissADHI.pdf: 1668973 bytes, checksum: 66675509e8ba3ae17c94da9b605df4d4 (MD5)
Approved for entry into archive by Marina Freitas (marinapf@ufscar.br) on 2017-01-16T18:00:38Z (GMT) No. of bitstreams: 1 DissADHI.pdf: 1668973 bytes, checksum: 66675509e8ba3ae17c94da9b605df4d4 (MD5)
Made available in DSpace on 2017-01-16T18:00:48Z (GMT). No. of bitstreams: 1 DissADHI.pdf: 1668973 bytes, checksum: 66675509e8ba3ae17c94da9b605df4d4 (MD5) Previous issue date: 2016-05-12
Não recebi financiamento
The multi-core processors represent the industry response for the physical barriers encountered during the development of computing processors during the last decades, and brought new advances on computing system performance. The complex superscalar unicore processors with high frequency clocks gave way to processing units with two or more cores in just one encapsulation, generally with low clock frequencies, allowing one or more execution threads per core. On this context, the existing programming models using serial and concurrent paradigms do not allow exploring the real potential provided by the new hardware elements incorporated, generating a necessity of new programming methodologies that does allow exploring parallelism aggregated by the use of multi-core processors. This work presents LX-MCAPI, a library based on modern IPC (Inter-Process Communication) and memory sharing mechanisms, developed over the hypothesis that message passing is a viable, flexible and scalable abstraction, compared to conventional programming methods using shared-memory on multi-core systems. LX-MCAPI offers a message-passing, zerocopy memory sharing mechanism between processes and ready to use scalability patterns to facilitate the process of abstraction and construction of applications. It has performed well in therms of transmission latency and transfer rate on x86-64 and ARM environments.
Os processadores multi-core representaram a resposta da indústria às barreiras físicas encontradas no desenvolvimento de processadores computacionais nas últimas décadas, e trouxeram novo fôlego ao avanço do desempenho de sistemas computacionais. Os complexos processadores superescalares de núcleo único com frequências de clock relativamente altas deram espaço a unidades de processamento com dois ou mais núcleos em um mesmo encapsulamento, geralmente mais “lentos”, possibilitando uma ou mais threads por núcleo. Nesse contexto, os modelos de programação existentes utilizando os paradigmas sequencial e concorrente não permitiam a exploração do potencial real proporcionado pelos novos elementos de hardware introduzidos, gerando uma necessidade de criação de novas metodologias de programação que permitissem tirar proveito do paralelismo agregado à utilização dos processadores multi-core. Este trabalho apresenta a LX-MCAPI, biblioteca baseada em mecanismos modernos de IPC (Inter-Process Communication) e compartilhamento de memória, desenvolvida sobre a hipótese em que a passagem de mensagens é uma abstração viável, flexível e escalável, quando comparada a métodos de programação convencionais utilizando memória-compartilhada em sistemas multi-core. LX-MCAPI oferece um mecanismo de passagem de mensagem e compartilhamento zero-copy de memória entre processos, além de padrões de programação paralela prontos para uso, que facilitam o processo de abstração e construção de aplicações. Além disso, apresentando bom desempenho em termos de latências de transmissão e taxas de transferência em ambientes x86-64 e ARM.
Rafael, João Pedro Maia. "A programming language for parallel event-driven development." Master's thesis, 2013. http://hdl.handle.net/10316/35550.
Full textRecently, event-oriented programming frameworks have surfaced as a solution to highly scalable network applications. This model as been adopted under many languages resulting in frameworks such as Node.js, Gevent and EventMachine. These frameworks are capable of handling many concurrent requests by using asynchronous IO. However, in order to make use all available cores, parallelism is exploited by creating multiple instances of the same application. Under this solution instances don’t share memory making synchronization mechanisms required. The same problem applies when using the actor model for concurrency. The EVE framework provides support for event-oriented programming under a shared-memory model. It encompasses the EVE language definition, its compiler and a runtime system capable of executing the resulting applications. Using our model, the programmer divides the application logic into tasks and each task indicates what variables it can access. The runtime schedules compatible tasks to multiple cores using a work-stealing algorithm for load balancing. In this work, we present a formal description of the language and it’s runtime, including their operational semantics. Our benchmarks indicate that our solution delivers the best performance on IO heavy problems when compared to existing of-the-shelf solutions and performance comparable to the state-of-the-art architectures for CPU-bounded applications.
CHEN, MING-REN, and 陳銘仁. "The design of an object-oriented parallel programming system supporting recoverable distributed shared memory." Thesis, 1992. http://ndltd.ncl.edu.tw/handle/25113388933012107674.
Full textHONG, QI-FU, and 洪啟富. "The design of a parallel programming environment for a distributed shared memory system on microcomputer networks." Thesis, 1992. http://ndltd.ncl.edu.tw/handle/3vn27x.
Full textHe, Yuxiong, and Junqing Wang. "On-the-fly Race Detection for Programs with Recursive Spawn-Sync Parallelism." 2003. http://hdl.handle.net/1721.1/3868.
Full textSingapore-MIT Alliance (SMA)