Log in

Relevant bibliographies by topics / MAPREDUCE FRAMEWORKS / Dissertations / Theses

To see the other types of publications on this topic, follow the link: MAPREDUCE FRAMEWORKS.

Dissertations / Theses on the topic 'MAPREDUCE FRAMEWORKS'

Author: Grafiati

Published: 11 September 2023

Last updated: 31 July 2025

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 45 dissertations / theses for your research on the topic 'MAPREDUCE FRAMEWORKS.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.

1

de, Souza Ferreira Tharso. "Improving Memory Hierarchy Performance on MapReduce Frameworks for Multi-Core Architectures." Doctoral thesis, Universitat Autònoma de Barcelona, 2013. http://hdl.handle.net/10803/129468.

Full text

Abstract:

La necesidad de analizar grandes conjuntos de datos de diferentes tipos de aplicaciones ha popularizado el uso de modelos de programación simplicados como MapReduce. La popularidad actual se justifica por ser una abstracción útil para expresar procesamiento paralelo de datos y también ocultar eficazmente la sincronización de datos, tolerancia a fallos y la gestión de balanceo de carga para el desarrollador de la aplicación. Frameworks MapReduce también han sido adaptados a los sistema multi-core y de memoria compartida. Estos frameworks proponen que cada core de una CPU ejecute una tarea Map

APA, Harvard, Vancouver, ISO, and other styles

2

Kumaraswamy, Ravindranathan Krishnaraj. "Exploiting Heterogeneity in Distributed Software Frameworks." Diss., Virginia Tech, 2016. http://hdl.handle.net/10919/64423.

Full text

Abstract:

The objective of this thesis is to address the challenges faced in sustaining efficient, high-performance and scalable Distributed Software Frameworks (DSFs), such as MapReduce, Hadoop, Dryad, and Pregel, for supporting data-intensive scientific and enterprise applications on emerging heterogeneous compute, storage and network infrastructure. Large DSF deployments in the cloud continue to grow both in size and number, given DSFs are cost-effective and easy to deploy. DSFs are becoming heterogeneous with the use of advanced hardware technologies and due to regular upgrades to the system. For in

APA, Harvard, Vancouver, ISO, and other styles

3

Venumuddala, Ramu Reddy. "Distributed Frameworks Towards Building an Open Data Architecture." Thesis, University of North Texas, 2015. https://digital.library.unt.edu/ark:/67531/metadc801911/.

Full text

Abstract:

Data is everywhere. The current Technological advancements in Digital, Social media and the ease at which the availability of different application services to interact with variety of systems are causing to generate tremendous volumes of data. Due to such varied services, Data format is now not restricted to only structure type like text but can generate unstructured content like social media data, videos and images etc. The generated Data is of no use unless been stored and analyzed to derive some Value. Traditional Database systems comes with limitations on the type of data format schema, a

APA, Harvard, Vancouver, ISO, and other styles

4

Peddi, Sri Vijay Bharat. "Cloud Computing Frameworks for Food Recognition from Images." Thesis, Université d'Ottawa / University of Ottawa, 2015. http://hdl.handle.net/10393/32450.

Full text

Abstract:

Distributed cloud computing, when integrated with smartphone capabilities, contribute to building an efficient multimedia e-health application for mobile devices. Unfortunately, mobile devices alone do not possess the ability to run complex machine learning algorithms, which require large amounts of graphic processing and computational power. Therefore, offloading the computationally intensive part to the cloud, reduces the overhead on the mobile device. In this thesis, we introduce two such distributed cloud computing models, which implement machine learning algorithms in the cloud in paralle

APA, Harvard, Vancouver, ISO, and other styles

5

Elteir, Marwa Khamis. "A MapReduce Framework for Heterogeneous Computing Architectures." Diss., Virginia Tech, 2012. http://hdl.handle.net/10919/28786.

Full text

Abstract:

Nowadays, an increasing number of computational systems are equipped with heterogeneous compute resources, i.e., following different architecture. This applies to the level of a single chip, a single node and even supercomputers and large-scale clusters. With its impressive price-to-performance ratio as well as power efficiently compared to traditional multicore processors, graphics processing units (GPUs) has become an integrated part of these systems. GPUs deliver high peak performance; however efficiently exploiting their computational power requires the exploration of a multi-dimensional s

APA, Harvard, Vancouver, ISO, and other styles

6

Alkan, Sertan. "A Distributed Graph Mining Framework Based On Mapreduce." Master's thesis, METU, 2010. http://etd.lib.metu.edu.tr/upload/12611588/index.pdf.

Full text

Abstract:

The frequent patterns hidden in a graph can reveal crucial information about the network the graph represents. Existing techniques to mine the frequent subgraphs in a graph database generally rely on the premise that the data can be fit into main memory of the device that the computation takes place. Even though there are some algorithms that are designed using highly optimized methods to some extent, many lack the solution to the problem of scalability. In this thesis work, our aim is to find and enumerate the subgraphs that are at least as frequent as the designated threshold in a given grap

APA, Harvard, Vancouver, ISO, and other styles

7

Wang, Yongzhi. "Constructing Secure MapReduce Framework in Cloud-based Environment." FIU Digital Commons, 2015. http://digitalcommons.fiu.edu/etd/2238.

Full text

Abstract:

MapReduce, a parallel computing paradigm, has been gaining popularity in recent years as cloud vendors offer MapReduce computation services on their public clouds. However, companies are still reluctant to move their computations to the public cloud due to the following reason: In the current business model, the entire MapReduce cluster is deployed on the public cloud. If the public cloud is not properly protected, the integrity and the confidentiality of MapReduce applications can be compromised by attacks inside or outside of the public cloud. From the result integrity’s perspective, if any

APA, Harvard, Vancouver, ISO, and other styles

8

Zhang, Yue. "A Workload Balanced MapReduce Framework on GPU Platforms." Wright State University / OhioLINK, 2015. http://rave.ohiolink.edu/etdc/view?acc_num=wright1450180042.

Full text

APA, Harvard, Vancouver, ISO, and other styles

9

Raja, Anitha. "A Coordination Framework for Deploying Hadoop MapReduce Jobs on Hadoop Cluster." Thesis, KTH, Skolan för informations- och kommunikationsteknik (ICT), 2016. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-196951.

Full text

Abstract:

Apache Hadoop is an open source framework that delivers reliable, scalable, and distributed computing. Hadoop services are provided for distributed data storage, data processing, data access, and security. MapReduce is the heart of the Hadoop framework and was designed to process vast amounts of data distributed over a large number of nodes. MapReduce has been used extensively to process structured and unstructured data in diverse fields such as e-commerce, web search, social networks, and scientific computation. Understanding the characteristics of Hadoop MapReduce workloads is the key to ach

APA, Harvard, Vancouver, ISO, and other styles

10

Lakkimsetti, Praveen Kumar. "A framework for automatic optimization of MapReduce programs based on job parameter configurations." Kansas State University, 2011. http://hdl.handle.net/2097/12011.

Full text

Abstract:

Master of Science<br>Department of Computing and Information Sciences<br>Mitchell L. Neilsen<br>Recently, cost-effective and timely processing of large datasets has been playing an important role in the success of many enterprises and the scientific computing community. Two promising trends ensure that applications will be able to deal with ever increasing data volumes: first, the emergence of cloud computing, which provides transparent access to a large number of processing, storage and networking resources; and second, the development of the MapReduce programming model, which provides a high

APA, Harvard, Vancouver, ISO, and other styles

11

Li, Min. "A resource management framework for cloud computing." Diss., Virginia Tech, 2014. http://hdl.handle.net/10919/47804.

Full text

Abstract:

The cloud computing paradigm is realized through large scale distributed resource management and computation platforms such as MapReduce, Hadoop, Dryad, and Pregel. These platforms enable quick and efficient development of a large range of applications that can be sustained at scale in a fault-tolerant fashion. Two key technologies, namely resource virtualization and feature-rich enterprise storage, are further driving the wide-spread adoption of virtualized cloud environments. Many challenges arise when designing resource management techniques for both native and virtualized data centers. Fi

APA, Harvard, Vancouver, ISO, and other styles

12

Rahman, Md Wasi-ur. "Designing and Modeling High-Performance MapReduce and DAG Execution Framework on Modern HPC Systems." The Ohio State University, 2016. http://rave.ohiolink.edu/etdc/view?acc_num=osu1480475635778714.

Full text

APA, Harvard, Vancouver, ISO, and other styles

13

Donepudi, Harinivesh. "An Apache Hadoop Framework for Large-Scale Peptide Identification." TopSCHOLAR®, 2015. http://digitalcommons.wku.edu/theses/1527.

Full text

Abstract:

Peptide identification is an essential step in protein identification, and Peptide Spectrum Match (PSM) data set is huge, which is a time consuming process to work on a single machine. In a typical run of the peptide identification method, PSMs are positioned by a cross correlation, a statistical score, or a likelihood that the match between the trial and hypothetical is correct and unique. This process takes a long time to execute, and there is a demand for an increase in performance to handle large peptide data sets. Development of distributed frameworks are needed to reduce the processing t

APA, Harvard, Vancouver, ISO, and other styles

14

Huang, Xin. "Querying big RDF data : semantic heterogeneity and rule-based inconsistency." Thesis, Sorbonne Paris Cité, 2016. http://www.theses.fr/2016USPCB124/document.

Full text

Abstract:

Le Web sémantique est la vision de la prochaine génération de Web proposé par Tim Berners-Lee en 2001. Avec le développement rapide des technologies du Web sémantique, de grandes quantités de données RDF existent déjà sous forme de données ouvertes et liées et ne cessent d'augmenter très rapidement. Les outils traditionnels d'interrogation et de raisonnement sur les données du Web sémantique sont conçus pour fonctionner dans un environnement centralisé. A ce titre, les algorithmes de calcul traditionnels vont inévitablement rencontrer des problèmes de performances et des limitations de mémoire

APA, Harvard, Vancouver, ISO, and other styles

15

Huang, Xin. "Querying big RDF data : semantic heterogeneity and rule-based inconsistency." Electronic Thesis or Diss., Sorbonne Paris Cité, 2016. http://www.theses.fr/2016USPCB124.

Full text

Abstract:

Le Web sémantique est la vision de la prochaine génération de Web proposé par Tim Berners-Lee en 2001. Avec le développement rapide des technologies du Web sémantique, de grandes quantités de données RDF existent déjà sous forme de données ouvertes et liées et ne cessent d'augmenter très rapidement. Les outils traditionnels d'interrogation et de raisonnement sur les données du Web sémantique sont conçus pour fonctionner dans un environnement centralisé. A ce titre, les algorithmes de calcul traditionnels vont inévitablement rencontrer des problèmes de performances et des limitations de mémoire

APA, Harvard, Vancouver, ISO, and other styles

16

RANJAN, RAVI. "PERFORMANCE ANALYSIS OF APRIORI AND FP GROWTH ON DIFFERENT MAPREDUCE FRAMEWORKS." Thesis, 2017. http://dspace.dtu.ac.in:8080/jspui/handle/repository/15814.

Full text

Abstract:

Association rule mining remains a very popular and effective method to extract meaningful information from large datasets. It tries to find possible associations between items in large transaction based datasets. In order to create these associations, frequent patterns have to be generated. Apriori and FP Growth are the two most popular algorithms for frequent itemset mining. To enhance the efficiency and scalability of Apriori and FP Growth, a number of algorithms have been proposed addressing the design of efficient data structures, minimizing database scan and parallel and distributed

APA, Harvard, Vancouver, ISO, and other styles

17

Huang, Ruei-Jyun, and 黃瑞竣. "A MapReduce Framework for Heterogeneous Mobile Devices." Thesis, 2013. http://ndltd.ncl.edu.tw/handle/91818518409630056856.

Full text

Abstract:

碩士<br>國立臺灣科技大學<br>電子工程系<br>101<br>With the advance of science and technology, mobile devices continue to introduce new models, so that users are willing to buy to experience in hardware and software performance. After some years, users could accumulate different computing capability of mobile devices. In the thesis, we will use heterogeneous mobile devices and a wireless router to build a MapReduce framework. Through the MapReduce framework, we not only can control each mobile device but also execute different applications in single mobile device or multiple mobile devices. The MapReduce frame

APA, Harvard, Vancouver, ISO, and other styles

18

"Thermal Aware Scheduling in Hadoop MapReduce Framework." Master's thesis, 2013. http://hdl.handle.net/2286/R.I.20932.

Full text

Abstract:

abstract: The energy consumption of data centers is increasing steadily along with the associ- ated power-density. Approximately half of such energy consumption is attributed to the cooling energy, as a result of which reducing cooling energy along with reducing servers energy consumption in data centers is becoming imperative so as to achieve greening of the data centers. This thesis deals with cooling energy management in data centers running data-processing frameworks. In particular, we propose ther- mal aware scheduling for MapReduce framework and its Hadoop implementation to reduce coolin

APA, Harvard, Vancouver, ISO, and other styles

19

Li, Jia-Hong, and 李家宏. "Using MapReduce Framework for Mining Association Rules." Thesis, 2013. http://ndltd.ncl.edu.tw/handle/gbw4n8.

Full text

Abstract:

碩士<br>國立臺中科技大學<br>資訊工程系碩士班<br>101<br>With the rapid development of computer hardware and network technologies, people may gain the demand for the related applications. Cloud computing has become a very popular research area recently. An association rules in data mining which plays important role in cloud computing technology. An association rule is useful for discovering relationships among different products and further provides beneficial decision to policy-market. In association rules, computation load in discovering all frequent itemsets from transaction database is considerably high. Some

APA, Harvard, Vancouver, ISO, and other styles

20

Kao, Yu-Chon, and 高玉璁. "Data-Locality-Aware MapReduce Real-Time Scheduling Framework." Thesis, 2015. http://ndltd.ncl.edu.tw/handle/95200425495617797846.

Full text

Abstract:

碩士<br>國立臺灣科技大學<br>電機工程系<br>103<br>MapReduce is widely used in cloud applications for large-scale data processing. The increasing number of interactive cloud applications has led to an increasing need for MapReduce real-time scheduling. Most MapReduce applications are data-oriented and nonpreemptively executed. Therefore, the problem of MapReduce real-time scheduling is complicated because of the trade-off between run-time blocking for nonpreemptive execution and data-locality. This paper proposes a data-locality-aware MapReduce real-time scheduling framework for guaranteeing quality of service

APA, Harvard, Vancouver, ISO, and other styles

21

Zeng, Wei-Chen, and 曾偉誠. "Efficient XML Data Processing based on MapReduce Framework." Thesis, 2015. http://ndltd.ncl.edu.tw/handle/j8b55u.

Full text

Abstract:

碩士<br>國立臺中科技大學<br>資訊工程系碩士班<br>103<br>As a result of hardware technology and network technology progress, each kind of application material quantity increasing rapidly. So cloud computing technology becomes the processing great quantity material (Big data) the most important research topic.Cloud computing technology offers a new service construction, operates the resources, storage spatial use more effectively, and also provides the development environment and the cloud services. In which great quantity data processing mostly processes at present by the MapReduce operation environment; But unde

APA, Harvard, Vancouver, ISO, and other styles

22

CHEN, YI-TING, and 陳奕廷. "An Improved K-means Algorithm based on MapReduce Framework." Thesis, 2016. http://ndltd.ncl.edu.tw/handle/582un3.

Full text

Abstract:

碩士<br>元智大學<br>資訊工程學系<br>104<br>As the data is collected and accumulated, data mining is used in big data analysis to find the relevance of huge amount of data and dig the information behind. Cluster analysis in data mining simplifies and analytics data. In this paper, we will research the problems of K-means algorithm and improve it. There are some disadvantages using K-means algorithm. The users need to determine the K value for a number of clusters, then generate the starting point randomly. In addition, the processing speed could be slow or some problems could not be fixed while dealing wit

APA, Harvard, Vancouver, ISO, and other styles

23

Hung-YuChang and 張弘諭. "Adaptive MapReduce Framework for Multi-Application Processing on GPU." Thesis, 2013. http://ndltd.ncl.edu.tw/handle/40788660892721645478.

Full text

Abstract:

碩士<br>國立成功大學<br>工程科學系碩博士班<br>101<br>With the improvements in electronic and computer technology, the amount of data to be processed by each enterprise is getting larger. Handling such amount of data is not a big challenge with the help of MapReduce framework anymore. Many applications from every field can take advantage of MapReduce framework on large amount of CPUs for efficient distributed and parallel computing. On the other hand, graphics processing unit (GPU) technology is also improving. The multi-cores GPU provides stronger computing power that is capable of handling more workloads and

APA, Harvard, Vancouver, ISO, and other styles

24

Hua, Guanjie, and 滑冠傑. "Haplotype Block Partitioning and TagSNP Selection with MapReduce Framework." Thesis, 2013. http://ndltd.ncl.edu.tw/handle/13072946241409858351.

Full text

Abstract:

碩士<br>靜宜大學<br>資訊工程學系<br>101<br>SNPs play important roles for various analysis applications including medical diagnostic and drug design. They contain the highest-resolution genetic fingerprint for identifying disease associations and human features. Haplotype, is composed of SNPs, region of linked genetic variants that are neighboring usually inherited together. Recently, genetics researches show that SNPs within certain haplotype blocks induce only a few distinct common haplotypes in the majority of the population. The discussion of haplotype block has serious implications of method with asso

APA, Harvard, Vancouver, ISO, and other styles

25

Chou, Yen-Chen, and 周艷貞. "General Wrapping of Information Hiding Patterns on MapReduce Framework." Thesis, 2014. http://ndltd.ncl.edu.tw/handle/21760802727746979445.

Full text

APA, Harvard, Vancouver, ISO, and other styles

26

Chang, Zhi-Hong, and 張志宏. "Join Operations for Large-Scale Data on MapReduce Framework." Thesis, 2012. http://ndltd.ncl.edu.tw/handle/t4c5e6.

Full text

Abstract:

碩士<br>國立臺中科技大學<br>資訊工程系碩士班<br>100<br>As the rapid development of hardware and network technology, cloud computing has become an important research topic. It provides a solution for large-scale data processing problems. The data-parallel framework provides a platform to deal with large-scale data, especially for data mining and data warehousing. MapReduce is one of the most famous data-parallel frameworks. It consists of two stages: the Map stage and the Reduce stage. Based on the MapReduce framework, Scatter-Gather-Merge (SGM) is an efficient algorithm supporting star join queries, which is on

APA, Harvard, Vancouver, ISO, and other styles

27

Huang, Yuan-Shao, and 黃元劭. "An Efficient Frequent Patterns Mining Algorithm Based on MapReduce Framework." Thesis, 2013. http://ndltd.ncl.edu.tw/handle/78k6v3.

Full text

Abstract:

碩士<br>中華大學<br>資訊工程學系碩士班<br>101<br>Recently, the data is continuously increasing in every enterprise. The Big Data, Cloud Computing, Data Mining etc., become hot topics in the present day. In this thesis, we modified the tradition Apriori algorithm by improving the execution efficiency, since Aprori algorithm confronted with an issue that the computation time increases dramatically when data size increases. Therefore, we design and implement two efficient algorithms: Frequent Patterns Mining Algorithm Based on MapReduce Framework (FAMR) algorithm and Optimization FAMR (OFAMR) algorithm. We adop

APA, Harvard, Vancouver, ISO, and other styles

28

You, Hsin-Han, and 尤信翰. "A Load-Aware Scheduler for MapReduce Framework in Heterogeneous Environments." Thesis, 2010. http://ndltd.ncl.edu.tw/handle/30478140785981334531.

Full text

Abstract:

碩士<br>國立交通大學<br>資訊科學與工程研究所<br>99<br>MapReduce is becoming a trendy programming model for large-scale data processing such as data mining, log processing, web indexing and scientific research. MapReduce framework is a batch distributed data processing framework that disassembles a job into smaller map tasks and reduce tasks. In MapReduce framework, master node distributes tasks to worker nodes to complete the whole job. Hadoop MapReduce is the most popular open-source implementation of MapReduce framework. Hadoop MapReduce comes with a pluggable task scheduler interface and a default FIFO job s

APA, Harvard, Vancouver, ISO, and other styles

29

Ho, Hung-Wei, and 何鴻緯. "Modeling and Analysis of Hadoop MapReduce Framework Using Petri Nets." Thesis, 2015. http://ndltd.ncl.edu.tw/handle/37850810462698368045.

Full text

Abstract:

碩士<br>國立臺北大學<br>資訊工程學系<br>103<br>Technological advances have significantly increased the amount of corporate data available, which has created a wide range of business opportunities related to big data and cloud computing. Hadoop is a popular programming framework used for the setup of cloud computing systems. The MapReduce framework forms the core of the Hadoop program for parallel computing and its parallel framework can greatly increase the efficiency of big data analysis. This study used Petri nets to create a visual model of the MapReduce framework and verify its reachability. We present

APA, Harvard, Vancouver, ISO, and other styles

30

Chang, Jui-Yen, and 張瑞岩. "MapReduce-Based Frequent Pattern Mining Framework with Multiple Item Support." Thesis, 2017. http://ndltd.ncl.edu.tw/handle/bxj8r2.

Full text

Abstract:

碩士<br>國立臺北科技大學<br>資訊與財金管理系碩士班<br>105<br>The analysis of big data mining for frequent patterns is become even more problematic. It got a lot of applications and attempt to promote people’s health and daily life better and easier. Association mining is the analyzing process of discovering interesting and useful association rules hidden from huge and complicated data in different databases. However, use a single minimum item support value for all items are not sufficient since it could not reflect the characteristic of each item. When the minimum support value (MIS) is set too low, despite it wou

APA, Harvard, Vancouver, ISO, and other styles

31

YEH, WEN-HSIN, and 葉文昕. "An Algorithm for Efficiently Mining Frequent Itemsets Based on MapReduce Framework." Thesis, 2018. http://ndltd.ncl.edu.tw/handle/z8na4z.

Full text

Abstract:

碩士<br>明新科技大學<br>電機工程系碩士班<br>107<br>With the maturity of cloud technology, big data and data mining, cloud computing have become the hot research topic. Association rule mining is one of the most important techniques for data mining. Among them, Apriori is the most representative algorithm for association rule mining, but the performance of traditional Apriori algorithm will become worse as the amount of data is larger and the support is smaller. The use of cloud computing technology MapReduce distributed architecture will improve Apriori's shortcomings. Google Cloud Dataproc is a platform that

APA, Harvard, Vancouver, ISO, and other styles

32

Wei, Xiu-Hui, and 魏秀蕙. "Performance Comparison of Sequential Pattern Mining Algorithms Based on Mapreduce Framework." Thesis, 2014. http://ndltd.ncl.edu.tw/handle/fz7kg8.

Full text

Abstract:

碩士<br>國立臺中科技大學<br>資訊工程系碩士班<br>102<br>Because that the popularity of cloud technology and the accumulation of large amounts of data, it is very important direction of research to reduce time for processing large amounts of data efficiently. Besides, there are many kinds of data mining technique which are used in analyzing of huge amounts of data, which contains the association rule mining algorithms and sequential pattern mining algorithms. In this study, two sequential pattern mining algorithms, GSP algorithm and AprioriAll algorithm, are parallelized through the MapReduce framework. Also, we

APA, Harvard, Vancouver, ISO, and other styles

33

Chen, Bo-Ting, and 陳柏廷. "Improving the techniques of Mining Association Rule based on MapReduce framework." Thesis, 2015. http://ndltd.ncl.edu.tw/handle/b6g4eb.

Full text

Abstract:

碩士<br>國立臺中科技大學<br>資訊工程系碩士班<br>103<br>We can get useful and valuable information from insignificant data through data mining and gain huge benefit from professional analysis. However, it is important to improve the performance of data mining for Big Data processing. The purpose of this study is to improve the performance of parallel association-rule mining algorithm of PIETM (Principle of Inclusion- Exclusion and Transaction Mapping) under the MapReduce framework. PIETM is arranging transaction data in database into a tree structure which is called Transaction tree (T-tree), and then transform

APA, Harvard, Vancouver, ISO, and other styles

34

Chang, Chih-Wei, and 張智崴. "An Adaptive Real-Time MapReduce Framework Based on Locutus and Borg-Tree." Thesis, 2013. http://ndltd.ncl.edu.tw/handle/32093140696316430010.

Full text

Abstract:

碩士<br>國立臺北教育大學<br>資訊科學系碩士班<br>101<br>Google has released the design of MapReduce since 2004. After years of development, finally Apache has lunched Hadoop version 1.0 at 2011, and it means the open source resources of MapReduce is enough supporting the applications of business. But somehow, there are still some features unsatisfied for big data processing. First is the supporting of real-time computing, and the other is the cross-platform deployment and ease of use. In this thesis, we analyzed the bottleneck of Hadoop performance and try to solve it, and hoping to develop an easy Real-Time Com

APA, Harvard, Vancouver, ISO, and other styles

35

Chung, Wei-Chun, and 鐘緯駿. "Algorithms for Correcting Next-Generation Sequencing Errors Based on MapReduce Big Data Framework." Thesis, 2017. http://ndltd.ncl.edu.tw/handle/k9jnna.

Full text

Abstract:

博士<br>國立臺灣大學<br>資訊工程學研究所<br>105<br>The rapid advancement of next-generation sequencing (NGS) technology has generated an explosive growth of ultra-large-scale data and computational problems, particularly in de novo genome assembly. Greater sequencing depths and increasingly longer reads have introduced numerous errors, which increase the probability of misassembly. The huge amounts of data cause severely high disk I/O overhead and lead to an unexpectedly long execution time. To speed up the time-consuming assembly processes without affecting its quality and to address problems pertaining to e

APA, Harvard, Vancouver, ISO, and other styles

36

Chin, Bing-Da, and 秦秉達. "Design of Parallel Binary Classification Algorithm Based on Hadoop Cluster with MapReduce Framework." Thesis, 2015. http://ndltd.ncl.edu.tw/handle/fu84aw.

Full text

Abstract:

碩士<br>國立臺中科技大學<br>資訊工程系碩士班<br>103<br>With increased amount data today,it is hard to analyze large data on single computer environment efficiently,the hadoop cluster is very important because we can save and large data by hadoop cluster. Data mining plays an important role of data analysis.Because time complexity of the binary-class classification SVM algorithm is a big issue,we design a parallel binary SVM algorithm to slove this problem,and achieve the effect of classifying appropriate data. By leveraging the parallel processing property in MapReduce ,we implement multi-layer binary SVM by

APA, Harvard, Vancouver, ISO, and other styles

37

Rosen, Andrew. "Towards a Framework for DHT Distributed Computing." 2016. http://scholarworks.gsu.edu/cs_diss/107.

Full text

Abstract:

Distributed Hash Tables (DHTs) are protocols and frameworks used by peer-to-peer (P2P) systems. They are used as the organizational backbone for many P2P file-sharing systems due to their scalability, fault-tolerance, and load-balancing properties. These same properties are highly desirable in a distributed computing environment, especially one that wants to use heterogeneous components. We show that DHTs can be used not only as the framework to build a P2P file-sharing service, but as a P2P distributed computing platform. We propose creating a P2P distributed computing framework using distrib

APA, Harvard, Vancouver, ISO, and other styles

38

Tsung-ChihHuang and 黃宗智. "The Design and Implementation of the MapReduce Framework based on OpenCL in GPU Environment." Thesis, 2013. http://ndltd.ncl.edu.tw/handle/44218540678222651423.

Full text

Abstract:

碩士<br>國立成功大學<br>電腦與通信工程研究所<br>101<br>With the advances and evolution of technology, General Purpose Computation on the GPU(GPGPU) was put forward due to the excellent performance of GPU in parallel computing. This thesis presents the design and implemention of a MapReduce software framework which is based on Open Computing Language(OpenCL) in GPU environment. For those users who develop parallel application software using OpenCL, this framework provides an alternative which can simplify the process of development and can implement the complicate details of parallel computing easily. Therefore,

APA, Harvard, Vancouver, ISO, and other styles

39

Lin, Jia-Chun, and 林佳純. "Study of Job Execution Performance, Reliability, Energy Consumption, and Fault Tolerance in the MapReduce Framework." Thesis, 2015. http://ndltd.ncl.edu.tw/handle/48439755319437778819.

Full text

Abstract:

博士<br>國立交通大學<br>資訊科學與工程研究所<br>103<br>Node/machine failure is the norm rather than an exception in a large-scale MapReduce cluster. To prevent jobs from being interrupted by machine/node failures, MapReduce has employed several policies, such as task-reexecution policy, intermediate-data replication policy, reduce-task assignment policy. However, the impacts of these policies on MapReduce jobs are not clear, especially in terms of Job Completion Reliability (JCR for short), Job Turnaround Time (JTT for short), and Job Energy Consumption (JEC for short). In this dissertation, JCR is the reliabil

APA, Harvard, Vancouver, ISO, and other styles

40

Roy, Sukanta. "Automated methods of natural resource mapping with Remote Sensing Big data in Hadoop MapReduce framework." Thesis, 2022. https://etd.iisc.ac.in/handle/2005/5836.

Full text

Abstract:

For several decades, remote sensing (RS) tools have provided platforms for the large-scale exploration of natural resources across the planetary bodies of our solar system. In the context of Indian remote sensing, mineral resources are being explored, and mangrove resources are being monitored towards a sustainable socio-economic structure and coastal eco-system, respectively, by utilising several remote analytical techniques. However, RS technologies and the corresponding data analytics have made a vast paradigm shift, which eventually has produced “RS Big data” in our scientific world of lar

APA, Harvard, Vancouver, ISO, and other styles

41

Huu, Tinh Giang Nguyen, and 阮有淨江. "Design and Implement a MapReduce Framework for Converting Standalone Software Packages to Hadoop-based Distributed Environments." Thesis, 2013. http://ndltd.ncl.edu.tw/handle/20649990806109007865.

Full text

Abstract:

碩士<br>國立成功大學<br>製造資訊與系統研究所碩博士班<br>101<br>The Hadoop MapReduce is the programming model of designing the auto scalable distributed computing applications. It provides developer an effective environment to attain automatic parallelization. However, most existing manufacturing systems are arduous and restrictive to migrate to MapReduce private cloud, due to the platform incompatible and tremendous complexity of system reconstruction. For increasing the efficiency of manufacturing systems with minimum modification of existing systems, we design a framework in this thesis, called MC-Framework: Mult

APA, Harvard, Vancouver, ISO, and other styles

42

Lo, Chia-Huai, and 駱家淮. "Constructing Suffix Array and Longest-Common-Prefix Array for Next-Generation-Sequencing Data Using MapReduce Framework." Thesis, 2015. http://ndltd.ncl.edu.tw/handle/71000795009259045140.

Full text

Abstract:

碩士<br>國立臺灣大學<br>資訊工程學研究所<br>103<br>Next-generation sequencing (NGS) data is rapidly growing and represents a source of varieties of new knowledge in science. State-of-the-art sequencers, such as HiSeq 2500, can generate up to 1 trillion base-pairs of sequencing data in 6 days, with good quality at low cost. In genome sequencing projects today, the NGS data size often ranges from tens of billions base-pairs to several hundreds of billions base-pairs. It is time-consuming to process such a big set of NGS data, especially for applications based on sequence alignment, e.g., de novo genome assembly

APA, Harvard, Vancouver, ISO, and other styles

43

Chou, Chien-Ting, and 周建廷. "Research on The Computing of Direct Geo Morphology Runoff on Hadoop Cluster by Using MapReduce Framework." Thesis, 2011. http://ndltd.ncl.edu.tw/handle/13575176515358582342.

Full text

Abstract:

碩士<br>國立臺灣師範大學<br>資訊工程研究所<br>99<br>Because of the weather and landform in Taiwan, a heavy rain often cause sudden rising of the runoff of some basins, even lead to serious disaster. That makes flood information system are highly relied in Taiwan especially in typhoon season. Computing the runoff of a basin is the most important module of flood information system for checking whether the runoff exceeds warning level or not. However this module is complicated and data-intensive, it becomes the bottleneck when the real-time information are needed while a typhoon is attacking the basins. The devel

APA, Harvard, Vancouver, ISO, and other styles

44

Chrimes, Dillon. "Towards a big data analytics platform with Hadoop/MapReduce framework using simulated patient data of a hospital system." Thesis, 2016. http://hdl.handle.net/1828/7645.

Full text

Abstract:

Background: Big data analytics (BDA) is important to reduce healthcare costs. However, there are many challenges. The study objective was high performance establishment of interactive BDA platform of hospital system. Methods: A Hadoop/MapReduce framework formed the BDA platform with HBase (NoSQL database) using hospital-specific metadata and file ingestion. Query performance tested with Apache tools in Hadoop’s ecosystem. Results: At optimized iteration, Hadoop distributed file system (HDFS) ingestion required three seconds but HBase required four to twelve hours to complete the Reducer

APA, Harvard, Vancouver, ISO, and other styles

45

(9530630), Akshay Jajoo. "EXPLOITING THE SPATIAL DIMENSION OF BIG DATA JOBS FOR EFFICIENT CLUSTER JOB SCHEDULING." Thesis, 2020.

Find full text

Abstract:

With the growing business impact of distributed big data analytics jobs, it has become crucial to optimize their execution and resource consumption. In most cases, such jobs consist of multiple sub-entities called tasks and are executed online in a large shared distributed computing system. The ability to accurately estimate runtime properties and coordinate execution of sub-entities of a job allows a scheduler to efficiently schedule jobs for optimal scheduling. This thesis presents the first study that highlights spatial dimension, an inherent property of distributed jobs, and underscores it

APA, Harvard, Vancouver, ISO, and other styles

We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!