Log in

Relevant bibliographies by topics / Hadoop MapReduce

Contents

Journal articles
Dissertations / Theses
Books
Book chapters
Conference papers

Academic literature on the topic 'Hadoop MapReduce'

Author: Grafiati

Published: 4 June 2021

Last updated: 16 June 2025

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the lists of relevant articles, books, theses, conference reports, and other scholarly sources on the topic 'Hadoop MapReduce.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Journal articles on the topic "Hadoop MapReduce"

1

Park, Jeong-Hyeok, Sang-Yeol Lee, Da Hyun Kang, and Joong-Ho Won. "Hadoop and MapReduce." Journal of the Korean Data and Information Science Society 24, no. 5 (2013): 1013–27. http://dx.doi.org/10.7465/jkdi.2013.24.5.1013.

Full text

APA, Harvard, Vancouver, ISO, and other styles

2

Tahsir Ahmed Munna, Md, Shaikh Muhammad Allayear, Mirza Mohtashim Alam, Sheikh Shah Mohammad Motiur Rahman, Md Samadur Rahman, and M. Mesbahuddin Sarker. "Simplified Mapreduce Mechanism for Large Scale Data Processing." International Journal of Engineering & Technology 7, no. 3.8 (2018): 16. http://dx.doi.org/10.14419/ijet.v7i3.8.15211.

Full text

Abstract:

MapReduce has become a popular programming model for processing and running large-scale data sets with a parallel, distributed paradigm on a cluster. Hadoop MapReduce is needed especially for large scale data like big data processing. In this paper, we work to modify the Hadoop MapReduce Algorithm and implement it to reduce processing time.

APA, Harvard, Vancouver, ISO, and other styles

3

Wibawa, Condro, Setia Wirawan, Metty Mustikasari, and Dessy Tri Anggraeni. "KOMPARASI KECEPATAN HADOOP MAPREDUCE DAN APACHE SPARK DALAM MENGOLAH DATA TEKS." Jurnal Ilmiah Matrik 24, no. 1 (2022): 10–20. http://dx.doi.org/10.33557/jurnalmatrik.v24i1.1649.

Full text

Abstract:

Istilah Big Data saat ini bukanlah hal yang baru lagi. Salah satu komponen Big Data adalah jumlah data yang masif, yang membuat data tidak bisa diproses dengan cara-cara tradicional. Untuk menyelesaikan masalah ini, dikembangkanlah metode Map Reduce. Map Reduce adalah metode pengolahan data dengan memecah data menjadi bagian-bagian kecil (mapping) dan kemudian hasilnya dijadikan satu kembali (reducing). Framework Map Reduce yang banyak digunakan adalah Hadoop MapReduce dan Apache Spark. Konsep kedua framework ini sama akan tetapi berbeda dalam pengelolaan sumber data. Hadoop MapReduce menggunakan pendekatan HDFS (disk), sedangkan Apache Spark menggunakan RDD (in-memory). Penggunaan RDD pada Apache Spark membuat kinerja framework ini lebih cepat dibandingkan Hadoop MapReduce. Hal ini dibutktikan dalam penelitian ini, dimana untuk mengolah data teks yang sama, kecepatan rata-rata Apache Spark adalah 4,99 kali lebih cepat dibandingkan Hadoop MapReduce.

APA, Harvard, Vancouver, ISO, and other styles

4

Adawiyah, Robiyatul, and Sirojul Munir. "Analisis Kecepatan Algoritma MapReduce Word Count Pada Cluster Hadoop Studi Kasus Pada Global Dataset of Events, Language and Tone (GDELT)." Jurnal Informatika Terpadu 6, no. 1 (2020): 14–19. http://dx.doi.org/10.54914/jit.v6i1.214.

Full text

Abstract:

Penelitian diajukan untuk menganalisis kecepatan algoritma mapreduce pada cluster Hadoop dan mengetahui waktu yang dibutuhkan dalam mengolah data GDELT pada Hadoop. Penelitian ini menggunakan metode analisis kualitatif. Berdasarkan analisa data yang telah dilakukan, diperoleh kesimpulan bahwa algoritma Word Count yang diterapkan pada data set GDELT dapat berjalan pada cluster Hadoop. Kecepatan algoritma Word Count pada MapReduce yang diterapkan untuk data set GDELT pada hadoop berpengaruh apabila node yang digunakan ditambah, dimana dalam penelitian menggunakan sebanyak 2 node physical machine. Hadoop dapat mengolah data yang memiliki ukuran besar dan banyak karena Hadoop mengolah data secara terdistribusi. Kecepatan Hadoop dapat diatur dengan menambahkan node dan juga pengaturan lainnya seperti halnya block size.

APA, Harvard, Vancouver, ISO, and other styles

5

Sontakke, Vaishali, and Dayananda R. B. "Memory aware optimized Hadoop MapReduce model in cloud computing environment." IAES International Journal of Artificial Intelligence (IJ-AI) 12, no. 3 (2023): 1270. http://dx.doi.org/10.11591/ijai.v12.i3.pp1270-1280.

Full text

Abstract:

<p>In the last decade, data analysis has become one of the popular tasks due to enormous growth in data every minute through different applications and instruments. MapReduce is the most popular programming model for data processing. Hadoop constitutes two basic models i.e., Hadoop file system (HDFS) and MapReduce, Hadoop is used for processing a huge amount of data whereas MapReduce is used for data processing. Hadoop MapReduce is one of the best platforms for processing huge data in an efficient manner such as processing web logs data. However, existing model This research work proposes memory aware optimized Hadoop MapReduce (MA-OHMR). MA-OHMR is developed considering memory as the constraint and prioritizes memory allocation and revocation in mapping, shuffling, and reducing, this further enhances the job of mapping and reducing. Optimal memory management and I/O operation are carried out to use the resource inefficiently manner. The model utilizes the global memory management to avoid garbage collection and MA-OHMR is optimized on the makespan front to reduce the same. MA-OHMR is evaluated considering two datasets i.e., simple workload of Wikipedia dataset and complex workload of sensor dataset considering makespan and cost as an evaluation parameter.</p>

APA, Harvard, Vancouver, ISO, and other styles

6

Sahu, kapil, Kaveri Bhatt, Prof Amit Saxena, and Kaptan Singh. "Implementation of Big-Data Applications Using Map Reduce Framework." International Journal of Engineering and Computer Science 9, no. 08 (2020): 25125–31. http://dx.doi.org/10.18535/ijecs/v9i08.4504.

Full text

Abstract:

Clustering As a result of the rapid development in cloud computing, it & fundamental to investigate the performance of extraordinary Hadoop MapReduce purposes and to realize the performance bottleneck in a cloud cluster that contributes to higher or diminish performance. It is usually primary to research the underlying hardware in cloud cluster servers to permit the optimization of program and hardware to achieve the highest performance feasible. Hadoop is founded on MapReduce, which is among the most popular programming items for huge knowledge analysis in a parallel computing environment. In this paper, we reward a particular efficiency analysis, characterization, and evaluation of Hadoop MapReduce Word Count utility. The main aim of this paper is to give implements of Hadoop map-reduce programming by giving a hands-on experience in developing Hadoop based Word-Count and Apriori application. Word count problem using Hadoop Map Reduce framework. The Apriori Algorithm has been used for finding frequent item set using Map Reduce framework.

APA, Harvard, Vancouver, ISO, and other styles

7

Park, Jong-Hyuk, Hwa-Young Jeong, Young-Sik Jeong, and Min Choi. "REST-MapReduce: An Integrated Interface but Differentiated Service." Journal of Applied Mathematics 2014 (2014): 1–10. http://dx.doi.org/10.1155/2014/170723.

Full text

Abstract:

With the fast deployment of cloud computing, MapReduce architectures are becoming the major technologies for mobile cloud computing. The concept of MapReduce was first introduced as a novel programming model and implementation for a large set of computing devices. In this research, we propose a novel concept of REST-MapReduce, enabling users to use only the REST interface without using the MapReduce architecture. This approach provides a higher level of abstraction by integration of the two types of access interface, REST API and MapReduce. The motivation of this research stems from the slower response time for accessing simple RDBMS on Hadoop than direct access to RDMBS. This is because there is overhead to job scheduling, initiating, starting, tracking, and management during MapReduce-based parallel execution. Therefore, we provide a good performance for REST Open API service and for MapReduce, respectively. This is very useful for constructing REST Open API services on Hadoop hosting services, for example, Amazon AWS (Macdonald, 2005) or IBM Smart Cloud. For evaluating performance of our REST-MapReduce framework, we conducted experiments with Jersey REST web server and Hadoop. Experimental result shows that our approach outperforms conventional approaches.

APA, Harvard, Vancouver, ISO, and other styles

8

Chen, Donghua, and Runtong Zhang. "MapReduce-Based Dynamic Partition Join with Shannon Entropy for Data Skewness." Scientific Programming 2021 (November 24, 2021): 1–15. http://dx.doi.org/10.1155/2021/1602767.

Full text

Abstract:

Join operations of data sets play a crucial role in obtaining the relations of massive data in real life. Joining two data sets with MapReduce requires a proper design of the Map and Reduce stages for different scenarios. The factors affecting MapReduce join efficiency include the density of the data sets and data transmission over clusters like Hadoop. This study aims to improve the efficiency of MapReduce join algorithms on Hadoop by leveraging Shannon entropy to measure the information changes of data sets being joined in different MapReduce stages. To reduce the uncertainty of data sets in joins through the network, a novel MapReduce join algorithm with dynamic partition strategies called dynamic partition join (DPJ) is proposed. Leveraging the changes of entropy in the partitions of data sets during the Map and Reduce stages revises the logical partitions by changing the original input of the reduce tasks in the MapReduce jobs. Experimental results indicate that the entropy-based measures can measure entropy changes of join operations. Moreover, the DPJ variant methods achieved lower entropy compared with the existing joins, thereby increasing the feasibility of MapReduce join operations for different scenarios on Hadoop.

APA, Harvard, Vancouver, ISO, and other styles

9

Geetha J., Uday Bhaskar N, and Chenna Reddy P. "An Analytical Approach for Optimizing the Performance of Hadoop Map Reduce Over RoCE." International Journal of Information Communication Technologies and Human Development 10, no. 2 (2018): 1–14. http://dx.doi.org/10.4018/ijicthd.2018040101.

Full text

Abstract:

Data intensive systems aim to efficiently process “big” data. Several data processing engines have evolved over past decade. These data processing engines are modeled around the MapReduce paradigm. This article explores Hadoop's MapReduce engine and propose techniques to obtain a higher level of optimization by borrowing concepts from the world of High Performance Computing. Consequently, power consumed and heat generated is lowered. This article designs a system with a pipelined dataflow in contrast to the existing unregulated “bursty” flow of network traffic, the ability to carry out both Map and Reduce tasks in parallel, and a system which incorporates modern high-performance computing concepts using Remote Direct Memory Access (RDMA). To establish the claim of an increased performance measure of the proposed system, the authors provide an algorithm for RoCE enabled MapReduce and a mathematical derivation contrasting the runtime of vanilla Hadoop. This article proves mathematically, that the proposed system functions 1.67 times faster than the vanilla version of Hadoop.

APA, Harvard, Vancouver, ISO, and other styles

10

E. Laxmi Lydia, Dr, and M. Srinivasa Rao. "Applying compression algorithms on hadoop cluster implementing through apache tez and hadoop mapreduce." International Journal of Engineering & Technology 7, no. 2.26 (2018): 80. http://dx.doi.org/10.14419/ijet.v7i2.26.12539.

Full text

Abstract:

The latest and famous subject all over the cloud research area is Big Data; its main appearances are volume, velocity and variety. The characteristics are difficult to manage through traditional software and their various available methodologies. To manage the data which is occurring from various domains of big data are handled through Hadoop, which is open framework software which is mainly developed to provide solutions. Handling of big data analytics is done through Hadoop Map Reduce framework and it is the key engine of hadoop cluster and it is extensively used in these days. It uses batch processing system.Apache developed an engine named "Tez", which supports interactive query system and it won't writes any temporary data into the Hadoop Distributed File System(HDFS).The paper mainly focuses on performance juxtaposition of MapReduce and TeZ, performance of these two engines are examined through the compression of input files and map output files. To compare two engines we used Bzip compression algorithm for the input files and snappy for the map out files. Word Count and Terasort gauge are used on our experiments. For the Word Count gauge, the results shown that Tez engine has better execution time than Hadoop MapReduce engine for the both compressed and non-compressed data. It has reduced the execution time nearly 39% comparing to the execution time of the Hadoop MapReduce engine. Correspondingly for the terasort gauge, the Tez engine has higher execution time than Hadoop MapReduce engine.

APA, Harvard, Vancouver, ISO, and other styles

Dissertations / Theses on the topic "Hadoop MapReduce"

1

Nilsson, Johan. "Hadoop MapReduce in Eucalyptus Private Cloud." Thesis, Umeå universitet, Institutionen för datavetenskap, 2011. http://urn.kb.se/resolve?urn=urn:nbn:se:umu:diva-51309.

Full text

Abstract:

This thesis investigates how setting up a private cloud using the Eucalyptus Cloud system could be done along with it's usability, requirements and limitations as an open-source cloud platform providing private cloud solutions. It also studies if using the MapReduce framework through Apache Hadoop's implementation on top of the private Eucalyptus Cloud can provide near linear scalability in terms of time and the amount of virtual machines in the cluster. Analysis has shown that Eucalyptus is lacking in a few usability areas when setting up the cloud infrastructure in terms of private networking and DNS lookups, yet the API that Eucalyptus provides gives benefits when migrating from public clouds like Amazon. The MapReduce framework is showing an initial near-linear relation which is declining when the amount of virtual machines is reaching the maximum of the cloud infrastructure.

APA, Harvard, Vancouver, ISO, and other styles

2

Raja, Anitha. "A Coordination Framework for Deploying Hadoop MapReduce Jobs on Hadoop Cluster." Thesis, KTH, Skolan för informations- och kommunikationsteknik (ICT), 2016. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-196951.

Full text

Abstract:

Apache Hadoop is an open source framework that delivers reliable, scalable, and distributed computing. Hadoop services are provided for distributed data storage, data processing, data access, and security. MapReduce is the heart of the Hadoop framework and was designed to process vast amounts of data distributed over a large number of nodes. MapReduce has been used extensively to process structured and unstructured data in diverse fields such as e-commerce, web search, social networks, and scientific computation. Understanding the characteristics of Hadoop MapReduce workloads is the key to achieving improved configurations and refining system throughput. Thus far, MapReduce workload characterization in a large-scale production environment has not been well studied. In this thesis project, the focus is mainly on composing a Hadoop cluster (as an execution environment for data processing) to analyze two types of Hadoop MapReduce (MR) jobs via a proposed coordination framework. This coordination framework is referred to as a workload translator. The outcome of this work includes: (1) a parametric workload model for the target MR jobs, (2) a cluster specification to develop an improved cluster deployment strategy using the model and coordination framework, and (3) better scheduling and hence better performance of jobs (i.e. shorter job completion time). We implemented a prototype of our solution using Apache Tomcat on (OpenStack) Ubuntu Trusty Tahr, which uses RESTful APIs to (1) create a Hadoop cluster version 2.7.2 and (2) to scale up and scale down the number of workers in the cluster. The experimental results showed that with well tuned parameters, MR jobs can achieve a reduction in the job completion time and improved utilization of the hardware resources. The target audience for this thesis are developers. As future work, we suggest adding additional parameters to develop a more refined workload model for MR and similar jobs.<br>Apache Hadoop är ett öppen källkods system som levererar pålitlig, skalbar och distribuerad användning. Hadoop tjänster hjälper med distribuerad data förvaring, bearbetning, åtkomst och trygghet. MapReduce är en viktig del av Hadoop system och är designad att bearbeta stora data mängder och även distribuerad i flera leder. MapReduce är använt extensivt inom bearbetning av strukturerad och ostrukturerad data i olika branscher bl. a e-handel, webbsökning, sociala medier och även vetenskapliga beräkningar. Förståelse av MapReduces arbetsbelastningar är viktiga att få förbättrad konfigurationer och resultat. Men, arbetsbelastningar av MapReduce inom massproduktions miljö var inte djup-forskat hittills. I detta examensarbete, är en hel del fokus satt på ”Hadoop cluster” (som en utförande miljö i data bearbetning) att analysera två typer av Hadoop MapReduce (MR) arbeten genom ett tilltänkt system. Detta system är refererad som arbetsbelastnings översättare. Resultaten från denna arbete innehåller: (1) en parametrisk arbetsbelastningsmodell till inriktad MR arbeten, (2) en specifikation att utveckla förbättrad kluster strategier med båda modellen och koordinations system, och (3) förbättrad planering och arbetsprestationer, d.v.s kortare tid att utföra arbetet. Vi har realiserat en prototyp med Apache Tomcat på (OpenStack) Ubuntu Trusty Tahr som använder RESTful API (1) att skapa ”Hadoop cluster” version 2.7.2 och (2) att båda skala upp och ner antal medarbetare i kluster. Forskningens resultat har visat att med vältrimmad parametrar, kan MR arbete nå förbättringar dvs. sparad tid vid slutfört arbete och förbättrad användning av hårdvara resurser. Målgruppen för denna avhandling är utvecklare. I framtiden, föreslår vi tilläggning av olika parametrar att utveckla en allmän modell för MR och liknande arbeten.

APA, Harvard, Vancouver, ISO, and other styles

3

Deolikar, Piyush P. "Lecture Video Search Engine Using Hadoop MapReduce." Thesis, California State University, Long Beach, 2017. http://pqdtopen.proquest.com/#viewpdf?dispub=10638908.

Full text

Abstract:

<p>With the advent of the Internet and ease of uploading video content over video libraries and social networking sites, the video data availability was increased very rapidly during this decade. Universities are uploading video tutorials in the online courses. Companies like Udemy, coursera, Lynda, etc. made video tutorials available over the Internet. We propose and implement a scalable solution, which helps to find relevant videos with respect to a query provided by the user. Our solution maintains an updated list of the available videos on the web and assigns a rank according to their relevance. The proposed solution consists of three main components that can mutually interact. The first component, called the crawler, continuously visits and locally stores the relevant information of all the webpages with videos available on the Internet. The crawler has several threads, concurrently parsing webpages. The second component obtains the inverted index of the web pages stored by the crawler. Given a query, the inverted index is used to obtain the videos that contain the words in the query. The third component computes the rank of the video. This rank is then used to display the results in the order of relevance. We implement a scalable solution in the Apache Hadoop Framework. Hadoop is a distributed operating system that provides a distributed file system able to handle large files as well as distributed computation among the participants.

APA, Harvard, Vancouver, ISO, and other styles

4

Темирбекова, Ж. Е., та Ж. М. Меренбаев. "Параллельное масштабирование изображений в технологии mapreduce hadoop". Thesis, Сумский государственный университет, 2015. http://essuir.sumdu.edu.ua/handle/123456789/40775.

Full text

Abstract:

Цифровая обработка изображений находит широкое применение практически во всех областях промышленности. Часто еѐ использование позволяет выйти на качественно новый технологический уровень производства. При этом наиболее сложными являются вопросы, связанные с автоматическим извлечением информации из изображения и ее интерпретацией, являющейся основой для принятия решений в процессе управления производственными процессами.

APA, Harvard, Vancouver, ISO, and other styles

5

Wu, Yuanyuan. "HADOOP-EDF: LARGE-SCALE DISTRIBUTED PROCESSING OF ELECTROPHYSIOLOGICAL SIGNAL DATA IN HADOOP MAPREDUCE." UKnowledge, 2019. https://uknowledge.uky.edu/cs_etds/88.

Full text

Abstract:

The rapidly growing volume of electrophysiological signals has been generated for clinical research in neurological disorders. European Data Format (EDF) is a standard format for storing electrophysiological signals. However, the bottleneck of existing signal analysis tools for handling large-scale datasets is the sequential way of loading large EDF files before performing an analysis. To overcome this, we develop Hadoop-EDF, a distributed signal processing tool to load EDF data in a parallel manner using Hadoop MapReduce. Hadoop-EDF uses a robust data partition algorithm making EDF data parallel processable. We evaluate Hadoop-EDF’s scalability and performance by leveraging two datasets from the National Sleep Research Resource and running experiments on Amazon Web Service clusters. The performance of Hadoop-EDF on a 20-node cluster improves 27 times and 47 times than sequential processing of 200 small-size files and 200 large-size files, respectively. The results demonstrate that Hadoop-EDF is more suitable and effective in processing large EDF files.

APA, Harvard, Vancouver, ISO, and other styles

6

Čecho, Jaroslav. "Optimalizace platformy pro distribuované výpočty Hadoop." Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2012. http://www.nusl.cz/ntk/nusl-236464.

Full text

Abstract:

This thesis is focusing on possibilities of improving the Apache Hadoop framework by outsourcing some computation to a graphic card using the NVIDIA CUDA technology. The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using a simple programming model called mapreduce. NVIDIA CUDA is a platform which allows one to use a graphic card for a general computation. This thesis contains description and experimental implementations of suitable computation inside te Hadoop framework that can benefit from being executed on a graphic card.

APA, Harvard, Vancouver, ISO, and other styles

7

Yee, Adam J. "Sharing the love : a generic socket API for Hadoop Mapreduce." Scholarly Commons, 2011. https://scholarlycommons.pacific.edu/uop_etds/772.

Full text

Abstract:

Hadoop is a popular software framework written in Java that performs data-intensive distributed computations on a cluster. It includes Hadoop MapReduce and the Hadoop Distributed File System (HDFS). HDFS has known scalability limitations due to its single NameNode which holds the entire file system namespace in RAM on one computer. Therefore, the NameNode can only store limited amounts of file names depending on the RAM capacity. The solution to furthering scalability is distributing the namespace similar to how file is data divided into chunks and stored across cluster nodes. Hadoop has an abstract file system API which is extended to integrate HDFS, but has also been extended for integrating file systems S3, CloudStore, Ceph and PVFS. File systems Ceph and PVFS already distribute the namespace, while others such as Lustre are making the conversion. Google previously announced in 2009 they have been implementing a Google File System distributed namespace to achieve greater scalability. The Generic Hadoop API is created from Hadoop's abstract file system API. It speaks a simple communication protocol that can integrate any file system which supports TCP sockets. By providing a file system agnostic API, future work with other file systems might provide ways for surpassing Hadoop 's current scalability limitations. Furthermore, the new API eliminates the need for customizing Hadoop's Java implementation, and instead moves the implementation to the file system itself. Thus, developers wishing to integrate their new file system with Hadoop are not responsible for understanding details ofHadoop's internal operation. The API is tested on a homogeneous, four-node cluster with OrangeFS. Initial OrangeFS I/0 throughputs compared to HDFS are 67% ofHDFS' write throughput and 74% percent of HDFS' read throughput. But, compared with an alternate method of integrating with OrangeFS (a POSIX kernel interface), write and read throughput is increased by 23% and 7%, respectively

APA, Harvard, Vancouver, ISO, and other styles

8

Wang, Guanying. "Evaluating MapReduce System Performance: A Simulation Approach." Diss., Virginia Tech, 2012. http://hdl.handle.net/10919/28820.

Full text

Abstract:

Scale of data generated and processed is exploding in the Big Data era. The MapReduce system popularized by open-source Hadoop is a powerful tool for the exploding data problem, and is widely employed in many areas involving large scale of data. In many circumstances, hypothetical MapReduce systems must be evaluated, e.g. to provision a new MapReduce system to provide certain performance goal, to upgrade a currently running system to meet increasing business demands, to evaluate novel network topology, new scheduling algorithms, or resource arrangement schemes. The traditional trial-and-error solution involves the time-consuming and costly process in which a real cluster is first built and then benchmarked. In this dissertation, we propose to simulate MapReduce systems and evaluate hypothetical MapReduce systems using simulation. This simulation approach offers significantly lower turn-around time and lower cost than experiments. Simulation cannot entirely replace experiments, but can be used as a preliminary step to reveal potential flaws and gain critical insights. We studied MapReduce systems in detail and developed a comprehensive performance model for MapReduce, including sub-task phase level performance models for both map and reduce tasks and a model for resource contention between multiple processes running in concurrent. Based on the performance model, we developed a comprehensive simulator for MapReduce, MRPerf. MRPerf is the first full-featured MapReduce simulator. It supports both workload simulation and resource contention, and it still offers the most complete features among all MapReduce simulators to date. Using MRPerf, we conducted two case studies to evaluate scheduling algorithms in MapReduce and shared storage in MapReduce, without building real clusters. Furthermore, in order to further integrate simulation and performance prediction into MapReduce systems and leverage predictions to improve system performance, we developed online prediction framework for MapReduce, which periodically runs simulations within a live Hadoop MapReduce system. The framework can predict task execution within a window in near future. These predictions can be used by other components in MapReduce systems in order to improve performance. Our results show that the framework can achieve high prediction accuracy and incurs negligible overhead. We present two potential use cases, prefetching and dynamic adapting scheduler.<br>Ph. D.

APA, Harvard, Vancouver, ISO, and other styles

9

Rocha, Fabiano da Guia. "Análise de escalabilidade de aplicações Hadoop/Mapreduce por meio de simulação." Universidade Federal de São Carlos, 2013. https://repositorio.ufscar.br/handle/ufscar/534.

Full text

Abstract:

Made available in DSpace on 2016-06-02T19:06:06Z (GMT). No. of bitstreams: 1 5351.pdf: 2740873 bytes, checksum: e4ce3a33279ffb7afccf2fc418af0f79 (MD5) Previous issue date: 2013-02-04<br>During the last years we have witnessed a significant growing in the amount of data processed in a daily basis by companies, universities, and other institutions. Many use cases report processing of data volumes of petabytes in thousands of cores by a single application. MapReduce is a programming model, and a framework for the execution of applications which manipulate large data volumes in machines composed of thousands of processors/cores. Currently, Hadoop is the most widely adopted free implementation of MapReduce. Although there are reports in the literature about the use of MapReduce applications on platforms with more than one hundred cores, the scalability is not stressed and much remain to be studied in this field. One of the main challenges in the scalability study of MapReduce applications is the large number of configuration parameters of Hadoop. There are reports in the literature that mention more than 190 configuration parameters, 25 of which are known to impact the application performance in a significant way. In this work we study the scalability of MapReduce applications running on Hadoop. Due to the limited number of processors/cores available, we adopted a combined approach involving both experimentation and simulation. The experimentation has been carried out in a local cluster of 32 nodes, and for the simulation we have used MRSG (MapReduce Over SimGrid). In a first set of experiments, we identify the most impacting parameters in the performance and scalability of the applications. Then, we present a method for calibrating the simulator. With the calibrated simulator, we evaluated the scalability of one well-optimized application on larger clusters, with up to 10 thousands of nodes.<br>Durante os últimos anos, houve um significativo crescimento na quantidade de dados processados diariamente por companhias, universidades e outras instituições. Mapreduce é um modelo de programação e um framework para a execução de aplicações que manipulam grandes volumes de dados em máquinas compostas por milhares de processadores ou núcleos. Atualmente, o Hadoop é a implementação como software livre de Mapreduce mais largamente adotada. Embora existam relatos na literatura sobre o uso de aplicações Mapreduce em plataformas com cerca de quatro mil núcleos processando dados da ordem de dezenas de petabytes, o estudo dos limites de escalabilidade não foi esgotado e muito ainda resta a ser estudado. Um dos principais desafios no estudo de escalabilidade de aplicações Mapreduce é o grande número de parâmetros de configuração da aplicação e do ambiente Hadoop. Na literatura há relatos que mencionam mais de 190 parâmetros de configuração, sendo que 25 podem afetar de maneira significativa o desempenho da aplicação. Este trabalho contém um estudo sobre a escalabilidade de aplicações Mapreduce executadas na plataforma Hadoop. Devido ao número limitado de processadores disponíveis, adotou-se uma abordagem que combina experimentação e simulação. A experimentação foi realizada em um cluster local de 32 nós (com 64 processadores), e para a simulação empregou-se o simulador MRSG (MapReduce Over SimGrid). Como principais resultados, foram identificados os parâmetros de maior impacto no desempenho e na escalabilidade das aplicações. Esse resultado foi obtido por meio de simulação. Além disso, apresentou-se um método para a calibração do simulador MRSG, em função de uma aplicação representativa escolhida como benchmark. Com o simulador calibrado, avaliou-se a escalabilidade de uma aplicação bem otimizada. O simulador calibrado permitiu obter uma predição sobre a escalabilidade da aplicação para uma plataforma com até 10 mil nós.

APA, Harvard, Vancouver, ISO, and other styles

10

Liu, Xuan. "An Ensemble Method for Large Scale Machine Learning with Hadoop MapReduce." Thèse, Université d'Ottawa / University of Ottawa, 2014. http://hdl.handle.net/10393/30702.

Full text

Abstract:

We propose a new ensemble algorithm: the meta-boosting algorithm. This algorithm enables the original Adaboost algorithm to improve the decisions made by different WeakLearners utilizing the meta-learning approach. Better accuracy results are achieved since this algorithm reduces both bias and variance. However, higher accuracy also brings higher computational complexity, especially on big data. We then propose the parallelized meta-boosting algorithm: Parallelized-Meta-Learning (PML) using the MapReduce programming paradigm on Hadoop. The experimental results on the Amazon EC2 cloud computing infrastructure show that PML reduces the computation complexity enormously while retaining lower error rates than the results on a single computer. As we know MapReduce has its inherent weakness that it cannot directly support iterations in an algorithm, our approach is a win-win method, since it not only overcomes this weakness, but also secures good accuracy performance. The comparison between this approach and a contemporary algorithm AdaBoost.PL is also performed.

APA, Harvard, Vancouver, ISO, and other styles

Books on the topic "Hadoop MapReduce"

1

Kumar, Vavilapalli Vinod, Eadline Doug 1956-, Niemiec Joseph, and Markham Jeff, eds. Apache Hadoop YARN: Moving beyond MapReduce and batch processing with Apache Hadoop 2. Addison-Wesley, 2014.

Find full text

APA, Harvard, Vancouver, ISO, and other styles

2

Hadoop MapReduce Cookbook. Packt Publishing, 2013.

Find full text

APA, Harvard, Vancouver, ISO, and other styles

3

Enterprise Hadoop and MapReduce. Pearson Education, Limited, 2025.

Find full text

APA, Harvard, Vancouver, ISO, and other styles

4

Tannir, Khaled. Optimizing Hadoop for MapReduce. Packt Publishing, 2014.

Find full text

APA, Harvard, Vancouver, ISO, and other styles

5

Big Data with Hadoop Mapreduce. Taylor & Francis Group, 2022.

Find full text

APA, Harvard, Vancouver, ISO, and other styles

6

Big Data with Hadoop Mapreduce. Taylor & Francis Group, 2020.

Find full text

APA, Harvard, Vancouver, ISO, and other styles

7

Gunarathne, Thilina. Hadoop Mapreduce V2 Cookbook Second Edition. Packt Publishing, Limited, 2015.

Find full text

APA, Harvard, Vancouver, ISO, and other styles

8

Perera, Srinath. Instant MapReduce Patterns - Hadoop Essentials How-To. Packt Publishing, Limited, 2013.

Find full text

APA, Harvard, Vancouver, ISO, and other styles

9

Instant MapReduce Patterns - Hadoop Essentials How-to. Packt Publishing, 2013.

Find full text

APA, Harvard, Vancouver, ISO, and other styles

10

Mapreduce Design Patterns. O'Reilly Media, 2012.

Find full text

APA, Harvard, Vancouver, ISO, and other styles

Book chapters on the topic "Hadoop MapReduce"

1

Jeyaraj, Rathinaraja, Ganeshkumar Pugalendhi, and Anand Paul. "Hadoop Framework." In Big Data with Hadoop MapReduce. Apple Academic Press, 2020. http://dx.doi.org/10.1201/9780429321733-2.

Full text

APA, Harvard, Vancouver, ISO, and other styles

2

Jeyaraj, Rathinaraja, Ganeshkumar Pugalendhi, and Anand Paul. "Hadoop Ecosystem." In Big Data with Hadoop MapReduce. Apple Academic Press, 2020. http://dx.doi.org/10.1201/9780429321733-4.

Full text

APA, Harvard, Vancouver, ISO, and other styles

3

Jeyaraj, Rathinaraja, Ganeshkumar Pugalendhi, and Anand Paul. "Hadoop 2.7.0." In Big Data with Hadoop MapReduce. Apple Academic Press, 2020. http://dx.doi.org/10.1201/9780429321733-5.

Full text

APA, Harvard, Vancouver, ISO, and other styles

4

Steele, Brian, John Chandler, and Swarna Reddy. "Hadoop and MapReduce." In Algorithms for Data Science. Springer International Publishing, 2016. http://dx.doi.org/10.1007/978-3-319-45797-0_4.

Full text

APA, Harvard, Vancouver, ISO, and other styles

5

Jeyaraj, Rathinaraja, Ganeshkumar Pugalendhi, and Anand Paul. "Hadoop 1.2.1 Installation." In Big Data with Hadoop MapReduce. Apple Academic Press, 2020. http://dx.doi.org/10.1201/9780429321733-3.

Full text

APA, Harvard, Vancouver, ISO, and other styles

6

Jeyaraj, Rathinaraja, Ganeshkumar Pugalendhi, and Anand Paul. "Hadoop 2.7.0 Installation." In Big Data with Hadoop MapReduce. Apple Academic Press, 2020. http://dx.doi.org/10.1201/9780429321733-6.

Full text

APA, Harvard, Vancouver, ISO, and other styles

7

Wadkar, Sameer, and Madhu Siddalingaiah. "Advanced MapReduce Development." In Pro Apache Hadoop. Apress, 2014. http://dx.doi.org/10.1007/978-1-4302-4864-4_6.

Full text

APA, Harvard, Vancouver, ISO, and other styles

8

Vohra, Deepak. "HDFS and MapReduce." In Practical Hadoop Ecosystem. Apress, 2016. http://dx.doi.org/10.1007/978-1-4842-2199-0_2.

Full text

APA, Harvard, Vancouver, ISO, and other styles

9

Wadkar, Sameer, and Madhu Siddalingaiah. "Basics of MapReduce Development." In Pro Apache Hadoop. Apress, 2014. http://dx.doi.org/10.1007/978-1-4302-4864-4_5.

Full text

APA, Harvard, Vancouver, ISO, and other styles

10

Jeyaraj, Rathinaraja, Ganeshkumar Pugalendhi, and Anand Paul. "Big Data." In Big Data with Hadoop MapReduce. Apple Academic Press, 2020. http://dx.doi.org/10.1201/9780429321733-1.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Conference papers on the topic "Hadoop MapReduce"

1

Johannessen, Roger, Anis Yazidi, and Boning Feng. "Hadoop MapReduce scheduling paradigms." In 2017 IEEE 2nd International Conference on Cloud Computing and Big Data Analysis (ICCCBDA). IEEE, 2017. http://dx.doi.org/10.1109/icccbda.2017.7951906.

Full text

APA, Harvard, Vancouver, ISO, and other styles

2

Fanibhare, Vaibhav, and Vijay Dahake. "SmartGrids: MapReduce framework using Hadoop." In 2016 3rd International Conference on Signal Processing and Integrated Networks (SPIN). IEEE, 2016. http://dx.doi.org/10.1109/spin.2016.7566727.

Full text

APA, Harvard, Vancouver, ISO, and other styles

3

Honjo, Toshimori, and Kazuki Oikawa. "Hardware acceleration of Hadoop MapReduce." In 2013 IEEE International Conference on Big Data. IEEE, 2013. http://dx.doi.org/10.1109/bigdata.2013.6691562.

Full text

APA, Harvard, Vancouver, ISO, and other styles

4

Bhandarkar, Milind. "MapReduce programming with apache Hadoop." In 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS). IEEE, 2010. http://dx.doi.org/10.1109/ipdps.2010.5470377.

Full text

APA, Harvard, Vancouver, ISO, and other styles

5

Wagle, Neena, Shreya Jasani, Srushti Gawand, Sailee Tilekar, and Poornima Patil. "Twitter UserRank using Hadoop MapReduce." In the ACM Symposium. ACM Press, 2016. http://dx.doi.org/10.1145/2909067.2909095.

Full text

APA, Harvard, Vancouver, ISO, and other styles

6

Tan, Jian, Xiaoqiao Meng, and Li Zhang. "Coupling scheduler for MapReduce/Hadoop." In the 21st international symposium. ACM Press, 2012. http://dx.doi.org/10.1145/2287076.2287097.

Full text

APA, Harvard, Vancouver, ISO, and other styles

7

George, Johnu, Chien-An Chen, Radu Stoleru, Geoffrey G. Xie, Tamim Sookoor, and David Bruno. "Hadoop MapReduce for Tactical Clouds." In 2014 IEEE 3rd International Conference on Cloud Networking (CloudNet). IEEE, 2014. http://dx.doi.org/10.1109/cloudnet.2014.6969015.

Full text

APA, Harvard, Vancouver, ISO, and other styles

8

Hou, Xiaofei, T. K. Ashwin Kumar, Johnson P. Thomas, and Vijay Varadharajan. "Dynamic Workload Balancing for Hadoop MapReduce." In 2014 IEEE Fourth International Conference on Big Data and Cloud Computing (BdCloud). IEEE, 2014. http://dx.doi.org/10.1109/bdcloud.2014.103.

Full text

APA, Harvard, Vancouver, ISO, and other styles

9

Merla, PrathyushaRani, and Yiheng Liang. "Data analysis using hadoop MapReduce environment." In 2017 IEEE International Conference on Big Data (Big Data). IEEE, 2017. http://dx.doi.org/10.1109/bigdata.2017.8258541.

Full text

APA, Harvard, Vancouver, ISO, and other styles

10

Syue, Fu-Hong, Varsha A. Kshirsagar, and Shou-Chih Lo. "Improving MapReduce Load Balancing in Hadoop." In 2018 14th International Conference on Natural Computation, Fuzzy Systems and Knowledge Discovery (ICNC-FSKD). IEEE, 2018. http://dx.doi.org/10.1109/fskd.2018.8687158.

Full text

APA, Harvard, Vancouver, ISO, and other styles

We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!