To see the other types of publications on this topic, follow the link: Mapreduce.

Journal articles on the topic 'Mapreduce'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 journal articles for your research on the topic 'Mapreduce.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse journal articles on a wide variety of disciplines and organise your bibliography correctly.

1

Garg, Uttama. "Data Analytic Models That Redress the Limitations of MapReduce." International Journal of Web-Based Learning and Teaching Technologies 16, no. 6 (November 2021): 1–15. http://dx.doi.org/10.4018/ijwltt.20211101.oa7.

Full text
Abstract:
The amount of data in today’s world is increasing exponentially. Effectively analyzing Big Data is a very complex task. The MapReduce programming model created by Google in 2004 revolutionized the big-data comput-ing market. Nowadays the model is being used by many for scientific and research analysis as well as for commercial purposes. The MapReduce model however is quite a low-level progamming model and has many limitations. Active research is being undertaken to make models that overcome/remove these limitations. In this paper we have studied some popular data analytic models that redress some of the limitations of MapReduce; namely ASTERIX and Pregel (Giraph) We discuss these models briefly and through the discussion highlight how these models are able to overcome MapReduce’s limitations.
APA, Harvard, Vancouver, ISO, and other styles
2

Zhang, Yulun, Chenxu Zhang, Lei Yang, and Hongyang Li. "Large-scale Data Mining Method based on Clustering Algorithm Combined with MAPREDUCE." Transactions on Computer Science and Intelligent Systems Research 2 (December 21, 2023): 9–13. http://dx.doi.org/10.62051/8p9b3106.

Full text
Abstract:
With the continuous deepening and development of information technology, the diversity and amount of information in data continue to grow. Effectively mining these text data to extract valuable content has become an urgent task in the field of data research. This study combines the MapReduce distributed system with the K-means clustering algorithm to meet the challenges of large-scale data mining. At the same time, the paper use a distributed caching mechanism to solve the problem of repeated application of resources for multiple MapReduce collaborative operations and improve data mining efficiency. The combination of MapReduce's distributed computing and the advantages of K-means clustering algorithm provides an efficient and scalable method for large-scale data mining. Experimental results combining internal and external indicators show that the advantage of combining K-means with MapReduce is to fully utilize the distributed and parallel computing characteristics of MapReduce, providing users with an efficient and scalable data mining tool. Through this research, the paper provide new methods and insights for large-scale data mining, improving the efficiency and accuracy of data mining.
APA, Harvard, Vancouver, ISO, and other styles
3

Dean, Jeffrey, and Sanjay Ghemawat. "MapReduce." Communications of the ACM 51, no. 1 (January 2008): 107–13. http://dx.doi.org/10.1145/1327452.1327492.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Dean, Jeffrey, and Sanjay Ghemawat. "MapReduce." Communications of the ACM 53, no. 1 (January 2010): 72–77. http://dx.doi.org/10.1145/1629175.1629198.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Zhang, Guigang, Chao Li, Yong Zhang, and Chunxiao Xing. "A Semantic++ MapReduce Parallel Programming Model." International Journal of Semantic Computing 08, no. 03 (September 2014): 279–99. http://dx.doi.org/10.1142/s1793351x14400091.

Full text
Abstract:
Big data is playing a more and more important role in every area such as medical health, internet finance, culture and education etc. How to process these big data efficiently is a huge challenge. MapReduce is a good parallel programming language to process big data. However, it has lots of shortcomings. For example, it cannot process complex computing. It cannot suit real-time computing. In order to overcome these shortcomings of MapReduce and its variants, in this paper, we propose a Semantic++ MapReduce parallel programming model. This study includes the following parts. (1) Semantic++ MapReduce parallel programming model. It includes physical framework of semantic++ MapReduce parallel programming model and logic framework of semantic++ MapReduce parallel programming model; (2) Semantic++ extraction and management method for big data; (3) Semantic++ MapReduce parallel programming computing framework. It includes semantic++ map, semantic++ reduce and semantic++ shuffle; (4) Semantic++ MapReduce for multi-data centers. It includes basic framework of semantic++ MapReduce for multi-data centers and semantic++ MapReduce application framework for multi-data centers; (5) A Case Study of semantic++ MapReduce across multi-data centers.
APA, Harvard, Vancouver, ISO, and other styles
6

Wang, Zhong, Bo Suo, and Zhuo Wang. "MRScheduling: An Effective Technique for Multi-Tenant Meeting Deadline in MapReduce." Applied Mechanics and Materials 644-650 (September 2014): 4482–86. http://dx.doi.org/10.4028/www.scientific.net/amm.644-650.4482.

Full text
Abstract:
The multi-tenant jobs scheduling problem based on MapReduce framework has become more and more significant in contemporary society. Existing scheduling approach or algorithm no longer fit well in scenario that numerous jobs were submitted by multiple users at the same time. Therefore, taken enlarging jobs’ throughput for MapReduce into account, we firstly propose an MRScheduling which focuses on meeting job’s respective deadline. Considering the various parameters which are related to job execution time of a MapReduce’s job, we present a simply time-cost model, for the aim that quantifying the number of job’s assigned map slots and reduce slots. Then, an MRScheduling algorithm is discussed in details. Finally, we perform our approach on both real data and synthetic data on real distributed cluster to verify its effectiveness and efficiency.
APA, Harvard, Vancouver, ISO, and other styles
7

Chen, Rong, and Haibo Chen. "Tiled-MapReduce." ACM Transactions on Architecture and Code Optimization 10, no. 1 (April 2013): 1–30. http://dx.doi.org/10.1145/2445572.2445575.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Friedman, Eric, Peter Pawlowski, and John Cieslewicz. "SQL/MapReduce." Proceedings of the VLDB Endowment 2, no. 2 (August 2009): 1402–13. http://dx.doi.org/10.14778/1687553.1687567.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Garcia, Christopher. "Demystifying MapReduce." Procedia Computer Science 20 (2013): 484–89. http://dx.doi.org/10.1016/j.procs.2013.09.307.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

Al-Badarneh, Amer, Amr Mohammad, and Salah Harb. "A Survey on MapReduce Implementations." International Journal of Cloud Applications and Computing 6, no. 1 (January 2016): 59–87. http://dx.doi.org/10.4018/ijcac.2016010104.

Full text
Abstract:
A distinguished successful platform for parallel data processing MapReduce is attracting a significant momentum from both academia and industry as the volume of data to capture, transform, and analyse grows rapidly. Although MapReduce is used in many applications to analyse large scale data sets, there is still a lot of debate among scientists and researchers on its efficiency, performance, and usability to support more classes of applications. This survey presents a comprehensive review of various implementations of MapReduce framework. Initially the authors give an overview of MapReduce programming model. They then present a broad description of various technical aspects of the most successful implementations of MapReduce framework reported in the literature and discuss their main strengths and weaknesses. Finally, the authors conclude by introducing a comparison between MapReduce implementations and discuss open issues and challenges on enhancing MapReduce.
APA, Harvard, Vancouver, ISO, and other styles
11

Dahiphale, Devendra, Rutvik Karve, Athanasios V. Vasilakos, Huan Liu, Zhiwei Yu, Amit Chhajer, Jianmin Wang, and Chaokun Wang. "An Advanced MapReduce: Cloud MapReduce, Enhancements and Applications." IEEE Transactions on Network and Service Management 11, no. 1 (March 2014): 101–15. http://dx.doi.org/10.1109/tnsm.2014.031714.130407.

Full text
APA, Harvard, Vancouver, ISO, and other styles
12

Gao, Tilei, Ming Yang, Rong Jiang, Yu Li, and Yao Yao. "Research on Computing Efficiency of MapReduce in Big Data Environment." ITM Web of Conferences 26 (2019): 03002. http://dx.doi.org/10.1051/itmconf/20192603002.

Full text
Abstract:
The emergence of big data has brought a great impact on traditional computing mode, the distributed computing framework represented by MapReduce has become an important solution to this problem. Based on the big data, this paper deeply studies the principle and framework of MapReduce programming. On the basis of mastering the principle and framework of MapReduce programming, the time consumption of distributed computing framework MapReduce and traditional computing model is compared with concrete programming experiments. The experiment shows that MapReduce has great advantages in large data volume.
APA, Harvard, Vancouver, ISO, and other styles
13

Park, Jong-Hyuk, Hwa-Young Jeong, Young-Sik Jeong, and Min Choi. "REST-MapReduce: An Integrated Interface but Differentiated Service." Journal of Applied Mathematics 2014 (2014): 1–10. http://dx.doi.org/10.1155/2014/170723.

Full text
Abstract:
With the fast deployment of cloud computing, MapReduce architectures are becoming the major technologies for mobile cloud computing. The concept of MapReduce was first introduced as a novel programming model and implementation for a large set of computing devices. In this research, we propose a novel concept of REST-MapReduce, enabling users to use only the REST interface without using the MapReduce architecture. This approach provides a higher level of abstraction by integration of the two types of access interface, REST API and MapReduce. The motivation of this research stems from the slower response time for accessing simple RDBMS on Hadoop than direct access to RDMBS. This is because there is overhead to job scheduling, initiating, starting, tracking, and management during MapReduce-based parallel execution. Therefore, we provide a good performance for REST Open API service and for MapReduce, respectively. This is very useful for constructing REST Open API services on Hadoop hosting services, for example, Amazon AWS (Macdonald, 2005) or IBM Smart Cloud. For evaluating performance of our REST-MapReduce framework, we conducted experiments with Jersey REST web server and Hadoop. Experimental result shows that our approach outperforms conventional approaches.
APA, Harvard, Vancouver, ISO, and other styles
14

Liu, Hanpeng, Wuqi Gao, and Junmin Luo. "Research on Intelligentization of Cloud Computing Programs Based on Self-awareness." International Journal of Advanced Network, Monitoring and Controls 8, no. 2 (June 1, 2023): 89–98. http://dx.doi.org/10.2478/ijanmc-2023-0060.

Full text
Abstract:
Abstract Through the research of MapReduce programming framework of cloud computing, the current MapReduce program only solves specific problems, and there is no design experience or design feature summary of MapReduce program, let alone formal description and experience inheritance and application of knowledge base. In order to solve the problem of intelligent cloud computing program, a general MapReduce program generation method is designed. This paper proposes the architecture of intelligent cloud computing by studying AORBCO model and combining cloud computing technology. According to the behavior control mechanism in AORBCO model, a program generation method of MapReduce in intelligent cloud computing is proposed. This method will extract entity information in input data set and entity information in knowledge base in intelligent cloud computing for similarity calculation, and extract the entity in the top order as key key-value pair information in intelligent cloud computing judgment data set. The data processing types are divided, and then aligned with each specific MapReduce capability, and the MapReduce program generation experiment is verified in the AORBCO model development platform. The experiment shows that the complexity of big data MapReduce program code is simplified, and the generated code execution efficiency is good.
APA, Harvard, Vancouver, ISO, and other styles
15

Jiang, Tao, Huaxi Gu, Kun Wang, Xiaoshan Yu, and Yunfeng Lu. "BHyberCube: A MapReduce aware heterogeneous architecture for data center." Computer Science and Information Systems 14, no. 3 (2017): 611–27. http://dx.doi.org/10.2298/csis170202019t.

Full text
Abstract:
Some applications, like MapReduce, ask for heterogeneous network in data center network. However, the traditional network topologies, like fat tree and BCube, are homogeneous. MapReduce is a distributed data processing application. In this paper, we propose a BHyberCube network (BHC), which is a new heterogeneous network for MapReduce. Heterogeneous nodes and scalability issues are addressed considering the implementation of MapReduce in the existing topologies. Mathematical model is established to demonstrate the procedure of building a BHC. Comparisons of BHC and other topologies show the good properties BHC possesses for MapReduce. We also do simulations of BHC in multi-job injection and different probability of worker servers? communications scenarios respectively. The result and analysis show that the BHC could be a viable interconnection topology in today?s data center for MapReduce.
APA, Harvard, Vancouver, ISO, and other styles
16

Gao, Tie Liang, Jiao Li, Jun Peng Zhang, and Bing Jie Shi. "The Research of MapReduce on the Cloud Computing." Applied Mechanics and Materials 182-183 (June 2012): 2127–30. http://dx.doi.org/10.4028/www.scientific.net/amm.182-183.2127.

Full text
Abstract:
MapReduce is a kind of model of program that is use in the parallel computing about large scale data muster in the Cloud Computing[1] , it mainly consist of map and reduce . MapReduce is tremendously convenient for the programmer who can’t familiar with the parallel program .These people use the MapReduce to run their program on the distribute system. This paper mainly research the model and process and theory of MapReduce .
APA, Harvard, Vancouver, ISO, and other styles
17

Zhang, Bin, Jia Jin Le, and Mei Wang. "Effective ACPS-Based Rescheduling of Parallel Batch Processing Machines with MapReduce." Applied Mechanics and Materials 575 (June 2014): 820–24. http://dx.doi.org/10.4028/www.scientific.net/amm.575.820.

Full text
Abstract:
MapReduce is a highly efficient distributed and parallel computing framework, allowing users to readily manage large clusters in parallel computing. For Big data search problem in the distributed computing environment based on MapReduce architecture, in this paper we propose an Ant colony parallel search algorithm (ACPSMR) for Big data. It take advantage of the group intelligence of ant colony algorithm for global parallel search heuristic scheduling capabilities to solve problem of multi-task parallel batch scheduling with low efficiency in the MapReduce. And we extended HDFS design in MapReduce architecture, which make it to achieve effective integration with MapReduce. Then the algorithm can make the best of the scalability, high parallelism of MapReduce. The simulation experiment result shows that, the new algorithm can take advantages of cloud computing to get good efficiency when mining Big data.
APA, Harvard, Vancouver, ISO, and other styles
18

Wang, Ting, Hua Liang Zhang, and Peng Zeng. "A MapReduce Iteration Framework in Local Parallel and Message Synchronization." Applied Mechanics and Materials 380-384 (August 2013): 2237–41. http://dx.doi.org/10.4028/www.scientific.net/amm.380-384.2237.

Full text
Abstract:
With the development of large-scale distributed computing, Stand-alone operating environment to meet the demand of the time and space overhead of massive data based on. There is more attention to how to design the distributed algorithm for efficient cloud computing environment. The MapReduce model cannot solve the issue. In this paper, the redesign of the computing model of MapReduce, ensure the existing calculation models compatible with the old MapReduce operation. At the same time, the framework used the message synchronization mechanism to implement state data changing interaction tasks in Parallel Layer. Compared to the original MapReduce operation, greatly reduces the processing time of the MapReduce iterative algorithm.
APA, Harvard, Vancouver, ISO, and other styles
19

Marzuni, Saeed Mirpour, Abdorreza Savadi, Adel N. Toosi, and Mahmoud Naghibzadeh. "Cross-MapReduce: Data transfer reduction in geo-distributed MapReduce." Future Generation Computer Systems 115 (February 2021): 188–200. http://dx.doi.org/10.1016/j.future.2020.09.009.

Full text
APA, Harvard, Vancouver, ISO, and other styles
20

Chen, Donghua, and Runtong Zhang. "MapReduce-Based Dynamic Partition Join with Shannon Entropy for Data Skewness." Scientific Programming 2021 (November 24, 2021): 1–15. http://dx.doi.org/10.1155/2021/1602767.

Full text
Abstract:
Join operations of data sets play a crucial role in obtaining the relations of massive data in real life. Joining two data sets with MapReduce requires a proper design of the Map and Reduce stages for different scenarios. The factors affecting MapReduce join efficiency include the density of the data sets and data transmission over clusters like Hadoop. This study aims to improve the efficiency of MapReduce join algorithms on Hadoop by leveraging Shannon entropy to measure the information changes of data sets being joined in different MapReduce stages. To reduce the uncertainty of data sets in joins through the network, a novel MapReduce join algorithm with dynamic partition strategies called dynamic partition join (DPJ) is proposed. Leveraging the changes of entropy in the partitions of data sets during the Map and Reduce stages revises the logical partitions by changing the original input of the reduce tasks in the MapReduce jobs. Experimental results indicate that the entropy-based measures can measure entropy changes of join operations. Moreover, the DPJ variant methods achieved lower entropy compared with the existing joins, thereby increasing the feasibility of MapReduce join operations for different scenarios on Hadoop.
APA, Harvard, Vancouver, ISO, and other styles
21

Fu, Chun Yan, Hong Zhou, Mao Song Ge, Xiao Qu, and Yong Li Wang. "A Quasi-Real-Time MapReduce Schedule Algorithm." Advanced Materials Research 694-697 (May 2013): 2458–61. http://dx.doi.org/10.4028/www.scientific.net/amr.694-697.2458.

Full text
Abstract:
In this paper, we extend and rewrite MapReduce dispatcher and its quasi-real-time schedule algorithm to support operation scheduling in time-limited. MapReduce dispatcher has an evaluation of completion time of tasks in dependence of rate of progress of tasks at hand, and allocated resource dynamically to every task when they are running. Experimental investigation shows that, the algorithm increase the resource utilization of the MapReduce system, and the goals of quasi-real-time MapReduce schedule has been achieved.
APA, Harvard, Vancouver, ISO, and other styles
22

Adornes, Daniel, Dalvan Griebler, Cleverson Ledur, and Luiz Gustavo Fernandes. "Coding Productivity in MapReduce Applications for Distributed and Shared Memory Architectures." International Journal of Software Engineering and Knowledge Engineering 25, no. 09n10 (November 2015): 1739–41. http://dx.doi.org/10.1142/s0218194015710096.

Full text
Abstract:
MapReduce was originally proposed as a suitable and efficient approach for analyzing and processing large amounts of data. Since then, many researches contributed with MapReduce implementations for distributed and shared memory architectures. Nevertheless, different architectural levels require different optimization strategies in order to achieve high-performance computing. Such strategies in turn have caused very different MapReduce programming interfaces among these researches. This paper presents some research notes on coding productivity when developing MapReduce applications for distributed and shared memory architectures. As a case study, we introduce our current research on a unified MapReduce domain-specific language with code generation for Hadoop and Phoenix++, which has achieved a coding productivity increase from 41.84% and up to 94.71% without significant performance losses (below 3%) compared to those frameworks.
APA, Harvard, Vancouver, ISO, and other styles
23

Mitra, Arnab, Anirban Kundu, Matangini Chattopadhyay, and Samiran Chattopadhyay. "On the Exploration of Equal Length Cellular Automata Rules Targeting a MapReduce Design in Cloud." International Journal of Cloud Applications and Computing 8, no. 2 (April 2018): 1–26. http://dx.doi.org/10.4018/ijcac.2018040101.

Full text
Abstract:
A MapReduce design with Cellular Automata (CA) is presented in this research article to facilitate load-reduced independent data processing and cost-efficient physical implementation in heterogeneous Cloud architecture. Equal Length Cellular Automata (ELCA) are considered for the design. This article explores ELCA rules and presents an ELCA based MapReduce design in cloud. New algorithms are presented for i) synthesis, ii) classification of ELCA rules, and iii) ELCA based MapReduce design in Cloud. Shuffling and efficient reduction of data volume are ensured in proposed MapReduce design.
APA, Harvard, Vancouver, ISO, and other styles
24

Zheng, Feifeng, Zhaojie Wang, Yinfeng Xu, and Ming Liu. "Heuristic Algorithms for MapReduce Scheduling Problem with Open-Map Task and Series-Reduce Tasks." Scientific Programming 2020 (July 15, 2020): 1–10. http://dx.doi.org/10.1155/2020/8810215.

Full text
Abstract:
Based on the classical MapReduce concept, we propose an extended MapReduce scheduling model. In the extended MapReduce scheduling problem, we assumed that each job contains an open-map task (the map task can be divided into multiple unparallel operations) and series-reduce tasks (each reduce task consists of only one operation). Different from the classical MapReduce scheduling problem, we also assume that all the operations cannot be processed in parallel, and the machine settings are unrelated machines. For solving the extended MapReduce scheduling problem, we establish a mixed-integer programming model with the minimum makespan as the objective function. We then propose a genetic algorithm, a simulated annealing algorithm, and an L-F algorithm to solve this problem. Numerical experiments show that L-F algorithm has better performance in solving this problem.
APA, Harvard, Vancouver, ISO, and other styles
25

Tahsir Ahmed Munna, Md, Shaikh Muhammad Allayear, Mirza Mohtashim Alam, Sheikh Shah Mohammad Motiur Rahman, Md Samadur Rahman, and M. Mesbahuddin Sarker. "Simplified Mapreduce Mechanism for Large Scale Data Processing." International Journal of Engineering & Technology 7, no. 3.8 (July 7, 2018): 16. http://dx.doi.org/10.14419/ijet.v7i3.8.15211.

Full text
Abstract:
MapReduce has become a popular programming model for processing and running large-scale data sets with a parallel, distributed paradigm on a cluster. Hadoop MapReduce is needed especially for large scale data like big data processing. In this paper, we work to modify the Hadoop MapReduce Algorithm and implement it to reduce processing time.
APA, Harvard, Vancouver, ISO, and other styles
26

Park, Jeong-Hyeok, Sang-Yeol Lee, Da Hyun Kang, and Joong-Ho Won. "Hadoop and MapReduce." Journal of the Korean Data and Information Science Society 24, no. 5 (September 30, 2013): 1013–27. http://dx.doi.org/10.7465/jkdi.2013.24.5.1013.

Full text
APA, Harvard, Vancouver, ISO, and other styles
27

Smith, Calvin, and Aws Albarghouthi. "MapReduce program synthesis." ACM SIGPLAN Notices 51, no. 6 (August 2016): 326–40. http://dx.doi.org/10.1145/2980983.2908102.

Full text
APA, Harvard, Vancouver, ISO, and other styles
28

Vineyard, Craig M., Stephen J. Verzi, Conrad D. James, James B. Aimone, and Gregory L. Heileman. "MapReduce SVM Game." Procedia Computer Science 53 (2015): 298–307. http://dx.doi.org/10.1016/j.procs.2015.07.307.

Full text
APA, Harvard, Vancouver, ISO, and other styles
29

Pace, Matthew Felice. "BSP vs MapReduce." Procedia Computer Science 9 (2012): 246–55. http://dx.doi.org/10.1016/j.procs.2012.04.026.

Full text
APA, Harvard, Vancouver, ISO, and other styles
30

Zhang, Yanfeng, Shimin Chen, Qiang Wang, and Ge Yu. "i$^2$ MapReduce: Incremental MapReduce for Mining Evolving Big Data." IEEE Transactions on Knowledge and Data Engineering 27, no. 7 (July 1, 2015): 1906–19. http://dx.doi.org/10.1109/tkde.2015.2397438.

Full text
APA, Harvard, Vancouver, ISO, and other styles
31

WEI, ZuKuan, Bo HONG, and JaeHong KIM. "A New Memory MapReduce Framework for Higher Access to Resources." Indonesian Journal of Electrical Engineering and Computer Science 4, no. 3 (December 18, 2016): 629. http://dx.doi.org/10.11591/ijeecs.v4.i3.pp629-636.

Full text
Abstract:
The demand for highly parallel data processing platform was growing due to an explosion in the number of massive-scale data applications both in academia and industry. MapReduce was one of the most meaningful solutions to deal with big data distributed computing, This paper was based on the work of Hadoop MapReduce. In the face of massive data computing and calculation process, MapReduce generated a lot of dynamic data, but these data were discarded after the task completed. Meanwhile, a large number of dynamic data were written to HDFS during task execution, caused much unnecessary IO cost. In this paper, we analyzed existing distributed caching mechanism and proposed a new Memory MapReduce framework that has a real-time response to read or write request from task nodes, maintain related information about cache data. After performance testing, we could clearly find MapReduce with cache significantly improved in IO performance.
APA, Harvard, Vancouver, ISO, and other styles
32

Tian, Hong Xia, Xue We Cui, Jing Wang, and Ying Jie Wang. "Design on Text Retrieval Algorithm in Cloud Computing Environment." Applied Mechanics and Materials 742 (March 2015): 726–29. http://dx.doi.org/10.4028/www.scientific.net/amm.742.726.

Full text
Abstract:
This paper presents a lightweight index does not suspend services online update program, and demonstrate the performance of the index update program from the theoretical analysis and experimental data in two ways. A new method of MapReduce existing index methodology based on this design and further discussion are done in the paper, in the index MapReduce and Hadoop MapReduce feasibility aspects, the design flaws through experimentation.
APA, Harvard, Vancouver, ISO, and other styles
33

Prakash, Shah Pratik, and Pattabiraman V. "Using Intermediate Data of Map Reduce for Faster Execution." International Journal of Computers and Communications 16 (March 8, 2022): 20–26. http://dx.doi.org/10.46300/91013.2022.16.4.

Full text
Abstract:
Data of any kind structured, unstructured or semistructured is generated in large quantity around the globe in various domains. These datasets are stored on multiple nodes in a cluster. MapReduce framework has emerged as the most efficient technique and easy to use for parallel processing of distributed data. This paper proposes a new methodology for mapreduce framework workflow. The proposed methodology provides a way to process raw data in such a way that it requires less processing time to generate the required result. The methodology stores intermediate data which is generated between map and reduce phase and re-used as input to mapreduce. The paper presents methodology which focuses on improving the data reusability, scalability and efficiency of the mapreduce framework for large data analysis. MongoDB 2.4.2 is used to demonstrate the experimental work to show how we can store and reuse intermediate data as a part of mapreduce to improve the processing of large datasets.
APA, Harvard, Vancouver, ISO, and other styles
34

Sundaraj, Kasi Perumal, Madhusudhan Rao T, and Praveen Chander P G. "Multiple MapReduce Jobs in Distributed Scheduler for Big Data Applications." International Journal of Advanced Research in Computer Science and Software Engineering 7, no. 12 (January 3, 2018): 34. http://dx.doi.org/10.23956/ijarcsse.v7i12.484.

Full text
Abstract:
The majority of large-scale data intensive applications executed by data centers are based on MapReduce or its open-source implementation, Hadoop. Such applications are executed on large clusters requiring large amounts of energy, making the energy costs a considerable fraction of the data center’s overall costs. Therefore minimizing the energy consumption when executing each MapReduce job is a critical concern for data centers. In this paper, we propose a framework for improving the energy efficiency of MapReduce applications, while satisfying the service level agreement (SLA).We first model the problemof energy-aware scheduling of a single MapReduce job as an Integer Program. We then propose two heuristic algorithms, called energy-aware MapReduce scheduling algorithms (EMRSA-I and EMRSA-II), that find the assignments of map and reduce tasks to the machine slots in order to minimize the energy consumed when executing the application. Our algorithm able to find near optimal job schedules consuming approximately 40 percent less energy on average than the schedules obtained by a common practice scheduler that minimizes the makespan.
APA, Harvard, Vancouver, ISO, and other styles
35

Durairaj, M., and T. S. Poornappriya. "Importance of MapReduce for Big Data Applications: A Survey." Asian Journal of Computer Science and Technology 7, no. 1 (May 5, 2018): 112–18. http://dx.doi.org/10.51983/ajcst-2018.7.1.1817.

Full text
Abstract:
Significant regard for MapReduce framework has been trapped by a wide range of areas. It is presently a practical model for data-focused applications because of its basic interface of programming, high elasticity, and capacity to withstand the subjection to defects. Additionally, it is fit for preparing a high extent of data in Distributed Computing environments (DCE). MapReduce, on various events, has turned out to be material to a wide scope of areas. MapReduce is a parallel programming model and a related usage presented by Google. In the programming model, a client determines the calculation by two capacities, Map and Reduce. The basic MapReduce library consequently parallelizes the calculation and handles muddled issues like data dispersion, load adjusting, and adaptation to non-critical failure. Huge data spread crosswise over numerous machines, need to parallelize. Moves the data, and gives booking, adaptation to non-critical failure. A writing survey on the MapReduce programming in different areas has completed in this paper. An examination course has been distinguished by utilizing a writing audit.
APA, Harvard, Vancouver, ISO, and other styles
36

Darapaneni, Chandra Sekhar, Bobba Basaveswara Rao, Boggavarapu Bhanu Venkata Satya Vara Prasad, and Suneetha Bulla. "An Analytical Performance Evaluation of MapReduce Model Using Transient Queuing Model." Advances in Modelling and Analysis B 64, no. 1-4 (December 31, 2021): 46–53. http://dx.doi.org/10.18280/ama_b.641-407.

Full text
Abstract:
Today the MapReduce frameworks become the standard distributed computing mechanisms to store, process, analyze, query and transform the Bigdata. While processing the Bigdata, evaluating the performance of the MapReduce framework is essential, to understand the process dependencies and to tune the hyper-parameters. Unfortunately, the scope of the MapReduce framework in-built functions is limited to evaluate the performance till some extent. A reliable analytical performance model is required in this area to evaluate the performance of the MapReduce frameworks. The main objective of this paper is to investigate the performance effect of the MapReduce computing models under various configurations. To accomplish this job, we proposed an analytical transient queuing model, which evaluates the MapReduce model performance for different job arrival rates at mappers and various job completion times of mappers as well as the reducers too. In our transient queuing model, we appointed an efficient multi-server queuing model M/M/C for optimal waiting queue management. To conduct the experiments on proposed analytics model, we selected the Bigdata applications with three mappers and two reducers, under various configurations. As part of the experiments, the transient differential equations, average queue lengths, mappers blocking probability, shuffle waiting probabilities and transient states are evaluated. MATLAB based numerical simulations presented the analytical results for various combinations of the input parameters like λ, µ1 and µ2 and their effect on queue length.
APA, Harvard, Vancouver, ISO, and other styles
37

Kim, Jin-Hyun, and Kyu-Seok Shim. "Sequential Pattern Mining with Optimization Calling MapReduce Function on MapReduce Framework." KIPS Transactions:PartD 18D, no. 2 (April 30, 2011): 81–88. http://dx.doi.org/10.3745/kipstd.2011.18d.2.081.

Full text
APA, Harvard, Vancouver, ISO, and other styles
38

Kim, Hyeon Gyu. "SQL-to-MapReduce Translation for Efficient OLAP Query Processing with MapReduce." International Journal of Database Theory and Application 10, no. 6 (June 30, 2017): 61–70. http://dx.doi.org/10.14257/ijdta.2017.10.6.05.

Full text
APA, Harvard, Vancouver, ISO, and other styles
39

Ajibade Lukuman Saheed, Abu Bakar Kamalrulnizam, Ahmed Aliyu, and Tasneem Darwish. "Latency-aware Straggler Mitigation Strategy in Hadoop MapReduce Framework: A Review." Systematic Literature Review and Meta-Analysis Journal 2, no. 2 (October 19, 2021): 53–60. http://dx.doi.org/10.54480/slrm.v2i2.19.

Full text
Abstract:
Processing huge and complex data to obtain useful information is challenging, even though several big data processing frameworks have been proposed and further enhanced. One of the prominent big data processing frameworks is MapReduce. The main concept of MapReduce framework relies on distributed and parallel processing. However, MapReduce framework is facing serious performance degradations due to the slow execution of certain tasks type called stragglers. Failing to handle stragglers causes delay and affects the overall job execution time. Meanwhile, several straggler reduction techniques have been proposed to improve the MapReduce performance. This study provides a comprehensive and qualitative review of the different existing straggler mitigation solutions. In addition, a taxonomy of the available straggler mitigation solutions is presented. Critical research issues and future research directions are identified and discussed to guide researchers and scholars
APA, Harvard, Vancouver, ISO, and other styles
40

Retnowo, Murti. "Syncronize Data Using MapReduceModel Programming." International Journal of Engineering Technology and Natural Sciences 3, no. 2 (December 31, 2021): 82–88. http://dx.doi.org/10.46923/ijets.v3i2.140.

Full text
Abstract:
Research in the processing of the data shows that the larger data increasingly requires a longer time. Processing huge amounts of data on a single computer has limitations that can be overcome by parallel processing. This study utilized the MapReduce programming model data synchronization by duplicating the data from database client to database server. MapReduce is a programming model that was developed to speed up the processing of large data. MapReduce model application on the training process performed on data sharing that is adapted to number of sub-process (thread) and data entry to database server and displays data from data synchronization. The experiments were performed using data of 1,000, 10,000, 100,000 and 1,000,000 of data, and use the thread as much as 1, 5, 10, 15, 20 and 25 threads. The results showed that the use of MapReduce programming model can result in a faster time, but time to create many thread that many require a longer time. The results of the use of MapReduce programming model can provide time efficiency in synchronizing data both on a single database or a distributed database.
APA, Harvard, Vancouver, ISO, and other styles
41

Wibawa, Condro, Setia Wirawan, Metty Mustikasari, and Dessy Tri Anggraeni. "KOMPARASI KECEPATAN HADOOP MAPREDUCE DAN APACHE SPARK DALAM MENGOLAH DATA TEKS." Jurnal Ilmiah Matrik 24, no. 1 (April 13, 2022): 10–20. http://dx.doi.org/10.33557/jurnalmatrik.v24i1.1649.

Full text
Abstract:
Istilah Big Data saat ini bukanlah hal yang baru lagi. Salah satu komponen Big Data adalah jumlah data yang masif, yang membuat data tidak bisa diproses dengan cara-cara tradicional. Untuk menyelesaikan masalah ini, dikembangkanlah metode Map Reduce. Map Reduce adalah metode pengolahan data dengan memecah data menjadi bagian-bagian kecil (mapping) dan kemudian hasilnya dijadikan satu kembali (reducing). Framework Map Reduce yang banyak digunakan adalah Hadoop MapReduce dan Apache Spark. Konsep kedua framework ini sama akan tetapi berbeda dalam pengelolaan sumber data. Hadoop MapReduce menggunakan pendekatan HDFS (disk), sedangkan Apache Spark menggunakan RDD (in-memory). Penggunaan RDD pada Apache Spark membuat kinerja framework ini lebih cepat dibandingkan Hadoop MapReduce. Hal ini dibutktikan dalam penelitian ini, dimana untuk mengolah data teks yang sama, kecepatan rata-rata Apache Spark adalah 4,99 kali lebih cepat dibandingkan Hadoop MapReduce.
APA, Harvard, Vancouver, ISO, and other styles
42

Liu, Chang, Guilin Qi, and Yong Yu. "Large Scale Temporal RDFS Reasoning Using MapReduce." Proceedings of the AAAI Conference on Artificial Intelligence 26, no. 1 (September 20, 2021): 2441–42. http://dx.doi.org/10.1609/aaai.v26i1.8417.

Full text
Abstract:
In this work, we build a large scale reasoning engine under temporal RDFS semantics using MapReduce. We identify the major challenges of applying MapReduce framework to reason over temporal information, and present our solutions to tackle them.
APA, Harvard, Vancouver, ISO, and other styles
43

ZHOU, SHUIGENG, RUIQI LIAO, and JIHONG GUAN. "WHEN CLOUD COMPUTING MEETS BIOINFORMATICS: A REVIEW." Journal of Bioinformatics and Computational Biology 11, no. 05 (October 2013): 1330002. http://dx.doi.org/10.1142/s0219720013300025.

Full text
Abstract:
In the past decades, with the rapid development of high-throughput technologies, biology research has generated an unprecedented amount of data. In order to store and process such a great amount of data, cloud computing and MapReduce were applied to many fields of bioinformatics. In this paper, we first introduce the basic concepts of cloud computing and MapReduce, and their applications in bioinformatics. We then highlight some problems challenging the applications of cloud computing and MapReduce to bioinformatics. Finally, we give a brief guideline for using cloud computing in biology research.
APA, Harvard, Vancouver, ISO, and other styles
44

Kavitha, C., S. R. Srividhya, Wen-Cheng Lai, and Vinodhini Mani. "IMapC: Inner MAPping Combiner to Enhance the Performance of MapReduce in Hadoop." Electronics 11, no. 10 (May 17, 2022): 1599. http://dx.doi.org/10.3390/electronics11101599.

Full text
Abstract:
Hadoop is a framework for storing and processing huge amounts of data. With HDFS, large data sets can be managed on commodity hardware. MapReduce is a programming model for processing vast amounts of data in parallel. Mapping and reducing can be performed by using the MapReduce programming framework. A very large amount of data is transferred from Mapper to Reducer without any filtering or recursion, resulting in overdrawn bandwidth. In this paper, we introduce an algorithm called Inner MAPping Combiner (IMapC) for the map phase. This algorithm in the Mapper combines the values of recurring keys. In order to test the efficiency of the algorithm, different approaches were tested. According to the test, MapReduce programs that are implemented with the Default Combiner (DC) of IMapC will be 70% more efficient than those that are implemented without one. To make computations significantly faster, this work can be combined with MapReduce.
APA, Harvard, Vancouver, ISO, and other styles
45

Chunduri, Raghavendra K., and Aswani Kumar Cherukuri. "HaLoop Approach for Concept Generation in Formal Concept Analysis." Journal of Information & Knowledge Management 17, no. 03 (September 2018): 1850029. http://dx.doi.org/10.1142/s0219649218500296.

Full text
Abstract:
This paper describes an efficient algorithm for formal concepts generation in large formal contexts. While many algorithms exist for concept generation, they are not suitable for generating concepts efficiently on larger contexts. We propose an algorithm named as HaLoopUNCG algorithm based on MapReduce framework that uses a lightweight runtime environment called HaLoop. HaLoop, a modified version of Hadoop MapReduce, suits better for iterative algorithms over large datasets. Our approach uses the features of HaLoop efficiently to generate concepts in an iterative manner. First, we describe the theoretical concepts of formal concept analysis and HaLoop. Second, we provide a detailed representation of our work based on Lindig’s fast concept analysis algorithm using HaLoop and MapReduce framework. The experimental evaluations demonstrate that HaLoopUNCG algorithm is performing better than Hadoop version of upper neighbour concept generation (MRUNCG) algorithm, MapReduce implementation of Ganter’s next closure algorithm and other distributed implementations of concept generation algorithms.
APA, Harvard, Vancouver, ISO, and other styles
46

Gao, Yufei, Yanjie Zhou, Bing Zhou, Lei Shi, and Jiacai Zhang. "Handling Data Skew in MapReduce Cluster by Using Partition Tuning." Journal of Healthcare Engineering 2017 (2017): 1–12. http://dx.doi.org/10.1155/2017/1425102.

Full text
Abstract:
The healthcare industry has generated large amounts of data, and analyzing these has emerged as an important problem in recent years. The MapReduce programming model has been successfully used for big data analytics. However, data skew invariably occurs in big data analytics and seriously affects efficiency. To overcome the data skew problem in MapReduce, we have in the past proposed a data processing algorithm called Partition Tuning-based Skew Handling (PTSH). In comparison with the one-stage partitioning strategy used in the traditional MapReduce model, PTSH uses a two-stage strategy and the partition tuning method to disperse key-value pairs in virtual partitions and recombines each partition in case of data skew. The robustness and efficiency of the proposed algorithm were tested on a wide variety of simulated datasets and real healthcare datasets. The results showed that PTSH algorithm can handle data skew in MapReduce efficiently and improve the performance of MapReduce jobs in comparison with the native Hadoop, Closer, and locality-aware and fairness-aware key partitioning (LEEN). We also found that the time needed for rule extraction can be reduced significantly by adopting the PTSH algorithm, since it is more suitable for association rule mining (ARM) on healthcare data.
APA, Harvard, Vancouver, ISO, and other styles
47

Orynbekova, Kamila, Andrey Bogdanchikov, Selcuk Cankurt, Abzatdin Adamov, and Shirali Kadyrov. "MapReduce Solutions Classification by Their Implementation." International Journal of Engineering Pedagogy (iJEP) 13, no. 5 (July 6, 2023): 58–71. http://dx.doi.org/10.3991/ijep.v13i5.38867.

Full text
Abstract:
Distributed Systems are widely used in industrial projects and scientific research. The Apache Hadoop environment, which works on the MapReduce paradigm, lost popularity because new, modern tools were developed. For example, Apache Spark is preferred in some cases since it uses RAM resources to hold intermediate calculations; therefore, it works faster and is easier to use. In order to take full advantage of it, users must think about the MapReduce concept. In this paper, a usual solution and MapReduce solution of ten problems were compared by their pseudocodes and categorized into five groups. According to these groups’ descriptions and pseudocodes, readers can get a concept of MapReduce without taking specific courses. This paper proposes a five-category classification methodology to help distributed-system users learn the MapReduce paradigm fast. The proposed methodology is illustrated with ten tasks. Furthermore, statistical analysis is carried out to test if the proposed classification methodology affects learner performance. The results of this study indicate that the proposed model outperforms the traditional approach with statistical significance, as evidenced by a p-value of less than 0.05. The policy implication is that educational institutions and organizations could adopt the proposed classification methodology to help learners and employees acquire the necessary knowledge and skills to use distributed systems effectively.
APA, Harvard, Vancouver, ISO, and other styles
48

Sontakke, Vaishali, and Dayananda R. B. "Memory aware optimized Hadoop MapReduce model in cloud computing environment." IAES International Journal of Artificial Intelligence (IJ-AI) 12, no. 3 (September 1, 2023): 1270. http://dx.doi.org/10.11591/ijai.v12.i3.pp1270-1280.

Full text
Abstract:
<p>In the last decade, data analysis has become one of the popular tasks due to enormous growth in data every minute through different applications and instruments. MapReduce is the most popular programming model for data processing. Hadoop constitutes two basic models i.e., Hadoop file system (HDFS) and MapReduce, Hadoop is used for processing a huge amount of data whereas MapReduce is used for data processing. Hadoop MapReduce is one of the best platforms for processing huge data in an efficient manner such as processing web logs data. However, existing model This research work proposes memory aware optimized Hadoop MapReduce (MA-OHMR). MA-OHMR is developed considering memory as the constraint and prioritizes memory allocation and revocation in mapping, shuffling, and reducing, this further enhances the job of mapping and reducing. Optimal memory management and I/O operation are carried out to use the resource inefficiently manner. The model utilizes the global memory management to avoid garbage collection and MA-OHMR is optimized on the makespan front to reduce the same. MA-OHMR is evaluated considering two datasets i.e., simple workload of Wikipedia dataset and complex workload of sensor dataset considering makespan and cost as an evaluation parameter.</p>
APA, Harvard, Vancouver, ISO, and other styles
49

Xia, Hui. "Research on Recommendation Algorithm of Matrix Factorization Method Based on MapReduce." Applied Mechanics and Materials 631-632 (September 2014): 138–41. http://dx.doi.org/10.4028/www.scientific.net/amm.631-632.138.

Full text
Abstract:
Matrix factorization is a collaborative filtering recommendation technique proposed in recent years.In the process of recommendation,each prediction depends on the collaboration of the whole known rating set and the feature matrices need huge storage. So the recommendation with only one node will meet the bottleneck of time and resource. A MapReduce-based matrix factorization recommendation algorithm was proposed to solve this problem.The big feature matrices were shared by Hadoop distributed cache and MapFile techniques.The MapReduce algorithm could also handle multi-λsituation.The experiment on Netflix data set shows that the MapReduce-based algorithm has high speedup and improves the efficiency of collaborative filtering.
APA, Harvard, Vancouver, ISO, and other styles
50

B Reddy, Yenumula, and Desmond Hill. "Document Selection Using Mapreduce." International Journal of Security, Privacy and Trust Management 4, no. 3/4 (November 30, 2015): 01–10. http://dx.doi.org/10.5121/ijsptm.2015.4401.

Full text
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography