Log in

Relevant bibliographies by topics / MAPREDUCE FRAMEWORKS / Journal articles

To see the other types of publications on this topic, follow the link: MAPREDUCE FRAMEWORKS.

Journal articles on the topic 'MAPREDUCE FRAMEWORKS'

Author: Grafiati

Published: 11 September 2023

Last updated: 31 July 2025

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 journal articles for your research on the topic 'MAPREDUCE FRAMEWORKS.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse journal articles on a wide variety of disciplines and organise your bibliography correctly.

1

Ajibade Lukuman Saheed, Abu Bakar Kamalrulnizam, Ahmed Aliyu, and Tasneem Darwish. "Latency-aware Straggler Mitigation Strategy in Hadoop MapReduce Framework: A Review." Systematic Literature Review and Meta-Analysis Journal 2, no. 2 (2021): 53–60. http://dx.doi.org/10.54480/slrm.v2i2.19.

Full text

Abstract:

Processing huge and complex data to obtain useful information is challenging, even though several big data processing frameworks have been proposed and further enhanced. One of the prominent big data processing frameworks is MapReduce. The main concept of MapReduce framework relies on distributed and parallel processing. However, MapReduce framework is facing serious performance degradations due to the slow execution of certain tasks type called stragglers. Failing to handle stragglers causes delay and affects the overall job execution time. Meanwhile, several straggler reduction techniques ha

APA, Harvard, Vancouver, ISO, and other styles

2

Utami, Firmania Dwi, and Femi Dwi Astuti. "Comparison of Hadoop Mapreduce and Apache Spark in Big Data Processing with Hgrid247-DE." Journal of Applied Informatics and Computing 8, no. 2 (2024): 390–99. https://doi.org/10.30871/jaic.v8i2.8557.

Full text

Abstract:

In today’s rapidly evolving information technology landscape, managing and analyzing big data has become one of the most significant challenges. This paper explores the implementation of two major frameworks for big data processing: Hadoop MapReduce and Apache Spark. Both frameworks were tested in three scenarios sorting, summarizing, and grouping using HGrid247-DE as the primary tool for data processing. A diverse set of datasets sourced from Kaggle, ranging in size from 3 MB to 260 MB, was employed to evaluate the performance of each framework. The findings reveal that Apache Spark generally

APA, Harvard, Vancouver, ISO, and other styles

3

Shailin Saraiya. "Technical Evolution and Performance Analysis of MapReduce in Modern Distributed Systems." International Journal of Scientific Research in Computer Science, Engineering and Information Technology 11, no. 1 (2025): 29–35. https://doi.org/10.32628/cseit25111206.

Full text

Abstract:

MapReduce has emerged as a cornerstone technology in the big data ecosystem, fundamentally transforming how organizations process and analyze massive datasets. This article provides a detailed examination of MapReduce's architecture, exploring its evolution from Google's original implementation to its current role in modern distributed computing systems. This article classifies into the three key phases of MapReduce—Map, Shuffle, Sort, and Reduce—analyzing how each contributes to efficient parallel data processing. This article demonstrates MapReduce's versatility and impact on real-world appl

APA, Harvard, Vancouver, ISO, and other styles

4

Darapaneni, Chandra Sekhar, Bobba Basaveswara Rao, Boggavarapu Bhanu Venkata Satya Vara Prasad, and Suneetha Bulla. "An Analytical Performance Evaluation of MapReduce Model Using Transient Queuing Model." Advances in Modelling and Analysis B 64, no. 1-4 (2021): 46–53. http://dx.doi.org/10.18280/ama_b.641-407.

Full text

Abstract:

Today the MapReduce frameworks become the standard distributed computing mechanisms to store, process, analyze, query and transform the Bigdata. While processing the Bigdata, evaluating the performance of the MapReduce framework is essential, to understand the process dependencies and to tune the hyper-parameters. Unfortunately, the scope of the MapReduce framework in-built functions is limited to evaluate the performance till some extent. A reliable analytical performance model is required in this area to evaluate the performance of the MapReduce frameworks. The main objective of this paper i

APA, Harvard, Vancouver, ISO, and other styles

5

Wei, Peng. "Analysis of Aliyun-based serverless on MapReduce efficiency." Applied and Computational Engineering 88, no. 1 (2024): 61–68. http://dx.doi.org/10.54254/2755-2721/88/20241499.

Full text

Abstract:

In the context of the current era of big data, traditional Hadoop and cluster-based MapReduce frameworks are unable to meet the demands of modern research. This paper presents a MapReduce framework based on the AliCloud Serverless platform, which has been developed with the objective of optimizing word frequency counting in large-scale English texts. Leveraging AliCloud's dynamic resource allocation and elastic scaling, we have created an efficient and flexible text data processing system. This paper details the design and implementation of the Map and Reduce phases and analyses the impact of

APA, Harvard, Vancouver, ISO, and other styles

6

Kang, Sol Ji, Sang Yeon Lee, and Keon Myung Lee. "Performance Comparison of OpenMP, MPI, and MapReduce in Practical Problems." Advances in Multimedia 2015 (2015): 1–9. http://dx.doi.org/10.1155/2015/575687.

Full text

Abstract:

With problem size and complexity increasing, several parallel and distributed programming models and frameworks have been developed to efficiently handle such problems. This paper briefly reviews the parallel computing models and describes three widely recognized parallel programming frameworks: OpenMP, MPI, and MapReduce. OpenMP is the de facto standard for parallel programming on shared memory systems. MPI is the de facto industry standard for distributed memory systems. MapReduce framework has become the de facto standard for large scale data-intensive applications. Qualitative pros and con

APA, Harvard, Vancouver, ISO, and other styles

7

Srirama, Satish Narayana, Oleg Batrashev, Pelle Jakovits, and Eero Vainikko. "Scalability of Parallel Scientific Applications on the Cloud." Scientific Programming 19, no. 2-3 (2011): 91–105. http://dx.doi.org/10.1155/2011/361854.

Full text

Abstract:

Cloud computing, with its promise of virtually infinite resources, seems to suit well in solving resource greedy scientific computing problems. To study the effects of moving parallel scientific applications onto the cloud, we deployed several benchmark applications like matrix–vector operations and NAS parallel benchmarks, and DOUG (Domain decomposition On Unstructured Grids) on the cloud. DOUG is an open source software package for parallel iterative solution of very large sparse systems of linear equations. The detailed analysis of DOUG on the cloud showed that parallel applications benefit

APA, Harvard, Vancouver, ISO, and other styles

8

Zhang, Yuhong. "MapReduce based on serverless platforms." Applied and Computational Engineering 40, no. 1 (2024): 168–73. http://dx.doi.org/10.54254/2755-2721/40/20230645.

Full text

Abstract:

This study intends to investigate the application of the MapReduce (MR) framework based on serverless computing in big data processing. By combining the MapReduce model with serverless computing, efficient data processing is achieved. In this framework, the phases of Map task execution, reduce task execution, etc. are accomplished through stateless serverless functions, and data storage is realized with the help of cloud storage platforms (e.g., OSS). In this paper, the author introduces the basic theory of MR, the basic theory of serverless computing, describes the framework implementation pr

APA, Harvard, Vancouver, ISO, and other styles

9

Adornes, Daniel, Dalvan Griebler, Cleverson Ledur, and Luiz Gustavo Fernandes. "Coding Productivity in MapReduce Applications for Distributed and Shared Memory Architectures." International Journal of Software Engineering and Knowledge Engineering 25, no. 09n10 (2015): 1739–41. http://dx.doi.org/10.1142/s0218194015710096.

Full text

Abstract:

MapReduce was originally proposed as a suitable and efficient approach for analyzing and processing large amounts of data. Since then, many researches contributed with MapReduce implementations for distributed and shared memory architectures. Nevertheless, different architectural levels require different optimization strategies in order to achieve high-performance computing. Such strategies in turn have caused very different MapReduce programming interfaces among these researches. This paper presents some research notes on coding productivity when developing MapReduce applications for distribu

APA, Harvard, Vancouver, ISO, and other styles

10

Senthilkumar, M., and P. Ilango. "A Survey on Job Scheduling in Big Data." Cybernetics and Information Technologies 16, no. 3 (2016): 35–51. http://dx.doi.org/10.1515/cait-2016-0033.

Full text

Abstract:

Abstract Big Data Applications with Scheduling becomes an active research area in last three years. The Hadoop framework becomes very popular and most used frameworks in a distributed data processing. Hadoop is also open source software that allows the user to effectively utilize the hardware. Various scheduling algorithms of the MapReduce model using Hadoop vary with design and behavior, and are used for handling many issues like data locality, awareness with resource, energy and time. This paper gives the outline of job scheduling, classification of the scheduler, and comparison of different

APA, Harvard, Vancouver, ISO, and other styles

11

Song, Minjae, Hyunsuk Oh, Seungmin Seo, and Kyong-Ho Lee. "Map-Side Join Processing of SPARQL Queries Based on Abstract RDF Data Filtering." Journal of Database Management 30, no. 1 (2019): 22–40. http://dx.doi.org/10.4018/jdm.2019010102.

Full text

Abstract:

The amount of RDF data being published on the Web is increasing at a massive rate. MapReduce-based distributed frameworks have become the general trend in processing SPARQL queries against RDF data. Currently, query processing systems that use MapReduce have not been able to keep up with the increase of semantic annotated data, resulting in non-interactive SPARQL query processing. The principal reason is that intermediate query results from join operations in a MapReduce framework are so massive that they consume all available network bandwidth. In this article, the authors present an efficien

APA, Harvard, Vancouver, ISO, and other styles

12

He, Yuheng, Jin Qian, Juanjie Zhang, and Renzhe Zhang. "Word frequency statistics based on MapReduce on serverless platforms." Applied and Computational Engineering 68, no. 1 (2024): 356–67. http://dx.doi.org/10.54254/2755-2721/68/20241536.

Full text

Abstract:

This paper investigates the application of serverless computing in conjunction with the MapReduce framework, particularly in machine learning (ML) tasks. The MapReduce programming model has been widely used to process large-scale datasets by simplifying parallel and distributed data processing. This study explores how the combination of these two technologies can provide more efficient and cost-effective ML solutions. Through a detailed analysis of serverless environments and the MapReduce framework, this paper shows how the combination can advance the fields of cloud computing and machine lea

APA, Harvard, Vancouver, ISO, and other styles

13

Goncalves, Carlos, Luis Assuncao, and Jose C. Cunha. "Flexible MapReduce Workflows for Cloud Data Analytics." International Journal of Grid and High Performance Computing 5, no. 4 (2013): 48–64. http://dx.doi.org/10.4018/ijghpc.2013100104.

Full text

Abstract:

Data analytics applications handle large data sets subject to multiple processing phases, some of which can execute in parallel on clusters, grids or clouds. Such applications can benefit from using MapReduce model, only requiring the end-user to define the application algorithms for input data processing and the map and reduce functions, but this poses a need to install/configure specific frameworks such as Apache Hadoop or Elastic MapReduce in Amazon Cloud. In order to provide more flexibility in defining and adjusting the application configurations, as well as in the specification of the co

APA, Harvard, Vancouver, ISO, and other styles

14

Esposito, Christian, and Massimo Ficco. "Recent Developments on Security and Reliability in Large-Scale Data Processing with MapReduce." International Journal of Data Warehousing and Mining 12, no. 1 (2016): 49–68. http://dx.doi.org/10.4018/ijdwm.2016010104.

Full text

Abstract:

The demand to access to a large volume of data, distributed across hundreds or thousands of machines, has opened new opportunities in commerce, science, and computing applications. MapReduce is a paradigm that offers a programming model and an associated implementation for processing massive datasets in a parallel fashion, by using non-dedicated distributed computing hardware. It has been successfully adopted in several academic and industrial projects for Big Data Analytics. However, since such analytics is increasingly demanded within the context of mission-critical applications, security an

APA, Harvard, Vancouver, ISO, and other styles

15

Al-Absi, Ahmed Abdulhakim, Najeeb Abbas Al-Sammarraie, Wael Mohamed Shaher Yafooz, and Dae-Ki Kang. "Parallel MapReduce: Maximizing Cloud Resource Utilization and Performance Improvement Using Parallel Execution Strategies." BioMed Research International 2018 (October 17, 2018): 1–17. http://dx.doi.org/10.1155/2018/7501042.

Full text

Abstract:

MapReduce is the preferred cloud computing framework used in large data analysis and application processing. MapReduce frameworks currently in place suffer performance degradation due to the adoption of sequential processing approaches with little modification and thus exhibit underutilization of cloud resources. To overcome this drawback and reduce costs, we introduce a Parallel MapReduce (PMR) framework in this paper. We design a novel parallel execution strategy of Map and Reduce worker nodes. Our strategy enables further performance improvement and efficient utilization of cloud resources

APA, Harvard, Vancouver, ISO, and other styles

16

Thabtah, Fadi, Suhel Hammoud, and Hussein Abdel-Jaber. "Parallel Associative Classification Data Mining Frameworks Based MapReduce." Parallel Processing Letters 25, no. 02 (2015): 1550002. http://dx.doi.org/10.1142/s0129626415500024.

Full text

Abstract:

Associative classification (AC) is a research topic that integrates association rules with classification in data mining to build classifiers. After dissemination of the Classification-based Association Rule algorithm (CBA), the majority of its successors have been developed to improve either CBA's prediction accuracy or the search for frequent ruleitems in the rule discovery step. Both of these steps require high demands in processing time and memory especially in cases of large training data sets or a low minimum support threshold value. In this paper, we overcome the problem of mining large

APA, Harvard, Vancouver, ISO, and other styles

17

Sifat Ibtisum, S M Atikur Rahman, and S. M. Saokat Hossain. "Comparative analysis of MapReduce and Apache Tez Performance in Multinode clusters with data compression." World Journal of Advanced Research and Reviews 20, no. 3 (2023): 519–26. http://dx.doi.org/10.30574/wjarr.2023.20.3.2486.

Full text

Abstract:

This article conducts a thorough comparative analysis of Apache Tez and MapReduce in the context of big data processing. It focuses on key performance metrics, scalability, and ease of use. The analysis begins with an overview of the architectural distinctions between the two frameworks, emphasizing their fundamental design principles. A detailed performance evaluation follows, considering factors such as execution time, resource utilization, and throughput across diverse workloads. The study explores scalability by examining how Apache Tez and MapReduce respond to increasing data volumes and

APA, Harvard, Vancouver, ISO, and other styles

18

Sifat, Ibtisum, M. Atikur Rahman S, and M. Saokat Hossain S. "Comparative analysis of MapReduce and Apache Tez Performance in Multinode clusters with data compression." World Journal of Advanced Research and Reviews 20, no. 3 (2023): 519–26. https://doi.org/10.5281/zenodo.12740062.

Full text

Abstract:

This article conducts a thorough comparative analysis of Apache Tez and MapReduce in the context of big data processing. It focuses on key performance metrics, scalability, and ease of use. The analysis begins with an overview of the architectural distinctions between the two frameworks, emphasizing their fundamental design principles. A detailed performance evaluation follows, considering factors such as execution time, resource utilization, and throughput across diverse workloads. The study explores scalability by examining how Apache Tez and MapReduce respond to increasing data volumes and

APA, Harvard, Vancouver, ISO, and other styles

19

Weipeng, Jing, Tian Dongxue, Chen Guangsheng, and Li Yiyuan. "Research on Improved Method of Storage and Query of Large-Scale Remote Sensing Images." Journal of Database Management 29, no. 3 (2018): 1–16. http://dx.doi.org/10.4018/jdm.2018070101.

Full text

Abstract:

The traditional method is used to deal with massive remote sensing data stored in low efficiency and poor scalability. This article presents a parallel processing method based on MapReduce and HBase. The filling of remote sensing images by the Hilbert curve makes the MapReduce method construct pyramids in parallel to reduce network communication between nodes. Then, the authors design a massive remote sensing data storage model composed of metadata storage model, index structure and filter column family. Finally, this article uses MapReduce frameworks to realize pyramid construction, storage a

APA, Harvard, Vancouver, ISO, and other styles

20

Wu, Shuhang. "Analysis of Parallel Optimisation Strategies Based on MapReduce Models." International Journal of Computer Science and Information Technology 4, no. 3 (2024): 354–59. https://doi.org/10.62051/ijcsit.v4n3.39.

Full text

Abstract:

The aim of this paper is to provide an in-depth analysis of parallel analysis strategies for MapReduce models, and to explore how to improve the overall performance by optimising task allocation and scheduling, improving data locality and increasing node utilisation. The research methodology includes an analysis and overview of existing MapReduce frameworks and proposes a series of improvement strategies. These strategies improve the utilisation of computing resources by adjusting the granularity of task division, optimising data slicing and distribution, and improving task scheduling algorith

APA, Harvard, Vancouver, ISO, and other styles

21

Ferreira, Tharso, Antonio Espinosa, Juan Carlos Moure, and Porfidio Hernández. "An Optimization for MapReduce Frameworks in Multi-core Architectures." Procedia Computer Science 18 (2013): 2587–90. http://dx.doi.org/10.1016/j.procs.2013.05.446.

Full text

APA, Harvard, Vancouver, ISO, and other styles

22

Marynowski, João Eugenio, Altair Olivo Santin, and Andrey Ricardo Pimentel. "Method for testing the fault tolerance of MapReduce frameworks." Computer Networks 86 (July 2015): 1–13. http://dx.doi.org/10.1016/j.comnet.2015.04.009.

Full text

APA, Harvard, Vancouver, ISO, and other styles

23

Diarra, Mamadou, and Telesphore B. Tiendrebeogo. "Performance Evaluation of Big Data Processing of Cloak-Reduce." International Journal of Distributed and Parallel systems 13, no. 1 (2022): 13–22. http://dx.doi.org/10.5121/ijdps.2022.13102.

Full text

Abstract:

Big Data has introduced the challenge of storing and processing large volumes of data (text, images, and videos). The success of centralised exploitation of massive data on a node is outdated, leading to the emergence of distributed storage, parallel processing and hybrid distributed storage and parallel processing frameworks. The main objective of this paper is to evaluate the load balancing and task allocation strategy of our hybrid distributed storage and parallel processing framework CLOAK-Reduce. To achieve this goal, we first performed a theoretical approach of the architecture and opera

APA, Harvard, Vancouver, ISO, and other styles

24

Alzyadat, Wael Jumah, Aysh AlHroob, Ikhlas Hassan Almukahel, and Rodziah Atan. "FUZZY MAP APPROACH FOR ACCRUING VELOCITY OF BIG DATA." COMPUSOFT: An International Journal of Advanced Computer Technology 08, no. 04 (2019): 3112–16. https://doi.org/10.5281/zenodo.14823055.

Full text

Abstract:

Each characteristic of Big Data (volume, velocity, variety, and value) illustrate a unique challenge to Big Data Analytics. The performance of Big Data from velocity characteristic, in particular, appear challenging of time complexity for reduced processing in dissimilar frameworks ranging from batch-oriented, MapReduce-based to real-time and stream-processing frameworks such as Spark and Storm. We proposed an approach to use a Fuzzy logic controller combined with MapReduce frameworks to handle the vehicle analysis by comparing the driving data from the new outcome vehicle trajectory. The prop

APA, Harvard, Vancouver, ISO, and other styles

25

Aihtesham Kazi. "Design of an Iterative Method for MapReduce Scheduling Using Deep Reinforcement Learning and Anomaly Detection." Communications on Applied Nonlinear Analysis 31, no. 4s (2024): 599–620. http://dx.doi.org/10.52783/cana.v31.953.

Full text

Abstract:

Due to high complexities within distributed computing environments, there's a critical need for advanced scheduling frameworks that are capable of optimizing MapReduce systems. Current approaches have static policies that limit their capability to adapt to changing system dynamics and workload variations for different cloud scenarios. To overcome these issues, this study introduces a robust MapReduce framework empowered by intelligent scheduling algorithms, tailored to enhance system efficiency and resilience levels. The framework introduces three novel scheduling models: Deep Reinforcement Le

APA, Harvard, Vancouver, ISO, and other styles

26

Memishi, Bunjamin, María S. Pérez, and Gabriel Antoniu. "Feedback-Based Resource Allocation in MapReduce-Based Systems." Scientific Programming 2016 (2016): 1–13. http://dx.doi.org/10.1155/2016/7241928.

Full text

Abstract:

Containers are considered an optimized fine-grain alternative to virtual machines in cloud-based systems. Some of the approaches which have adopted the use of containers are the MapReduce frameworks. This paper makes an analysis of the use of containers in MapReduce-based systems, concluding that the resource utilization of these systems in terms of containers is suboptimal. In order to solve this, the paper describes AdaptCont, a proposal for optimizing the containers allocation in MapReduce systems. AdaptCont is based on the foundations of feedback systems. Two different selection approaches

APA, Harvard, Vancouver, ISO, and other styles

27

Saundatt, Sujay i. "Databases In The 21’st Century." International Journal for Research in Applied Science and Engineering Technology 10, no. 6 (2022): 1440–44. http://dx.doi.org/10.22214/ijraset.2022.43982.

Full text

Abstract:

Abstract: NoSQL databases are the 21’st century databases created to defeat the disadvantages of RDBMS. The objective of NoSQL is to give versatility, accessibility and meet different necessities of distributed computing.The main motivations for NoSQL databases systems are achieving scalability and fail over needs. In the vast majority of the NoSQL data set frameworks, information is parceled and repeated across numerous hubs. Innately, the majority of them utilize either Google's MapReduce or Hadoop Distributed File System or Hadoop MapReduce for information assortment. Cassandra, HBase and M

APA, Harvard, Vancouver, ISO, and other styles

28

Latha, Rondi Pushpa, and Persis Voola. "Satellite Image Classification using Transformers with Map Reduce-Based Pre-processing Framework." Indian Journal Of Science And Technology 18, no. 19 (2025): 1471–77. https://doi.org/10.17485/ijst/v18i19.743.

Full text

Abstract:

Objectives: This study aims to develop a deep learning framework with satellite image classification, a framework that is scalable as well as accurate, effectively handling the challenges posed by large-scale remote sensing datasets. The framework has the goal of improving upon classification precision and optimizes through using computational resources. It supports multiple critical applications like agricultural monitoring, land cover mapping, and disaster response with its categorization of satellite images into certain distinct land cover types, including urban, agricultural, forest, deser

APA, Harvard, Vancouver, ISO, and other styles

29

Astsatryan, Hrachya, Aram Kocharyan, Daniel Hagimont, and Arthur Lalayan. "Performance Optimization System for Hadoop and Spark Frameworks." Cybernetics and Information Technologies 20, no. 6 (2020): 5–17. http://dx.doi.org/10.2478/cait-2020-0056.

Full text

Abstract:

Abstract The optimization of large-scale data sets depends on the technologies and methods used. The MapReduce model, implemented on Apache Hadoop or Spark, allows splitting large data sets into a set of blocks distributed on several machines. Data compression reduces data size and transfer time between disks and memory but requires additional processing. Therefore, finding an optimal tradeoff is a challenge, as a high compression factor may underload Input/Output but overload the processor. The paper aims to present a system enabling the selection of the compression tools and tuning the compr

APA, Harvard, Vancouver, ISO, and other styles

30

Yang, Wen Chuan, Jiang Yong Wang, and Hao Yu Zeng. "A MapReduce Telecommunication Data Center Analysis Model." Advanced Materials Research 734-737 (August 2013): 2863–66. http://dx.doi.org/10.4028/www.scientific.net/amr.734-737.2863.

Full text

Abstract:

With the widely use of smart phone in China, all inputs and routes packets streams to the Content Distribution Service (CDS) switching centers. Each produces up to 1.5 terabytes arriving every day. Normally, the job of the switch is to transmit data. Obviously, the ordinary database cannot handle the massive dataset and complex ad-hoc query. In this paper, we propose DeepMR, a MapReduce deep service analysis system based on Hive/Hadoop frameworks. A distributed file system HDFS is used in DeepMR for fast data sharing and query. DeepMR also optimizes scheduling for switch analysis jobs and supp

APA, Harvard, Vancouver, ISO, and other styles

31

Tiwari, Jyotindra, Dr Mahesh Pawar, and Dr Anjajana Pandey. "A Survey on Accelerated Mapreduce for Hadoop." Oriental journal of computer science and technology 10, no. 3 (2017): 597–602. http://dx.doi.org/10.13005/ojcst/10.03.07.

Full text

Abstract:

Big Data is defined by 3Vs which stands for variety, volume and velocity. The volume of data is very huge, data exists in variety of file types and data grows very rapidly. Big data storage and processing has always been a big issue. Big data has become even more challenging to handle these days. To handle big data high performance techniques have been introduced. Several frameworks like Apache Hadoop has been introduced to process big data. Apache Hadoop provides map/reduce to process big data. But this map/reduce can be further accelerated. In this paper a survey has been performed for map/r

APA, Harvard, Vancouver, ISO, and other styles

32

Rondi, Pushpa Latha, and Voola Persis. "Satellite Image Classification using Transformers with Map Reduce-Based Pre-processing Framework." Indian Journal of Science and Technology 18, no. 19 (2025): 1471–77. https://doi.org/10.17485/IJST/v18i19.743.

Full text

Abstract:

Abstract <strong>Objectives:</strong> This study aims to develop a deep learning framework with satellite image classification, a framework that is scalable as well as accurate, effectively handling the challenges posed by large-scale remote sensing datasets. The framework has the goal of improving upon classification precision and optimizes through using computational resources. It supports multiple critical applications like agricultural monitoring, land cover mapping, and disaster response with its categorization of satellite images into certain distinct land cover types, including urb

APA, Harvard, Vancouver, ISO, and other styles

33

Khalid, Madiha, and Muhammad Murtaza Yousaf. "A Comparative Analysis of Big Data Frameworks: An Adoption Perspective." Applied Sciences 11, no. 22 (2021): 11033. http://dx.doi.org/10.3390/app112211033.

Full text

Abstract:

The emergence of social media, the worldwide web, electronic transactions, and next-generation sequencing not only opens new horizons of opportunities but also leads to the accumulation of a massive amount of data. The rapid growth of digital data generated from diverse sources makes it inapt to use traditional storage, processing, and analysis methods. These limitations have led to the development of new technologies to process and store very large datasets. As a result, several execution frameworks emerged for big data processing. Hadoop MapReduce, the pioneering framework, set the ground fo

APA, Harvard, Vancouver, ISO, and other styles

34

Azhir, Elham, Mehdi Hosseinzadeh, Faheem Khan, and Amir Mosavi. "Performance Evaluation of Query Plan Recommendation with Apache Hadoop and Apache Spark." Mathematics 10, no. 19 (2022): 3517. http://dx.doi.org/10.3390/math10193517.

Full text

Abstract:

Access plan recommendation is a query optimization approach that executes new queries using prior created query execution plans (QEPs). The query optimizer divides the query space into clusters in the mentioned method. However, traditional clustering algorithms take a significant amount of execution time for clustering such large datasets. The MapReduce distributed computing model provides efficient solutions for storing and processing vast quantities of data. Apache Spark and Apache Hadoop frameworks are used in the present investigation to cluster different sizes of query datasets in the Map

APA, Harvard, Vancouver, ISO, and other styles

35

Abbas, Haider Hadi, Poh Soon JosephNg, Ahmed Lateef Khalaf, Jamal Fadhil Tawfeq, and Ahmed Dheyaa Radhi. "A powerful heuristic method for generating efficient database systems." Bulletin of Electrical Engineering and Informatics 12, no. 6 (2023): 3706–16. http://dx.doi.org/10.11591/eei.v12i6.5070.

Full text

Abstract:

Heuristic functions are an integral part of MapReduce software, both in Apache Hadoop and Spark. If the heuristic function performs badly, the load in the reduce part will not be balanced and access times spike. To investigate this problem closer, we run an optimal database program with numerous different heuristic functions on database. We will leverage the Amazon elastic MapReduce framework. The paper investigates on general purpose, implementation, and evaluation of heuristic algorithm for generating optimal database system, checksum, and special heuristic functions. With the analysis, we p

APA, Harvard, Vancouver, ISO, and other styles

36

Jo, Junghee, and Kang-Woo Lee. "High-Performance Geospatial Big Data Processing System Based on MapReduce." ISPRS International Journal of Geo-Information 7, no. 10 (2018): 399. http://dx.doi.org/10.3390/ijgi7100399.

Full text

Abstract:

With the rapid development of Internet of Things (IoT) technologies, the increasing volume and diversity of sources of geospatial big data have created challenges in storing, managing, and processing data. In addition to the general characteristics of big data, the unique properties of spatial data make the handling of geospatial big data even more complicated. To facilitate users implementing geospatial big data applications in a MapReduce framework, several big data processing systems have extended the original Hadoop to support spatial properties. Most of those platforms, however, have incl

APA, Harvard, Vancouver, ISO, and other styles

37

Yang, Wen Chuan, He Chen, and Qing Yi Qu. "Research of a MapReduce Model to Process the Traffic Big Data." Applied Mechanics and Materials 548-549 (April 2014): 1853–56. http://dx.doi.org/10.4028/www.scientific.net/amm.548-549.1853.

Full text

Abstract:

Normally, the job of the Traffic Data Processing Center (TDPC) is to monitor and retain data. There is a tendency to put more capability into the TDPC, such as ad-hoc query for speeding car identification and feedback abnormal traffic information. Thus we definitely need to think about what can be kept in working storage and how to analysis it. Obviously, the ordinary database cannot handle the massive dataset and complex ad-hoc query. MapReduce is a popular and widely used fine grain parallel runtime, which is developed for high performance processing of large scale dataset. In this paper, we

APA, Harvard, Vancouver, ISO, and other styles

38

Elgalb, Ahmed, and George Samaan. "Benchmarking Apache Spark vs. Hadoop: Evaluating In-Memory and Disk-Based Processing Models for Big Data Analytics." International Journal of Research in Science and Technology 12, no. 4 (2022): 43–52. https://doi.org/10.37648/ijrst.v12i04.008.

Full text

Abstract:

Apache Spark and Hadoop MapReduce are two of the most popular data processing paradigms for large-scale computing and each has its own model and philosophy of execution. Spark’s in-memory model promises better execution for iterative, interactive, and streaming workloads, and Hadoop MapReduce’s disk-based solution remains a staple of massive one-pass jobs. This paper presents an in-depth discussion of both frameworks based on studies and benchmarks published prior to 2022. By exploring their architectures, performance, fault tolerance, and compatibility with larger analytics stacks, it illustr

APA, Harvard, Vancouver, ISO, and other styles

39

Teffer, Dean, Ravi Srinivasan, and Joydeep Ghosh. "AdaHash: hashing-based scalable, adaptive hierarchical clustering of streaming data on Mapreduce frameworks." International Journal of Data Science and Analytics 8, no. 3 (2018): 257–67. http://dx.doi.org/10.1007/s41060-018-0145-7.

Full text

APA, Harvard, Vancouver, ISO, and other styles

40

Karamolegkos, Panagiotis, Argyro Mavrogiorgou, Athanasios Kiourtis, and Dimosthenis Kyriazis. "EverAnalyzer: A Self-Adjustable Big Data Management Platform Exploiting the Hadoop Ecosystem." Information 14, no. 2 (2023): 93. http://dx.doi.org/10.3390/info14020093.

Full text

Abstract:

Big Data is a phenomenon that affects today’s world, with new data being generated every second. Today’s enterprises face major challenges from the increasingly diverse data, as well as from indexing, searching, and analyzing such enormous amounts of data. In this context, several frameworks and libraries for processing and analyzing Big Data exist. Among those frameworks Hadoop MapReduce, Mahout, Spark, and MLlib appear to be the most popular, although it is unclear which of them best suits and performs in various data processing and analysis scenarios. This paper proposes EverAnalyzer, a sel

APA, Harvard, Vancouver, ISO, and other styles

41

Saadoon, Muntadher, Siti Hafizah Ab Hamid, Hazrina Sofian, et al. "Experimental Analysis in Hadoop MapReduce: A Closer Look at Fault Detection and Recovery Techniques." Sensors 21, no. 11 (2021): 3799. http://dx.doi.org/10.3390/s21113799.

Full text

Abstract:

Hadoop MapReduce reactively detects and recovers faults after they occur based on the static heartbeat detection and the re-execution from scratch techniques. However, these techniques lead to excessive response time penalties and inefficient resource consumption during detection and recovery. Existing fault-tolerance solutions intend to mitigate the limitations without considering critical conditions such as fail-slow faults, the impact of faults at various infrastructure levels and the relationship between the detection and recovery stages. This paper analyses the response time under two mai

APA, Harvard, Vancouver, ISO, and other styles

42

Yang, Wen Chuan, Rui Li, and Zhi Dong Shang. "A MapReduce Model to Process Massive Switching Center Data Set." Applied Mechanics and Materials 548-549 (April 2014): 1557–60. http://dx.doi.org/10.4028/www.scientific.net/amm.548-549.1557.

Full text

Abstract:

Accompany the widely use of smart phone in China, all inputs and routes packets streams to the Telecommunication Content Distribution Service Switching Centers (TSC). There is a tendency to put more capability into the switch, such as retain or query passing by data. Thus we definitely need to think about what can be kept in working storage and how to analysis it. Obviously, the ordinary database cannot handle the massive dataset and complex ad-hoc query. In this paper, we propose MRTSC, a MapReduce deep service analysis system based on Hive/Hadoop frameworks. A distributed file system HDFS is

APA, Harvard, Vancouver, ISO, and other styles

43

Sujit R Wakchaure, Et al. "MR-AT: Map Reduce based Apriori Technique for Sequential Pattern Mining using Big Data in Hadoop." International Journal on Recent and Innovation Trends in Computing and Communication 11, no. 9 (2023): 4258–67. http://dx.doi.org/10.17762/ijritcc.v11i9.9877.

Full text

Abstract:

One of the most well-known and widely implemented data mining methods is Apriori algorithm which is responsible for mining frequent item sets. The effectiveness of the Apriori algorithm has been improved by a number of algorithms that have been introduced on both parallel and distributed platforms in recent years. They are distinct from one another on account of the method of load balancing, memory system, method of data degradation, and data layout that was utilised in their implementation. The majority of the issues that arise with distributed frameworks are associated with the operating cos

APA, Harvard, Vancouver, ISO, and other styles

44

Anand, L., K. Senthilkumar, N. Arivazhagan, and V. Sivakumar. "Analysis for guaranteeing performance in map reduce systems with hadoop and R." International Journal of Engineering & Technology 7, no. 3.3 (2018): 445. http://dx.doi.org/10.14419/ijet.v7i2.33.14207.

Full text

Abstract:

Corporates have fast developing measures of information to technique and store, an information blast goes ahead by USA. By and by one on the whole the chief regular ways to deal with treat these gigantic data amounts region units upheld the MapReduce parallel programming worldview. Though its utilization is across the board inside the exchange, guaranteeing execution limitations, while at a comparable time limiting costs, still gives escalated challenges. We have an angle to have a trend to propose a harsh grained administration hypothetical approach, bolstered procedures that have effectively

APA, Harvard, Vancouver, ISO, and other styles

45

Niketa Penumajji. "Transformations in Database and Data Processing Frameworks for Large-Scale Systems." international journal of engineering technology and management sciences 8, no. 5 (2024): 92–102. https://doi.org/10.46647/ijetms.2024.v08i05.013.

Full text

Abstract:

With the exponential growth of data, traditional database management systems (DBMSs)face unprecedented challenges in scalability, performance, and flexibility. This paper surveys keydevelopments in scalable database systems, focusing on their evolution and integration with moderndistributed architectures such as MapReduce, HadoopDB, alongside systems like Presto, whichemphasize high-speed analytics through distributed SQL. We examine the role of parallelism, queryoptimization, and extensibility in enhancing system performance and resource efficiency. Byreviewing critical techniques, including

APA, Harvard, Vancouver, ISO, and other styles

46

Prudhvi Naayini, Prudhvi Naayini. "Parallel Computing Data Processing: Frameworks, Implementations, and Case Studies." International Journal of Advances in Engineering and Management 7, no. 4 (2025): 405–15. https://doi.org/10.35629/5252-0704405415.

Full text

Abstract:

Parallelcomputinghasbecomefundamentalinprocessing the massive data volumes generated in modern science andindustry.Thispaperpresentsacomprehensivesurv ey and practical review of parallel computing in data processing, examining key frameworks (MPI, OpenMP, CUDA, Hadoop MapReduce, Apache Spark, etc.), implementations, and real- world case studies. We discuss the architectures underlying shared-memory and distributed-memory systems and illustrate how parallelism on CPUs and GPUs is exploited for highperformancecomputing(HPC)andbigdataanalytics. Wereview theoretical foundations (including Amdahl’s

APA, Harvard, Vancouver, ISO, and other styles

47

Astsatryan, Hrachya, Arthur Lalayan, Aram Kocharyan, and Daniel Hagimont. "Performance-efficient Recommendation and Prediction Service for Big Data frameworks focusing on Data Compression and In-memory Data Storage Indicators." Scalable Computing: Practice and Experience 22, no. 4 (2021): 401–12. http://dx.doi.org/10.12694/scpe.v22i4.1945.

Full text

Abstract:

The MapReduce framework manages Big Data sets by splitting the large datasets into a set of distributed blocks and processes them in parallel. Data compression and in-memory file systems are widely used methods in Big Data processing to reduce resource-intensive I/O operations and improve I/O rate correspondingly. The article presents a performance-efficient modular and configurable decision-making robust service relying on data compression and in-memory data storage indicators. The service consists of Recommendation and Prediction modules, predicts the execution time of a given job based on m

APA, Harvard, Vancouver, ISO, and other styles

48

Yang, Wen Chuan, Guang Jie Lin, and Jiang Yong Wang. "A MapReduce Clone Car Identification Model over Traffic Data Stream." Applied Mechanics and Materials 346 (August 2013): 117–22. http://dx.doi.org/10.4028/www.scientific.net/amm.346.117.

Full text

Abstract:

Accompany the widely use of Intelligent Traffic in China, all traffic input data streams to the Traffic Surveillance Center (TSC). Some metropolitan TSC, such as in Beijing, produces up to 18 million records and 1T image data arriving every hour. Normally, the job of the TSC is to monitor and retain data. There is a tendency to put more capability into the TSC, such as ad-hoc query for clone car identification and feedback abnormal traffic information. Thus we definitely need to think about what can be kept in working storage and how to analysis it. Obviously, the ordinary database cannot hand

APA, Harvard, Vancouver, ISO, and other styles

49

Afanasev, A. P., and S. S. Kolmogorova. "USING SYSTEM ANALYSIS TO OPTIMIZE BIG DATA PROCESSING." DYNAMICS OF SYSTEMS, MECHANISMS AND MACHINES 11, no. 1 (2023): 69–72. http://dx.doi.org/10.25206/2310-9793-2023-11-1-69-72.

Full text

Abstract:

Modern information technologies have made changes in the way data is collected, stored and processed. As a result, there is an urgent problem of developing and applying new methods of approaches to data analysis. For the convenience of working with big data, a large number of frameworks and distributed data warehouses have been implemented. One of the well-known methods to support the execution of largescale distributed application programs is MapReduce. The article presents an overview of modern technologies used in the field of big data processing, market analysis, as well as a comparison of

APA, Harvard, Vancouver, ISO, and other styles

50

Ates, Dagli1 Niall McCarroll2 and Dmitry Vasilenko3. "DATA PARTITIONING FOR ENSEMBLE MODEL BUILDING." International Journal on Cloud Computing: Services and Architecture (IJCCSA) 7, August (2018): 01–07. https://doi.org/10.5281/zenodo.1464728.

Full text

Abstract:

In distributed ensemble model-building algorithms, the performance and statistical validity of models are dependent on sizes of the input data partitions as well as the distribution of records among the partitions. Failure to correctly select and pre-process the data often results in the models which are not stable and do not perform well. This article introduces an optimized approach to building the ensemble models for very large data sets in distributed map-reduce environments using Pass-Stream-Merge (PSM) algorithm. To ensure the model correctness the input data is randomly distributed usin

APA, Harvard, Vancouver, ISO, and other styles

We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!