Log in

Relevant bibliographies by topics / Hadoop / Journal articles

To see the other types of publications on this topic, follow the link: Hadoop.

Journal articles on the topic 'Hadoop'

Author: Grafiati

Published: 4 June 2021

Last updated: 22 June 2024

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 journal articles for your research on the topic 'Hadoop.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse journal articles on a wide variety of disciplines and organise your bibliography correctly.

1

Li, Xin Liang, and Jian De Zheng. "Improvement of Hadoop Security Mechanism." Applied Mechanics and Materials 484-485 (January 2014): 912–15. http://dx.doi.org/10.4028/www.scientific.net/amm.484-485.912.

Full text

Abstract:

Hadoop, as an open-source cloud computing framework, is increasingly applied in many fields,while the weakness of security mechanism now becomes one of the main problems hindering its development. This paper first analyzes the current security mechanisms of Hadoop,then through study of Hadoop's security mechanism and analysis of security risk in its current version,proposes a corresponding solution based on secure multicast to resovle the security risks. All these could provide certain technical supports for the enterprises in their Hadoop applications with new security needs.

APA, Harvard, Vancouver, ISO, and other styles

2

Revathy, P., and Rajeswari Mukesh. "HadoopSec 2.0: Prescriptive analytics-based multi-model sensitivity-aware constraints centric block placement strategy for Hadoop." Journal of Intelligent & Fuzzy Systems 39, no. 6 (December 4, 2020): 8477–86. http://dx.doi.org/10.3233/jifs-189165.

Full text

Abstract:

Like many open-source technologies such as UNIX or TCP/IP, Hadoop was not created with Security in mind. Hadoop however evolved from the other tools over time and got widely adopted across large enterprises. Some of Hadoop’s architectural features present Hadoop its unique security issues. Given this security vulnerability and potential invasion of confidentiality due to malicious attackers or internal customers, organizations face challenges in implementing a strong security framework for Hadoop. Furthermore, given the method in which data is placed in Hadoop Cluster adds to the only growing list of these potential security vulnerabilities. Data privacy is compromised when these critical and data-sensitive blocks are accessed either by unauthorized users or for that matter even misuse by authorized users. In this paper, we intend to address the strategy of data block placement across the allotted DataNodes. Prescriptive analytics algorithms are used to determine the Sensitivity Index of the Data and thereby decide on data placement allocation to provide impenetrable access to an unauthorized user. This data block placement strategy aims to adaptively distribute the data across the cluster using innovative ML techniques to make the data infrastructure extra secured.

APA, Harvard, Vancouver, ISO, and other styles

3

Masadeh, M. B., M. S. Azmi, and S. S. S. Ahmad. "Available techniques in hadoop small file issue." International Journal of Electrical and Computer Engineering (IJECE) 10, no. 2 (April 1, 2020): 2097. http://dx.doi.org/10.11591/ijece.v10i2.pp2097-2101.

Full text

Abstract:

Hadoop is an optimal solution for big data processing and storing since being released in the late of 2006, hadoop data processing stands on master-slaves manner [1] that’s splits the large file job into several small files in order to process them separately, this technique was adopted instead of pushing one large file into a costly super machine to insights some useful information. Hadoop runs very good with large file of big data, but when it comes to big data in small files it could facing some problems in performance, processing slow down, data access delay, high latency and up to a completely cluster shutting down [2]. In this paper we will high light on one of hadoop’s limitations, that’s affects the data processing performance, one of these limits called “big data in small files” accrued when a massive number of small files pushed into a hadoop cluster which will rides the cluster to shut down totally. This paper also high light on some native and proposed solutions for big data in small files, how do they work to reduce the negative effects on hadoop cluster, and add extra performance on storing and accessing mechanism.

APA, Harvard, Vancouver, ISO, and other styles

4

Hanafi, Idris, and Amal Abdel-Raouf. "P-Codec: Parallel Compressed File Decompression Algorithm for Hadoop." INTERNATIONAL JOURNAL OF COMPUTERS & TECHNOLOGY 15, no. 8 (May 24, 2016): 6991–98. http://dx.doi.org/10.24297/ijct.v15i8.1500.

Full text

Abstract:

The increasing amount and size of data being handled by data analytic applications running on Hadoop has created a need for faster data processing. One of the effective methods for handling big data sizes is compression. Data compression not only makes network I/O processing faster, but also provides better utilization of resources. However, this approach defeats one of Hadoopâ€™s main purposes, which is the parallelism of map and reduce tasks. The number of map tasks created is determined by the size of the file, so by compressing a large file, the number of mappers is reduced which in turn decreases parallelism. Consequently, standard Hadoop takes longer times to process. In this paper, we propose the design and implementation of a Parallel Compressed File Decompressor (P-Codec) that improves the performance of Hadoop when processing compressed data. P-Codec includes two modules; the first module decompresses data upon retrieval by a data node during the phase of uploading the data to the Hadoop Distributed File System (HDFS). This process reduces the runtime of a job by removing the burden of decompression during the MapReduce phase. The second P-Codec module is a decompressed map task divider that increases parallelism by dynamically changing the map task split sizes based on the size of the final decompressed block. Our experimental results using five different MapReduce benchmarks show an average improvement of approximately 80% compared to standard Hadoop.

APA, Harvard, Vancouver, ISO, and other styles

5

Lee, Kyong-Ha, Woo Lam Kang, and Young-Kyoon Suh. "Improving I/O Efficiency in Hadoop-Based Massive Data Analysis Programs." Scientific Programming 2018 (December 2, 2018): 1–9. http://dx.doi.org/10.1155/2018/2682085.

Full text

Abstract:

Apache Hadoop has been a popular parallel processing tool in the era of big data. While practitioners have rewritten many conventional analysis algorithms to make them customized to Hadoop, the issue of inefficient I/O in Hadoop-based programs has been repeatedly reported in the literature. In this article, we address the problem of the I/O inefficiency in Hadoop-based massive data analysis by introducing our efficient modification of Hadoop. We first incorporate a columnar data layout into the conventional Hadoop framework, without any modification of the Hadoop internals. We also provide Hadoop with indexing capability to save a huge amount of I/O while processing not only selection predicates but also star-join queries that are often used in many analysis tasks.

APA, Harvard, Vancouver, ISO, and other styles

6

Adawiyah, Robiyatul, and Sirojul Munir. "Analisis Kecepatan Algoritma MapReduce Word Count Pada Cluster Hadoop Studi Kasus Pada Global Dataset of Events, Language and Tone (GDELT)." Jurnal Informatika Terpadu 6, no. 1 (March 6, 2020): 14–19. http://dx.doi.org/10.54914/jit.v6i1.214.

Full text

Abstract:

Penelitian diajukan untuk menganalisis kecepatan algoritma mapreduce pada cluster Hadoop dan mengetahui waktu yang dibutuhkan dalam mengolah data GDELT pada Hadoop. Penelitian ini menggunakan metode analisis kualitatif. Berdasarkan analisa data yang telah dilakukan, diperoleh kesimpulan bahwa algoritma Word Count yang diterapkan pada data set GDELT dapat berjalan pada cluster Hadoop. Kecepatan algoritma Word Count pada MapReduce yang diterapkan untuk data set GDELT pada hadoop berpengaruh apabila node yang digunakan ditambah, dimana dalam penelitian menggunakan sebanyak 2 node physical machine. Hadoop dapat mengolah data yang memiliki ukuran besar dan banyak karena Hadoop mengolah data secara terdistribusi. Kecepatan Hadoop dapat diatur dengan menambahkan node dan juga pengaturan lainnya seperti halnya block size.

APA, Harvard, Vancouver, ISO, and other styles

7

Azeroual, Otmane, and Renaud Fabre. "Processing Big Data with Apache Hadoop in the Current Challenging Era of COVID-19." Big Data and Cognitive Computing 5, no. 1 (March 9, 2021): 12. http://dx.doi.org/10.3390/bdcc5010012.

Full text

Abstract:

Big data have become a global strategic issue, as increasingly large amounts of unstructured data challenge the IT infrastructure of global organizations and threaten their capacity for strategic forecasting. As experienced in former massive information issues, big data technologies, such as Hadoop, should efficiently tackle the incoming large amounts of data and provide organizations with relevant processed information that was formerly neither visible nor manageable. After having briefly recalled the strategic advantages of big data solutions in the introductory remarks, in the first part of this paper, we focus on the advantages of big data solutions in the currently difficult time of the COVID-19 pandemic. We characterize it as an endemic heterogeneous data context; we then outline the advantages of technologies such as Hadoop and its IT suitability in this context. In the second part, we identify two specific advantages of Hadoop solutions, globality combined with flexibility, and we notice that they are at work with a “Hadoop Fusion Approach” that we describe as an optimal response to the context. In the third part, we justify selected qualifications of globality and flexibility by the fact that Hadoop solutions enable comparable returns in opposite contexts of models of partial submodels and of models of final exact systems. In part four, we remark that in both these opposite contexts, Hadoop’s solutions allow a large range of needs to be fulfilled, which fits with requirements previously identified as the current heterogeneous data structure of COVID-19 information. In the final part, we propose a framework of strategic data processing conditions. To the best of our knowledge, they appear to be the most suitable to overcome COVID-19 massive information challenges.

APA, Harvard, Vancouver, ISO, and other styles

8

Li, Pengcheng, Haidong Chen, Shipeng Li, Tinggui Yan, and Hang Qian. "Research on Distributed Calculation of Flight Parameters Based on Hadoop." Journal of Physics: Conference Series 2337, no. 1 (September 1, 2022): 012013. http://dx.doi.org/10.1088/1742-6596/2337/1/012013.

Full text

Abstract:

Abstract With the improvement of launch vehicle technology and the increase of launch missions, under the intensive launch tasks, the contradiction between the large-scale calculation demand of flight parameters of launch vehicle and the traditional standalone calculation mode is increasingly prominent, which is mainly reflected in the slow calculation speed, low processing efficiency, limited bandwidth bottleneck, and single point fault. Big data processing architecture Hadoop’s distributed computing framework MapReduce, running in low cost cluster, is innovative applied in the large-scale calculation of launch vehicle’s flight parameters, relying on its distributed storage, coordination and load balancing mechanism. The method effectively improves computing efficiency, breakthroughs performance bottleneck and avoids single point failure. Compared with the traditional standalone deployment and pseudo-distributed cluster deployment based on Hadoop, the fully distributed cluster deployment based on Hadoop is the optimal deployment for calculating flight parameters. The results show that the parallel calculation of flight parameters based on Hadoop could save time used by 51% with the consistency of calculation results, which is the same as the standalone deployment.

APA, Harvard, Vancouver, ISO, and other styles

9

Ji, Keungyeup, and Youngmi Kwon. "Hadoop MapReduce Performance Optimization Analysis by Calibrating Hadoop Parameters." Journal of Korean Institute of Information Technology 19, no. 6 (June 30, 2021): 9–19. http://dx.doi.org/10.14801/jkiit.2021.19.6.9.

Full text

APA, Harvard, Vancouver, ISO, and other styles

10

Lee, Sungchul, Ju-Yeon Jo, and Yoohwan Kim. "Hadoop Performance Analysis Model with Deep Data Locality." Information 10, no. 7 (June 27, 2019): 222. http://dx.doi.org/10.3390/info10070222.

Full text

Abstract:

Background: Hadoop has become the base framework on the big data system via the simple concept that moving computation is cheaper than moving data. Hadoop increases a data locality in the Hadoop Distributed File System (HDFS) to improve the performance of the system. The network traffic among nodes in the big data system is reduced by increasing a data-local on the machine. Traditional research increased the data-local on one of the MapReduce stages to increase the Hadoop performance. However, there is currently no mathematical performance model for the data locality on the Hadoop. Methods: This study made the Hadoop performance analysis model with data locality for analyzing the entire process of MapReduce. In this paper, the data locality concept on the map stage and shuffle stage was explained. Also, this research showed how to apply the Hadoop performance analysis model to increase the performance of the Hadoop system by making the deep data locality. Results: This research proved the deep data locality for increasing performance of Hadoop via three tests, such as, a simulation base test, a cloud test and a physical test. According to the test, the authors improved the Hadoop system by over 34% by using the deep data locality. Conclusions: The deep data locality improved the Hadoop performance by reducing the data movement in HDFS.

APA, Harvard, Vancouver, ISO, and other styles

11

Lu, An Sheng, Jian Jiang Cai, Wei Jin, and Lu Wang. "Research and Practice of Cloud Computing Based on Hadoop." Applied Mechanics and Materials 644-650 (September 2014): 3387–89. http://dx.doi.org/10.4028/www.scientific.net/amm.644-650.3387.

Full text

Abstract:

Hadoop is a distributed parallel processing of massive data computing platform. It is currently the most widely used cloud computing platform. This paper analyses and studies the Hadoop distributed file system HDFS and the calculation model of MapReduce on Hadoop platform and the cloud computing model that is based on Hadoop, the paper introduces the process of building cloud computing platform, Hadoop, operating environment, and proposing the implementation.

APA, Harvard, Vancouver, ISO, and other styles

12

Zhang, Wei Feng, and Tin Wang. "Hadoop: Analysis of Cloud Computing Infrastructure." Applied Mechanics and Materials 475-476 (December 2013): 1201–6. http://dx.doi.org/10.4028/www.scientific.net/amm.475-476.1201.

Full text

Abstract:

As a kind of cloud computing infrastructure, Hadoop has attracted more attention from a group of corporations and has been widely used. The framework and character of Hadoop were detailed introduced and analyzed as well as the application opportunities of Hadoop in the field of communication in the future were prospected in this paper. An application based on Hadoop was designed and experiment result demonstrates the efficiency of Hadoop on dealing with big dataset.

APA, Harvard, Vancouver, ISO, and other styles

13

Dittrich, Jens, Jorge-Arnulfo Quiané-Ruiz, Alekh Jindal, Yagiz Kargin, Vinay Setty, and Jörg Schad. "Hadoop++." Proceedings of the VLDB Endowment 3, no. 1-2 (September 2010): 515–29. http://dx.doi.org/10.14778/1920841.1920908.

Full text

APA, Harvard, Vancouver, ISO, and other styles

14

Giri, Pratit Raj, and Gajendra Sharma. "Apache Hadoop Architecture, Applications, and Hadoop Distributed File System." Semiconductor Science and Information Devices 4, no. 1 (May 18, 2022): 14. http://dx.doi.org/10.30564/ssid.v4i1.4619.

Full text

Abstract:

The data and internet are highly growing which causes problems in management of the big-data. For these kinds of problems, there are many software frameworks used to increase the performance of the distributed system. This software is used for the availability of large data storage. One of the most beneficial software frameworks used to utilize data in distributed systems is Hadoop. This paper introduces Apache Hadoop architecture, components of Hadoop, their significance in managing vast volumes of data in a distributed system. Hadoop Distributed File System enables the storage of enormous chunks of data over a distributed network. Hadoop Framework maintains fsImage and edits files, which supports the availability and integrity of data. This paper includes cases of Hadoop implementation, such as monitoring weather, processing bioinformatics.

APA, Harvard, Vancouver, ISO, and other styles

15

Hua, Guan-Jie, Che-Lun Hung, and Chuan Yi Tang. "Hadoop-MCC: Efficient Multiple Compound Comparison Algorithm Using Hadoop." Combinatorial Chemistry & High Throughput Screening 21, no. 2 (April 17, 2018): 84–92. http://dx.doi.org/10.2174/1386207321666180102120641.

Full text

Abstract:

Aim and Objective: In the past decade, the drug design technologies have been improved enormously. The computer-aided drug design (CADD) has played an important role in analysis and prediction in drug development, which makes the procedure more economical and efficient. However, computation with big data, such as ZINC containing more than 60 million compounds data and GDB-13 with more than 930 million small molecules, is a noticeable issue of time-consuming problem. Therefore, we propose a novel heterogeneous high performance computing method, named as Hadoop-MCC, integrating Hadoop and GPU, to copy with big chemical structure data efficiently. Materials and Methods: Hadoop-MCC gains the high availability and fault tolerance from Hadoop, as Hadoop is used to scatter input data to GPU devices and gather the results from GPU devices. Hadoop framework adopts mapper/reducer computation model. In the proposed method, mappers response for fetching SMILES data segments and perform LINGO method on GPU, then reducers collect all comparison results produced by mappers. Due to the high availability of Hadoop, all of LINGO computational jobs on mappers can be completed, even if some of the mappers encounter problems. Results: A comparison of LINGO is performed on each the GPU device in parallel. According to the experimental results, the proposed method on multiple GPU devices can achieve better computational performance than the CUDA-MCC on a single GPU device. Conclusion: Hadoop-MCC is able to achieve scalability, high availability, and fault tolerance granted by Hadoop, and high performance as well by integrating computational power of both of Hadoop and GPU. It has been shown that using the heterogeneous architecture as Hadoop-MCC effectively can enhance better computational performance than on a single GPU device.

APA, Harvard, Vancouver, ISO, and other styles

16

Hamad, Faten. "An Overview of Hadoop Scheduler Algorithms." Modern Applied Science 12, no. 8 (July 26, 2018): 69. http://dx.doi.org/10.5539/mas.v12n8p69.

Full text

Abstract:

Hadoop is a cloud computing open source system, used in large-scale data processing. It became the basic computing platforms for many internet companies. With Hadoop platform users can develop the cloud computing application and then submit the task to the platform. Hadoop has a strong fault tolerance, and can easily increase the number of cluster nodes, using linear expansion of the cluster size, so that clusters can process larger datasets. However Hadoop has some shortcomings, especially in the actual use of the process of exposure to the MapReduce scheduler, which calls for more researches on Hadoop scheduling algorithms.This survey provides an overview of the default Hadoop scheduler algorithms and the problem they have. It also compare between five Hadoop framework scheduling algorithms in term of the default scheduler algorithm to be enhanced, the proposed scheduler algorithm, type of cluster applied either heterogeneous or homogeneous, methodology, and clusters classification based on performance evaluation. Finally, a new algorithm based on capacity scheduling and use of perspective resource utilization to enhance Hadoop scheduling is proposed.

APA, Harvard, Vancouver, ISO, and other styles

17

Sun, Ya-ni, and Xinhua Chen. "Application and Realization of Improved Apriori Algorism in Hadoop Simulation Platform for Mass Data Process." International Journal of Online Engineering (iJOE) 12, no. 02 (February 29, 2016): 16. http://dx.doi.org/10.3991/ijoe.v12i02.5037.

Full text

Abstract:

This paper chose the open source distributed cloud computing platform-Hadoop from the Apache foundation as the basic platform for this research project, researched and discussed the Apriori distributed DM algorism in Hadoop platform after researched and analyzed the Hadoop platform structure, and gave improving and performance analysis and finally completed the construct and simulation research for Hadoop mass data process platform.

APA, Harvard, Vancouver, ISO, and other styles

18

Et. al., Ravi Kumar A,. "A Review on Design and Development of Performance Evaluation Model for Bio-Informatics Data Using Hadoop." Turkish Journal of Computer and Mathematics Education (TURCOMAT) 12, no. 2 (April 10, 2021): 1546–63. http://dx.doi.org/10.17762/turcomat.v12i2.1432.

Full text

Abstract:

The paper reviews the usage of the platform Hadoop in applications for systemic bioinformatics. Hadoop offers another system for Structural Bioinformatics to break down broad fractions of the Protein Data Bank that is crucial to high-throughput investigations of (for example) protein-ligand docking, protein-ligand complex clustering, and structural alignment. In specific, we review different applications of high-throughput analyses and their scalability in the literature using Hadoop. In comparison to revising the algorithms, we find that these organisations typically use a realized executable called MapReduce. Scalability demonstrates variable behavior in correlation with other batch schedulers, particularly as immediate examinations are usually not accessible on a similar platform. Direct Hadoop examinations with batch schedulers are missing in the literature, but we note that there is some evidence that the scale of MPI executions is better than Hadoop. The dilemma of the interface and structure of an asset to use Hadoop is a significant obstacle to the utilization of the Hadoop biological framework. This will enhance additional time as Hadoop interfaces, such as enhancing Flash, increasing the use of cloud platforms, and normalized approaches, for example, are taken up by Workflow Languages.

APA, Harvard, Vancouver, ISO, and other styles

19

Wu, Zhen Quan, and Bing Pan. "Research of Distributed Search Engine Based on Hadoop." Applied Mechanics and Materials 631-632 (September 2014): 171–74. http://dx.doi.org/10.4028/www.scientific.net/amm.631-632.171.

Full text

Abstract:

Combined with the Map/Reduce programming model, the Hadoop distributed file system, Lucene inverted file indexing technology and ICTCLAS Chinese word segmentation technology, we designed and implemented a distributed search engine system based on Hadoop. By testing of the system in the four-node Hadoop cluster environment, experimental results show that Hadoop platform can be used in search engines to improve system performance, reliability and scalability.

APA, Harvard, Vancouver, ISO, and other styles

20

Chen, Feng Ping, Li Miao, and Yue Gao Tang. "Research of Hadoop Parameters Tuning Based on Function Monitoring." Applied Mechanics and Materials 621 (August 2014): 264–70. http://dx.doi.org/10.4028/www.scientific.net/amm.621.264.

Full text

Abstract:

Hadoop is a popular software framework supports distributed processing of large data sets. However, with Hadoop being a relatively new technology, practitioners and administers often lack the expertise to tune it to get better performance. Hadoop parameters configuration is one of the key factors which influence the performance. In this article, we present a novel Hadoop parameters tuning method based on function monitoring. This method monitors the function call information during task run to analyze why the performance of Hadoop changes when tuning parameters, which will be helpful for practitioners and administer to tune parameters to get better performance.

APA, Harvard, Vancouver, ISO, and other styles

21

Yue, Hang. "Unstructured Healthcare Data Archiving and Retrieval Using Hadoop and Drill." International Journal of Big Data and Analytics in Healthcare 3, no. 2 (July 2018): 28–44. http://dx.doi.org/10.4018/ijbdah.2018070103.

Full text

Abstract:

A healthcare hybrid Hadoop ecosystem is analyzed for unstructured healthcare data archives. This healthcare hybrid Hadoop ecosystem is composed of some components such as Pig, Hive, Sqoop and Zoopkeeper, Hadoop Distributed File System (HDFS), MapReduce and HBase. Also, Apache Drill is applied for unstructured healthcare data retrieval. This article will discuss the combination of Hadoop and Drill for data analysis applications. Based on the analysis of Hadoop components, (including HBase design) and the case studies of Drill query design regarding different unstructured healthcare data, the Hadoop ecosystem and Drill are valid tools to integrate and access voluminous complex healthcare data. They can improve the healthcare systems, achieve savings on patient care costs, optimize the healthcare supply chain and infer useful knowledge from noisy and heterogeneous healthcare data sources.

APA, Harvard, Vancouver, ISO, and other styles

22

Deng, Zhong Hua, Bing Fan, Ying Jun Lu, and Zhi Fang Li. "Discussion about Big Data Mining Based on Hadoop." Applied Mechanics and Materials 380-384 (August 2013): 2063–66. http://dx.doi.org/10.4028/www.scientific.net/amm.380-384.2063.

Full text

Abstract:

As a Cloud computing platform, Hadoop has huge advantages in Data mining. The main aspects of Hadoop for data mining are discussed. A technical framework for big data mining based on Hadoop is analyzed.

APA, Harvard, Vancouver, ISO, and other styles

23

Prabowo, Sidik, and Maman Abdurohman. "Studi Perbandingan Performa Algoritma Penjadwalan untuk Real Time Data Twitter pada Hadoop." Komputika : Jurnal Sistem Komputer 9, no. 1 (April 3, 2020): 43–50. http://dx.doi.org/10.34010/komputika.v9i1.2848.

Full text

Abstract:

Hadoop merupakan sebuah framework software yang bersifat open source dan berbasis java. Hadoop terdiri atas dua komponen utama, yaitu MapReduce dan Hadoop Distributed File System (HDFS). MapReduce terdiri atas Map dan Reduce yang digunakan untuk pemrosesan data, sementara HDFS adalah tempat atau direktori dimana data hadoop dapat disimpan. Dalam menjalankan job yang tidak jarang terdapat keragaman karakteristik eksekusinya, diperlukan job scheduler yang tepat. Terdapat banyak job scheduler yang dapat di pilih supaya sesuai dengan karakteristik job. Fair Scheduler menggunakan salah satu scheduler dimana prisnsipnya memastikan suatu jobs akan mendapatkan resource yang sama dengan jobs yang lain, dengan tujuan meningkatkan performa dari segi Average Completion Time. Hadoop Fair Sojourn Protocol Scheduler adalah sebuah algoritma scheduling dalam Hadoop yang dapat melakukan scheduling berdasarkan ukuran jobs yang diberikan. Penelitian ini bertujuan untuk melihat perbandingan performa kedua scheduler tersebut untuk karakteristik data twitter. Hasil pengujian menunjukan Hadoop Fair Sojourn Protocol Scheduler memiliki performansi lebih baik dibandingkan Fair Scheduler baik dari penanganan average completion time sebesar 9,31% dan job throughput sebesar 23,46%. Kemudian untuk Fair Scheduler unggul dalam parameter task fail rate sebesar 23,98%.

APA, Harvard, Vancouver, ISO, and other styles

24

Han, Yong Qi, Yun Zhang, and Wei Dong Guan. "Research on Building the Cloud Platform Based on Hadoop." Applied Mechanics and Materials 513-517 (February 2014): 2468–71. http://dx.doi.org/10.4028/www.scientific.net/amm.513-517.2468.

Full text

Abstract:

Based on the research of the Hadoop cloud platform, this paper consisting of four servers in the network environment, focusing on four computers deployment environment, design and node arrangement, the Hadoop installation and configuration, realize the Hadoop cloud platform to build.

APA, Harvard, Vancouver, ISO, and other styles

25

Берсанов, М.-Д. А., Р. Ю. Исраилов, and Д. В. Андреянов. "Платформа для обработки больших данных HADOOP." ТЕНДЕНЦИИ РАЗВИТИЯ НАУКИ И ОБРАЗОВАНИЯ 97, no. 12 (2023): 32–34. http://dx.doi.org/10.18411/trnio-05-2023-651.

Full text

Abstract:

В статье обсуждается важность обработки больших данных в современной цифровой экосистеме. Это объясняет, как объем, скорость обработки и разнообразие данных стали важнее самого контента. Также рассказывается о технологии Hadoop, распределенной платформе обработки больших данных, разработанной командой инженеров Yahoo! в 2005 году. Раскрывается история создания Hadoop и ее ключевые компоненты, а именно распределенная файловая система Hadoop HDFS и система обработки данных MapReduce. Статья завершается обсуждением преимуществ и проблем использования Hadoop и его растущей популярности в ИТ-индустрии

APA, Harvard, Vancouver, ISO, and other styles

26

Gupta, Piyush, Pardeep Kumar, and Girdhar Gopal. "Sentiment Analysis on Hadoop with Hadoop Streaming." International Journal of Computer Applications 121, no. 11 (July 18, 2015): 4–8. http://dx.doi.org/10.5120/21582-4651.

Full text

APA, Harvard, Vancouver, ISO, and other styles

27

Sahu, kapil, Kaveri Bhatt, Prof Amit Saxena, and Kaptan Singh. "Implementation of Big-Data Applications Using Map Reduce Framework." International Journal of Engineering and Computer Science 9, no. 08 (August 12, 2020): 25125–31. http://dx.doi.org/10.18535/ijecs/v9i08.4504.

Full text

Abstract:

Clustering As a result of the rapid development in cloud computing, it & fundamental to investigate the performance of extraordinary Hadoop MapReduce purposes and to realize the performance bottleneck in a cloud cluster that contributes to higher or diminish performance. It is usually primary to research the underlying hardware in cloud cluster servers to permit the optimization of program and hardware to achieve the highest performance feasible. Hadoop is founded on MapReduce, which is among the most popular programming items for huge knowledge analysis in a parallel computing environment. In this paper, we reward a particular efficiency analysis, characterization, and evaluation of Hadoop MapReduce Word Count utility. The main aim of this paper is to give implements of Hadoop map-reduce programming by giving a hands-on experience in developing Hadoop based Word-Count and Apriori application. Word count problem using Hadoop Map Reduce framework. The Apriori Algorithm has been used for finding frequent item set using Map Reduce framework.

APA, Harvard, Vancouver, ISO, and other styles

28

Awasthi, Yogesh. "Enhancing approach for information security in Hadoop." Ukrainian Journal of Educational Studies and Information Technology 8, no. 1 (March 27, 2020): 39–49. http://dx.doi.org/10.32919/uesit.2020.01.04.

Full text

Abstract:

Hadoop, is one of the ongoing patterns in innovation which is utilized as a structure for the distributed storage, is an open-source appropriated figuring structure actualized in Java and comprises two modules that are MapReduce and Hadoop Distributed File System (HDFS). The MapReduce is intended for handling with enormous informational collections, it enables the clients to utilize a huge number of item machines in parallel adequately, by just characterizing guides and lessening capacities, the client can prepare huge amounts of information, HDFS is for putting away information on appropriated bunches of machines. Hadoop is normally utilized in a huge bunch or an open cloud administration, for example, Yahoo!, Facebook, Twitter, and Amazon. The versatility of Hadoop has been appeared by the notoriety of these applications, yet it is structured without security for putting away information. Using the Hadoop package, a proposed secure cloud computing system has been designed, so Hadoop would use the area to establish and enhance the security for saving and managing user data. Apache had also produced a software tool termed Hadoop to overcome this big data problem which often uses MapReduce architecture to process vast amounts of data. Hadoop has no strategy to assure the security and privacy of the files stored in Hadoop distributed file system (HDFS). As an encryption scheme of the files stored in HDFS, an asymmetric key cryptosystem is advocated. Thus before saving data in HDFS, the proposed hybrid based on RSA, and Rabin cipher would encrypt the data. The user of the cloud might upload files in two ways, non-safe or secure upload files.

APA, Harvard, Vancouver, ISO, and other styles

29

Awasthi, Yogesh, and Ashish Sharma. "Enhancing Approach for Information Security in Hadoop." JITCE (Journal of Information Technology and Computer Engineering) 4, no. 01 (March 30, 2020): 5–9. http://dx.doi.org/10.25077/jitce.4.01.5-9.2020.

Full text

Abstract:

Using the Hadoop package, a proposed secure cloud computing system was designed. Hadoop would use the area to establish and enhance the security for saving and managing user data. Apache had also produced a software tool termed Hadoop to overcome this big data problem which often uses MapReduce architecture to process vast amounts of data. Hadoop has no strategy to assure the security and privacy of the files stored in Hadoop distributed File system (HDFS). As an encryption scheme of the files stored in HDFS, an asymmetric key cryptosystem is advocated. Thus before saving data in HDFS, the proposed hybrid based on (RSA, and Rabin) cipher would encrypt the data. The user of the cloud might upload files in two ways, non-safe or secure upload files.

APA, Harvard, Vancouver, ISO, and other styles

30

Uriti, Archana, Surya Prakash Yalla, and Chunduru Anilkumar. "Understand the working of Sqoop and hive in Hadoop." Applied and Computational Engineering 6, no. 1 (June 14, 2023): 312–17. http://dx.doi.org/10.54254/2755-2721/6/20230798.

Full text

Abstract:

In past decades, the structured and consistent data analysis has seen huge success. It is a challenging task to analyse the multimedia data which is in unstructured format. Here the big data defines the huge volume of data that can be processed in distributed format. The big data can be analysed by using the hadoop tool which contains the Hadoop Distributed File System (HDFS) storage space and inbuilt several components are there. Hadoop manages the distributed data which is placed in the form of cluster analysis of data itself. In this, it shows the working of Sqoop and Hive in hadoop. Sqoop (SQL-to-Hadoop) is one of the Hadoop component that is designed to efficiently imports the huge data from traditional database to HDFS and vice versa. Hive is an open source software for managing large data files that is stored in HDFS. To show the working, here we are taking the application Instagram which is a most popular social media. In this analyze the data that is generated from Instagram that can be mined and utilized by using Sqoop and Hive. By this, prove that sqoop and hive can give results efficiently. This paper gives the details of sqoop and hive working in hadoop.

APA, Harvard, Vancouver, ISO, and other styles

31

Rahman, Md Armanur, J. Hossen, Venkataseshaiah C, CK Ho, Tan Kim Geok, Aziza Sultana, Jesmeen M. Z. H., and Ferdous Hossain. "A Survey of Machine Learning Techniques for Self-tuning Hadoop Performance." International Journal of Electrical and Computer Engineering (IJECE) 8, no. 3 (June 1, 2018): 1854. http://dx.doi.org/10.11591/ijece.v8i3.pp1854-1862.

Full text

Abstract:

The Apache Hadoop framework is an open source implementation of MapReduce for processing and storing big data. However, to get the best performance from this is a big challenge because of its large number configuration parameters. In this paper, the concept of critical issues of Hadoop system, big data and machine learning have been highlighted and an analysis of some machine learning techniques applied so far, for improving the Hadoop performance is presented. Then, a promising machine learning technique using deep learning algorithm is proposed for Hadoop system performance improvement.

APA, Harvard, Vancouver, ISO, and other styles

32

Liang, Yu, and Chao Wu. "A Hadoop-enabled sensor-oriented information system for knowledge discovery about target-of-interest." Facta universitatis - series: Electronics and Energetics 29, no. 3 (2016): 437–50. http://dx.doi.org/10.2298/fuee1603437l.

Full text

Abstract:

To obtain a real-time situational awareness about the specific behavior of targets-of-interest using large-scale sensory data-set, this paper presents a generic sensor-oriented information system based on Hadoop Ecosystem, which is denoted as SOIS-Hadoop for simplicity. Robotic heterogeneous sensor nodes bound by wireless sensor network are used to track things-of-interest. Hadoop Ecosystem enables highly scalable and fault-tolerant acquisition, fusion and storage, retrieval, and processing of sensory data. In addition, SOIS-Hadoop employs temporally and spatially dependent mathematical model to formulate the expected behavior of targets-of-interest, based on which the observed behavior of targets can be analyzed and evaluated. Using two real-world sensor-oriented information processing and analysis problems as examples, the mechanism of SOIS-Hadoop is also presented and validated in detail.

APA, Harvard, Vancouver, ISO, and other styles

33

Tripathi, A. K., S. Agrawal, and R. D. Gupta. "A COMPARATIVE ANALYSIS OF CONVENTIONAL HADOOP WITH PROPOSED CLOUD ENABLED HADOOP FRAMEWORK FOR SPATIAL BIG DATA PROCESSING." ISPRS Annals of Photogrammetry, Remote Sensing and Spatial Information Sciences IV-5 (November 15, 2018): 425–30. http://dx.doi.org/10.5194/isprs-annals-iv-5-425-2018.

Full text

Abstract:

<p><strong>Abstract.</strong> The emergence of new tools and technologies to gather the information generate the problem of processing spatial big data. The solution of this problem requires new research, techniques, innovation and development. Spatial big data is categorized by the five V’s: volume, velocity, veracity, variety and value. Hadoop is a most widely used framework which address these problems. But it requires high performance computing resources to store and process such huge data. The emergence of cloud computing has provided, on demand, elastic, scalable and payment based computing resources to users to develop their own computing environment. The main objective of this paper is to develop a cloud enabled hadoop framework which combines cloud technology and high computing resources with the conventional hadoop framework to support the spatial big data solutions. The paper also compares the conventional hadoop framework and proposed cloud enabled hadoop framework. It is observed that the propose cloud enabled hadoop framework is much efficient to spatial big data processing than the current available solutions.</p>

APA, Harvard, Vancouver, ISO, and other styles

34

Sirisha, N., and K. V.D. Kiran. "Authorization of Data In Hadoop Using Apache Sentry." International Journal of Engineering & Technology 7, no. 3.6 (July 4, 2018): 234. http://dx.doi.org/10.14419/ijet.v7i3.6.14978.

Full text

Abstract:

Big Data has become more popular, as it can provide on-demand, reliable and flexible services to users such as storage and its processing. The data security has become a major issue in the Big data. The open source HDFS software is used to store huge amount of data with high throughput and fault tolerance and Map Reduce is used for its computations and processing. However, it is a significant target in the Hadoop system, security model was not designed and became the major drawback of Hadoop software. In terms of storage, meta data security, sensitive data and also the data security will be an serious issue in HDFS. With the importance of Hadoop in today's enterprises, there is also an increasing trend in providing a high security features in enterprises. Over recent years, only some level of security in Hadoop such as Kerberos and Transparent Data Encryption(TDE),Encryption techniques, hash techniques are shown for Hadoop. This paper, shows the efforts that are made to present Hadoop Authorization security issues using Apache Sentry in HDFS.

APA, Harvard, Vancouver, ISO, and other styles

35

Zheng, Cong, and Li Qing Zhou. "Principle Study and Usage of Hadoop Benchmark." Advanced Materials Research 926-930 (May 2014): 1988–92. http://dx.doi.org/10.4028/www.scientific.net/amr.926-930.1988.

Full text

Abstract:

Hadoop is the most popular Cloud Computing platform with source code. Its maturity and completeness of ecosystem attract a large number of technical personnel and companies. With the development of a variety of assistive technologies, it is more and more easily to build Hadoop platform. But how to judge a Hadoop platform is built successfully and whether robust enough to be able to put into actual production also requires a series of benchmark tests. This article will introduce the principles and usage methods of Hadoop platform benchmark tools such as Slive, DFSIO and TeraSort. Then compare their results and make a comparison.

APA, Harvard, Vancouver, ISO, and other styles

36

Baranowski, Zbigniew, Emil Kleszcz, Prasanth Kothuri, Luca Canali, Riccardo Castellotti, Manuel Martin Marquez, Nuno Guilherme Matos de Barros, Evangelos Motesnitsalis, Piotr Mrowczynski, and Jose Carlos Luna Duran. "Evolution of the Hadoop Platform and Ecosystem for High Energy Physics." EPJ Web of Conferences 214 (2019): 04058. http://dx.doi.org/10.1051/epjconf/201921404058.

Full text

Abstract:

The interest in using scalable data processing solutions based on Apache Hadoop ecosystem is constantly growing in the High Energy Physics (HEP) community. This drives the need for increased reliability and availability of the central Hadoop service and underlying infrastructure provided to the community by the CERN IT department. This paper reports on the overall status of the Hadoop platform and related Hadoop and Spark service at CERN, detailing recent enhancements and features introduced in many areas including the service configuration, availability, alerting, monitoring and data protection, in order to meet the new requirements posed by the users’ community.

APA, Harvard, Vancouver, ISO, and other styles

37

Buthukuri, Bhavani, and Sivaram Rajeyyagari. "Investigation on Processing of Real-Time Streaming Big Data." International Journal of Engineering & Technology 7, no. 3.13 (July 27, 2018): 79. http://dx.doi.org/10.14419/ijet.v7i3.13.16329.

Full text

Abstract:

MapReduce is the most widely used for huge data processing and it is a part of the Hadoop big data and this will provide the quality and efficient results because of their processing functions. For the batch jobs, Hadoop is the proper and also there is inflated request for non-batch elements homogeneous interactive jobs, and high data currents. For this non-batch assignments, consider Hadoop is not useful and present situations are recommending to these new crises. In this paper, these are divided into two stages that are real-time processing, and stream processing of big data. For every stage, the models are deliberate, stability and diversity to Hadoop. For every group, we have provided the working systems and structures. For the creation of the new examples, some experiments are conducted to improve the new results belongs to available Hadoop-based solutions.

APA, Harvard, Vancouver, ISO, and other styles

38

Octavyanti Hakim, Annisa, Heri Wijayanto, and I. Gde Putu Wirarama. "Performa Klaster Hadoop Mapreduce Pada Private Cloud Computing Untuk Komputasi Skyline Query." Jurnal Rekayasa Tropis, Teknologi, dan Inovasi (RETROTEKIN) 1, no. 2 (December 29, 2023): 40–56. http://dx.doi.org/10.30872/retrotekin.v1i2.1110.

Full text

Abstract:

Untuk mengoptimalkan pemrosesan data besar dengan Hadoop, komputasi awan menyediakan infrastruktur yang mudah digunakan, menggabungkan layanan private cloud dengan Infrastructure as a Service (IaaS). Dalam skripsi ini, peneliti mengkarakterisasi dan menilai kinerja eksekusi data besar pada instans kluster virtual Hadoop MapReduce yang dibangun di private cloud Universitas Mataram. Dengan algoritma Skyline Query, kluster diuji dengan variasi data, mesin, dan ukuran blok HDFS pada 3 jenis data sintetis: anti-correlated, correlated, dan independent. Parameter waktu eksekusi digunakan untuk membandingkan hasil dengan kluster Hadoop pada infrastruktur fisik. Hasil pengujian kluster private cloud menunjukkan peningkatan waktu komputasi saat data meningkat dari 1,5 juta menjadi 12 juta pada 4 mesin: data anti-correlated (168%), correlated (194%), independent (126%). Tren serupa terjadi pada kluster Hadoop fisik. Pada skenario lainnya, kluster private cloud menunjukkan kinerja yang lebih baik dengan penambahan mesin hingga 7, sementara kluster Hadoop fisik mengalami overhead communication antar node ketika mesin diskalakan menjadi 7 mesin. Pemrosesan data 12 juta dengan ukuran blok HDFS 512 MB dan 7 mesin merupakan blocksize paling optimal karena menghasilkan waktu eksekusi terpendek. Berdasarkan uji statistik t menggunakan rata-rata waktu komputasi, disimpulkan bahwa kluster Hadoop di private cloud dengan spesifikasi Intel(R) Xeon (R) E3-1225 v5 @ 3,30 GHz RAM 16 GB lebih unggul dalam mengeksekusi aplikasi Skyline dibanding kluster Hadoop fisik dengan spesifikasi Intel Core i5 CPU @ 3,00GHz 4GB RAM.

APA, Harvard, Vancouver, ISO, and other styles

39

Yu Dai, Yu Dai. "3D Interior Design System Model Based on Computer Virtual Reality Technology." Journal of Electrical Systems 19, no. 4 (January 25, 2024): 84–101. http://dx.doi.org/10.52783/jes.625.

Full text

Abstract:

Globally, data volume is increases exponentially with increase in the proliferation with Cloud Computing. MapReduce is emerged as the prominent solution for the unprecedented growth in the efficient manner as it process both structured and unstructured data. The dynamic landscape of Virtual Reality has seen a significant shift towards technology-driven approaches, with data analytics and personalized learning becoming increasingly important. This paper introduces an innovative framework that leverages the power of Hadoop and MapReduce to elevate 3D virtual reality experiences within diverse VR Cloud settings. This paper presents the development of an efficient Cache-Based MapReduce framework (CMF) where Cache algorithms are effectively used to process queries on large-scale cloud-based data. The Hadoop System processes data in single-node Hadoop Clusters (Pseudo-distributed) as well as heterogeneous Hadoop Clusters (fully distributed nodes) within Amazon Web Services (AWS). The Hadoop System process the data in the single node Hadoop Cluster (Pseudo-distributed) heterogeneous Hadoop Cluster (fully distributed node) in the Amazon Web Services (AWS). The experimental analysis is evaluated for the SmallGutenberg and LargeGutenberg database. The developed model achieves the average reduction in job of 48.01% with reduction in execution time of 51.99%. The CMF of 7-node, 9-node, 15-node and 20-node reduction in execution time is measured as 49.91%, 51.38%, 54.71% and 45.29% respectively.

APA, Harvard, Vancouver, ISO, and other styles

40

Xu, Zhiwei, Bo Yan, and Yongqiang Zou. "Beyond Hadoop." International Journal of Cloud Applications and Computing 1, no. 1 (January 2011): 45–61. http://dx.doi.org/10.4018/ijcac.2011010104.

Full text

Abstract:

As a main subfield of cloud computing applications, internet services require large-scale data computing. Their workloads can be divided into two classes: customer-facing query-processing interactive tasks that serve hundreds of millions of users within a short response time and backend data analysis batch tasks that involve petabytes of data. Hadoop, an open source software suite, is used by many Internet services as the main data computing platform. Hadoop is also used by academia as a research platform and an optimization target. This paper presents five research directions for optimizing Hadoop; improving performance, utilization, power efficiency, availability, and different consistency constraints. The survey covers both backend analysis and customer-facing workloads. A total of 15 innovative techniques and systems are analyzed and compared, focusing on main research issues, innovative techniques, and optimized results.

APA, Harvard, Vancouver, ISO, and other styles

41

Jankatti, Santosh, Raghavendra B. K., Raghavendra S., and Meenakshi Meenakshi. "Performance evaluation of Map-reduce jar pig hive and spark with machine learning using big data." International Journal of Electrical and Computer Engineering (IJECE) 10, no. 4 (August 1, 2020): 3811. http://dx.doi.org/10.11591/ijece.v10i4.pp3811-3818.

Full text

Abstract:

Big data is the biggest challenges as we need huge processing power system and good algorithms to make an decision. We need Hadoop environment with pig hive, machine learning and hadoopecosystem components. The data comes from industries. Many devices around us and sensor, and from social media sites. According to McKinsey There will be a shortage of 15000000 big data professionals by the end of 2020. There are lots of technologies to solve the problem of big data Storage and processing. Such technologies are Apache Hadoop, Apache Spark, Apache Kafka, and many more. Here we analyse the processing speed for the 4GB data on cloudx lab with Hadoop mapreduce with varing mappers and reducers and with pig script and Hive querries and spark environment along with machine learning technology and from the results we can say that machine learning with Hadoop will enhance the processing performance along with with spark, and also we can say that spark is better than Hadoop mapreduce pig and hive, spark with hive and machine learning will be the best performance enhanced compared with pig and hive, Hadoop mapreduce jar.

APA, Harvard, Vancouver, ISO, and other styles

42

Aji, Ablimit, Fusheng Wang, Hoang Vo, Rubao Lee, Qiaoling Liu, Xiaodong Zhang, and Joel Saltz. "Hadoop GIS." Proceedings of the VLDB Endowment 6, no. 11 (August 27, 2013): 1009–20. http://dx.doi.org/10.14778/2536222.2536227.

Full text

APA, Harvard, Vancouver, ISO, and other styles

43

Liroz-Gistau, Miguel, Reza Akbarinia, and Patrick Valduriez. "FP-Hadoop." Proceedings of the VLDB Endowment 8, no. 12 (August 2015): 1856–59. http://dx.doi.org/10.14778/2824032.2824085.

Full text

APA, Harvard, Vancouver, ISO, and other styles

44

Mone, Gregory. "Beyond Hadoop." Communications of the ACM 56, no. 1 (January 2013): 22–24. http://dx.doi.org/10.1145/2398356.2398364.

Full text

APA, Harvard, Vancouver, ISO, and other styles

45

J, Geetha, Uday Bhaskar N, and Chenna Reddy P. "An improved hadoop load rebalancer." International Journal of Engineering & Technology 7, no. 2.27 (August 6, 2018): 109. http://dx.doi.org/10.14419/ijet.v7i2.27.11775.

Full text

Abstract:

Hadoop has taken an important space in the market as a result of quick growth of data. Load rebalancing in Hadoop is an area of major concern due to the unpredictable nature of tasks, new nodes added to cluster and node computing capacities. A load rebalancer that is efficient can help to improve the performance and reduce computation time. Load rebalancer and schedulers are used interchangeably in many cases. The main idea of this paper is to explore how load balancers / schedulers work in case of native Hadoop also included insights from some of the works, which identify and addresses the problems around schedulers and rebalancers. In this paper, an Improved Hadoop Load Re-balancer adopts a strategy to move the task to the node which has replica, which is faster and is topologically closer, which reduces the network congestion and execution time of Hadoop.

APA, Harvard, Vancouver, ISO, and other styles

46

E. Laxmi Lydia, Dr, and M. Srinivasa Rao. "Applying compression algorithms on hadoop cluster implementing through apache tez and hadoop mapreduce." International Journal of Engineering & Technology 7, no. 2.26 (May 7, 2018): 80. http://dx.doi.org/10.14419/ijet.v7i2.26.12539.

Full text

Abstract:

The latest and famous subject all over the cloud research area is Big Data; its main appearances are volume, velocity and variety. The characteristics are difficult to manage through traditional software and their various available methodologies. To manage the data which is occurring from various domains of big data are handled through Hadoop, which is open framework software which is mainly developed to provide solutions. Handling of big data analytics is done through Hadoop Map Reduce framework and it is the key engine of hadoop cluster and it is extensively used in these days. It uses batch processing system.Apache developed an engine named "Tez", which supports interactive query system and it won't writes any temporary data into the Hadoop Distributed File System(HDFS).The paper mainly focuses on performance juxtaposition of MapReduce and TeZ, performance of these two engines are examined through the compression of input files and map output files. To compare two engines we used Bzip compression algorithm for the input files and snappy for the map out files. Word Count and Terasort gauge are used on our experiments. For the Word Count gauge, the results shown that Tez engine has better execution time than Hadoop MapReduce engine for the both compressed and non-compressed data. It has reduced the execution time nearly 39% comparing to the execution time of the Hadoop MapReduce engine. Correspondingly for the terasort gauge, the Tez engine has higher execution time than Hadoop MapReduce engine.

APA, Harvard, Vancouver, ISO, and other styles

47

Zhang, Yun, Yu Xia Yao, and Ji Yang. "Build a Fully Distributed Hadoop Cluster Based on VM Scene." Advanced Materials Research 1049-1050 (October 2014): 1962–65. http://dx.doi.org/10.4028/www.scientific.net/amr.1049-1050.1962.

Full text

Abstract:

Hadoop learners is limited by practical problems such as hardware devices, the paper intends to solve the multi-machine build Hadoop distributed cluster in VMWare virtual environment. In this paper, three hosts as an example, the research includes the design of the node and network topology, the installation and configuration of the virtual machine, SSH free password login, installation, configuration and management of Hadoop.

APA, Harvard, Vancouver, ISO, and other styles

48

Tyagi, Adhishtha, and Sonia Sharma. "A Framework of Security and Performance Enhancement for Hadoop." International Journal of Advanced Research in Computer Science and Software Engineering 7, no. 7 (July 30, 2017): 437. http://dx.doi.org/10.23956/ijarcsse/v7i6/0171.

Full text

Abstract:

Hadoop framework has been emerged as the most effective and widely adopted framework for Big Data processing. Map Reduce programming model is used for processing as well as generating large data sets. Data security has become an important issue as far as storage is concerned. By default theres no security mechanism in hadoop and it is the first choice of the business analyst and industrialists to store and manage data as well as theres a need to introduce security solutions to Hadoop in order to secure the important data in the Hadoop environment. We implemented and evaluated Dynamic Task Splitting Scheduler (DTSS) which explores the tradeoffs between fairness and data performance by splitting the tasks dynamically before processing in hadoop along with AES-MR (an Advanced Encryption Standard based encryption using mapreduce) encryption in MapReduce paradigm. This paper would be useful for beginners and researchers for understanding DTSS scheduling along with security.

APA, Harvard, Vancouver, ISO, and other styles

49

Khan, Mukhtaj, Zhengwen Huang, Maozhen Li, Gareth A. Taylor, Phillip M. Ashton, and Mushtaq Khan. "Optimizing Hadoop Performance for Big Data Analytics in Smart Grid." Mathematical Problems in Engineering 2017 (2017): 1–11. http://dx.doi.org/10.1155/2017/2198262.

Full text

Abstract:

The rapid deployment of Phasor Measurement Units (PMUs) in power systems globally is leading to Big Data challenges. New high performance computing techniques are now required to process an ever increasing volume of data from PMUs. To that extent the Hadoop framework, an open source implementation of the MapReduce computing model, is gaining momentum for Big Data analytics in smart grid applications. However, Hadoop has over 190 configuration parameters, which can have a significant impact on the performance of the Hadoop framework. This paper presents an Enhanced Parallel Detrended Fluctuation Analysis (EPDFA) algorithm for scalable analytics on massive volumes of PMU data. The novel EPDFA algorithm builds on an enhanced Hadoop platform whose configuration parameters are optimized by Gene Expression Programming. Experimental results show that the EPDFA is 29 times faster than the sequential DFA in processing PMU data and 1.87 times faster than a parallel DFA, which utilizes the default Hadoop configuration settings.

APA, Harvard, Vancouver, ISO, and other styles

50

R.Kennady, Et al. "Increased Task Execution with a Bandwidth-Aware Hadoop Scheduling Approach." International Journal on Recent and Innovation Trends in Computing and Communication 11, no. 2 (February 25, 2023): 189–93. http://dx.doi.org/10.17762/ijritcc.v11i2.9830.

Full text

Abstract:

This research presents a novel bandwidth-aware Hadoop scheduling method that addresses the challenge of task scheduling in Hadoop clusters while considering the real-time network conditions. The proposed method involves the establishment of a job time completion model and a mathematical model for a Hadoop scheduling system. Furthermore, it transforms the Hadoop task scheduling problem into an optimization problem to find the task scheduling method that minimizes job completion time. By leveraging Software-Defined Networking (SDN) capabilities, a time slot-based network bandwidth allocation mechanism is introduced to allocate bandwidth fairly across network links. The proposed method also takes into account task locality and network bandwidth availability when allocating computational nodes for individual tasks. Through this approach, the limitations of existing methods, which fail to simultaneously consider global task scheduling and actual network bandwidth availability, are overcome. Experimental evaluations demonstrate the effectiveness of the proposed method in enhancing the performance of Hadoop task scheduling.

APA, Harvard, Vancouver, ISO, and other styles

We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!