Log in

Relevant bibliographies by topics / HDFS (Hadoop Distributed File System) / Journal articles

To see the other types of publications on this topic, follow the link: HDFS (Hadoop Distributed File System).

Journal articles on the topic 'HDFS (Hadoop Distributed File System)'

Author: Grafiati

Published: 5 June 2025

Last updated: 15 July 2025

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 journal articles for your research on the topic 'HDFS (Hadoop Distributed File System).'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse journal articles on a wide variety of disciplines and organise your bibliography correctly.

1

Gupta, Manish Kumar, and Rajendra Kumar Dwivedi. "Blockchain Enabled Hadoop Distributed File System Framework for Secure and Reliable Traceability." ADCAIJ: Advances in Distributed Computing and Artificial Intelligence Journal 12 (December 29, 2023): e31478. http://dx.doi.org/10.14201/adcaij.31478.

Full text

Abstract:

Hadoop Distributed File System (HDFS) is a distributed file system that allows large amounts of data to be stored and processed across multiple servers in a Hadoop cluster. HDFS also provides high throughput for data access. HDFS enables the management of vast amounts of data using commodity hardware. However, security vulnerabilities in HDFS can be manipulated for malicious purposes. This emphasizes the significance of establishing strong security measures to facilitate file sharing within Hadoop and implementing a reliable mechanism for verifying the legitimacy of shared files. The objective of this paper is to enhance the security of HDFS by utilizing a blockchain-based technique. The proposed model uses the Hyperledger Fabric platform at the enterprise level to leverage metadata of files, thereby establishing dependable security and traceability of data within HDFS. The analysis of results indicates that the proposed model incurs a slightly higher overhead compared to HDFS and requires more storage space. However, this is considered an acceptable trade-off for the improved security.

APA, Harvard, Vancouver, ISO, and other styles

2

Mohammad, Bahjat Al-Masadeh, Sanusi Azmi Mohd, and Sakinah Syed Ahmad Sharifah. "Tiny datablock in saving Hadoop distributed file system wasted memory." International Journal of Electrical and Computer Engineering (IJECE) 13, no. 2 (2023): 1757–72. https://doi.org/10.11591/ijece.v13i2.pp1757-1772.

Full text

Abstract:

Hadoop distributed file system (HDFS) is the file system whereby Hadoop is use it to store all the upcoming data inside it. Since it been declared, HDFS is consuming a huge memory amount in order to serve a normal dataset. Nonetheless, the current file saving mechanism in HDFS save only one file in one datablock. Thus, a file with just 5 Mb in size will take up the whole datablock capacity causing the rest of the memory unavailable for other upcoming files, and this is considered a huge waste of memory in serving a normal size dataset. This paper proposed a method called tiny datablockHDFS (TD-HDFS) to increase the usability of HDFS memory and increase the file hosting capabilities by reducing the datablock size to the minimum capacity, and then merging all the related datablocks into one master datablock. This master datablock consists of tiny virtual datablocks that contain the related small files together; will exploit the full memory of the master datablock. The result of this study is a running HDFS with a minimum amount of wasted memory with the same read/write data performance. The results were examined through a comparison between the standard HDFS file hosting and the proposed solution of this study.

APA, Harvard, Vancouver, ISO, and other styles

3

Kapil, Gayatri, Alka Agrawal, Abdulaziz Attaallah, Abdullah Algarni, Rajeev Kumar, and Raees Ahmad Khan. "Attribute based honey encryption algorithm for securing big data: Hadoop distributed file system perspective." PeerJ Computer Science 6 (February 17, 2020): e259. http://dx.doi.org/10.7717/peerj-cs.259.

Full text

Abstract:

Hadoop has become a promising platform to reliably process and store big data. It provides flexible and low cost services to huge data through Hadoop Distributed File System (HDFS) storage. Unfortunately, absence of any inherent security mechanism in Hadoop increases the possibility of malicious attacks on the data processed or stored through Hadoop. In this scenario, securing the data stored in HDFS becomes a challenging task. Hence, researchers and practitioners have intensified their efforts in working on mechanisms that would protect user’s information collated in HDFS. This has led to the development of numerous encryption-decryption algorithms but their performance decreases as the file size increases. In the present study, the authors have enlisted a methodology to solve the issue of data security in Hadoop storage. The authors have integrated Attribute Based Encryption with the honey encryption on Hadoop, i.e., Attribute Based Honey Encryption (ABHE). This approach works on files that are encoded inside the HDFS and decoded inside the Mapper. In addition, the authors have evaluated the proposed ABHE algorithm by performing encryption-decryption on different sizes of files and have compared the same with existing ones including AES and AES with OTP algorithms. The ABHE algorithm shows considerable improvement in performance during the encryption-decryption of files.

APA, Harvard, Vancouver, ISO, and other styles

4

Awasthi, Yogesh. "Enhancing approach for information security in Hadoop." Ukrainian Journal of Educational Studies and Information Technology 8, no. 1 (2020): 39–49. http://dx.doi.org/10.32919/uesit.2020.01.04.

Full text

Abstract:

Hadoop, is one of the ongoing patterns in innovation which is utilized as a structure for the distributed storage, is an open-source appropriated figuring structure actualized in Java and comprises two modules that are MapReduce and Hadoop Distributed File System (HDFS). The MapReduce is intended for handling with enormous informational collections, it enables the clients to utilize a huge number of item machines in parallel adequately, by just characterizing guides and lessening capacities, the client can prepare huge amounts of information, HDFS is for putting away information on appropriated bunches of machines. Hadoop is normally utilized in a huge bunch or an open cloud administration, for example, Yahoo!, Facebook, Twitter, and Amazon. The versatility of Hadoop has been appeared by the notoriety of these applications, yet it is structured without security for putting away information. Using the Hadoop package, a proposed secure cloud computing system has been designed, so Hadoop would use the area to establish and enhance the security for saving and managing user data. Apache had also produced a software tool termed Hadoop to overcome this big data problem which often uses MapReduce architecture to process vast amounts of data. Hadoop has no strategy to assure the security and privacy of the files stored in Hadoop distributed file system (HDFS). As an encryption scheme of the files stored in HDFS, an asymmetric key cryptosystem is advocated. Thus before saving data in HDFS, the proposed hybrid based on RSA, and Rabin cipher would encrypt the data. The user of the cloud might upload files in two ways, non-safe or secure upload files.

APA, Harvard, Vancouver, ISO, and other styles

5

K., Srikanth* P. Venkateswarlu Ashok Suragala. "A FUNDAMENTAL CONCEPT OF MAPREDUCE WITH MASSIVE FILES DATASET IN BIG DATA USING HADOOP PSEUDO-DISTRIBUTION MODE." Global Journal of Engineering Science and Research Management 4, no. 5 (2017): 58–62. https://doi.org/10.5281/zenodo.801301.

Full text

Abstract:

Hadoop Distributed File System (HDFS) and MapReduce programming model is used for storage and retrieval of the big data. Big data can be any structured collection which results incapability of conventional data management methods. The Tera Bytes size file can be easily stored on the HDFS and can be analyzed with MapReduce. This paper provides introduction to Hadoop HDFS and MapReduce for storing large number of files and retrieve information from these files. In this paper we present our experimental work done on Hadoop by applying a number of files as input to the system and then analyzing the performance of the Hadoop system. We have studied the amount of bytes written and read by the system and by the MapReduce. We have analyzed the behavior of the map method and the reduce method with increasing number of files and the amount of bytes written and read by these tasks.

APA, Harvard, Vancouver, ISO, and other styles

6

Malik, Vandana. "Hadoop Distributed File System (HDFS) with Its Architecture." International Journal for Research in Applied Science and Engineering Technology 13, no. 5 (2025): 6031–34. https://doi.org/10.22214/ijraset.2025.71584.

Full text

Abstract:

The exponential growth of big data has catalyzed the development of robust, scalable, and fault-tolerant storage systems. The Hadoop Distributed File System (HDFS) stands as a key pillar in the Hadoop ecosystem, providing a distributed, resilient storage infrastructure for managing petabytes of data. This paper investigates the core architecture of HDFS, including its design principles, components, and operational workflow. It also analyzes practical implementations, advantages, limitations, and future trends. Through case studies and real-world applications, the paper illustrates how HDFS supports the ever-growing demands of modern data-driven enterprises.

APA, Harvard, Vancouver, ISO, and other styles

7

Sudirman, Ahmad, Irawan, and Zawiyah Saharuna. "PENGEMBANGAN SISTEM BIG DATA: RANCANG BANGUN INFRASTRUKTUR DENGAN FRAMEWORK HADOOP." Journal of Informatics and Computer Engineering Research 1, no. 1 (2024): 25–32. https://doi.org/10.31963/jicer.v1i1.4919.

Full text

Abstract:

Hadoop is a distributed data storage platform that provides a parallel processing framework. HDFS (Hadoop Distributed File System) and Map Reduce are two very important parts of Hadoop, HDFS is a distributed storage system built on java, while Map Reduce is a program for managing large data in parallel and distributed. The research focused on testing the data transfer speed on HDFS, testing using 3 types of data, namely video, ISO Image, and text with a total of 512 MB, 1 GB, and 2 GB of data. Testing will be carried out by entering data into HDFS using the Hadoop command and changing the size of the block size with parameters 128 MB, 256 MB and 384 MB. Hadoop's performance on a large block size of 384 MB has better speed than block sizes of 128 MB and 256 MB because the data will be divided into 384 MB so that data mapping will be less than on block sizes of 128 MB and 256 MB.

APA, Harvard, Vancouver, ISO, and other styles

8

Awasthi, Yogesh, and Ashish Sharma. "Enhancing Approach for Information Security in Hadoop." JITCE (Journal of Information Technology and Computer Engineering) 4, no. 01 (2020): 5–9. http://dx.doi.org/10.25077/jitce.4.01.5-9.2020.

Full text

Abstract:

Using the Hadoop package, a proposed secure cloud computing system was designed. Hadoop would use the area to establish and enhance the security for saving and managing user data. Apache had also produced a software tool termed Hadoop to overcome this big data problem which often uses MapReduce architecture to process vast amounts of data. Hadoop has no strategy to assure the security and privacy of the files stored in Hadoop distributed File system (HDFS). As an encryption scheme of the files stored in HDFS, an asymmetric key cryptosystem is advocated. Thus before saving data in HDFS, the proposed hybrid based on (RSA, and Rabin) cipher would encrypt the data. The user of the cloud might upload files in two ways, non-safe or secure upload files.

APA, Harvard, Vancouver, ISO, and other styles

9

Han, Yong Qi, Yun Zhang, and Shui Yu. "Research of Cloud Storage Based on Hadoop Distributed File System." Applied Mechanics and Materials 513-517 (February 2014): 2472–75. http://dx.doi.org/10.4028/www.scientific.net/amm.513-517.2472.

Full text

Abstract:

This paper discusses the application of cloud computing technology to store large amount of data in agricultural remote training video and other multimedia data, using four computers to build a Hadoop cloud platform, focus on the Hadoop Distributed File System (HDFS) principle and file storage, to achieve massive agricultural multimedia data storage.

APA, Harvard, Vancouver, ISO, and other styles

10

Zine-Dine, Fayçal, Sara Alcabnani, Ahmed Azouaoui, and Jamal El Kafi. "Enhance big data security based on HDFS using the hybrid approach." Indonesian Journal of Electrical Engineering and Computer Science 38, no. 2 (2025): 1256. https://doi.org/10.11591/ijeecs.v38.i2.pp1256-1264.

Full text

Abstract:

Hadoop has emerged as a prominent open-source framework for the storage, management, and processing of extensive big data through its distributed file system, known as Hadoop distributed file system (HDFS). This widespread adoption can be attributed to its capacity to provide reliable, scalable, and cost-effective solutions for managing large datasets across diverse sectors, including finance, healthcare, and social media. Nevertheless, as the significance and scale of big data applications continue to expand, the challenge of ensuring the security and safeguarding of sensitive data within Hadoop has become increasingly critical. In this study, the authors introduce a novel strategy aimed at bolstering data security within the Hadoop storage framework. This approach specifically employs a hybrid encryption technique that leverages the advantages of both advanced encryption standard (AES) and data encryption standard (DES) algorithms, whereby files are encrypted in HDFS and subsequently decrypted during the map task. To assess the efficacy of this method, the authors performed experiments with various file sizes, benchmarking the outcomes against other established security measures.

APA, Harvard, Vancouver, ISO, and other styles

11

Fayçal, Zine-Dine Sara Alcabnani Ahmed Azouaoui Jamal El Kafi. "Enhance big data security based on HDFS using the hybrid approach." Indonesian Journal of Electrical Engineering and Computer Science 38, no. 2 (2025): 1256–64. https://doi.org/10.11591/ijeecs.v38.i2.pp1256-1264.

Full text

Abstract:

Hadoop has emerged as a prominent open-source framework for the storage, management, and processing of extensive big data through its distributed file system, known as Hadoop distributed file system (HDFS). This widespread adoption can be attributed to its capacity to provide reliable, scalable, and cost-effective solutions for managing large datasets across diverse sectors, including finance, healthcare, and social media. Nevertheless, as the significance and scale of big data applications continue to expand, the challenge of ensuring the security and safeguarding of sensitive data within Hadoop has become increasingly critical. In this study, the authors introduce a novel strategy aimed at bolstering data security within the Hadoop storage framework. This approach specifically employs a hybrid encryption technique that leverages the advantages of both advanced encryption standard (AES) and data encryption standard (DES) algorithms, whereby files are encrypted in HDFS and subsequently decrypted during the map task. To assess the efficacy of this method, the authors performed experiments with various file sizes, benchmarking the outcomes against other established security measures.

APA, Harvard, Vancouver, ISO, and other styles

12

Elshayeb, M., and Leelavathi Rajamanickam. "HDFS Security Approaches and Visualization Tracking." Journal of Engineering & Technological Advances 3, no. 1 (2018): 49–60. http://dx.doi.org/10.35934/segi.v3i1.49.

Full text

Abstract:

Big Data refers to large-scale information management and analysis technologies that exceed the capability of traditional data processing technologies. In order to analyse complex data and to identify patterns it is very important to securely store, manage, and share large amounts of complex data. In recent years an increasing of database size according to the various forms (text, images and videos), in huge volumes and with high velocity, the services issues that use internet and desires big data come to leading edge (data-intensive services), (HDFS) Apache’s Hadoop distributed file system is in progress as outstanding software component for cloud computing joint with integrated pieces such as MapReduce. GoogleMapReduce implemented an open source which is Hadoop, having a distributed file system, present to software programmers the perception of the map and reduce. The research shows the security approaches for Big Data Hadoop distributed file system and the best security solution, also this research will help business by big data visualization which will help in better data analysis. In today’s data-centric world, big-data processing and analytics have become critical to most enterprise and government applications.

APA, Harvard, Vancouver, ISO, and other styles

13

Wu, Xing, and Mengqi Pei. "Image File Storage System Resembling Human Memory." International Journal of Software Science and Computational Intelligence 7, no. 2 (2015): 70–84. http://dx.doi.org/10.4018/ijssci.2015040104.

Full text

Abstract:

Big Data era is characterized by the explosive increase of image files on the Internet, massive image files bring great challenges to storage. It is required not only the storage efficiency of massive image files but also the accuracy and robustness of massive image file management and retrieval. To meet these requirements, distributed image file storage system based on cognitive theory is proposed. According to the human brain function, humans can correlate image files with thousands of distinct object and action categories and sorted store these files. Thus the authors proposed to sorted store image files according to different visual categories based on human cognition to resemble human memory. The experimental results demonstrate that the proposed distributed image file system (DIFS) based on cognition performs better than Hadoop Distributed File System (HDFS) and FastDFS.

APA, Harvard, Vancouver, ISO, and other styles

14

Lu, An Sheng, Jian Jiang Cai, Wei Jin, and Lu Wang. "Research and Practice of Cloud Computing Based on Hadoop." Applied Mechanics and Materials 644-650 (September 2014): 3387–89. http://dx.doi.org/10.4028/www.scientific.net/amm.644-650.3387.

Full text

Abstract:

Hadoop is a distributed parallel processing of massive data computing platform. It is currently the most widely used cloud computing platform. This paper analyses and studies the Hadoop distributed file system HDFS and the calculation model of MapReduce on Hadoop platform and the cloud computing model that is based on Hadoop, the paper introduces the process of building cloud computing platform, Hadoop, operating environment, and proposing the implementation.

APA, Harvard, Vancouver, ISO, and other styles

15

Zhang, Bo, Ya Yao Zuo, and Zu Chuan Zhang. "Research and Improvement of the Hot Small File Storage Performance under HDFS." Advanced Materials Research 756-759 (September 2013): 1450–54. http://dx.doi.org/10.4028/www.scientific.net/amr.756-759.1450.

Full text

Abstract:

In order to deal with a large number of small files and hotspot data program in Hadoop distributed file system (HDFS)[1,, according to the exit proposal, this paper proposes a new the hotspot data processing model. The model proposals to change the block size, the introduction of efficient indexing mechanism to improve the dynamic replica management strategy and design of the new HDFS architecture to save space, speed up system processing, and enhance security.

APA, Harvard, Vancouver, ISO, and other styles

16

Hanisah Kamaruzaman, Siti, Wan Nor Shuhadah Wan Nik, Mohamad Afendee Mohamed, and Zarina Mohamad. "Design and Implementation of Data-at-Rest Encryption for Hadoop." International Journal of Engineering & Technology 7, no. 2.15 (2018): 54. http://dx.doi.org/10.14419/ijet.v7i2.15.11212.

Full text

Abstract:

The manuscript should contain an abstract. The security aspects in Cloud computing is paramount in order to ensure high quality of Service Level Agreement (SLA) to the cloud computing customers. This issue is more apparent when very large amount of data is involved in this emerging computing environment. Hadoop is an open source software framework that supports large data sets storage and processing in a distributed computing environment and well-known implementation of Map Reduce. Map Reduce is one common programming model to process and handle a large amount of data, specifically in big data analysis. Further, Hadoop Distributed File System (HDFS) is a distributed, scalable and portable file system that is written in java for Hadoop framework. However, the main problem is that the data at rest is not secure where intruders can steal or converts the data stored in this computing environment. Therefore, the AES encryption algorithm has been implemented in HDFS to ensure the security of data stored in HDFS. It is shown that the implementation of AES encryption algorithm is capable to secure data stored in HDFS to some extent.

APA, Harvard, Vancouver, ISO, and other styles

17

Ameena, Anjum, and Shivleela Patil Prof. "HDFS Erasure Coded Information Repository System for Hadoop Clusters." International Journal of Trend in Scientific Research and Development 2, no. 5 (2018): 1957–60. https://doi.org/10.31142/ijtsrd18206.

Full text

Abstract:

Existing disk based recorded stockpiling frameworks are insufficient for Hadoop groups because of the obliviousness of information copies and the guide decrease programming model. To handle this issue, a deletion coded information chronicled framework called HD FS is developed for Hadoop bunches, where codes are utilized to file information copies in the Hadoop dispersed document framework or HD FS. Here there are two chronicled systems that HDFS Grouping and HDFS Pipeline in HDFS to accelerate the information documented process. HDFS Grouping is a Map Reduce based information chronicling plan keeps every mappers moderate yield Key Value matches in a nearby key esteem store and unions all the transitional key esteem sets with a similar key into one single key esteem combine, trailed by rearranging the single Key Value match to reducers to create last equality squares. HDFS Pipeline frames an information recorded pipeline utilizing numerous information hub in a Hadoop group. HDFS Pipeline conveys the consolidated single key esteem combine to an ensuing hubs nearby key esteem store. Last hub in the pipeline is mindful to yield equality squares. HD FS is executed in a true Hadoop group. The exploratory outcomes demonstrate that HDFS Grouping and HDFS Pipeline accelerate Baselines rearrange and diminish stages by a factor of 10 and 5, individually. At the point when square size is bigger than 32 M B, HD FS enhances the execution of HDFS RA ID and HDFS EC by roughly 31.8 and 15.7 percent, separately. Ameena Anjum | Prof. Shivleela Patil "HDFS: Erasure-Coded Information Repository System for Hadoop Clusters" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-2 | Issue-5 , August 2018, URL: https://www.ijtsrd.com/papers/ijtsrd18206.pdf

APA, Harvard, Vancouver, ISO, and other styles

18

Uriti, Archana, Surya Prakash Yalla, and Chunduru Anilkumar. "Understand the working of Sqoop and hive in Hadoop." Applied and Computational Engineering 6, no. 1 (2023): 312–17. http://dx.doi.org/10.54254/2755-2721/6/20230798.

Full text

Abstract:

In past decades, the structured and consistent data analysis has seen huge success. It is a challenging task to analyse the multimedia data which is in unstructured format. Here the big data defines the huge volume of data that can be processed in distributed format. The big data can be analysed by using the hadoop tool which contains the Hadoop Distributed File System (HDFS) storage space and inbuilt several components are there. Hadoop manages the distributed data which is placed in the form of cluster analysis of data itself. In this, it shows the working of Sqoop and Hive in hadoop. Sqoop (SQL-to-Hadoop) is one of the Hadoop component that is designed to efficiently imports the huge data from traditional database to HDFS and vice versa. Hive is an open source software for managing large data files that is stored in HDFS. To show the working, here we are taking the application Instagram which is a most popular social media. In this analyze the data that is generated from Instagram that can be mined and utilized by using Sqoop and Hive. By this, prove that sqoop and hive can give results efficiently. This paper gives the details of sqoop and hive working in hadoop.

APA, Harvard, Vancouver, ISO, and other styles

19

Goyal, Shubh. "Using HDFS to Load, Search, and Retrieve Data from Local Data Nodes." International Journal for Research in Applied Science and Engineering Technology 9, no. 11 (2021): 656–59. http://dx.doi.org/10.22214/ijraset.2021.38877.

Full text

Abstract:

Abstract: By utilizing the Hadoop environment, data may be loaded and searched from local data nodes. Because the dataset's capacity may be vast, loading and finding data using a query is often more difficult. We suggest a method for dealing with data in local nodes that does not overlap with data acquired by script. The query's major purpose is to store information in a distributed environment and look for it quickly. In this section, we define the script to eliminate duplicate data redundancy when searching and loading data in a dynamic manner. In addition, the Hadoop file system is available in a distributed environment. Keywords: HDFS; Hadoop distributed file system; replica; local; distributed; capacity; SQL; redundancy

APA, Harvard, Vancouver, ISO, and other styles

20

ABDALWAHID, Shadan Mohammed Jihad, Raghad Zuhair YOUSIF, and Shahab Wahhab KAREEM. "ENHANCING APPROACH USING HYBRID PAILLER AND RSA FOR INFORMATION SECURITY IN BIGDATA." Applied Computer Science 15, no. 4 (2019): 63–74. http://dx.doi.org/10.35784/acs-2019-30.

Full text

Abstract:

The amount of data processed and stored in the cloud is growing dramatically. The traditional storage devices at both hardware and software levels cannot meet the requirement of the cloud. This fact motivates the need for a plat¬form which can handle this problem. Hadoop is a deployed platform proposed to overcome this big data problem which often uses MapReduce architecture to process vast amounts of data of the cloud system. Hadoop has no strategy to assure the safety and confidentiality of the files saved inside the Hadoop distributed File system (HDFS). In the cloud, the protection of sensitive data is a critical issue in which data encryption schemes plays avital rule. This research proposes a hybrid system between two well-known asymmetric key cryptosystems (RSA, and Paillier) to encrypt the files stored in HDFS. Thus before saving data in HDFS, the proposed cryptosystem is utilized for encrypting the data. Each user of the cloud might upload files in two ways, non-safe or secure. The hybrid system shows higher computational complexity and less latency in comparison to the RSA cryptosystem alone.

APA, Harvard, Vancouver, ISO, and other styles

21

Achandair, O., S. Bourekkadi, E. Elmahouti, S. Khoulji, and M. L. Kerkeb. "solution for the future: small file management by optimizing Hadoop." International Journal of Engineering & Technology 7, no. 2.6 (2018): 221. http://dx.doi.org/10.14419/ijet.v7i2.6.10773.

Full text

Abstract:

Hadoop Distributed File System (HDFS) is designed to reliably store very large files across machines in a large cluster. It is one of the most used distributed file systems and offer a high availability and scalability on low-cost hardware. All Hadoopframework have HDFS as their storage component. Coupled with map reduce, which is the processing component, HDFS and Map Reduce (a processing component) have become the standard platforms for any management of big data in these days. HDFS however, in terms of design has the ability to handle huge numbers of large files, but when it comes to its deployments to handle large amounts of small files it might not be very effective. This paper puts forward a new strategy of managing small files. The approach will consists of two principal phases. The first phase will deal with the consolidating of aaclients input files, storing it continuously in a particular allocated block, that is a SequenceFile format, and so on into the next blocks. In this way we avoid the use of multiple block allocations for different streams, this reduces calls for available blocks and also reduces the metadata memory on the NameNode. Note the reason for this is that groups of small files packaged in a SequenceFile on the same block require one entry instead of one of each small file. The second phase will involve analyzing the attributes of stored small files so they can be distributed them in a way that the most called files will be referenced by an additional index as a MapFile format to reduce the read throughput during random access.

APA, Harvard, Vancouver, ISO, and other styles

22

Bhatia, Raj Kumari, and Aakriti Bansal. "Deploying and Improving Hadoop on PseudoDistributed Mode." COMPUSOFT: An International Journal of Advanced Computer Technology 03, no. 10 (2014): 1136–39. https://doi.org/10.5281/zenodo.14759333.

Full text

Abstract:

Hadoop is an open source framework which comprises of MapReduce and HDFS (Hadoop Distributed File System) which allows storage of data and computing capabilities on large scale. Hadoop is expanding every day and many cloud computing enterprises have been adapting it. Hadoop provides a platform to provide cloud computing services to customers. Hadoop can be run of any the three modes: Standalone, Pseudo-Distributed and Fully-Distributed. In this paper we have improved execution time by configuring different schedulers. We implemented our method over Hadoop 2.2.0 on Pseudo-Distributed mode which improves the overall job execution time. 

APA, Harvard, Vancouver, ISO, and other styles

23

Ren, Yitong, Zhaojun Gu, Zhi Wang, et al. "System Log Detection Model Based on Conformal Prediction." Electronics 9, no. 2 (2020): 232. http://dx.doi.org/10.3390/electronics9020232.

Full text

Abstract:

With the rapid development of the Internet of Things, the combination of the Internet of Things with machine learning, Hadoop and other fields are current development trends. Hadoop Distributed File System (HDFS) is one of the core components of Hadoop, which is used to process files that are divided into data blocks distributed in the cluster. Once the distributed log data are abnormal, it will cause serious losses. When using machine learning algorithms for system log anomaly detection, the output of threshold-based classification models are only normal or abnormal simple predictions. This paper used the statistical learning method of conformity measure to calculate the similarity between test data and past experience. Compared with detection methods based on static threshold, the statistical learning method of the conformity measure can dynamically adapt to the changing log data. By adjusting the maximum fault tolerance, a system administrator can better manage and monitor the system logs. In addition, the computational efficiency of the statistical learning method for conformity measurement was improved. This paper implemented an intranet anomaly detection model based on log analysis, and conducted trial detection on HDFS data sets quickly and efficiently.

APA, Harvard, Vancouver, ISO, and other styles

24

Ahlawat, Deepak, and Deepali Gupta. "Big Data Clustering and Hadoop Distributed File System Architecture." Journal of Computational and Theoretical Nanoscience 16, no. 9 (2019): 3824–29. http://dx.doi.org/10.1166/jctn.2019.8256.

Full text

Abstract:

Due to advancement in the technological world, there is a great surge in data. The main sources of generating such a large amount of data are social websites, internet sites etc. The large data files are combined together to create a big data architecture. Managing the data file in such a large volume is not easy. Therefore, modern techniques are developed to manage bulk data. To arrange and utilize such big data, Hadoop Distributed File System (HDFS) architecture from Hadoop was presented in the early stage of 2015. This architecture is used when traditional methods are insufficient to manage the data. In this paper, a novel clustering algorithm is implemented to manage a large amount of data. The concepts and frames of Big Data are studied. A novel algorithm is developed using the K means and cosine-based similarity clustering in this paper. The developed clustering algorithm is evaluated using the precision and recall parameters. The prominent results are obtained which successfully manages the big data issue.

APA, Harvard, Vancouver, ISO, and other styles

25

Hou, Qing, Lei Pan, Jia Xi Xu, and Kai Zhou. "Educational Resources Cloud Platform Based on Hadoop." Advanced Materials Research 912-914 (April 2014): 1249–53. http://dx.doi.org/10.4028/www.scientific.net/amr.912-914.1249.

Full text

Abstract:

As the traditional educational resources platform exists some deficiencies in storage, parallel processing, and cost, we designed a cloud platform of educational resources based on Hadoop framework.The platform applied HDFS distributed file system to store massive data distributedly and applied MapReduce distributed programming framework to process data parallelly and scheduler resources, which solved the mass storage of resources and improved the efficiency of various resources retrieval.

APA, Harvard, Vancouver, ISO, and other styles

26

Kareem, Shahab Wahhab, Raghad Zuhair Yousif, and Shadan Mohammed Jihad Abdalwahid. "An approach for enhancing data confidentiality in Hadoop." Indonesian Journal of Electrical Engineering and Computer Science 20, no. 3 (2020): 1547. http://dx.doi.org/10.11591/ijeecs.v20.i3.pp1547-1555.

Full text

Abstract:

<p class="Abstract">The amount of data processed and stored in the cloud is growing dramatically. The traditional storage devices at both hardware and software levels cannot meet the requirement of the cloud. This fact motivates the need for a platform which can handle this problem. Hadoop is a deployed platform proposed to overcome this big data problem which often uses MapReduce architecture to process vast amounts of data of the cloud system. Hadoop has no strategy to assure the safety and confidentiality of the files saved inside the Hadoop distributed File system(HDFS). In the cloud, the protection of sensitive data is a critical issue in which data encryption schemes plays avital rule. This research proposes a hybrid system between two well-known asymmetric key cryptosystems (RSA, and Paillier) to encrypt the files stored in HDFS. Thus before saving data in HDFS, the proposed cryptosystem is utilized for encrypting the data. Each user of the cloud might upload files in two ways, non-safe or secure. The hybrid system shows higher computational complexity and less latency in comparison to the RSA cryptosystem alone.</p>

APA, Harvard, Vancouver, ISO, and other styles

27

Yousif, Raghad Z., Shahab W. Kareem, and Shadan M. Abdalwahid. "Enhancing Approach for Information Security in Hadoop." Polytechnic Journal 10, no. 1 (2020): 81–87. http://dx.doi.org/10.25156/ptj.v10n1y2020.pp81-87.

Full text

Abstract:

Developing a confident Hadoop essentially a cloud computing is an essential challenge as the cloud. The protection policy can be utilized during various cloud services such as Platform as a Service (PaaS), Infrastructure as a Service (IaaS), and Software as a Service (SaaS) and also can support most requirements in cloud computing. This event motivates the need of a policy which will control these challenges. Hadoop may be a used policy recommended to beat this big data problem which usually utilizes MapReduce design to arrange huge amounts of information of the cloud system. Hadoop has no policy to ensure the privacy and protection of the files saved within the Hadoop Distributed File System (HDFS). Within the cloud, the safety of sensitive data may be a significant problem within which encryption schemes play an avital rule. This paper proposes a hybrid method between pair well-known asymmetric key cryptosystems (RSA and Rabin) to cipher the files saved in HDFS. Therefore, before storing data in HDFS, the proposed cryptosystem is employed to cipher the information. In the proposed system, the user of the cloud might upload files in two ways, secure or non-secure. The hybrid method presents more powerful computational complexity and smaller latency as compared to the RSA cryptosystem alone.

APA, Harvard, Vancouver, ISO, and other styles

28

Malik, Vandana. "An Indication of HDFS and MapReduce Application." International Journal for Research in Applied Science and Engineering Technology 13, no. 5 (2025): 6035–37. https://doi.org/10.22214/ijraset.2025.71586.

Full text

Abstract:

In the era of big data, handling and processing large-scale datasets efficiently is paramount. The Hadoop ecosystem, particularly the Hadoop Distributed File System (HDFS) and MapReduce programming model, plays a crucial role in addressing these needs. This paper presents an in-depth analysis of HDFS and MapReduce, highlighting their architecture, functionality, and real-world applications. It explores how these technologies facilitate reliable storage and scalable processing of vast data volumes across distributed computing environments. Additionally, the paper discusses use cases in sectors like healthcare, finance, social media, and scientific research to demonstrate their practical significance.

APA, Harvard, Vancouver, ISO, and other styles

29

Liao, Wenzhe. "Application of Hadoop in the Document Storage Management System for Telecommunication Enterprise." International Journal of Interdisciplinary Telecommunications and Networking 8, no. 2 (2016): 58–68. http://dx.doi.org/10.4018/ijitn.2016040106.

Full text

Abstract:

In view of the information management processor a telecommunication enterprise, how to properly store electronic documents is a challenge. This paper presents the design of a document storage management system based on Hadoop, which uses the distributed file system HDFS and the distributed database HBase, to achieve efficient access to electronic office documents in a steel structure enterprise. This paper also describes an automatic small files merge method using HBase, which simplifies the process of artificial periodic joining of small files, resulting in improved system efficiency.

APA, Harvard, Vancouver, ISO, and other styles

30

Sun, Jun Xiong, Yan Chen, Tao Ying Li, Ren Yuan Wang, and Peng Hui Li. "Distributed File Information Management System Based on Hadoop." Advanced Materials Research 756-759 (September 2013): 820–23. http://dx.doi.org/10.4028/www.scientific.net/amr.756-759.820.

Full text

Abstract:

There are two main problems to store the system data on single machine: limited storage space and low reliability. The concept of distribution solves the two problems fundamentally. Many independent machines are integrated as a whole. As a result, these separated resources are integrated together. This paper focuses on developing a system, based on SSH, XFire and Hadoop, to help users store and manage the distributed files. All the files stored in HDFS should be encrypted to protect users privacy. In order to save resources, system is designed to avoid uploading the duplicate files by checking the files MD5 string.

APA, Harvard, Vancouver, ISO, and other styles

31

IEVLEV, K. O., and M. G. GORODNICHEV. "COMPARATIVE ANALYSIS OF HDFS AND APACHE OZONE DATA STORAGE SYSTEMS." Computational Nanotechnology 12, no. 1 (2025): 26–33. https://doi.org/10.33693/2313-223x-2025-12-1-26-33.

Full text

Abstract:

Over the last few decades, both the volume of digital data in the globe and the variety of ways to use it have increased dramatically. For a long time, the Hadoop ecosystem, which is still widely utilized, has been synonymous with large data storage and processing platforms. However, during the past 20 years, Hadoop has been found to have a number of serious flaws, including the “small files problem” and uneven cluster resource usage. Various commercial and research organizations are faced with the issue of upgrading the data stack to improve resource utilization and increasing data processing efficiency. This study aims to examine the benefits and drawbacks of the next-generation data storage system, Apache Ozone, and to assess whether this technology is ready to completely supplant the Hadoop Distributed File System (HDFS).

APA, Harvard, Vancouver, ISO, and other styles

32

Elvira, Oktaviani Mastura Diana Marieska Alvi Syahrini Utami. "Perbandingan Metode Mapreduce Berbasis Single Node Hadoop Pada Aplikasi Word Count." JUPITER 16, no. 1 (2024): 347–56. https://doi.org/10.5281/zenodo.11097045.

Full text

Abstract:

<em>Dalam konteks pemrosesan Big Data, Hadoop MapReduce menjadi framework yang digunakan untuk mengembangkan perangkat lunak dan melakukan pemrosesan data yang besar secara paralel. Word Count merupakan jenis pekerjaan yang digunakan untuk menghitung kemunculan kata-kata unik dalam sebuah file teks. Kecepatan dalam pemrosesan data merupakan faktor penting yang harus dipertimbangkan dalam pemenuhan standar pemrosesan Big Data. Penelitian yang dilakukan melibatkan pemrosesan file teks menggunakan metode MapReduce pada Hadoop Distributed File System (HDFS) menggunakan single node dengan membandingkan hasil pemrosesan word count dengan dan tanpa menggunakan metode MapReduce. Penelitian ini memberikan hasil bahwa implementasi Word Count tanpa menggunakan metode MapReduce menawarkan kecepatan yang lebih baik dalam pemrosesan data teks bahasa Indonesia dibandingkan dengan menggunakan Hadoop single node. Selain itu, perbandingan waktu pemrosesan antara program Word Count dengan metode MapReduce berbasis Hadoop dan Word Count tanpa MapReduce menunjukkan bahwa program Word Count tanpa MapReduce memiliki waktu pemrosesan yang lebih cepat. Reduksi waktu yang tinggi, hingga 95% pada file berukuran 5 MB, dapat dicapai dengan penggunaan metode Word Count tanpa MapReduce, meskipun tingkat reduksi tersebut menurun seiring dengan peningkatan ukuran file.</em> <em> </em> <strong><em>Kata kunci</em></strong><em>— Big Data, Word Count, MapReduce, HDFS, Hadoop Single Node</em>

APA, Harvard, Vancouver, ISO, and other styles

33

Bartus, Paul. "Using Hadoop Distributed and Deduplicated File System (HD2FS) in Astronomy." Proceedings of the International Astronomical Union 15, S367 (2019): 464–66. http://dx.doi.org/10.1017/s1743921321000387.

Full text

Abstract:

AbstractDuring the last years, the amount of data has skyrocketed. As a consequence, the data has become more expensive to store than to generate. The storage needs for astronomical data are also following this trend. Storage systems in Astronomy contain redundant copies of data such as identical files or within sub-file regions. We propose the use of the Hadoop Distributed and Deduplicated File System (HD2FS) in Astronomy. HD2FS is a deduplication storage system that was created to improve data storage capacity and efficiency in distributed file systems without compromising Input/Output performance. HD2FS can be developed by modifying existing storage system environments such as the Hadoop Distributed File System. By taking advantage of deduplication technology, we can better manage the underlying redundancy of data in astronomy and reduce the space needed to store these files in the file systems, thus allowing for more capacity per volume.

APA, Harvard, Vancouver, ISO, and other styles

34

Traynor, Daniel, and Terry Froy. "Using Lustre and Slurm to process Hadoop workloads and extending to the WLCG." EPJ Web of Conferences 214 (2019): 04049. http://dx.doi.org/10.1051/epjconf/201921404049.

Full text

Abstract:

The Queen Mary University of London Grid site has investigated the use of its Lustre file system to support Hadoop work flows. Lustre is an open source, POSIX compatible, clustered file system often used in high performance computing clusters and is often paired with the Slurm batch system. Hadoop is an open-source software framework for distributed storage and processing of data normally run on dedicated hardware utilising the HDFS file system and Yarn batch system. Hadoop is an important modern tool for data analytics used by a large range of organisation including CERN. By using our existing Lustre file system and Slurm batch system, the need to have dedicated hardware is removed and a single platform only has to be maintained for data storage and processing. The motivation and benefits of using Hadoop with Lustre and Slurm are presented. The installation, benchmarks, limitations and future plans are discussed. We also investigate using the standard WLCG Grid middleware Cream-CE service to provide a Grid enabled Hadoop service.

APA, Harvard, Vancouver, ISO, and other styles

35

Song, Aibo, Maoxian Zhao, Yingying Xue, and Junzhou Luo. "MHDFS: A Memory-Based Hadoop Framework for Large Data Storage." Scientific Programming 2016 (2016): 1–12. http://dx.doi.org/10.1155/2016/1808396.

Full text

Abstract:

Hadoop distributed file system (HDFS) is undoubtedly the most popular framework for storing and processing large amount of data on clusters of machines. Although a plethora of practices have been proposed for improving the processing efficiency and resource utilization, traditional HDFS still suffers from the overhead of disk-based low throughput and I/O rate. In this paper, we attempt to address this problem by developing a memory-based Hadoop framework called MHDFS. Firstly, a strategy for allocating and configuring reasonable memory resources for MHDFS is designed and RAMFS is utilized to develop the framework. Then, we propose a new method to handle the data replacement to disk when memory resource is excessively occupied. An algorithm for estimating and updating the replacement is designed based on the metrics of file heat. Finally, substantial experiments are conducted which demonstrate the effectiveness of MHDFS and its advantage against conventional HDFS.

APA, Harvard, Vancouver, ISO, and other styles

36

Husain, Baydaa Hassan, and Subhi R. M. Zeebaree. "Improvised Distributions framework of Hadoop: A review." International Journal of Science and Business 5, no. 2 (2021): 31–41. https://doi.org/10.5281/zenodo.4461761.

Full text

Abstract:

HADOOP is an open-source virtualization technology that allows the distributed processing of large data sets across standardized server clusters. With two modules, HADOOP Distributed File System (HDFS) and MapReduce framework, it is designed to scale single servers to thousands of computers, providing local computation and storage. Over a decade after HADOOP emerged on the forefront as an open system for Big Data analysis. Its growth has prompted several improvisations for particular data processing needs, based on the type of processing conditions at various periods of computation. This paper, through reviewing several kinds of research provides the basic HADOOP system structure and the description of the MapReduce, HDFS Efficiency. Explaining how the HADOOP framework can overcome the “5Vs” challenges in Big Data. However, in addition to the many benefits of the HADOOP system, like fault tolerance, reliability, high availability, scalable, decreases execution time, reduces latency, improve the security issues, improving the quality of data analysis, better scheduling model, and cost-efficiently. On the other hand, there were some barriers and challenges regarding adjusting data regularly, security issues, and load balancing. Finally, the certainly benefit and challenges of the HADOOP system have been represented paving the way for the future research to find solutions to these challenges.

APA, Harvard, Vancouver, ISO, and other styles

37

Ashok, Shivayogappa, and S. Supreeth. "A Comparison of HDFS File Formats: Avro, Parquet and ORC." International Journal of Advanced Science and Technology 29, no. 4 (2020): 4665–75. https://doi.org/10.5281/zenodo.7027910.

Full text

Abstract:

Hadoop is one of the standard platforms for managing and storing Big Data in distributed systems. But the lack of good developers to write Map Reduce programs into the development environment has pushed the adoption of SQL based query system into the Hadoop Ecosystem in an attempt to benefit from the traditional relational database systems in terms of the skills, especially in the Business process and Intelligent Analytical process. On top of the Hadoop environment a new framework has arrived ie; Apache Hive as the standard data warehouse engine. This is one of the challenges by the industry-leading developers to work on improvements both in query execution and as well as in data storage paradigms. In this work, various data structure file formats like Avro, Parquet, and ORC are differentiated with text file formats to evaluate the storage optimization, the performance of the database queries. Various kinds of query patterns have been evaluated. The output of the study shows that ORC and Parquet file format takes up less storage space compared with Avro and text files format, it is because of binary data formats and compression techniques used. Furthermore, Aggregate queries of the ORC and Parquet data structures are quicker compared with Avro or text file formats because earlier two formats support well for column-based queries.

APA, Harvard, Vancouver, ISO, and other styles

38

Mahdaoui, Rabie, Manar Sais, Jaafar Abouchabaka, and Najat Rafalia. "Enhancing Hadoop distributed storage efficiency using multi-agent systems." Indonesian Journal of Electrical Engineering and Computer Science 34, no. 3 (2024): 1814. http://dx.doi.org/10.11591/ijeecs.v34.i3.pp1814-1822.

Full text

Abstract:

<p>Distributed storage systems play a pivotal role in modern data-intensive applications, with Hadoop distributed file system (HDFS) being a prominent example. However, optimizing the efficiency of such systems remains a complex challenge. This research paper presents a novel approach to enhance the efficiency of distributed storage by leveraging multi-agent systems (MAS). Our research is centered on enhancing the efficiency of the HDFS by incorporating intelligent agents that can dynamically assign storage tasks to nodes based on their performance characteristics. Utilizing a decentralized decision-making framework, the suggested approach based on MAS considers the real-time performance of nodes and allocates storage tasks adaptively. This strategy aims to alleviate performance bottlenecks and minimize data transfer latency. Through extensive experimental evaluation, we demonstrate the effectiveness of our approach in improving HDFS performance in terms of data storage, retrieval, and overall system efficiency. The results reveal significant reductions in job execution times and enhanced resource utilization, there by offering a promising avenue for enhancing the efficiency of distributed storage systems.</p>

APA, Harvard, Vancouver, ISO, and other styles

39

Mahdaoui, Rabie, Manar Sais, Jaafar Abouchabaka, and Najat Rafalia. "Enhancing Hadoop distributed storage efficiency using multi-agent systems." Indonesian Journal of Electrical Engineering and Computer Science 34, no. 3 (2024): 1814–22. https://doi.org/10.11591/ijeecs.v34.i3.pp1814-1822.

Full text

Abstract:

Distributed storage systems play a pivotal role in modern data-intensive applications, with Hadoop distributed file system (HDFS) being a prominent example. However, optimizing the efficiency of such systems remains a complex challenge. This research paper presents a novel approach to enhance the efficiency of distributed storage by leveraging multi-agent systems (MAS). Our research is centered on enhancing the efficiency of the HDFS by incorporating intelligent agents that can dynamically assign storage tasks to nodes based on their performance characteristics. Utilizing a decentralized decision-making framework, the suggested approach based on MAS considers the real-time performance of nodes and allocates storage tasks adaptively. This strategy aims to alleviate performance bottlenecks and minimize data transfer latency. Through extensive experimental evaluation, we demonstrate the effectiveness of our approach in improving HDFS performance in terms of data storage, retrieval, and overall system efficiency. The results reveal significant reductions in job execution times and enhanced resource utilization, there by offering a promising avenue for enhancing the efficiency of distributed storage systems.

APA, Harvard, Vancouver, ISO, and other styles

40

Narwade, Aditya Rajesh. "CLOUD BASED DUPLICATION REMOVAL SYSTEM." INTERANTIONAL JOURNAL OF SCIENTIFIC RESEARCH IN ENGINEERING AND MANAGEMENT 08, no. 03 (2024): 1–5. http://dx.doi.org/10.55041/ijsrem29104.

Full text

Abstract:

Deduplication involves eliminating duplicate or redundant data to reduce stored data volume, commonly used in data backup, network optimization, and storage management. However, traditional deduplication methods have limitations with encrypted data and security. The primary objective of this project is to develop new distributed deduplication systems that offer increased reliability. In these systems, data chunks are distributed across the Hadoop Distributed File System (HDFS), and a robust key management system is utilized to ensure secure deduplication with slave nodes. Instead of having multiple copies of the same content, deduplication removes redundant data by retaining only one physical copy and referring other instances to that copy. The granularity of deduplication can vary, ranging from an entire file to a data block. The MD5 and 3DES algorithms are used to enhance the deduplication process. The proposed approach in this project is the Proof of Ownership (POF) of the file. With this method, deduplication can effectively address the issues of reliability and label consistency in HDFS storage systems. The proposed system has successfully reduced the cost and time associated with uploading and downloading data, while also optimizing storage space. Key Words: Cloud computing, data storage, file checksum algorithms, computational infrastructure, duplication.

APA, Harvard, Vancouver, ISO, and other styles

41

Lakshmi Siva Rama Krishna, T., J. Priyanka, N. Nikhil Teja, Sd Mahiya Sultana, and B. Jabber. "An Efficient Data Replication Scheme for Hadoop Distributed File System." International Journal of Engineering & Technology 7, no. 2.32 (2018): 167. http://dx.doi.org/10.14419/ijet.v7i2.32.15396.

Full text

Abstract:

A Distributed file system (DFS) is a storage component of a distributed system (DS). DS consists of multiple autonomous nodes connected via a communication network to solve large problems and to achieve more computing power. One of the design requirement of any DS is to provide replicas. In this paper, we propose a new replication algorithm which is more reliable than the existing replication algorithm used in DFS. The advantages of our proposed replication algorithm by incrementing nodes sequentially (RAINS) is that it distributes the storage load equally among all the nodes sequentially and it guarantees a replica copy in case two racks in a DS are down. This feature is not available in the existing DFS. We have compared existing replication algorithm used by Hadoop distributed file system (HDFS) with our proposed RAINS algorithm. The experimental results indicate that our proposed RAINS algorithm performs better when more number of racks failed in the DS.

APA, Harvard, Vancouver, ISO, and other styles

42

Kareem, Shahab Wahhab, Raghad Zuhair Yousif, and Shadan Mohammed Jihad Abdalwahid. "An approach for enhancing data confidentiality in hadoop." Indonesian Journal of Electrical Engineering and Computer Science 20, no. 3 (2020): 1547–55. https://doi.org/10.11591/ijeecs.v20.i3.pp1547-1555.

Full text

Abstract:

The quantity of prepared and stored data in the cloud is rising dramatically. The common storage devices at both hardware and software levels cannot satisfy the necessity of the cloud. This fact motivates the requirement for principles that can manage this difficulty. Hadoop is an expanded principle proposed to succeed in the big data challenge that usually utilizes MapReduce structure to prepare huge quantities of data of the cloud system. Hadoop has no policy to guarantee the protection and secrecy of the data collected in the hadoop distributed file system (HDFS). In the cloud, the security of sensitive data is a significant issue in which data encryption schemes play an avital rule. This research proposes a hybrid system between two popular asymmetric key cryptosystems (RSA, and ElGamal) to encrypt the data collected in HDFS. Thus before storing data in HDFS, the proposed cryptosystem is utilized to encrypt the data. The user of the cloud might upload data in two ways, non-safe or secure. The hybrid method presents more powerful computational complexity and less latency in comparison to the RSA cryptosystem alone.

APA, Harvard, Vancouver, ISO, and other styles

43

Su, Kai Kai, Wen Sheng Xu, and Jian Yong Li. "Research on Mass Manufacturing Resource Sensory Data Management Based on Hadoop." Key Engineering Materials 693 (May 2016): 1880–85. http://dx.doi.org/10.4028/www.scientific.net/kem.693.1880.

Full text

Abstract:

Aiming at the management issue of mass sensory data from the manufacturing resources in cloud manufacturing, a management method for mass sensory data based on Hadoop is proposed. Firstly, characteristics of sensory data in cloud manufacturing are analyzed, meanings and advantages of Internet of Things and cloud computing are elaborated. Then the structure of the cloud manufacturing service platform is proposed based on Hadoop, the information model of manufacturing resources in cloud manufacturing is defined, and the data cloud in the cloud manufacturing service platform is designed. The distributed storage of mass sensory data is implemented and a universal distributed computing model of mass sensory data is established based on the characteristics of Hadoop Distributed File System (HDFS).

APA, Harvard, Vancouver, ISO, and other styles

44

Jayakumar, N., and A. M. Kulkarni. "A Simple Measuring Model for Evaluating the Performance of Small Block Size Accesses in Lustre File System." Engineering, Technology & Applied Science Research 7, no. 6 (2017): 2313–18. http://dx.doi.org/10.48084/etasr.1557.

Full text

Abstract:

Storage performance is one of the vital characteristics of a big data environment. Data throughput can be increased to some extent using storage virtualization and parallel data paths. Technology has enhanced the various SANs and storage topologies to be adaptable for diverse applications that improve end to end performance. In big data environments the mostly used file systems are HDFS (Hadoop Distributed File System) and Lustre. There are environments in which both HDFS and Lustre are connected, and the applications directly work on Lustre. In Lustre architecture with out-of-band storage virtualization system, the separation of data path from metadata path is acceptable (and even desirable) for large files since one MDT (Metadata Target) open RPC is typically a small fraction of the total number of read or write RPCs. This hurts small file performance significantly when there is only a single read or write RPC for the file data. Since applications require data for processing and considering in-situ architecture which brings data or metadata close to applications for processing, how the in-situ processing can be exploited in Lustre is the domain of this dissertation work. The earlier research exploited Lustre supporting in-situ processing when Hadoop/MapReduce is integrated with Lustre, but still, the scope of performance improvement existed in Lustre. The aim of the research is to check whether it is feasible and beneficial to move the small files to the MDT so that additional RPCs and I/O overhead can be eliminated, and read/write performance of Lustre file system can be improved.

APA, Harvard, Vancouver, ISO, and other styles

45

Jayakumar, N., and A. M. Kulkarni. "A Simple Measuring Model for Evaluating the Performance of Small Block Size Accesses in Lustre File System." Engineering, Technology & Applied Science Research 7, no. 6 (2017): 2313–18. https://doi.org/10.5281/zenodo.1118996.

Full text

Abstract:

Storage performance is one of the vital characteristics of a big data environment. Data throughput can be increased to some extent using storage virtualization and parallel data paths. Technology has enhanced the various SANs and storage topologies to be adaptable for diverse applications that improve end to end performance. In big data environments the mostly used file systems are HDFS (Hadoop Distributed File System) and Lustre. There are environments in which both HDFS and Lustre are connected, and the applications directly work on Lustre. In Lustre architecture with out-of-band storage virtualization system, the separation of data path from metadata path is acceptable (and even desirable) for large files since one MDT (Metadata Target) open RPC is typically a small fraction of the total number of read or write RPCs. This hurts small file performance significantly when there is only a single read or write RPC for the file data. Since applications require data for processing and considering in-situ architecture which brings data or metadata close to applications for processing, how the in-situ processing can be exploited in Lustre is the domain of this dissertation work. The earlier research exploited Lustre supporting in-situ processing when Hadoop/MapReduce is integrated with Lustre, but still, the scope of performance improvement existed in Lustre. The aim of the research is to check whether it is feasible and beneficial to move the small files to the MDT so that additional RPCs and I/O overhead can be eliminated, and read/write performance of Lustre file system can be improved.

APA, Harvard, Vancouver, ISO, and other styles

46

Lee, Sungchul, Ju-Yeon Jo, and Yoohwan Kim. "Hadoop Performance Analysis Model with Deep Data Locality." Information 10, no. 7 (2019): 222. http://dx.doi.org/10.3390/info10070222.

Full text

Abstract:

Background: Hadoop has become the base framework on the big data system via the simple concept that moving computation is cheaper than moving data. Hadoop increases a data locality in the Hadoop Distributed File System (HDFS) to improve the performance of the system. The network traffic among nodes in the big data system is reduced by increasing a data-local on the machine. Traditional research increased the data-local on one of the MapReduce stages to increase the Hadoop performance. However, there is currently no mathematical performance model for the data locality on the Hadoop. Methods: This study made the Hadoop performance analysis model with data locality for analyzing the entire process of MapReduce. In this paper, the data locality concept on the map stage and shuffle stage was explained. Also, this research showed how to apply the Hadoop performance analysis model to increase the performance of the Hadoop system by making the deep data locality. Results: This research proved the deep data locality for increasing performance of Hadoop via three tests, such as, a simulation base test, a cloud test and a physical test. According to the test, the authors improved the Hadoop system by over 34% by using the deep data locality. Conclusions: The deep data locality improved the Hadoop performance by reducing the data movement in HDFS.

APA, Harvard, Vancouver, ISO, and other styles

47

R., Sivaprasath, Vedeshvar L., Bharath M. D., Achari Magesh, and Anisha C. D. "Distributed Resource Management in Operating Systems: A Case Study on HDFS and YARN." Recent Research Reviews Journal 4, no. 1 (2025): 141–53. https://doi.org/10.36548/rrrj.2025.1.009.

Full text

Abstract:

This research study focuses on analysing the role of distributed resource management in enhancing the scalability and reliability of the linked systems. This study presents a detailed analysis on the architectures, benefits, and inherent drawbacks of the Hadoop Distributed File System (HDFS) and Yet Another Resource Negotiator (YARN). YARN offers flexible resource scheduling through Fair and Capacity schedulers, while HDFS offers fault-tolerant, scalable storage through a block-based, replicated, and locality-optimized design. Although robust, limitations like resource contention in YARN and the Name Node's single point of failure in HDFS still exist. In order to address the evolving challenges in modern computing, this study also explores the potential research domains like serverless architecture for dynamic scaling, latency-conscious edge computing, and AI-based resource forecasting.

APA, Harvard, Vancouver, ISO, and other styles

48

Li, Fengxia, Zhi Qu, and Ruiling Li. "Medical Cloud Computing Data Processing to Optimize the Effect of Drugs." Journal of Healthcare Engineering 2021 (March 19, 2021): 1–15. http://dx.doi.org/10.1155/2021/5560691.

Full text

Abstract:

In recent years, cloud computing technology is maturing in the process of growing. Hadoop originated from Apache Nutch and is an open-source cloud computing platform. Moreover, the platform is characterized by large scale, virtualization, strong stability, strong versatility, and support for scalability. It is necessary and far-reaching, based on the characteristics of unstructured medical images, to combine content-based medical image retrieval with the Hadoop cloud platform to conduct research. This study combines the impact mechanism of senile dementia vascular endothelial cells with cloud computing to construct a corresponding data retrieval platform of the cloud computing image set. Moreover, this study uses Hadoop’s core framework distributed file system HDFS to upload images, store the images in the HDFS and image feature vectors in HBase, and use MapReduce programming mode to perform parallel retrieval, and each of the nodes cooperates with each other. The results show that the proposed method has certain effects and can be applied to medical research.

APA, Harvard, Vancouver, ISO, and other styles

49

Prabowo, Sidik, and Maman Abdurohman. "Studi Perbandingan Performa Algoritma Penjadwalan untuk Real Time Data Twitter pada Hadoop." Komputika : Jurnal Sistem Komputer 9, no. 1 (2020): 43–50. http://dx.doi.org/10.34010/komputika.v9i1.2848.

Full text

Abstract:

Hadoop merupakan sebuah framework software yang bersifat open source dan berbasis java. Hadoop terdiri atas dua komponen utama, yaitu MapReduce dan Hadoop Distributed File System (HDFS). MapReduce terdiri atas Map dan Reduce yang digunakan untuk pemrosesan data, sementara HDFS adalah tempat atau direktori dimana data hadoop dapat disimpan. Dalam menjalankan job yang tidak jarang terdapat keragaman karakteristik eksekusinya, diperlukan job scheduler yang tepat. Terdapat banyak job scheduler yang dapat di pilih supaya sesuai dengan karakteristik job. Fair Scheduler menggunakan salah satu scheduler dimana prisnsipnya memastikan suatu jobs akan mendapatkan resource yang sama dengan jobs yang lain, dengan tujuan meningkatkan performa dari segi Average Completion Time. Hadoop Fair Sojourn Protocol Scheduler adalah sebuah algoritma scheduling dalam Hadoop yang dapat melakukan scheduling berdasarkan ukuran jobs yang diberikan. Penelitian ini bertujuan untuk melihat perbandingan performa kedua scheduler tersebut untuk karakteristik data twitter. Hasil pengujian menunjukan Hadoop Fair Sojourn Protocol Scheduler memiliki performansi lebih baik dibandingkan Fair Scheduler baik dari penanganan average completion time sebesar 9,31% dan job throughput sebesar 23,46%. Kemudian untuk Fair Scheduler unggul dalam parameter task fail rate sebesar 23,98%.

APA, Harvard, Vancouver, ISO, and other styles

50

Li, Xiaolu, Zuoru Yang, Jinhong Li, et al. "Repair Pipelining for Erasure-coded Storage: Algorithms and Evaluation." ACM Transactions on Storage 17, no. 2 (2021): 1–29. http://dx.doi.org/10.1145/3436890.

Full text

Abstract:

We propose repair pipelining , a technique that speeds up the repair performance in general erasure-coded storage. By carefully scheduling the repair of failed data in small-size units across storage nodes in a pipelined manner, repair pipelining reduces the single-block repair time to approximately the same as the normal read time for a single block in homogeneous environments. We further design different extensions of repair pipelining algorithms for heterogeneous environments and multi-block repair operations. We implement a repair pipelining prototype, called ECPipe , and integrate it as a middleware system into two versions of Hadoop Distributed File System (HDFS) (namely, HDFS-RAID and HDFS-3) as well as Quantcast File System. Experiments on a local testbed and Amazon EC2 show that repair pipelining significantly improves the performance of degraded reads and full-node recovery over existing repair techniques.

APA, Harvard, Vancouver, ISO, and other styles

We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!