To see the other types of publications on this topic, follow the link: Hadoop Distributed File System.

Journal articles on the topic 'Hadoop Distributed File System'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 journal articles for your research on the topic 'Hadoop Distributed File System.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse journal articles on a wide variety of disciplines and organise your bibliography correctly.

1

Giri, Pratit Raj, and Gajendra Sharma. "Apache Hadoop Architecture, Applications, and Hadoop Distributed File System." Semiconductor Science and Information Devices 4, no. 1 (May 18, 2022): 14. http://dx.doi.org/10.30564/ssid.v4i1.4619.

Full text
Abstract:
The data and internet are highly growing which causes problems in management of the big-data. For these kinds of problems, there are many software frameworks used to increase the performance of the distributed system. This software is used for the availability of large data storage. One of the most beneficial software frameworks used to utilize data in distributed systems is Hadoop. This paper introduces Apache Hadoop architecture, components of Hadoop, their significance in managing vast volumes of data in a distributed system. Hadoop Distributed File System enables the storage of enormous chunks of data over a distributed network. Hadoop Framework maintains fsImage and edits files, which supports the availability and integrity of data. This paper includes cases of Hadoop implementation, such as monitoring weather, processing bioinformatics.
APA, Harvard, Vancouver, ISO, and other styles
2

Bartus, Paul. "Using Hadoop Distributed and Deduplicated File System (HD2FS) in Astronomy." Proceedings of the International Astronomical Union 15, S367 (December 2019): 464–66. http://dx.doi.org/10.1017/s1743921321000387.

Full text
Abstract:
AbstractDuring the last years, the amount of data has skyrocketed. As a consequence, the data has become more expensive to store than to generate. The storage needs for astronomical data are also following this trend. Storage systems in Astronomy contain redundant copies of data such as identical files or within sub-file regions. We propose the use of the Hadoop Distributed and Deduplicated File System (HD2FS) in Astronomy. HD2FS is a deduplication storage system that was created to improve data storage capacity and efficiency in distributed file systems without compromising Input/Output performance. HD2FS can be developed by modifying existing storage system environments such as the Hadoop Distributed File System. By taking advantage of deduplication technology, we can better manage the underlying redundancy of data in astronomy and reduce the space needed to store these files in the file systems, thus allowing for more capacity per volume.
APA, Harvard, Vancouver, ISO, and other styles
3

Wu, Zhen Quan, and Bing Pan. "Research of Distributed Search Engine Based on Hadoop." Applied Mechanics and Materials 631-632 (September 2014): 171–74. http://dx.doi.org/10.4028/www.scientific.net/amm.631-632.171.

Full text
Abstract:
Combined with the Map/Reduce programming model, the Hadoop distributed file system, Lucene inverted file indexing technology and ICTCLAS Chinese word segmentation technology, we designed and implemented a distributed search engine system based on Hadoop. By testing of the system in the four-node Hadoop cluster environment, experimental results show that Hadoop platform can be used in search engines to improve system performance, reliability and scalability.
APA, Harvard, Vancouver, ISO, and other styles
4

Gemayel, Nader. "Analyzing Google File System and Hadoop Distributed File System." Research Journal of Information Technology 8, no. 3 (March 1, 2016): 66–74. http://dx.doi.org/10.3923/rjit.2016.66.74.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Kapil, Gayatri, Alka Agrawal, Abdulaziz Attaallah, Abdullah Algarni, Rajeev Kumar, and Raees Ahmad Khan. "Attribute based honey encryption algorithm for securing big data: Hadoop distributed file system perspective." PeerJ Computer Science 6 (February 17, 2020): e259. http://dx.doi.org/10.7717/peerj-cs.259.

Full text
Abstract:
Hadoop has become a promising platform to reliably process and store big data. It provides flexible and low cost services to huge data through Hadoop Distributed File System (HDFS) storage. Unfortunately, absence of any inherent security mechanism in Hadoop increases the possibility of malicious attacks on the data processed or stored through Hadoop. In this scenario, securing the data stored in HDFS becomes a challenging task. Hence, researchers and practitioners have intensified their efforts in working on mechanisms that would protect user’s information collated in HDFS. This has led to the development of numerous encryption-decryption algorithms but their performance decreases as the file size increases. In the present study, the authors have enlisted a methodology to solve the issue of data security in Hadoop storage. The authors have integrated Attribute Based Encryption with the honey encryption on Hadoop, i.e., Attribute Based Honey Encryption (ABHE). This approach works on files that are encoded inside the HDFS and decoded inside the Mapper. In addition, the authors have evaluated the proposed ABHE algorithm by performing encryption-decryption on different sizes of files and have compared the same with existing ones including AES and AES with OTP algorithms. The ABHE algorithm shows considerable improvement in performance during the encryption-decryption of files.
APA, Harvard, Vancouver, ISO, and other styles
6

Han, Yong Qi, Yun Zhang, and Shui Yu. "Research of Cloud Storage Based on Hadoop Distributed File System." Applied Mechanics and Materials 513-517 (February 2014): 2472–75. http://dx.doi.org/10.4028/www.scientific.net/amm.513-517.2472.

Full text
Abstract:
This paper discusses the application of cloud computing technology to store large amount of data in agricultural remote training video and other multimedia data, using four computers to build a Hadoop cloud platform, focus on the Hadoop Distributed File System (HDFS) principle and file storage, to achieve massive agricultural multimedia data storage.
APA, Harvard, Vancouver, ISO, and other styles
7

Wu, Xing, and Mengqi Pei. "Image File Storage System Resembling Human Memory." International Journal of Software Science and Computational Intelligence 7, no. 2 (April 2015): 70–84. http://dx.doi.org/10.4018/ijssci.2015040104.

Full text
Abstract:
Big Data era is characterized by the explosive increase of image files on the Internet, massive image files bring great challenges to storage. It is required not only the storage efficiency of massive image files but also the accuracy and robustness of massive image file management and retrieval. To meet these requirements, distributed image file storage system based on cognitive theory is proposed. According to the human brain function, humans can correlate image files with thousands of distinct object and action categories and sorted store these files. Thus the authors proposed to sorted store image files according to different visual categories based on human cognition to resemble human memory. The experimental results demonstrate that the proposed distributed image file system (DIFS) based on cognition performs better than Hadoop Distributed File System (HDFS) and FastDFS.
APA, Harvard, Vancouver, ISO, and other styles
8

Hussain, G. Fayaz, and Tarakeswar T. "File Systems and Hadoop Distributed File System in Big Data." IJARCCE 5, no. 12 (December 30, 2016): 36–40. http://dx.doi.org/10.17148/ijarcce.2016.51207.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Ahlawat, Deepak, and Deepali Gupta. "Big Data Clustering and Hadoop Distributed File System Architecture." Journal of Computational and Theoretical Nanoscience 16, no. 9 (September 1, 2019): 3824–29. http://dx.doi.org/10.1166/jctn.2019.8256.

Full text
Abstract:
Due to advancement in the technological world, there is a great surge in data. The main sources of generating such a large amount of data are social websites, internet sites etc. The large data files are combined together to create a big data architecture. Managing the data file in such a large volume is not easy. Therefore, modern techniques are developed to manage bulk data. To arrange and utilize such big data, Hadoop Distributed File System (HDFS) architecture from Hadoop was presented in the early stage of 2015. This architecture is used when traditional methods are insufficient to manage the data. In this paper, a novel clustering algorithm is implemented to manage a large amount of data. The concepts and frames of Big Data are studied. A novel algorithm is developed using the K means and cosine-based similarity clustering in this paper. The developed clustering algorithm is evaluated using the precision and recall parameters. The prominent results are obtained which successfully manages the big data issue.
APA, Harvard, Vancouver, ISO, and other styles
10

Awasthi, Yogesh. "Enhancing approach for information security in Hadoop." Ukrainian Journal of Educational Studies and Information Technology 8, no. 1 (March 27, 2020): 39–49. http://dx.doi.org/10.32919/uesit.2020.01.04.

Full text
Abstract:
Hadoop, is one of the ongoing patterns in innovation which is utilized as a structure for the distributed storage, is an open-source appropriated figuring structure actualized in Java and comprises two modules that are MapReduce and Hadoop Distributed File System (HDFS). The MapReduce is intended for handling with enormous informational collections, it enables the clients to utilize a huge number of item machines in parallel adequately, by just characterizing guides and lessening capacities, the client can prepare huge amounts of information, HDFS is for putting away information on appropriated bunches of machines. Hadoop is normally utilized in a huge bunch or an open cloud administration, for example, Yahoo!, Facebook, Twitter, and Amazon. The versatility of Hadoop has been appeared by the notoriety of these applications, yet it is structured without security for putting away information. Using the Hadoop package, a proposed secure cloud computing system has been designed, so Hadoop would use the area to establish and enhance the security for saving and managing user data. Apache had also produced a software tool termed Hadoop to overcome this big data problem which often uses MapReduce architecture to process vast amounts of data. Hadoop has no strategy to assure the security and privacy of the files stored in Hadoop distributed file system (HDFS). As an encryption scheme of the files stored in HDFS, an asymmetric key cryptosystem is advocated. Thus before saving data in HDFS, the proposed hybrid based on RSA, and Rabin cipher would encrypt the data. The user of the cloud might upload files in two ways, non-safe or secure upload files.
APA, Harvard, Vancouver, ISO, and other styles
11

Awasthi, Yogesh, and Ashish Sharma. "Enhancing Approach for Information Security in Hadoop." JITCE (Journal of Information Technology and Computer Engineering) 4, no. 01 (March 30, 2020): 5–9. http://dx.doi.org/10.25077/jitce.4.01.5-9.2020.

Full text
Abstract:
Using the Hadoop package, a proposed secure cloud computing system was designed. Hadoop would use the area to establish and enhance the security for saving and managing user data. Apache had also produced a software tool termed Hadoop to overcome this big data problem which often uses MapReduce architecture to process vast amounts of data. Hadoop has no strategy to assure the security and privacy of the files stored in Hadoop distributed File system (HDFS). As an encryption scheme of the files stored in HDFS, an asymmetric key cryptosystem is advocated. Thus before saving data in HDFS, the proposed hybrid based on (RSA, and Rabin) cipher would encrypt the data. The user of the cloud might upload files in two ways, non-safe or secure upload files.
APA, Harvard, Vancouver, ISO, and other styles
12

Goyal, Shubh. "Using HDFS to Load, Search, and Retrieve Data from Local Data Nodes." International Journal for Research in Applied Science and Engineering Technology 9, no. 11 (November 30, 2021): 656–59. http://dx.doi.org/10.22214/ijraset.2021.38877.

Full text
Abstract:
Abstract: By utilizing the Hadoop environment, data may be loaded and searched from local data nodes. Because the dataset's capacity may be vast, loading and finding data using a query is often more difficult. We suggest a method for dealing with data in local nodes that does not overlap with data acquired by script. The query's major purpose is to store information in a distributed environment and look for it quickly. In this section, we define the script to eliminate duplicate data redundancy when searching and loading data in a dynamic manner. In addition, the Hadoop file system is available in a distributed environment. Keywords: HDFS; Hadoop distributed file system; replica; local; distributed; capacity; SQL; redundancy
APA, Harvard, Vancouver, ISO, and other styles
13

Sun, Jun Xiong, Yan Chen, Tao Ying Li, Ren Yuan Wang, and Peng Hui Li. "Distributed File Information Management System Based on Hadoop." Advanced Materials Research 756-759 (September 2013): 820–23. http://dx.doi.org/10.4028/www.scientific.net/amr.756-759.820.

Full text
Abstract:
There are two main problems to store the system data on single machine: limited storage space and low reliability. The concept of distribution solves the two problems fundamentally. Many independent machines are integrated as a whole. As a result, these separated resources are integrated together. This paper focuses on developing a system, based on SSH, XFire and Hadoop, to help users store and manage the distributed files. All the files stored in HDFS should be encrypted to protect users privacy. In order to save resources, system is designed to avoid uploading the duplicate files by checking the files MD5 string.
APA, Harvard, Vancouver, ISO, and other styles
14

Lu, An Sheng, Jian Jiang Cai, Wei Jin, and Lu Wang. "Research and Practice of Cloud Computing Based on Hadoop." Applied Mechanics and Materials 644-650 (September 2014): 3387–89. http://dx.doi.org/10.4028/www.scientific.net/amm.644-650.3387.

Full text
Abstract:
Hadoop is a distributed parallel processing of massive data computing platform. It is currently the most widely used cloud computing platform. This paper analyses and studies the Hadoop distributed file system HDFS and the calculation model of MapReduce on Hadoop platform and the cloud computing model that is based on Hadoop, the paper introduces the process of building cloud computing platform, Hadoop, operating environment, and proposing the implementation.
APA, Harvard, Vancouver, ISO, and other styles
15

Bende, Sachin, and Rajashree Shedge. "Dealing with Small Files Problem in Hadoop Distributed File System." Procedia Computer Science 79 (2016): 1001–12. http://dx.doi.org/10.1016/j.procs.2016.03.127.

Full text
APA, Harvard, Vancouver, ISO, and other styles
16

CAO, Ning, Zhong-hai WU, Hong-zhi LIU, and Qi-xun ZHANG. "Improving downloading performance in hadoop distributed file system." Journal of Computer Applications 30, no. 8 (September 2, 2010): 2060–65. http://dx.doi.org/10.3724/sp.j.1087.2010.02060.

Full text
APA, Harvard, Vancouver, ISO, and other styles
17

Hu, Daming, Deyun Chen, Shuhui Lou, and Shujun Pei. "Research on Reliability of Hadoop Distributed File System." International Journal of Multimedia and Ubiquitous Engineering 10, no. 11 (November 30, 2015): 315–26. http://dx.doi.org/10.14257/ijmue.2015.10.11.30.

Full text
APA, Harvard, Vancouver, ISO, and other styles
18

Lakshmi Siva Rama Krishna, T., J. Priyanka, N. Nikhil Teja, Sd Mahiya Sultana, and B. Jabber. "An Efficient Data Replication Scheme for Hadoop Distributed File System." International Journal of Engineering & Technology 7, no. 2.32 (May 31, 2018): 167. http://dx.doi.org/10.14419/ijet.v7i2.32.15396.

Full text
Abstract:
A Distributed file system (DFS) is a storage component of a distributed system (DS). DS consists of multiple autonomous nodes connected via a communication network to solve large problems and to achieve more computing power. One of the design requirement of any DS is to provide replicas. In this paper, we propose a new replication algorithm which is more reliable than the existing replication algorithm used in DFS. The advantages of our proposed replication algorithm by incrementing nodes sequentially (RAINS) is that it distributes the storage load equally among all the nodes sequentially and it guarantees a replica copy in case two racks in a DS are down. This feature is not available in the existing DFS. We have compared existing replication algorithm used by Hadoop distributed file system (HDFS) with our proposed RAINS algorithm. The experimental results indicate that our proposed RAINS algorithm performs better when more number of racks failed in the DS.
APA, Harvard, Vancouver, ISO, and other styles
19

Traynor, Daniel, and Terry Froy. "Using Lustre and Slurm to process Hadoop workloads and extending to the WLCG." EPJ Web of Conferences 214 (2019): 04049. http://dx.doi.org/10.1051/epjconf/201921404049.

Full text
Abstract:
The Queen Mary University of London Grid site has investigated the use of its Lustre file system to support Hadoop work flows. Lustre is an open source, POSIX compatible, clustered file system often used in high performance computing clusters and is often paired with the Slurm batch system. Hadoop is an open-source software framework for distributed storage and processing of data normally run on dedicated hardware utilising the HDFS file system and Yarn batch system. Hadoop is an important modern tool for data analytics used by a large range of organisation including CERN. By using our existing Lustre file system and Slurm batch system, the need to have dedicated hardware is removed and a single platform only has to be maintained for data storage and processing. The motivation and benefits of using Hadoop with Lustre and Slurm are presented. The installation, benchmarks, limitations and future plans are discussed. We also investigate using the standard WLCG Grid middleware Cream-CE service to provide a Grid enabled Hadoop service.
APA, Harvard, Vancouver, ISO, and other styles
20

Maurya, Jay. "Contributions to Hadoop File System Architecture by Revising the File System Usage Along with Automatic Service." ECS Transactions 107, no. 1 (April 24, 2022): 2903–10. http://dx.doi.org/10.1149/10701.2903ecst.

Full text
Abstract:
The usage of unstructured data is becoming obvious by companies and social media is raised heavily from past decade. The sharing of images, audio, and video content by the individual user and corporate can be observed everywhere. The current work focused on the Hadoop framework revision contributions so as to improve the performance of the eco system in the context of space and time parameters. The architecture basically provides the usage of Hadoop Distributed File System (HDFS) and MapReduce (MR). We are proposing certain revision contributions so that the process of importing and processing of the tasks can get the benefit of time and space usage in the effective and efficient manner. The work provides the service running in two different ways which reduces the time requirements of the cluster management. In the distributed environment, this revision helps in the reduction of waiting time for the start of the service. The other context we have focused on, the local file system handler in the data storage and processing of the data, the provision of using the file system according to the proposed architecture, will handle the CPU context switch while performing the import and export process in the running of the jobs. The outcome of the work is revision architecture to reflect the service initiation by all the machines in the cluster and file system revision approach to minimize the CPU context switch, while performing the storage and processing relevant aspects of the Hadoop cluster.
APA, Harvard, Vancouver, ISO, and other styles
21

Hou, Qing, Lei Pan, Jia Xi Xu, and Kai Zhou. "Educational Resources Cloud Platform Based on Hadoop." Advanced Materials Research 912-914 (April 2014): 1249–53. http://dx.doi.org/10.4028/www.scientific.net/amr.912-914.1249.

Full text
Abstract:
As the traditional educational resources platform exists some deficiencies in storage, parallel processing, and cost, we designed a cloud platform of educational resources based on Hadoop framework.The platform applied HDFS distributed file system to store massive data distributedly and applied MapReduce distributed programming framework to process data parallelly and scheduler resources, which solved the mass storage of resources and improved the efficiency of various resources retrieval.
APA, Harvard, Vancouver, ISO, and other styles
22

Liao, Wenzhe. "Application of Hadoop in the Document Storage Management System for Telecommunication Enterprise." International Journal of Interdisciplinary Telecommunications and Networking 8, no. 2 (April 2016): 58–68. http://dx.doi.org/10.4018/ijitn.2016040106.

Full text
Abstract:
In view of the information management processor a telecommunication enterprise, how to properly store electronic documents is a challenge. This paper presents the design of a document storage management system based on Hadoop, which uses the distributed file system HDFS and the distributed database HBase, to achieve efficient access to electronic office documents in a steel structure enterprise. This paper also describes an automatic small files merge method using HBase, which simplifies the process of artificial periodic joining of small files, resulting in improved system efficiency.
APA, Harvard, Vancouver, ISO, and other styles
23

Ren, Yitong, Zhaojun Gu, Zhi Wang, Zhihong Tian, Chunbo Liu, Hui Lu, Xiaojiang Du, and Mohsen Guizani. "System Log Detection Model Based on Conformal Prediction." Electronics 9, no. 2 (January 31, 2020): 232. http://dx.doi.org/10.3390/electronics9020232.

Full text
Abstract:
With the rapid development of the Internet of Things, the combination of the Internet of Things with machine learning, Hadoop and other fields are current development trends. Hadoop Distributed File System (HDFS) is one of the core components of Hadoop, which is used to process files that are divided into data blocks distributed in the cluster. Once the distributed log data are abnormal, it will cause serious losses. When using machine learning algorithms for system log anomaly detection, the output of threshold-based classification models are only normal or abnormal simple predictions. This paper used the statistical learning method of conformity measure to calculate the similarity between test data and past experience. Compared with detection methods based on static threshold, the statistical learning method of the conformity measure can dynamically adapt to the changing log data. By adjusting the maximum fault tolerance, a system administrator can better manage and monitor the system logs. In addition, the computational efficiency of the statistical learning method for conformity measurement was improved. This paper implemented an intranet anomaly detection model based on log analysis, and conducted trial detection on HDFS data sets quickly and efficiently.
APA, Harvard, Vancouver, ISO, and other styles
24

Hanafi, Idris, and Amal Abdel-Raouf. "P-Codec: Parallel Compressed File Decompression Algorithm for Hadoop." INTERNATIONAL JOURNAL OF COMPUTERS & TECHNOLOGY 15, no. 8 (May 24, 2016): 6991–98. http://dx.doi.org/10.24297/ijct.v15i8.1500.

Full text
Abstract:
The increasing amount and size of data being handled by data analytic applications running on Hadoop has created a need for faster data processing. One of the effective methods for handling big data sizes is compression. Data compression not only makes network I/O processing faster, but also provides better utilization of resources. However, this approach defeats one of Hadoop’s main purposes, which is the parallelism of map and reduce tasks. The number of map tasks created is determined by the size of the file, so by compressing a large file, the number of mappers is reduced which in turn decreases parallelism. Consequently, standard Hadoop takes longer times to process. In this paper, we propose the design and implementation of a Parallel Compressed File Decompressor (P-Codec) that improves the performance of Hadoop when processing compressed data. P-Codec includes two modules; the first module decompresses data upon retrieval by a data node during the phase of uploading the data to the Hadoop Distributed File System (HDFS). This process reduces the runtime of a job by removing the burden of decompression during the MapReduce phase. The second P-Codec module is a decompressed map task divider that increases parallelism by dynamically changing the map task split sizes based on the size of the final decompressed block. Our experimental results using five different MapReduce benchmarks show an average improvement of approximately 80% compared to standard Hadoop.
APA, Harvard, Vancouver, ISO, and other styles
25

Bhathal, Gurjit Singh, and Amardeep Singh Dhiman. "Big Data Security Challenges and Solution of Distributed Computing in Hadoop Environment: A Security Framework." Recent Advances in Computer Science and Communications 13, no. 4 (October 19, 2020): 790–97. http://dx.doi.org/10.2174/2213275912666190822095422.

Full text
Abstract:
Background: In current scenario of internet, large amounts of data are generated and processed. Hadoop framework is widely used to store and process big data in a highly distributed manner. It is argued that Hadoop Framework is not mature enough to deal with the current cyberattacks on the data. Objective: The main objective of the proposed work is to provide a complete security approach comprising of authorisation and authentication for the user and the Hadoop cluster nodes and to secure the data at rest as well as in transit. Methods: The proposed algorithm uses Kerberos network authentication protocol for authorisation and authentication and to validate the users and the cluster nodes. The Ciphertext-Policy Attribute- Based Encryption (CP-ABE) is used for data at rest and data in transit. User encrypts the file with their own set of attributes and stores on Hadoop Distributed File System. Only intended users can decrypt that file with matching parameters. Results: The proposed algorithm was implemented with data sets of different sizes. The data was processed with and without encryption. The results show little difference in processing time. The performance was affected in range of 0.8% to 3.1%, which includes impact of other factors also, like system configuration, the number of parallel jobs running and virtual environment. Conclusion: The solutions available for handling the big data security problems faced in Hadoop framework are inefficient or incomplete. A complete security framework is proposed for Hadoop Environment. The solution is experimentally proven to have little effect on the performance of the system for datasets of different sizes.
APA, Harvard, Vancouver, ISO, and other styles
26

Aung, Ohnmar, and Thandar Thein. "Enhancing NameNode Fault Tolerance in Hadoop Distributed File System." International Journal of Computer Applications 87, no. 12 (February 14, 2014): 41–47. http://dx.doi.org/10.5120/15264-4020.

Full text
APA, Harvard, Vancouver, ISO, and other styles
27

Chang, Ruay-Shiung, Chih-Shan Liao, Kuo-Zheng Fan, and Chia-Ming Wu. "Dynamic Deduplication Decision in a Hadoop Distributed File System." International Journal of Distributed Sensor Networks 10, no. 4 (January 2014): 630380. http://dx.doi.org/10.1155/2014/630380.

Full text
APA, Harvard, Vancouver, ISO, and other styles
28

Saranya, S., M. Sarumathi, B. Swathi, P. Victer Paul, S. Sampath Kumar, and T. Vengattaraman. "Dynamic Preclusion of Encroachment in Hadoop Distributed File System." Procedia Computer Science 50 (2015): 531–36. http://dx.doi.org/10.1016/j.procs.2015.04.027.

Full text
APA, Harvard, Vancouver, ISO, and other styles
29

Achandair, O., S. Bourekkadi, E. Elmahouti, S. Khoulji, and M. L. Kerkeb. "solution for the future: small file management by optimizing Hadoop." International Journal of Engineering & Technology 7, no. 2.6 (March 11, 2018): 221. http://dx.doi.org/10.14419/ijet.v7i2.6.10773.

Full text
Abstract:
Hadoop Distributed File System (HDFS) is designed to reliably store very large files across machines in a large cluster. It is one of the most used distributed file systems and offer a high availability and scalability on low-cost hardware. All Hadoopframework have HDFS as their storage component. Coupled with map reduce, which is the processing component, HDFS and Map Reduce (a processing component) have become the standard platforms for any management of big data in these days. HDFS however, in terms of design has the ability to handle huge numbers of large files, but when it comes to its deployments to handle large amounts of small files it might not be very effective. This paper puts forward a new strategy of managing small files. The approach will consists of two principal phases. The first phase will deal with the consolidating of aaclients input files, storing it continuously in a particular allocated block, that is a SequenceFile format, and so on into the next blocks. In this way we avoid the use of multiple block allocations for different streams, this reduces calls for available blocks and also reduces the metadata memory on the NameNode. Note the reason for this is that groups of small files packaged in a SequenceFile on the same block require one entry instead of one of each small file. The second phase will involve analyzing the attributes of stored small files so they can be distributed them in a way that the most called files will be referenced by an additional index as a MapFile format to reduce the read throughput during random access.
APA, Harvard, Vancouver, ISO, and other styles
30

Elshayeb, M., and Leelavathi Rajamanickam. "HDFS Security Approaches and Visualization Tracking." Journal of Engineering & Technological Advances 3, no. 1 (2018): 49–60. http://dx.doi.org/10.35934/segi.v3i1.49.

Full text
Abstract:
Big Data refers to large-scale information management and analysis technologies that exceed the capability of traditional data processing technologies. In order to analyse complex data and to identify patterns it is very important to securely store, manage, and share large amounts of complex data. In recent years an increasing of database size according to the various forms (text, images and videos), in huge volumes and with high velocity, the services issues that use internet and desires big data come to leading edge (data-intensive services), (HDFS) Apache’s Hadoop distributed file system is in progress as outstanding software component for cloud computing joint with integrated pieces such as MapReduce. GoogleMapReduce implemented an open source which is Hadoop, having a distributed file system, present to software programmers the perception of the map and reduce. The research shows the security approaches for Big Data Hadoop distributed file system and the best security solution, also this research will help business by big data visualization which will help in better data analysis. In today’s data-centric world, big-data processing and analytics have become critical to most enterprise and government applications.
APA, Harvard, Vancouver, ISO, and other styles
31

Chaudhary, Abhay, K. R. Batwada Batwada, Namita Mittal, and Emmanuel S. Pilli. "AdMap: a framework for advertising using MapReduce pipeline." Computer Science and Information Technologies 3, no. 2 (July 1, 2022): 82–93. http://dx.doi.org/10.11591/csit.v3i2.p82-93.

Full text
Abstract:
There is a vast collection of data for consumers due to tremendous development in digital marketing. For their ads or for consumers to validate nearby services which already are upgraded to the dataset systems, consumers are more concerned with the amount of data. Hence there is a void formed between the producer and the client. To fill that void, there is the need for a framework which can facilitate all the needs for query updating of the data. The present systems have some shortcomings by a vast number of information that each time lead to decision tree-based approach. A systematic solution to the automated incorporation of data into a Hadoop distributed file system (HDFS) warehouse (Hadoop file system) includes a data hub server, a generic data charging mechanism and a metadata model. In our model framework, the database would be able to govern the data processing schema. In the future, as a variety of data is archived, the datalake will play a critical role in managing that data. To order to carry out a planned loading function, the setup files immense catalogue move the datahub server together to attach the miscellaneous details dynamically to its schemas.
APA, Harvard, Vancouver, ISO, and other styles
32

Kharlampenkov, I. E., and A. U. Oshchepkov. "Conversion and Display of the Calculated Data of Spectral Remote Sensing Data on the Basis of GeoServer Extensions and Distributed Storage Technologies." Programmnaya Ingeneria 12, no. 2 (March 16, 2021): 107–12. http://dx.doi.org/10.17587/prin.12.107-112.

Full text
Abstract:
The article presents methods for caching and displaying data from spectral satellite images using libraries of distributed computing systems that are part of the Apache Hadoop ecosystem, and GeoServer extensions. The authors gave a brief overview of existing tools that provide the ability to present remote sensing data using distributed information technologies. A distinctive feature is the way to convert remote sensing data inside Apache Parquet files for further display. This approach allows you to interact with the distributed file system via the Kite SDK libraries and switch on additional data processors based on Apache Hadoop technology as external services. A comparative analysis of existing tools, such as: GeoMesa, GeoWawe, etc is performed. The following steps are described: extracting data from Apache Parquet via the Kite SDK, converting this data to GDAL Dataset, iterating the received data, and saving it inside the file system in BIL format. In this article, the BIL format is used for the GeoServer cache. The extension was implemented and published under the Apache License on the GitHub resource. In conclusion, you will find instructions for installing and using the created extension.
APA, Harvard, Vancouver, ISO, and other styles
33

Uriti, Archana, Surya Prakash Yalla, and Chunduru Anilkumar. "Understand the working of Sqoop and hive in Hadoop." Applied and Computational Engineering 6, no. 1 (June 14, 2023): 312–17. http://dx.doi.org/10.54254/2755-2721/6/20230798.

Full text
Abstract:
In past decades, the structured and consistent data analysis has seen huge success. It is a challenging task to analyse the multimedia data which is in unstructured format. Here the big data defines the huge volume of data that can be processed in distributed format. The big data can be analysed by using the hadoop tool which contains the Hadoop Distributed File System (HDFS) storage space and inbuilt several components are there. Hadoop manages the distributed data which is placed in the form of cluster analysis of data itself. In this, it shows the working of Sqoop and Hive in hadoop. Sqoop (SQL-to-Hadoop) is one of the Hadoop component that is designed to efficiently imports the huge data from traditional database to HDFS and vice versa. Hive is an open source software for managing large data files that is stored in HDFS. To show the working, here we are taking the application Instagram which is a most popular social media. In this analyze the data that is generated from Instagram that can be mined and utilized by using Sqoop and Hive. By this, prove that sqoop and hive can give results efficiently. This paper gives the details of sqoop and hive working in hadoop.
APA, Harvard, Vancouver, ISO, and other styles
34

Mao, Yingchi, Bicong Jia, Wei Min, and Jiulong Wang. "Optimization Scheme for Small Files Storage Based on Hadoop Distributed File System." International Journal of Database Theory and Application 8, no. 5 (October 31, 2015): 241–54. http://dx.doi.org/10.14257/ijdta.2015.8.5.21.

Full text
APA, Harvard, Vancouver, ISO, and other styles
35

Wei, Chih-Chiang, and Tzu-Hao Chou. "Typhoon Quantitative Rainfall Prediction from Big Data Analytics by Using the Apache Hadoop Spark Parallel Computing Framework." Atmosphere 11, no. 8 (August 17, 2020): 870. http://dx.doi.org/10.3390/atmos11080870.

Full text
Abstract:
Situated in the main tracks of typhoons in the Northwestern Pacific Ocean, Taiwan frequently encounters disasters from heavy rainfall during typhoons. Accurate and timely typhoon rainfall prediction is an imperative topic that must be addressed. The purpose of this study was to develop a Hadoop Spark distribute framework based on big-data technology, to accelerate the computation of typhoon rainfall prediction models. This study used deep neural networks (DNNs) and multiple linear regressions (MLRs) in machine learning, to establish rainfall prediction models and evaluate rainfall prediction accuracy. The Hadoop Spark distributed cluster-computing framework was the big-data technology used. The Hadoop Spark framework consisted of the Hadoop Distributed File System, MapReduce framework, and Spark, which was used as a new-generation technology to improve the efficiency of the distributed computing. The research area was Northern Taiwan, which contains four surface observation stations as the experimental sites. This study collected 271 typhoon events (from 1961 to 2017). The following results were obtained: (1) in machine-learning computation, prediction errors increased with prediction duration in the DNN and MLR models; and (2) the system of Hadoop Spark framework was faster than the standalone systems (single I7 central processing unit (CPU) and single E3 CPU). When complex computation is required in a model (e.g., DNN model parameter calibration), the big-data-based Hadoop Spark framework can be used to establish highly efficient computation environments. In summary, this study successfully used the big-data Hadoop Spark framework with machine learning, to develop rainfall prediction models with effectively improved computing efficiency. Therefore, the proposed system can solve problems regarding real-time typhoon rainfall prediction with high timeliness and accuracy.
APA, Harvard, Vancouver, ISO, and other styles
36

Suresh, S., and N. P. Gopalan. "Delay Scheduling Based Replication Scheme for Hadoop Distributed File System." International Journal of Information Technology and Computer Science 7, no. 4 (March 8, 2015): 73–78. http://dx.doi.org/10.5815/ijitcs.2015.04.08.

Full text
APA, Harvard, Vancouver, ISO, and other styles
37

Lee, Jungha, Jaehwa Chung, and Daewon Lee. "Efficient Data Replication Scheme based on Hadoop Distributed File System." International Journal of Software Engineering and Its Applications 9, no. 12 (December 31, 2015): 177–86. http://dx.doi.org/10.14257/ijseia.2015.9.12.16.

Full text
APA, Harvard, Vancouver, ISO, and other styles
38

Suganya, S., S. Selvamuthukumaran, S. Swaminathan, and V. Kalaichelvi. "Securing Hadoop Distributed File System using Intelligence based Sensitivity bits." Journal of Physics: Conference Series 1362 (November 2019): 012060. http://dx.doi.org/10.1088/1742-6596/1362/1/012060.

Full text
APA, Harvard, Vancouver, ISO, and other styles
39

Qin, Ren, Gao Jue, Gao Honghao, Bian Minjie, Xu Huahu, and Feng Weibing. "Research on Improved Hadoop Distributed File System in Cloud Rendering." International Journal of Database Theory and Application 9, no. 11 (November 30, 2016): 1–12. http://dx.doi.org/10.14257/ijdta.2016.9.11.01.

Full text
APA, Harvard, Vancouver, ISO, and other styles
40

Al-Masadeh, Mohammad Bahjat, Mohad Sanusi Azmi, and Sharifah Sakinah Syed Ahmad. "Tiny datablock in saving Hadoop distributed file system wasted memory." International Journal of Electrical and Computer Engineering (IJECE) 13, no. 2 (April 1, 2023): 1757. http://dx.doi.org/10.11591/ijece.v13i2.pp1757-1772.

Full text
Abstract:
<p>Hadoop distributed file system (HDFS) is the file system whereby Hadoop is use it to store all the upcoming data inside it. Since it been declared, HDFS is consuming a huge memory amount in order to serve a normal dataset. Nonetheless, the current file saving mechanism in HDFS save only one file in one datablock. Thus, a file with just 5 Mb in size will take up the whole datablock capacity causing the rest of the memory unavailable for other upcoming files, and this is considered a huge waste of memory in serving a normal size dataset. This paper proposed a method called tiny datablock-HDFS (TD-HDFS) to increase the usability of HDFS memory and increase the file hosting capabilities by reducing the datablock size to the minimum capacity, and then merging all the related datablocks into one master datablock. This master datablock consists of tiny virtual datablocks that contain the related small files together; will exploit the full memory of the master datablock. The result of this study is a running HDFS with a minimum amount of wasted memory with the same read/write data performance. The results were examined through a comparison between the standard HDFS file hosting and the proposed solution of this study.</p><textarea id="BFI_DATA" style="width: 1px; height: 1px; display: none;"></textarea><textarea id="BFI_DATA" style="width: 1px; height: 1px; display: none;"></textarea><textarea id="BFI_DATA" style="width: 1px; height: 1px; display: none;"></textarea><textarea id="BFI_DATA" style="width: 1px; height: 1px; display: none;"></textarea><div id="WidgetFloaterPanels" class="LTRStyle" style="display: none; text-align: left; direction: ltr; visibility: hidden;"><div id="WidgetFloater" style="display: none;" onmouseover="Microsoft.Translator.OnMouseOverFloater()" onmouseout="Microsoft.Translator.OnMouseOutFloater()"><div id="WidgetLogoPanel"><span id="WidgetTranslateWithSpan"><span>TRANSLATE with </span><img id="FloaterLogo" alt="" /></span> <span id="WidgetCloseButton" title="Exit Translation" onclick="Microsoft.Translator.FloaterOnClose()">x</span></div><div id="LanguageMenuPanel"><div class="DDStyle_outer"><input id="LanguageMenu_svid" style="display: none;" onclick="this.select()" type="text" name="LanguageMenu_svid" value="en" /> <input id="LanguageMenu_textid" style="display: none;" onclick="this.select()" type="text" name="LanguageMenu_textid" /> <span id="__LanguageMenu_header" class="DDStyle" onclick="return LanguageMenu &amp;&amp; !LanguageMenu.Show('__LanguageMenu_popup', event);" onkeydown="return LanguageMenu &amp;&amp; !LanguageMenu.Show('__LanguageMenu_popup', event);">English</span><div style="position: relative; text-align: left; left: 0;"><div style="position: absolute; ;left: 0px;"><div id="__LanguageMenu_popup" class="DDStyle" style="display: none;"><table id="LanguageMenu" border="0"><tbody><tr><td><a onclick="return LanguageMenu.onclick('ar');" tabindex="-1" href="#ar">Arabic</a></td><td><a onclick="return LanguageMenu.onclick('he');" tabindex="-1" href="#he">Hebrew</a></td><td><a onclick="return LanguageMenu.onclick('pl');" tabindex="-1" href="#pl">Polish</a></td></tr><tr><td><a onclick="return LanguageMenu.onclick('bg');" tabindex="-1" href="#bg">Bulgarian</a></td><td><a onclick="return LanguageMenu.onclick('hi');" tabindex="-1" href="#hi">Hindi</a></td><td><a onclick="return LanguageMenu.onclick('pt');" tabindex="-1" href="#pt">Portuguese</a></td></tr><tr><td><a onclick="return LanguageMenu.onclick('ca');" tabindex="-1" href="#ca">Catalan</a></td><td><a onclick="return LanguageMenu.onclick('mww');" tabindex="-1" href="#mww">Hmong Daw</a></td><td><a onclick="return LanguageMenu.onclick('ro');" tabindex="-1" href="#ro">Romanian</a></td></tr><tr><td><a onclick="return LanguageMenu.onclick('zh-CHS');" tabindex="-1" href="#zh-CHS">Chinese Simplified</a></td><td><a onclick="return LanguageMenu.onclick('hu');" tabindex="-1" href="#hu">Hungarian</a></td><td><a onclick="return LanguageMenu.onclick('ru');" tabindex="-1" href="#ru">Russian</a></td></tr><tr><td><a onclick="return LanguageMenu.onclick('zh-CHT');" tabindex="-1" href="#zh-CHT">Chinese Traditional</a></td><td><a onclick="return LanguageMenu.onclick('id');" tabindex="-1" href="#id">Indonesian</a></td><td><a onclick="return LanguageMenu.onclick('sk');" tabindex="-1" href="#sk">Slovak</a></td></tr><tr><td><a onclick="return LanguageMenu.onclick('cs');" tabindex="-1" href="#cs">Czech</a></td><td><a onclick="return LanguageMenu.onclick('it');" tabindex="-1" href="#it">Italian</a></td><td><a onclick="return LanguageMenu.onclick('sl');" tabindex="-1" href="#sl">Slovenian</a></td></tr><tr><td><a onclick="return LanguageMenu.onclick('da');" tabindex="-1" href="#da">Danish</a></td><td><a onclick="return LanguageMenu.onclick('ja');" tabindex="-1" href="#ja">Japanese</a></td><td><a onclick="return LanguageMenu.onclick('es');" tabindex="-1" href="#es">Spanish</a></td></tr><tr><td><a onclick="return LanguageMenu.onclick('nl');" tabindex="-1" href="#nl">Dutch</a></td><td><a onclick="return LanguageMenu.onclick('tlh');" tabindex="-1" href="#tlh">Klingon</a></td><td><a onclick="return LanguageMenu.onclick('sv');" tabindex="-1" href="#sv">Swedish</a></td></tr><tr><td><a onclick="return LanguageMenu.onclick('en');" tabindex="-1" href="#en">English</a></td><td><a onclick="return LanguageMenu.onclick('ko');" tabindex="-1" href="#ko">Korean</a></td><td><a onclick="return LanguageMenu.onclick('th');" tabindex="-1" href="#th">Thai</a></td></tr><tr><td><a onclick="return LanguageMenu.onclick('et');" tabindex="-1" href="#et">Estonian</a></td><td><a onclick="return LanguageMenu.onclick('lv');" tabindex="-1" href="#lv">Latvian</a></td><td><a onclick="return LanguageMenu.onclick('tr');" tabindex="-1" href="#tr">Turkish</a></td></tr><tr><td><a onclick="return LanguageMenu.onclick('fi');" tabindex="-1" href="#fi">Finnish</a></td><td><a onclick="return LanguageMenu.onclick('lt');" tabindex="-1" href="#lt">Lithuanian</a></td><td><a onclick="return LanguageMenu.onclick('uk');" tabindex="-1" href="#uk">Ukrainian</a></td></tr><tr><td><a onclick="return LanguageMenu.onclick('fr');" tabindex="-1" href="#fr">French</a></td><td><a onclick="return LanguageMenu.onclick('ms');" tabindex="-1" href="#ms">Malay</a></td><td><a onclick="return LanguageMenu.onclick('ur');" tabindex="-1" href="#ur">Urdu</a></td></tr><tr><td><a onclick="return LanguageMenu.onclick('de');" tabindex="-1" href="#de">German</a></td><td><a onclick="return LanguageMenu.onclick('mt');" tabindex="-1" href="#mt">Maltese</a></td><td><a onclick="return LanguageMenu.onclick('vi');" tabindex="-1" href="#vi">Vietnamese</a></td></tr><tr><td><a onclick="return LanguageMenu.onclick('el');" tabindex="-1" href="#el">Greek</a></td><td><a onclick="return LanguageMenu.onclick('no');" tabindex="-1" href="#no">Norwegian</a></td><td><a onclick="return LanguageMenu.onclick('cy');" tabindex="-1" href="#cy">Welsh</a></td></tr><tr><td><a onclick="return LanguageMenu.onclick('ht');" tabindex="-1" href="#ht">Haitian Creole</a></td><td><a onclick="return LanguageMenu.onclick('fa');" tabindex="-1" href="#fa">Persian</a></td><td> </td></tr></tbody></table><img style="height: 7px; width: 17px; border-width: 0px; left: 20px;" alt="" /></div></div></div></div><script type="text/javascript">// <![CDATA[ var LanguageMenu; var LanguageMenu_keys=["ar","bg","ca","zh-CHS","zh-CHT","cs","da","nl","en","et","fi","fr","de","el","ht","he","hi","mww","hu","id","it","ja","tlh","ko","lv","lt","ms","mt","no","fa","pl","pt","ro","ru","sk","sl","es","sv","th","tr","uk","ur","vi","cy"]; var LanguageMenu_values=["Arabic","Bulgarian","Catalan","Chinese Simplified","Chinese Traditional","Czech","Danish","Dutch","English","Estonian","Finnish","French","German","Greek","Haitian Creole","Hebrew","Hindi","Hmong Daw","Hungarian","Indonesian","Italian","Japanese","Klingon","Korean","Latvian","Lithuanian","Malay","Maltese","Norwegian","Persian","Polish","Portuguese","Romanian","Russian","Slovak","Slovenian","Spanish","Swedish","Thai","Turkish","Ukrainian","Urdu","Vietnamese","Welsh"]; var LanguageMenu_callback=function(){ }; var LanguageMenu_popupid='__LanguageMenu_popup'; // ]]></script></div><div id="CTFLinksPanel"><span id="ExternalLinksPanel"><a id="HelpLink" title="Help" href="https://go.microsoft.com/?linkid=9722454" target="_blank"> <img id="HelpImg" alt="" /></a> <a id="EmbedLink" title="Get this widget for your own site" href="javascript:Microsoft.Translator.FloaterShowEmbed()"> <img id="EmbedImg" alt="" /></a> <a id="ShareLink" title="Share translated page with friends" href="javascript:Microsoft.Translator.FloaterShowSharePanel()"> <img id="ShareImg" alt="" /></a> </span></div><div id="FloaterProgressBar"> </div></div><div id="WidgetFloaterCollapsed" style="display: none;" onmouseover="Microsoft.Translator.OnMouseOverFloater()"><span>TRANSLATE with </span><img id="CollapsedLogoImg" alt="" /></div><div id="FloaterSharePanel" style="display: none;"><div id="ShareTextDiv"><span id="ShareTextSpan"> COPY THE URL BELOW </span></div><div id="ShareTextboxDiv"><input id="ShareTextbox" onclick="this.select()" type="text" name="ShareTextbox" readonly="readonly" /> <!--a id="TwitterLink" title="Share on Twitter"> <img id="TwitterImg" /></a> <a-- id="FacebookLink" title="Share on Facebook"> <img id="FacebookImg" /></a--> <a id="EmailLink" title="Email this translation"> <img id="EmailImg" alt="" /></a></div><div id="ShareFooter"><span id="ShareHelpSpan"><a id="ShareHelpLink"> <img id="ShareHelpImg" alt="" /></a></span> <span id="ShareBackSpan"><a id="ShareBack" title="Back To Translation" href="javascript:Microsoft.Translator.FloaterOnShareBackClick()"> Back</a></span></div><input id="EmailSubject" type="hidden" name="EmailSubject" value="Check out this page in {0} translated from {1}" /> <input id="EmailBody" type="hidden" name="EmailBody" value="Translated: {0}%0d%0aOriginal: {1}%0d%0a%0d%0aAutomatic translation powered by Microsoft® Translator%0d%0ahttp://www.bing.com/translator?ref=MSTWidget" /> <input id="ShareHelpText" type="hidden" value="This link allows visitors to launch this page and automatically translate it to {0}." /></div><div id="FloaterEmbed" style="display: none;"><div id="EmbedTextDiv"><span id="EmbedTextSpan">EMBED THE SNIPPET BELOW IN YOUR SITE</span> <a id="EmbedHelpLink" title="Copy this code and place it into your HTML."> <img id="EmbedHelpImg" alt="" /></a></div><div id="EmbedTextboxDiv"><input id="EmbedSnippetTextBox" onclick="this.select()" type="text" name="EmbedSnippetTextBox" value="&lt;div id='MicrosoftTranslatorWidget' class='Dark' style='color:white;background-color:#555555'&gt;&lt;/div&gt;&lt;script type='text/javascript'&gt;setTimeout(function(){var s=document.createElement('script');s.type='text/javascript';s.charset='UTF-8';s.src=((location &amp;&amp; location.href &amp;&amp; location.href.indexOf('https') == 0)?'https://ssl.microsofttranslator.com':'http://www.microsofttranslator.com')+'/ajax/v3/WidgetV3.ashx?siteData=ueOIGRSKkd965FeEGM5JtQ**&amp;ctf=true&amp;ui=true&amp;settings=manual&amp;from=en';var p=document.getElementsByTagName('head')[0]||document.documentElement;p.insertBefore(s,p.firstChild); },0);&lt;/script&gt;" readonly="readonly" /></div><div id="EmbedNoticeDiv"><span id="EmbedNoticeSpan">Enable collaborative features and customize widget: <a href="http://www.bing.com/widget/translator" target="_blank">Bing Webmaster Portal</a></span></div><div id="EmbedFooterDiv"><span id="EmbedBackSpan"><a title="Back To Translation" href="javascript:Microsoft.Translator.FloaterOnEmbedBackClick()">Back</a></span></div></div><script type="text/javascript">// <![CDATA[ var intervalId = setInterval(function () { if (MtPopUpList) { LanguageMenu = new MtPopUpList(); var langMenu = document.getElementById(LanguageMenu_popupid); var origLangDiv = document.createElement("div"); origLangDiv.id = "OriginalLanguageDiv"; origLangDiv.innerHTML = "<span id='OriginalTextSpan'>ORIGINAL: </span><span id='OriginalLanguageSpan'></span>"; langMenu.appendChild(origLangDiv); LanguageMenu.Init('LanguageMenu', LanguageMenu_keys, LanguageMenu_values, LanguageMenu_callback, LanguageMenu_popupid); window["LanguageMenu"] = LanguageMenu; clearInterval(intervalId); } }, 1); // ]]></script></div>
APA, Harvard, Vancouver, ISO, and other styles
41

Ünver, Mahmut, Atilla Ergüzen, and Erdal Erdal. "Design of a DFS to Manage Big Data in Distance Education Environments." JUCS - Journal of Universal Computer Science 28, no. 2 (February 28, 2022): 202–24. http://dx.doi.org/10.3897/jucs.69069.

Full text
Abstract:
Information technologies have invaded every aspect of our lives. Distance education was also affected by this phase and became an accepted model of education. The evolution of education into a digital platform has also brought unexpected problems, such as the increase in internet usage, the need for new software and devices that can connect to the Internet. Perhaps the most important of these problems is the management of the large amounts of data generated when all training activities are conducted remotely. Over the past decade, studies have provided important information about the quality of training and the benefits of distance learning. However, Big Data in distance education has been studied only to a limited extent, and to date no clear single solution has been found. In this study, a Distributed File Systems (DFS) is proposed and implemented to manage big data in distance education. The implemented ecosystem mainly contains the elements Dynamic Link Library (DLL), Windows Service Routines and distributed data nodes. DLL codes are required to connect Learning Management System (LMS) with the developed system. 67.72% of the files in the distance education system have small file size (<=16 MB) and 53.10% of the files are smaller than 1 MB. Therefore, a dedicated Big Data management platform was needed to manage and archive small file sizes. The proposed system was designed with a dynamic block structure to address this shortcoming. A serverless architecture has been chosen and implemented to make the platform more robust. Moreover, the developed platform also has compression and encryption features. According to system statistics, each written file was read 8.47 times, and for video archive files, this value was 20.95. In this way, a framework was developed in the Write Once Read Many architecture. A comprehensive performance analysis study was conducted using the operating system, NoSQL, RDBMS and Hadoop. Thus, for file sizes 1 MB and 50 MB, the developed system achieves a response time of 0.95 ms and 22.35 ms, respectively, while Hadoop, a popular DFS, has 4.01 ms and 47.88 ms, respectively.
APA, Harvard, Vancouver, ISO, and other styles
42

Zhang, Bo, Ya Yao Zuo, and Zu Chuan Zhang. "Research and Improvement of the Hot Small File Storage Performance under HDFS." Advanced Materials Research 756-759 (September 2013): 1450–54. http://dx.doi.org/10.4028/www.scientific.net/amr.756-759.1450.

Full text
Abstract:
In order to deal with a large number of small files and hotspot data program in Hadoop distributed file system (HDFS)[1,, according to the exit proposal, this paper proposes a new the hotspot data processing model. The model proposals to change the block size, the introduction of efficient indexing mechanism to improve the dynamic replica management strategy and design of the new HDFS architecture to save space, speed up system processing, and enhance security.
APA, Harvard, Vancouver, ISO, and other styles
43

Cho, Joong-Yeon, Hyun-Wook Jin, Min Lee, and Karsten Schwan. "Dynamic core affinity for high-performance file upload on Hadoop Distributed File System." Parallel Computing 40, no. 10 (December 2014): 722–37. http://dx.doi.org/10.1016/j.parco.2014.07.005.

Full text
APA, Harvard, Vancouver, ISO, and other styles
44

Su, Kai Kai, Wen Sheng Xu, and Jian Yong Li. "Research on Mass Manufacturing Resource Sensory Data Management Based on Hadoop." Key Engineering Materials 693 (May 2016): 1880–85. http://dx.doi.org/10.4028/www.scientific.net/kem.693.1880.

Full text
Abstract:
Aiming at the management issue of mass sensory data from the manufacturing resources in cloud manufacturing, a management method for mass sensory data based on Hadoop is proposed. Firstly, characteristics of sensory data in cloud manufacturing are analyzed, meanings and advantages of Internet of Things and cloud computing are elaborated. Then the structure of the cloud manufacturing service platform is proposed based on Hadoop, the information model of manufacturing resources in cloud manufacturing is defined, and the data cloud in the cloud manufacturing service platform is designed. The distributed storage of mass sensory data is implemented and a universal distributed computing model of mass sensory data is established based on the characteristics of Hadoop Distributed File System (HDFS).
APA, Harvard, Vancouver, ISO, and other styles
45

Shen, Bo, Wei Huang, and Xiaodi Li. "Design and Construction of Distributed JavaScript Parsing System." International Journal of Interdisciplinary Telecommunications and Networking 6, no. 4 (October 2014): 1–14. http://dx.doi.org/10.4018/ijitn.2014100101.

Full text
Abstract:
With the rapid development of the Internet technology, JS (short for JavaScript), as one of the representative of script languages, which is very powerful, is becoming more and more popular to the developers and users. But JS programming is more complex than usual static technology. In the field of search engine and information acquisition, it's very difficult to get the information hidden in script code. In this paper, the authors design a distributed system for parsing the JS code embedded in HTML file and retrieving the underling information. the authors describe how to extract JS codes from HTML file and parse them. Also, they introduce a task scheduling algorithm for the JS parsing system by employing Hadoop distributed computing technology. The experimental results indicate that the proposed algorithm and system can achieve a reasonable task scheduling efficiency and parse JS codes rapidly.
APA, Harvard, Vancouver, ISO, and other styles
46

Jayakumar, N., and A. M. Kulkarni. "A Simple Measuring Model for Evaluating the Performance of Small Block Size Accesses in Lustre File System." Engineering, Technology & Applied Science Research 7, no. 6 (December 18, 2017): 2313–18. http://dx.doi.org/10.48084/etasr.1557.

Full text
Abstract:
Storage performance is one of the vital characteristics of a big data environment. Data throughput can be increased to some extent using storage virtualization and parallel data paths. Technology has enhanced the various SANs and storage topologies to be adaptable for diverse applications that improve end to end performance. In big data environments the mostly used file systems are HDFS (Hadoop Distributed File System) and Lustre. There are environments in which both HDFS and Lustre are connected, and the applications directly work on Lustre. In Lustre architecture with out-of-band storage virtualization system, the separation of data path from metadata path is acceptable (and even desirable) for large files since one MDT (Metadata Target) open RPC is typically a small fraction of the total number of read or write RPCs. This hurts small file performance significantly when there is only a single read or write RPC for the file data. Since applications require data for processing and considering in-situ architecture which brings data or metadata close to applications for processing, how the in-situ processing can be exploited in Lustre is the domain of this dissertation work. The earlier research exploited Lustre supporting in-situ processing when Hadoop/MapReduce is integrated with Lustre, but still, the scope of performance improvement existed in Lustre. The aim of the research is to check whether it is feasible and beneficial to move the small files to the MDT so that additional RPCs and I/O overhead can be eliminated, and read/write performance of Lustre file system can be improved.
APA, Harvard, Vancouver, ISO, and other styles
47

Yue, Hang. "Unstructured Healthcare Data Archiving and Retrieval Using Hadoop and Drill." International Journal of Big Data and Analytics in Healthcare 3, no. 2 (July 2018): 28–44. http://dx.doi.org/10.4018/ijbdah.2018070103.

Full text
Abstract:
A healthcare hybrid Hadoop ecosystem is analyzed for unstructured healthcare data archives. This healthcare hybrid Hadoop ecosystem is composed of some components such as Pig, Hive, Sqoop and Zoopkeeper, Hadoop Distributed File System (HDFS), MapReduce and HBase. Also, Apache Drill is applied for unstructured healthcare data retrieval. This article will discuss the combination of Hadoop and Drill for data analysis applications. Based on the analysis of Hadoop components, (including HBase design) and the case studies of Drill query design regarding different unstructured healthcare data, the Hadoop ecosystem and Drill are valid tools to integrate and access voluminous complex healthcare data. They can improve the healthcare systems, achieve savings on patient care costs, optimize the healthcare supply chain and infer useful knowledge from noisy and heterogeneous healthcare data sources.
APA, Harvard, Vancouver, ISO, and other styles
48

ABDALWAHID, Shadan Mohammed Jihad, Raghad Zuhair YOUSIF, and Shahab Wahhab KAREEM. "ENHANCING APPROACH USING HYBRID PAILLER AND RSA FOR INFORMATION SECURITY IN BIGDATA." Applied Computer Science 15, no. 4 (December 30, 2019): 63–74. http://dx.doi.org/10.35784/acs-2019-30.

Full text
Abstract:
The amount of data processed and stored in the cloud is growing dramatically. The traditional storage devices at both hardware and software levels cannot meet the requirement of the cloud. This fact motivates the need for a plat¬form which can handle this problem. Hadoop is a deployed platform proposed to overcome this big data problem which often uses MapReduce architecture to process vast amounts of data of the cloud system. Hadoop has no strategy to assure the safety and confidentiality of the files saved inside the Hadoop distributed File system (HDFS). In the cloud, the protection of sensitive data is a critical issue in which data encryption schemes plays avital rule. This research proposes a hybrid system between two well-known asymmetric key cryptosystems (RSA, and Paillier) to encrypt the files stored in HDFS. Thus before saving data in HDFS, the proposed cryptosystem is utilized for encrypting the data. Each user of the cloud might upload files in two ways, non-safe or secure. The hybrid system shows higher computational complexity and less latency in comparison to the RSA cryptosystem alone.
APA, Harvard, Vancouver, ISO, and other styles
49

Hanisah Kamaruzaman, Siti, Wan Nor Shuhadah Wan Nik, Mohamad Afendee Mohamed, and Zarina Mohamad. "Design and Implementation of Data-at-Rest Encryption for Hadoop." International Journal of Engineering & Technology 7, no. 2.15 (April 6, 2018): 54. http://dx.doi.org/10.14419/ijet.v7i2.15.11212.

Full text
Abstract:
The manuscript should contain an abstract. The security aspects in Cloud computing is paramount in order to ensure high quality of Service Level Agreement (SLA) to the cloud computing customers. This issue is more apparent when very large amount of data is involved in this emerging computing environment. Hadoop is an open source software framework that supports large data sets storage and processing in a distributed computing environment and well-known implementation of Map Reduce. Map Reduce is one common programming model to process and handle a large amount of data, specifically in big data analysis. Further, Hadoop Distributed File System (HDFS) is a distributed, scalable and portable file system that is written in java for Hadoop framework. However, the main problem is that the data at rest is not secure where intruders can steal or converts the data stored in this computing environment. Therefore, the AES encryption algorithm has been implemented in HDFS to ensure the security of data stored in HDFS. It is shown that the implementation of AES encryption algorithm is capable to secure data stored in HDFS to some extent.
APA, Harvard, Vancouver, ISO, and other styles
50

Joshi, Brijesh Y., Poornashankar ., and Deepali Sawai. "Performance Tuning Of Apache Spark Framework In Big Data Processing with Respect To Block Size And Replication Factor." SAMRIDDHI : A Journal of Physical Sciences, Engineering and Technology 14, no. 02 (June 30, 2022): 152–58. http://dx.doi.org/10.18090/samriddhi.v14i02.4.

Full text
Abstract:
Apache Spark has recently become the most popular big data analytics framework. Default configurations are provided by Spark. HDFS stands for Hadoop Distributed File System. It means the large files will be physically stored on multiple nodes in a distributed fashion. The block size determines how large files are distributed, while the replication factor determines how reliable the files are. If there is just one copy of each block for a given file and the node fails, the data in the files become unreadable. The block size and replication factor are configurable per file. The results and analysis of the experimental study to determine the efficiency of adjusting the settings of tuning Apache Spark for minimizing application execution time as compared to standard values are described in this paper. Based on a vast number of studies, we employed a trial-anderror strategy to fine-tune these values. We chose two workloads to test the Apache framework for comparative analysis: Wordcount and Terasort. We used the elapsed time to evaluate the same.
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography