Academic literature on the topic 'Hadoop Distributed File System'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the lists of relevant articles, books, theses, conference reports, and other scholarly sources on the topic 'Hadoop Distributed File System.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Journal articles on the topic "Hadoop Distributed File System"

1

Giri, Pratit Raj, and Gajendra Sharma. "Apache Hadoop Architecture, Applications, and Hadoop Distributed File System." Semiconductor Science and Information Devices 4, no. 1 (May 18, 2022): 14. http://dx.doi.org/10.30564/ssid.v4i1.4619.

Full text
Abstract:
The data and internet are highly growing which causes problems in management of the big-data. For these kinds of problems, there are many software frameworks used to increase the performance of the distributed system. This software is used for the availability of large data storage. One of the most beneficial software frameworks used to utilize data in distributed systems is Hadoop. This paper introduces Apache Hadoop architecture, components of Hadoop, their significance in managing vast volumes of data in a distributed system. Hadoop Distributed File System enables the storage of enormous chunks of data over a distributed network. Hadoop Framework maintains fsImage and edits files, which supports the availability and integrity of data. This paper includes cases of Hadoop implementation, such as monitoring weather, processing bioinformatics.
APA, Harvard, Vancouver, ISO, and other styles
2

Bartus, Paul. "Using Hadoop Distributed and Deduplicated File System (HD2FS) in Astronomy." Proceedings of the International Astronomical Union 15, S367 (December 2019): 464–66. http://dx.doi.org/10.1017/s1743921321000387.

Full text
Abstract:
AbstractDuring the last years, the amount of data has skyrocketed. As a consequence, the data has become more expensive to store than to generate. The storage needs for astronomical data are also following this trend. Storage systems in Astronomy contain redundant copies of data such as identical files or within sub-file regions. We propose the use of the Hadoop Distributed and Deduplicated File System (HD2FS) in Astronomy. HD2FS is a deduplication storage system that was created to improve data storage capacity and efficiency in distributed file systems without compromising Input/Output performance. HD2FS can be developed by modifying existing storage system environments such as the Hadoop Distributed File System. By taking advantage of deduplication technology, we can better manage the underlying redundancy of data in astronomy and reduce the space needed to store these files in the file systems, thus allowing for more capacity per volume.
APA, Harvard, Vancouver, ISO, and other styles
3

Wu, Zhen Quan, and Bing Pan. "Research of Distributed Search Engine Based on Hadoop." Applied Mechanics and Materials 631-632 (September 2014): 171–74. http://dx.doi.org/10.4028/www.scientific.net/amm.631-632.171.

Full text
Abstract:
Combined with the Map/Reduce programming model, the Hadoop distributed file system, Lucene inverted file indexing technology and ICTCLAS Chinese word segmentation technology, we designed and implemented a distributed search engine system based on Hadoop. By testing of the system in the four-node Hadoop cluster environment, experimental results show that Hadoop platform can be used in search engines to improve system performance, reliability and scalability.
APA, Harvard, Vancouver, ISO, and other styles
4

Gemayel, Nader. "Analyzing Google File System and Hadoop Distributed File System." Research Journal of Information Technology 8, no. 3 (March 1, 2016): 66–74. http://dx.doi.org/10.3923/rjit.2016.66.74.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Kapil, Gayatri, Alka Agrawal, Abdulaziz Attaallah, Abdullah Algarni, Rajeev Kumar, and Raees Ahmad Khan. "Attribute based honey encryption algorithm for securing big data: Hadoop distributed file system perspective." PeerJ Computer Science 6 (February 17, 2020): e259. http://dx.doi.org/10.7717/peerj-cs.259.

Full text
Abstract:
Hadoop has become a promising platform to reliably process and store big data. It provides flexible and low cost services to huge data through Hadoop Distributed File System (HDFS) storage. Unfortunately, absence of any inherent security mechanism in Hadoop increases the possibility of malicious attacks on the data processed or stored through Hadoop. In this scenario, securing the data stored in HDFS becomes a challenging task. Hence, researchers and practitioners have intensified their efforts in working on mechanisms that would protect user’s information collated in HDFS. This has led to the development of numerous encryption-decryption algorithms but their performance decreases as the file size increases. In the present study, the authors have enlisted a methodology to solve the issue of data security in Hadoop storage. The authors have integrated Attribute Based Encryption with the honey encryption on Hadoop, i.e., Attribute Based Honey Encryption (ABHE). This approach works on files that are encoded inside the HDFS and decoded inside the Mapper. In addition, the authors have evaluated the proposed ABHE algorithm by performing encryption-decryption on different sizes of files and have compared the same with existing ones including AES and AES with OTP algorithms. The ABHE algorithm shows considerable improvement in performance during the encryption-decryption of files.
APA, Harvard, Vancouver, ISO, and other styles
6

Han, Yong Qi, Yun Zhang, and Shui Yu. "Research of Cloud Storage Based on Hadoop Distributed File System." Applied Mechanics and Materials 513-517 (February 2014): 2472–75. http://dx.doi.org/10.4028/www.scientific.net/amm.513-517.2472.

Full text
Abstract:
This paper discusses the application of cloud computing technology to store large amount of data in agricultural remote training video and other multimedia data, using four computers to build a Hadoop cloud platform, focus on the Hadoop Distributed File System (HDFS) principle and file storage, to achieve massive agricultural multimedia data storage.
APA, Harvard, Vancouver, ISO, and other styles
7

Wu, Xing, and Mengqi Pei. "Image File Storage System Resembling Human Memory." International Journal of Software Science and Computational Intelligence 7, no. 2 (April 2015): 70–84. http://dx.doi.org/10.4018/ijssci.2015040104.

Full text
Abstract:
Big Data era is characterized by the explosive increase of image files on the Internet, massive image files bring great challenges to storage. It is required not only the storage efficiency of massive image files but also the accuracy and robustness of massive image file management and retrieval. To meet these requirements, distributed image file storage system based on cognitive theory is proposed. According to the human brain function, humans can correlate image files with thousands of distinct object and action categories and sorted store these files. Thus the authors proposed to sorted store image files according to different visual categories based on human cognition to resemble human memory. The experimental results demonstrate that the proposed distributed image file system (DIFS) based on cognition performs better than Hadoop Distributed File System (HDFS) and FastDFS.
APA, Harvard, Vancouver, ISO, and other styles
8

Hussain, G. Fayaz, and Tarakeswar T. "File Systems and Hadoop Distributed File System in Big Data." IJARCCE 5, no. 12 (December 30, 2016): 36–40. http://dx.doi.org/10.17148/ijarcce.2016.51207.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Ahlawat, Deepak, and Deepali Gupta. "Big Data Clustering and Hadoop Distributed File System Architecture." Journal of Computational and Theoretical Nanoscience 16, no. 9 (September 1, 2019): 3824–29. http://dx.doi.org/10.1166/jctn.2019.8256.

Full text
Abstract:
Due to advancement in the technological world, there is a great surge in data. The main sources of generating such a large amount of data are social websites, internet sites etc. The large data files are combined together to create a big data architecture. Managing the data file in such a large volume is not easy. Therefore, modern techniques are developed to manage bulk data. To arrange and utilize such big data, Hadoop Distributed File System (HDFS) architecture from Hadoop was presented in the early stage of 2015. This architecture is used when traditional methods are insufficient to manage the data. In this paper, a novel clustering algorithm is implemented to manage a large amount of data. The concepts and frames of Big Data are studied. A novel algorithm is developed using the K means and cosine-based similarity clustering in this paper. The developed clustering algorithm is evaluated using the precision and recall parameters. The prominent results are obtained which successfully manages the big data issue.
APA, Harvard, Vancouver, ISO, and other styles
10

Awasthi, Yogesh. "Enhancing approach for information security in Hadoop." Ukrainian Journal of Educational Studies and Information Technology 8, no. 1 (March 27, 2020): 39–49. http://dx.doi.org/10.32919/uesit.2020.01.04.

Full text
Abstract:
Hadoop, is one of the ongoing patterns in innovation which is utilized as a structure for the distributed storage, is an open-source appropriated figuring structure actualized in Java and comprises two modules that are MapReduce and Hadoop Distributed File System (HDFS). The MapReduce is intended for handling with enormous informational collections, it enables the clients to utilize a huge number of item machines in parallel adequately, by just characterizing guides and lessening capacities, the client can prepare huge amounts of information, HDFS is for putting away information on appropriated bunches of machines. Hadoop is normally utilized in a huge bunch or an open cloud administration, for example, Yahoo!, Facebook, Twitter, and Amazon. The versatility of Hadoop has been appeared by the notoriety of these applications, yet it is structured without security for putting away information. Using the Hadoop package, a proposed secure cloud computing system has been designed, so Hadoop would use the area to establish and enhance the security for saving and managing user data. Apache had also produced a software tool termed Hadoop to overcome this big data problem which often uses MapReduce architecture to process vast amounts of data. Hadoop has no strategy to assure the security and privacy of the files stored in Hadoop distributed file system (HDFS). As an encryption scheme of the files stored in HDFS, an asymmetric key cryptosystem is advocated. Thus before saving data in HDFS, the proposed hybrid based on RSA, and Rabin cipher would encrypt the data. The user of the cloud might upload files in two ways, non-safe or secure upload files.
APA, Harvard, Vancouver, ISO, and other styles
More sources

Dissertations / Theses on the topic "Hadoop Distributed File System"

1

Lorenzetto, Luca <1988&gt. "Evaluating performance of Hadoop Distributed File System." Master's Degree Thesis, Università Ca' Foscari Venezia, 2014. http://hdl.handle.net/10579/4773.

Full text
Abstract:
In recent years, a huge quantity of data produced by multiple sources has appeared. Dealing with this data has arisen the so called "big data problem", which can be faced only with new computing paradigms and platforms. Many vendors compete in this field, but at this day the de-facto standard platform for big-data is the opensource framework Apache Hadoop . Inspired by Google's private cluster platform, some indipendent developers created Hadoop and, following the structure published by Google's engineering team, a complete set of components for big data elaboration has been developed. One of this components is the Hadoop Distributed File System, one of the core components. In this thesis work, we will analyze its performance and identify some action points that can be tuned to improve its behavior in a real implementation.
APA, Harvard, Vancouver, ISO, and other styles
2

Polato, Ivanilton. "Energy savings and performance improvements with SSDs in the Hadoop Distributed File System." Universidade de São Paulo, 2016. http://www.teses.usp.br/teses/disponiveis/45/45134/tde-31102016-155908/.

Full text
Abstract:
Energy issues gathered strong attention over the past decade, reaching IT data processing infrastructures. Now, they need to cope with such responsibility, adjusting existing platforms to reach acceptable performance while promoting energy consumption reduction. As the de facto platform for Big Data, Apache Hadoop has evolved significantly over the last years, with more than 60 releases bringing new features. By implementing the MapReduce programming paradigm and leveraging HDFS, its distributed file system, Hadoop has become a reliable and fault tolerant middleware for parallel and distributed computing over large datasets. Nevertheless, Hadoop may struggle under certain workloads, resulting in poor performance and high energy consumption. Users increasingly demand that high performance computing solutions address sustainability and limit energy consumption. In this thesis, we introduce HDFSH, a hybrid storage mechanism for HDFS, which uses a combination of Hard Disks and Solid-State Disks to achieve higher performance while saving power in Hadoop computations. HDFSH brings, to the middleware, the best from HDs (affordable cost per GB and high storage capacity) and SSDs (high throughput and low energy consumption) in a configurable fashion, using dedicated storage zones for each storage device type. We implemented our mechanism as a block placement policy for HDFS, and assessed it over six recent releases of Hadoop with different architectural properties. Results indicate that our approach increases overall job performance while decreasing the energy consumption under most hybrid configurations evaluated. Our results also showed that, in many cases, storing only part of the data in SSDs results in significant energy savings and execution speedups
Ao longo da última década, questões energéticas atraíram forte atenção da sociedade, chegando às infraestruturas de TI para processamento de dados. Agora, essas infraestruturas devem se ajustar a essa responsabilidade, adequando plataformas existentes para alcançar desempenho aceitável enquanto promovem a redução no consumo de energia. Considerado um padrão para o processamento de Big Data, o Apache Hadoop tem evoluído significativamente ao longo dos últimos anos, com mais de 60 versões lançadas. Implementando o paradigma de programação MapReduce juntamente com o HDFS, seu sistema de arquivos distribuídos, o Hadoop tornou-se um middleware tolerante a falhas e confiável para a computação paralela e distribuída para grandes conjuntos de dados. No entanto, o Hadoop pode perder desempenho com determinadas cargas de trabalho, resultando em elevado consumo de energia. Cada vez mais, usuários exigem que a sustentabilidade e o consumo de energia controlado sejam parte intrínseca de soluções de computação de alto desempenho. Nesta tese, apresentamos o HDFSH, um sistema de armazenamento híbrido para o HDFS, que usa uma combinação de discos rígidos e discos de estado sólido para alcançar maior desempenho, promovendo economia de energia em aplicações usando Hadoop. O HDFSH traz ao middleware o melhor dos HDs (custo acessível por GB e grande capacidade de armazenamento) e SSDs (alto desempenho e baixo consumo de energia) de forma configurável, usando zonas de armazenamento dedicadas para cada dispositivo de armazenamento. Implementamos nosso mecanismo como uma política de alocação de blocos para o HDFS e o avaliamos em seis versões recentes do Hadoop com diferentes arquiteturas de software. Os resultados indicam que nossa abordagem aumenta o desempenho geral das aplicações, enquanto diminui o consumo de energia na maioria das configurações híbridas avaliadas. Os resultados também mostram que, em muitos casos, armazenar apenas uma parte dos dados em SSDs resulta em economia significativa de energia e aumento na velocidade de execução
APA, Harvard, Vancouver, ISO, and other styles
3

Musatoiu, Mihai. "An approach to choosing the right distributed file system : Microsoft DFS vs. Hadoop DFS." Thesis, Blekinge Tekniska Högskola, Institutionen för programvaruteknik, 2015. http://urn.kb.se/resolve?urn=urn:nbn:se:bth-844.

Full text
Abstract:
Context. An important goal of most IT groups is to manage server resources in such a way that their users are provided with fast, reliable and secure access to files. The modern needs of organizations imply that resources are often distributed geographically, asking for new design solutions for the file systems to remain highly available and efficient. This is where distributed file systems (DFSs) come into the picture. A distributed file system (DFS), as opposed to a "classical", local, file system, is accessible across some kind of network and allows clients to access files remotely as if they were stored locally. Objectives. This paper has the goal of comparatively analyzing two distributed file systems, Microsoft DFS (MSDFS) and Hadoop DFS (HDFS). The two systems come from different "worlds" (proprietary - Microsoft DFS - vs. open-source - Hadoop DFS); the abundance of solutions and the variety of choices that exist today make such a comparison more relevant. Methods. The comparative analysis is done on a cluster of 4 computers running dual-installations of Microsoft Windows Server 2012 R2 (the MSDFS environment) and Linux Ubuntu 14.04 (the HDFS environment). The comparison is done on read and write operations on files and sets of files of increasing sizes, as well as on a set of key usage scenarios. Results. Comparative results are produced for reading and writing operations of files of increasing size - 1 MB, 2 MB, 4 MB and so on up to 4096 MB - and of sets of small files (64 KB each) amounting to totals of 128 MB, 256 MB and so on up to 4096 MB. The results expose the behavior of the two DFSs on different types of stressful activities (when the size of the transferred file increases, as well as when the quantity of data is divided into (tens of) thousands of many small files). The behavior in the case of key usage scenarios is observed and analyzed. Conclusions. HDFS performs better at writing large files, while MSDFS is better at writing many small files. At read operations, the two show similar performance, with a slight advantage for MSDFS. In the key usage scenarios, HDFS shows more flexibility, but MSDFS could be the better choice depending on the needs of the users (for example, most of the common functions can be configured through the graphical user interface).
APA, Harvard, Vancouver, ISO, and other styles
4

Bhat, Adithya. "RDMA-based Plugin Design and Profiler for Apache and Enterprise Hadoop Distributed File system." The Ohio State University, 2015. http://rave.ohiolink.edu/etdc/view?acc_num=osu1440188090.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Cheng, Lu. "Concentric layout, a new scientific data layout for matrix data set in Hadoop file system." Master's thesis, University of Central Florida, 2010. http://digital.library.ucf.edu/cdm/ref/collection/ETD/id/4545.

Full text
Abstract:
The data generated by scientific simulation, sensor, monitor or optical telescope has increased with dramatic speed. In order to analyze the raw data speed and space efficiently, data pre-process operation is needed to achieve better performance in data analysis phase. Current research shows an increasing tread of adopting MapReduce framework for large scale data processing. However, the data access patterns which generally applied to scientific data set are not supported by current MapReduce framework directly. The gap between the requirement from analytics application and the property of MapReduce framework motivates us to provide support for these data access patterns in MapReduce framework. In our work, we studied the data access patterns in matrix files and proposed a new concentric data layout solution to facilitate matrix data access and analysis in MapReduce framework. Concentric data layout is a data layout which maintains the dimensional property in chunk level. Contrary to the continuous data layout which adopted in current Hadoop framework by default, concentric data layout stores the data from the same sub-matrix into one chunk. This matches well with the matrix operations like computation. The concentric data layout preprocesses the data beforehand, and optimizes the afterward run of MapReduce application. The experiments indicate that the concentric data layout improves the overall performance, reduces the execution time by 38% when the file size is 16 GB, also it relieves the data overhead phenomenon and increases the effective data retrieval rate by 32% on average.
ID: 029051151; System requirements: World Wide Web browser and PDF reader.; Mode of access: World Wide Web.; Thesis (M.S.)--University of Central Florida, 2010.; Includes bibliographical references (p. 56-58).
M.S.
Masters
Department of Electrical Engineering and Computer Science
Engineering
APA, Harvard, Vancouver, ISO, and other styles
6

Sodhi, Bir Apaar Singh. "DATA MINING: TRACKING SUSPICIOUS LOGGING ACTIVITY USING HADOOP." CSUSB ScholarWorks, 2016. https://scholarworks.lib.csusb.edu/etd/271.

Full text
Abstract:
In this modern rather interconnected era, an organization’s top priority is to protect itself from major security breaches occurring frequently within a communicational environment. But, it seems, as if they quite fail in doing so. Every week there are new headlines relating to information being forged, funds being stolen and corrupt usage of credit card and so on. Personal computers are turned into “zombie machines” by hackers to steal confidential and financial information from sources without disclosing hacker’s true identity. These identity thieves rob private data and ruin the very purpose of privacy. The purpose of this project is to identify suspicious user activity by analyzing a log file which then later can help an investigation agency like FBI to track and monitor anonymous user(s) who seek for weaknesses to attack vulnerable parts of a system to have access of it. The project also emphasizes the potential damage that a malicious activity could have on the system. This project uses Hadoop framework to search and store log files for logging activities and then performs a ‘Map Reduce’ programming code to finally compute and analyze the results.
APA, Harvard, Vancouver, ISO, and other styles
7

Johannsen, Fabian, and Mattias Hellsing. "Hadoop Read Performance During Datanode Crashes." Thesis, Linköpings universitet, Institutionen för datavetenskap, 2016. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-130466.

Full text
Abstract:
This bachelor thesis evaluates the impact of datanode crashes on the performance of the read operations of a Hadoop Distributed File System, HDFS. The goal is to better understand how datanode crashes, as well as how certain parameters, affect the  performance of the read operation by looking at the execution time of the get command. The parameters used are the number of crashed nodes, block size and file size. By setting up a Linux test environment with ten virtual machines and Hadoop installed on them and running tests on it, data has been collected in order to answer these questions. From this data the average execution time and standard deviation of the get command was calculated. The network activity during the tests was also measured. The results showed that neither the number of crashed nodes nor block size had any significant effect on the execution time. It also demonstrated that the execution time of the get command was not directly proportional to the size of the fetched file. The execution time was up to 4.5 times as long when the file size was four times as large. A four times larger file did sometimes result in more than a four times as long execution time. Although, the consequences of a datanode crash while fetching a small file appear to be much greater than with a large file. The average execution time increased by up to 36% when a large file was fetched but it increased by as much as 85% when fetching a small file.
APA, Harvard, Vancouver, ISO, and other styles
8

Careres, Gutierrez Franco Jesus. "Towards an S3-based, DataNode-lessimplementation of HDFS." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-291125.

Full text
Abstract:
The relevance of data processing and analysis today cannot be overstated. The convergence of several technological advancements has fostered the proliferation of systems and infrastructure that together support the generation, transmission, and storage of nearly 15,000 exabytes of digital, analyzabledata. The Hadoop Distributed File System (HDFS) is an open source system designed to leverage the storage capacity of thousands of servers, and is the file system component of an entire ecosystem of tools to transform and analyze massive data sets. While HDFS is used by organizations of all sizes, smaller ones are not as well-suited to organically grow their clusters to accommodate their ever-expanding data sets and processing needs. This is because larger clusters are concomitant with higher investment in servers, greater rates of failures to recover from, and the need to allocate moreresources in maintenance and administration tasks. This poses a potential limitation down the road for organizations, and it might even deter some from venturing into the data world altogether. This thesis addresses this matter by presenting a novel implementation of HopsFS, an already improved version of HDFS, that requires no user-managed data servers. Instead, it relies on S3, a leading object storage service, for all its user-data storage needs. We compared the performance of both S3-based and regular clusters and found that such architecture is not only feasible, but also perfectly viable in terms of read and write throughputs, in some cases even outperforming its original counterpart. Furthermore, our solution provides first-class elasticity, reliability, and availability, all while being remarkably more affordable.
Relevansen av databehandling och analys idag kan inte överdrivas. Konvergensen av flera tekniska framsteg har främjat spridningen av system och infrastruk-tur som tillsammans stöder generering, överföring och lagring av nästan 15,000 exabyte digitala, analyserbara data. Hadoop Distributed File System (HDFS) är ett öppen källkodssystem som är utformat för att utnyttja lagringskapaciteten hos tusentals servrar och är filsystemkomponenten i ett helt ekosystem av verktyg för att omvandla och analysera massiva datamängder. HDFS används av organisationer i alla storlekar, men mindre är inte lika lämpade för att organiskt växa sina kluster för att tillgodose deras ständigt växande datamängder och behandlingsbehov. Detta beror på att större kluster är samtidigt med högre investeringar i servrar, större misslyckanden att återhämta sig från och behovet av att avsätta mer resurser i underhålls- och administrationsuppgifter. Detta utgör en potentiell begränsning på vägen för organisationer, och det kan till och med avskräcka en del från att våga sig helt in i datavärlden. Denna avhandling behandlar denna fråga genom att presentera en ny implementering av HopsFS, en redan förbättrad version av HDFS, som inte kräver några användarhanterade dataservrar. Istället förlitar sig det på S3, en ledande objektlagringstjänst, för alla dess användardata lagringsbehov. Vi jämförde prestandan för både S3-baserade och vanliga kluster och fann att sådan arkitektur inte bara är möjlig, utan också helt livskraftig när det gäller läs- och skrivgenomströmningar, i vissa fall till och med bättre än dess ursprungliga motsvarighet. Dessutom ger vår lösning förstklassig elasticitet, tillförlitlighet och tillgänglighet, samtidigt som den är anmärkningsvärt billigare.
APA, Harvard, Vancouver, ISO, and other styles
9

Caceres, Gutierrez Franco Jesus. "Towards an S3-based, DataNode-less implementation of HDFS." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-291125.

Full text
Abstract:
The relevance of data processing and analysis today cannot be overstated. The convergence of several technological advancements has fostered the proliferation of systems and infrastructure that together support the generation, transmission, and storage of nearly 15,000 exabytes of digital, analyzabledata. The Hadoop Distributed File System (HDFS) is an open source system designed to leverage the storage capacity of thousands of servers, and is the file system component of an entire ecosystem of tools to transform and analyze massive data sets. While HDFS is used by organizations of all sizes, smaller ones are not as well-suited to organically grow their clusters to accommodate their ever-expanding data sets and processing needs. This is because larger clusters are concomitant with higher investment in servers, greater rates of failures to recover from, and the need to allocate moreresources in maintenance and administration tasks. This poses a potential limitation down the road for organizations, and it might even deter some from venturing into the data world altogether. This thesis addresses this matter by presenting a novel implementation of HopsFS, an already improved version of HDFS, that requires no user-managed data servers. Instead, it relies on S3, a leading object storage service, for all its user-data storage needs. We compared the performance of both S3-based and regular clusters and found that such architecture is not only feasible, but also perfectly viable in terms of read and write throughputs, in some cases even outperforming its original counterpart. Furthermore, our solution provides first-class elasticity, reliability, and availability, all while being remarkably more affordable.
Relevansen av databehandling och analys idag kan inte överdrivas. Konvergensen av flera tekniska framsteg har främjat spridningen av system och infrastruk-tur som tillsammans stöder generering, överföring och lagring av nästan 15,000 exabyte digitala, analyserbara data. Hadoop Distributed File System (HDFS) är ett öppen källkodssystem som är utformat för att utnyttja lagringskapaciteten hos tusentals servrar och är filsystemkomponenten i ett helt ekosystem av verktyg för att omvandla och analysera massiva datamängder. HDFS används av organisationer i alla storlekar, men mindre är inte lika lämpade för att organiskt växa sina kluster för att tillgodose deras ständigt växande datamängder och behandlingsbehov. Detta beror på att större kluster är samtidigt med högre investeringar i servrar, större misslyckanden att återhämta sig från och behovet av att avsätta mer resurser i underhålls- och administrationsuppgifter. Detta utgör en potentiell begränsning på vägen för organisationer, och det kan till och med avskräcka en del från att våga sig helt in i datavärlden. Denna avhandling behandlar denna fråga genom att presentera en ny implementering av HopsFS, en redan förbättrad version av HDFS, som inte kräver några användarhanterade dataservrar. Istället förlitar sig det på S3, en ledande objektlagringstjänst, för alla dess användardata lagringsbehov. Vi jämförde prestandan för både S3-baserade och vanliga kluster och fann att sådan arkitektur inte bara är möjlig, utan också helt livskraftig när det gäller läs- och skrivgenomströmningar, i vissa fall till och med bättre än dess ursprungliga motsvarighet. Dessutom ger vår lösning förstklassig elasticitet, tillförlitlighet och tillgänglighet, samtidigt som den är anmärkningsvärt billigare.
APA, Harvard, Vancouver, ISO, and other styles
10

Benkő, Krisztián. "Zpracování velkých dat z rozsáhlých IoT sítí." Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2019. http://www.nusl.cz/ntk/nusl-403820.

Full text
Abstract:
The goal of this diploma thesis is to design and develop a system for collecting, processing and storing data from large IoT networks. The developed system introduces a complex solution able to process data from various IoT networks using Apache Hadoop ecosystem. The data are real-time processed and stored in a NoSQL database, but the data are also stored  in the file system for a potential later processing. The system is optimized and tested using data from IQRF network. The data stored in the NoSQL database are visualized and the system periodically generates derived predictions. Users are connected to this system via an information system, which is able to automatically generate notifications when monitored values are out of range.
APA, Harvard, Vancouver, ISO, and other styles
More sources

Books on the topic "Hadoop Distributed File System"

1

Ouleyey, Thomas. A LAN distributed file system. Oxford: Oxford Brookes University, 2004.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
2

White, Tom. Hadoop: The Definitive Guide. 2nd ed. Sebastopol: O'Reilly, 2010.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
3

Kumar, Vavilapalli Vinod, Eadline Doug 1956-, Niemiec Joseph, and Markham Jeff, eds. Apache Hadoop YARN: Moving beyond MapReduce and batch processing with Apache Hadoop 2. Upper Saddle River, NJ: Addison-Wesley, 2014.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
4

Siegel, Alexander W. Deceit: A flexible distributed file system. Ithaca, NY: Dept. of Computer Science, Cornell University, 1990.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
5

Kistler, James Jay. Disconnected Operation in a Distributed File System. Berlin, Heidelberg: Springer Berlin Heidelberg, 1995. http://dx.doi.org/10.1007/3-540-60627-0.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Kistler, James Jay. Disconnected operation in a distributed file system. New York: Springer, 1996.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
7

Kistler, James Jay. Disconnected operation in a distributed file system. Berlin: Springer, 1995.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
8

Gerald, Popek, and Walker Bruce James 1951-, eds. The LOCUS distributed system architecture. Cambridge, Mass: MIT Press, 1985.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
9

Wilkinson, Paul, Lars George, Jan Kunigk, and Ian Buss. Architecting Modern Data Platforms: A Guide to Enterprise Hadoop at Scale. Edited by Nicole Tache and Michele Cronin. Beijing: O'Reilly Media, 2019.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
10

Paul-Robert, Hering, and International Business Machines Corporation. International Technical Support Organization, eds. z/OS distributed file service zSeries File System implementation z/OS V1R11. [Poughkeepsie, NY]: IBM, International Technical Support Organization, 2010.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
More sources

Book chapters on the topic "Hadoop Distributed File System"

1

Xu, JunWu, and JunLing Liang. "Research on Distributed File System with Hadoop." In Communications in Computer and Information Science, 148–55. Berlin, Heidelberg: Springer Berlin Heidelberg, 2012. http://dx.doi.org/10.1007/978-3-642-35211-9_19.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Qu, Wei, Siyao Cheng, and Hongzhi Wang. "Efficient File Accessing Techniques on Hadoop Distributed File Systems." In Communications in Computer and Information Science, 350–61. Singapore: Springer Singapore, 2016. http://dx.doi.org/10.1007/978-981-10-2053-7_31.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Bass, Len, Rick Kazman, and Ipek Ozkaya. "Developing Architectural Documentation for the Hadoop Distributed File System." In IFIP Advances in Information and Communication Technology, 50–61. Berlin, Heidelberg: Springer Berlin Heidelberg, 2011. http://dx.doi.org/10.1007/978-3-642-24418-6_4.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Bari, Wasim, Ahmed Shiraz Memon, and Bernd Schuller. "Enhancing UNICORE Storage Management Using Hadoop Distributed File System." In Lecture Notes in Computer Science, 345–52. Berlin, Heidelberg: Springer Berlin Heidelberg, 2010. http://dx.doi.org/10.1007/978-3-642-14122-5_39.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Lakshmi Siva Rama Krishna, T., T. Ragunathan, and Sudheer Kumar Battula. "Customized Web User Interface for Hadoop Distributed File System." In Advances in Intelligent Systems and Computing, 567–76. New Delhi: Springer India, 2015. http://dx.doi.org/10.1007/978-81-322-2523-2_55.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Rajput, Dilip, Ankur Goyal, and Abhishek Tripathi. "Priority-Based Replication Management for Hadoop Distributed File System." In Lecture Notes on Data Engineering and Communications Technologies, 549–60. Singapore: Springer Nature Singapore, 2022. http://dx.doi.org/10.1007/978-981-16-9113-3_40.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Chattaraj, Durbadal, Sumit Bhagat, and Monalisa Sarma. "Storage Service Reliability and Availability Predictions of Hadoop Distributed File System." In Reliability, Safety and Hazard Assessment for Risk-Based Technologies, 617–26. Singapore: Springer Singapore, 2019. http://dx.doi.org/10.1007/978-981-13-9008-1_52.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Samanta, Pravin Kumar, Soumyadev Mukherjee, and Nirmal Kumar Rout. "Susceptibility Analysis of Novel Corona Virus Using Hadoop Distributed File System." In Advances in Smart Communication Technology and Information Processing, 337–46. Singapore: Springer Singapore, 2021. http://dx.doi.org/10.1007/978-981-15-9433-5_33.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Le, Hieu Hanh, Satoshi Hikida, and Haruo Yokota. "NameNode and DataNode Coupling for a Power-Proportional Hadoop Distributed File System." In Database Systems for Advanced Applications, 99–107. Berlin, Heidelberg: Springer Berlin Heidelberg, 2013. http://dx.doi.org/10.1007/978-3-642-37450-0_7.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

Satheesh, P., B. Srinivas, P. R. S. Naidu, and B. Prasanth Kumar. "Study on Efficient and Adaptive Reproducing Management in Hadoop Distributed File System." In Internet of Things and Personalized Healthcare Systems, 121–32. Singapore: Springer Singapore, 2018. http://dx.doi.org/10.1007/978-981-13-0866-6_12.

Full text
APA, Harvard, Vancouver, ISO, and other styles

Conference papers on the topic "Hadoop Distributed File System"

1

Shvachko, Konstantin, Hairong Kuang, Sanjay Radia, and Robert Chansler. "The Hadoop Distributed File System." In 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST). IEEE, 2010. http://dx.doi.org/10.1109/msst.2010.5496972.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Suganya, S., and S. Selvamuthukumaran. "Hadoop Distributed File System Security -A Review." In 2018 International Conference on Current Trends towards Converging Technologies (ICCTCT). IEEE, 2018. http://dx.doi.org/10.1109/icctct.2018.8550957.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Dwivedi, Kalpana, and Sanjay Kumar Dubey. "Analytical review on Hadoop Distributed file system." In 2014 5th International Conference- Confluence The Next Generation Information Technology Summit. IEEE, 2014. http://dx.doi.org/10.1109/confluence.2014.6949336.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Shetty, Madhvaraj M., and D. H. Manjaiah. "Data security in Hadoop distributed file system." In 2016 International Conference on Emerging Technological Trends (ICETT). IEEE, 2016. http://dx.doi.org/10.1109/icett.2016.7873697.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Attebury, G., A. Baranovski, K. Bloom, B. Bockelman, D. Kcira, J. Letts, T. Levshina, et al. "Hadoop distributed file system for the Grid." In 2009 IEEE Nuclear Science Symposium and Medical Imaging Conference (NSS/MIC 2009). IEEE, 2009. http://dx.doi.org/10.1109/nssmic.2009.5402426.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Alange, Neeta, and Anjali Mathur. "Small Sized File Storage Problems in Hadoop Distributed File System." In 2019 International Conference on Smart Systems and Inventive Technology (ICSSIT). IEEE, 2019. http://dx.doi.org/10.1109/icssit46314.2019.8987739.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Xie, Yunyue, Abobaker Mohammed Qasem Farhan, and Meihua Zhou. "Performance Analysis of Hadoop Distributed File System Writing File Process." In 2018 International Conference on Intelligent Autonomous Systems (ICoIAS). IEEE, 2018. http://dx.doi.org/10.1109/icoias.2018.8494199.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Almansouri, Hatim Talal, and Youssef Masmoudi. "Hadoop Distributed File System for Big data analysis." In 2019 4th World Conference on Complex Systems (WCCS). IEEE, 2019. http://dx.doi.org/10.1109/icocs.2019.8930804.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Zhou, Wei, Jizhong Han, Zhang Zhang, and Jiao Dai. "Dynamic Random Access for Hadoop Distributed File System." In 2012 32nd International Conference on Distributed Computing Systems Workshops (ICDCS Workshops). IEEE, 2012. http://dx.doi.org/10.1109/icdcsw.2012.74.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

Maneas, Stathis, and Bianca Schroeder. "The Evolution of the Hadoop Distributed File System." In 2018 32nd International Conference on Advanced Information Networking and Applications Workshops (WAINA). IEEE, 2018. http://dx.doi.org/10.1109/waina.2018.00065.

Full text
APA, Harvard, Vancouver, ISO, and other styles

Reports on the topic "Hadoop Distributed File System"

1

Siegel, Alex, Kenneth Birman, and Keith Marzullo. Deceit: A Flexible Distributed File System. Fort Belvoir, VA: Defense Technical Information Center, December 1989. http://dx.doi.org/10.21236/ada215937.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Siegel, Alex, Kenneth Birman, and Keith Marzullo. Deceit: A Flexible Distributed File System. Fort Belvoir, VA: Defense Technical Information Center, November 1989. http://dx.doi.org/10.21236/ada218620.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Mukhopadhyay, Meenakshi. Performance analysis of a distributed file system. Portland State University Library, January 2000. http://dx.doi.org/10.15760/etd.6081.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Wachsmann, A. Part III: AFS - A Secure Distributed File System. Office of Scientific and Technical Information (OSTI), June 2005. http://dx.doi.org/10.2172/881149.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Atkin, Benjamin, and Kenneth P. Birman. MFS: an Adaptive Distributed File System for Mobile Hosts. Fort Belvoir, VA: Defense Technical Information Center, January 2003. http://dx.doi.org/10.21236/ada529351.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Gozani, Shai, Mary Gray, Srinivasan Keshav, Vijay Madisetti, Ethan Munson, Mendel Rosenblum, Steve Schoettler, Mark Sullivan, and Douglas Terry. GAFFES: The Design of a Globally Distributed File System. Fort Belvoir, VA: Defense Technical Information Center, June 1987. http://dx.doi.org/10.21236/ada619422.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Thornett, Richard G. File Conversion and Transfer from a Lanier No Problem Word Processor to a Defense Logistics Agency (DLA) Distributed Minicomputer System (DMINS). Fort Belvoir, VA: Defense Technical Information Center, September 1987. http://dx.doi.org/10.21236/ada199551.

Full text
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography