Log in

Relevant bibliographies by topics / HDFS (Hadoop Distributed File System) / Dissertations / Theses

To see the other types of publications on this topic, follow the link: HDFS (Hadoop Distributed File System).

Dissertations / Theses on the topic 'HDFS (Hadoop Distributed File System)'

Author: Grafiati

Published: 5 June 2025

Last updated: 15 July 2025

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 24 dissertations / theses for your research on the topic 'HDFS (Hadoop Distributed File System).'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.

1

Musatoiu, Mihai. "An approach to choosing the right distributed file system : Microsoft DFS vs. Hadoop DFS." Thesis, Blekinge Tekniska Högskola, Institutionen för programvaruteknik, 2015. http://urn.kb.se/resolve?urn=urn:nbn:se:bth-844.

Full text

Abstract:

Context. An important goal of most IT groups is to manage server resources in such a way that their users are provided with fast, reliable and secure access to files. The modern needs of organizations imply that resources are often distributed geographically, asking for new design solutions for the file systems to remain highly available and efficient. This is where distributed file systems (DFSs) come into the picture. A distributed file system (DFS), as opposed to a "classical", local, file system, is accessible across some kind of network and allows clients to access files remotely as if th

APA, Harvard, Vancouver, ISO, and other styles

2

Bhat, Adithya. "RDMA-based Plugin Design and Profiler for Apache and Enterprise Hadoop Distributed File system." The Ohio State University, 2015. http://rave.ohiolink.edu/etdc/view?acc_num=osu1440188090.

Full text

APA, Harvard, Vancouver, ISO, and other styles

3

Polato, Ivanilton. "Energy savings and performance improvements with SSDs in the Hadoop Distributed File System." Universidade de São Paulo, 2016. http://www.teses.usp.br/teses/disponiveis/45/45134/tde-31102016-155908/.

Full text

Abstract:

Energy issues gathered strong attention over the past decade, reaching IT data processing infrastructures. Now, they need to cope with such responsibility, adjusting existing platforms to reach acceptable performance while promoting energy consumption reduction. As the de facto platform for Big Data, Apache Hadoop has evolved significantly over the last years, with more than 60 releases bringing new features. By implementing the MapReduce programming paradigm and leveraging HDFS, its distributed file system, Hadoop has become a reliable and fault tolerant middleware for parallel and distribute

APA, Harvard, Vancouver, ISO, and other styles

4

Careres, Gutierrez Franco Jesus. "Towards an S3-based, DataNode-lessimplementation of HDFS." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-291125.

Full text

Abstract:

The relevance of data processing and analysis today cannot be overstated. The convergence of several technological advancements has fostered the proliferation of systems and infrastructure that together support the generation, transmission, and storage of nearly 15,000 exabytes of digital, analyzabledata. The Hadoop Distributed File System (HDFS) is an open source system designed to leverage the storage capacity of thousands of servers, and is the file system component of an entire ecosystem of tools to transform and analyze massive data sets. While HDFS is used by organizations of all sizes,

APA, Harvard, Vancouver, ISO, and other styles

5

Caceres, Gutierrez Franco Jesus. "Towards an S3-based, DataNode-less implementation of HDFS." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-291125.

Full text

Abstract:

The relevance of data processing and analysis today cannot be overstated. The convergence of several technological advancements has fostered the proliferation of systems and infrastructure that together support the generation, transmission, and storage of nearly 15,000 exabytes of digital, analyzabledata. The Hadoop Distributed File System (HDFS) is an open source system designed to leverage the storage capacity of thousands of servers, and is the file system component of an entire ecosystem of tools to transform and analyze massive data sets. While HDFS is used by organizations of all sizes,

APA, Harvard, Vancouver, ISO, and other styles

6

Benkő, Krisztián. "Zpracování velkých dat z rozsáhlých IoT sítí." Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2019. http://www.nusl.cz/ntk/nusl-403820.

Full text

Abstract:

The goal of this diploma thesis is to design and develop a system for collecting, processing and storing data from large IoT networks. The developed system introduces a complex solution able to process data from various IoT networks using Apache Hadoop ecosystem. The data are real-time processed and stored in a NoSQL database, but the data are also stored in the file system for a potential later processing. The system is optimized and tested using data from IQRF network. The data stored in the NoSQL database are visualized and the system periodically generates derived predictions. Users are c

APA, Harvard, Vancouver, ISO, and other styles

7

Lorenzetto, Luca <1988&gt. "Evaluating performance of Hadoop Distributed File System." Master's Degree Thesis, Università Ca' Foscari Venezia, 2014. http://hdl.handle.net/10579/4773.

Full text

Abstract:

In recent years, a huge quantity of data produced by multiple sources has appeared. Dealing with this data has arisen the so called "big data problem", which can be faced only with new computing paradigms and platforms. Many vendors compete in this field, but at this day the de-facto standard platform for big-data is the opensource framework Apache Hadoop . Inspired by Google's private cluster platform, some indipendent developers created Hadoop and, following the structure published by Google's engineering team, a complete set of components for big data elaboration has been developed. One of

APA, Harvard, Vancouver, ISO, and other styles

8

Pradeep, Aakash. "P2PHDFS: AN IMPLEMENTATION OF STATISTIC MULTIPLEXED COMPUTING ARCHITECTURE IN HADOOP FILE SYSTEM." Master's thesis, Temple University Libraries, 2012. http://cdm16002.contentdm.oclc.org/cdm/ref/collection/p245801coll10/id/214757.

Full text

Abstract:

Computer and Information Science<br>M.S.<br>The Peer to Peer Hadoop Distributed File System (P2PHDFS) is designed to store and process extremely large-scale data sets reliably. This is a first attempt implementation of the Statistic Multiplexed Computing Architecture concept proposed by Dr. Shi for the existing Hadoop File System (HDFS) to eliminate all single point failures. Unlike HDFS, in P2PHDFS every node is designed to be equal and behaves as a file system server as well as slave, which enable it to attain higher performance and higher reliability at the same time as the infrastructure u

APA, Harvard, Vancouver, ISO, and other styles

9

Cheng, Lu. "Concentric layout, a new scientific data layout for matrix data set in Hadoop file system." Master's thesis, University of Central Florida, 2010. http://digital.library.ucf.edu/cdm/ref/collection/ETD/id/4545.

Full text

Abstract:

The data generated by scientific simulation, sensor, monitor or optical telescope has increased with dramatic speed. In order to analyze the raw data speed and space efficiently, data pre-process operation is needed to achieve better performance in data analysis phase. Current research shows an increasing tread of adopting MapReduce framework for large scale data processing. However, the data access patterns which generally applied to scientific data set are not supported by current MapReduce framework directly. The gap between the requirement from analytics application and the property of Map

APA, Harvard, Vancouver, ISO, and other styles

10

Sodhi, Bir Apaar Singh. "DATA MINING: TRACKING SUSPICIOUS LOGGING ACTIVITY USING HADOOP." CSUSB ScholarWorks, 2016. https://scholarworks.lib.csusb.edu/etd/271.

Full text

Abstract:

In this modern rather interconnected era, an organization’s top priority is to protect itself from major security breaches occurring frequently within a communicational environment. But, it seems, as if they quite fail in doing so. Every week there are new headlines relating to information being forged, funds being stolen and corrupt usage of credit card and so on. Personal computers are turned into “zombie machines” by hackers to steal confidential and financial information from sources without disclosing hacker’s true identity. These identity thieves rob private data and ruin the very purpos

APA, Harvard, Vancouver, ISO, and other styles

11

Josefík, Martin. "Distribuovaný repositář digitálních forenzních dat." Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2018. http://www.nusl.cz/ntk/nusl-385958.

Full text

Abstract:

This work deals with the design of distributed repository aimed at storing digital forensic data. The theoretical part of the thesis describes digital forensics and what is its purpose. There are also explained Big data, suitable storages, their properties, advantages and disadvantages, in this part. The main part of the thesis deals with the design and implementation of distributed storage for digital forensic data. The design is also focused in suitable indexing of stored data, and supporting new types of digital forensic data. The performance of implemented system was evaluated for chosen t

APA, Harvard, Vancouver, ISO, and other styles

12

Queirós, Jorge Afonso Barandas. "Implementing Hadoop distributed file system (hdfs) Cluster for BI Solution." Master's thesis, 2021. https://hdl.handle.net/10216/133038.

Full text

APA, Harvard, Vancouver, ISO, and other styles

13

Queirós, Jorge Afonso Barandas. "Implementing Hadoop distributed file system (hdfs) Cluster for BI Solution." Dissertação, 2021. https://hdl.handle.net/10216/133038.

Full text

APA, Harvard, Vancouver, ISO, and other styles

14

Liao, Jhih-Kai, and 廖治凱. "Fault-Tolerant Management Framework for Hadoop Distributed File System." Thesis, 2013. http://ndltd.ncl.edu.tw/handle/20941508217703176221.

Full text

Abstract:

碩士<br>淡江大學<br>資訊工程學系碩士班<br>101<br>Due to the rapid development of modern Internet, the mode of operation of a large number of applications has changed from single-machine to a cluster of machines over the network. This trend also contributed to the development of cloud computing technology, among which Google invented the MapReduce framework, Google File System (GFS), and BigTable, and Yahoo invested the open-source Hadoop project to implement those technologies proposed by Google. The Hadoop Distributed File System (HDFS) is based on the master/slave model to manage the entire file system. Sp

APA, Harvard, Vancouver, ISO, and other styles

15

CHO, CHIH-YUAN, and 卓志遠. "Performance Comparison of Hadoop Distributed File System and Ceph." Thesis, 2014. http://ndltd.ncl.edu.tw/handle/27230766807802849865.

Full text

Abstract:

碩士<br>東海大學<br>資訊工程學系<br>102<br>Cloud computing refers to services at anytime, anywhere, on demand, using any device to access various services. It is a model that can be easily accessed in accordance with the needs of the network computer resources provided by these computer resources, including networks, servers, storage, applications, and services. In response to the popularity of cloud computing services, which produce large amount of information and data, and in order to save the future of science and technology development, processing and analyzing massive data applications for key resear

APA, Harvard, Vancouver, ISO, and other styles

16

Lin, Ying-Chen, and 林映辰. "A Load-Balancing Algorithm for Hadoop Distributed File System." Thesis, 2015. http://ndltd.ncl.edu.tw/handle/79778516225414074113.

Full text

Abstract:

碩士<br>淡江大學<br>資訊工程學系碩士班<br>103<br>With the advancement of Internet and increasing data demands, many enterprises are offering cloud services to their customers. Among various cloud computing platforms, the Apache Hadoop project has been widely adopted by many large organizations and enterprises. In the Hadoop ecosystem, Hadoop Distributed File System (HDFS), Hadoop MapReduce, and HBase are open source equivalents of the Google proposed Google File System (GFS), MapReduce framework, and BigTable, respectively. To meet the requirement of horizontal scaling of storage in the big data era, HDFS ha

APA, Harvard, Vancouver, ISO, and other styles

17

Huang, Hsin-Yi, and 黃心怡. "Realizing Prioritized MapReduce Service in Hadoop Distributed File System." Thesis, 2016. http://ndltd.ncl.edu.tw/handle/85801068064786371658.

Full text

Abstract:

碩士<br>輔仁大學<br>資訊工程學系碩士班<br>104<br>Hadoop is a widely used and highly scalable platform software, and it is a distributed system which can handle a large amount of data with a high fault-tolerance feature. Like other application software, Hadoop system must build on the operating system, and must communicate and coordinate with hardware through the operating system. As Cloud Computing and Big Data appear, the cloud software platform becomes very important to support the cloud services implementation. Hadoop has a mechanism for the work performed by the allocation of resources. The work groups

APA, Harvard, Vancouver, ISO, and other styles

18

Fan, Kuo-Zheng, and 范國拯. "Dynamic De-duplication Decision in a Hadoop Distributed File System." Thesis, 2013. http://ndltd.ncl.edu.tw/handle/12180320597103126420.

Full text

Abstract:

碩士<br>國立東華大學<br>資訊工程學系<br>101<br>Nowadays, data is generated and updated per second and this makes coping with those tremendously fast and multiform amounts of data a heavy challenge. The Hadoop Distributed File System (HDFS) is the first choice solution for most people. However, data is usually prevented from being lost with many backups, and HDFS also does this. Obviously, these duplicates occupy a lot of storage space, and this also means that we need to invest sufficient funding in infrastructure. However, this is not a good method for everybody, since it may be unaffordable. Therefore, us

APA, Harvard, Vancouver, ISO, and other styles

19

Chen, Chih-yi, and 陳致毅. "Design and Implementation of a QoS file transfer protocol over Hadoop distributed file system." Thesis, 2010. http://ndltd.ncl.edu.tw/handle/10307767125626696193.

Full text

Abstract:

碩士<br>國立中山大學<br>資訊工程學系研究所<br>98<br>Cloud computing is pervasive in our daily life. For instance, I usually use Google’s GMail to receive e-mail, Google Document to edit documents online and Google Calendar to make my daily schedule. We can say that Google provides a “Platform as a Service (PaaS)”, which delivers a computing platform as a service, and the platform sustaining lots of cloud applications such as I mentioned above. However, the cloud computing platform of Google is private: we cannot trace its source code and make cloud applications on it! Fortunately, there’s an open source projec

APA, Harvard, Vancouver, ISO, and other styles

20

Fischer, Axel. "Implementierung eines File Managers für das Hadoop Distributed Filesystem und Realisierung einer MapReduce Workflow Submission-Komponente." 2012. https://ul.qucosa.de/id/qucosa%3A17105.

Full text

Abstract:

Die vorliegende Bachelorarbeit erläutert die Entwicklung eines File Managers für das Hadoop Distributed Filesystem (HDFS) im Zusammenhang mit der Entwicklung des Dedoop Prototyps. Der File Manager deckt die Anwendungsfälle refresh, rename, move und delete ab. Darüber hinaus erlaubt er Uploads vom und Downloads zum lokalen Dateisystem des Anwenders. Besonders beachtet werden mussten hierbei die speziellen Anforderungen des Mehrbenutzerbetriebs. Darüber hinaus beschreibt die Bachelorarbeit die Entwicklung einer MapReduce Workflow Submission-Komponente für Dedoop, welche für die Übertragung und A

APA, Harvard, Vancouver, ISO, and other styles

21

Sun, Yi-Feng, and 孫逸峰. "Enabling Prioritized Disk and Memory Service of Cloud Computing in Hadoop Distributed File System." Thesis, 2014. http://ndltd.ncl.edu.tw/handle/02588097514362182433.

Full text

Abstract:

碩士<br>輔仁大學<br>資訊工程學系碩士班<br>102<br>Clouding computing has become more and more popular nowadays. Both governments and enterprises provide service through the construction of public and private clouds accordingly. Among the platforms used in cloud computing, Hadoop is considered one of the most practical and stable systems. Nevertheless, as other regular software, Hadoop still needs to rely on the underlying operating system to communicate with hardware to function appropriately. For modern computer systems, CPUs excessively outrun hard drives (hard disks). The computer hard disk has become

APA, Harvard, Vancouver, ISO, and other styles

22

VARSHNEY, PRATEEK KUMAR VARSHNEY. "IMPLEMENTING PARALLEL PSO ALGORITHM USING MAPREDUCE ARCHITECTURE." Thesis, 2016. http://dspace.dtu.ac.in:8080/jspui/handle/repository/14678.

Full text

Abstract:

ABSTRACT Optimization is the problem of finding minimum or maximum of a given objective function relative to some set, often representing a range of choices available in a certain situation. Particle Swarm Optimization (PSO) is a simple and effective evolutionary algorithm, but it may take a reasonable time to optimize complex objective functions which are deceptive or expensive. To avoid being trapped in local optima, Particle Swarm Optimization requires extensive exploration for multimodal and multidimensional functions. Expensive functions whose computational complexity may arise from depe

APA, Harvard, Vancouver, ISO, and other styles

23

Nethula, Shravya. "Implementation of the HadoopMapReduce algorithm on virtualizedshared storage systems." Thesis, 2016. http://urn.kb.se/resolve?urn=urn:nbn:se:bth-11876.

Full text

Abstract:

Context Hadoop is an open-source software framework developed for distributed storage and distributed processing of large sets of data. The implementation of the Hadoop MapReduce algorithm on virtualized shared storage by eliminating the concept of Hadoop Distributed File System (HDFS) is a challenging task. In this study, the Hadoop MapReduce algorithm is implemented on the Compuverde software that deals with virtualized shared storage of data. Objectives In this study, the effect of using virtualized shared storage with Hadoop framework is identified. The main objective of this study is to d

APA, Harvard, Vancouver, ISO, and other styles

24

Arpith, K. "IO Pattern Aware Methods to Improve the Performance and Lifetime of NAND SSD." Thesis, 2018. https://etd.iisc.ac.in/handle/2005/5376.

Full text

Abstract:

Modern SSDs can store multiple bits per transistor which enables it to have higher storage capacities. Low cost per bit of such SSDs has made it a commercial success. As of 2018, cells with an ability to store three bits are being widely used, with Intel and Micron just announcing even the availability of the first commercial SSD with quad level cells. However, such high-density SSDs suffer from longer latencies to write and read data, resulting in reduced throughputs, when compared to ash memories that store a single bit per cell. Also, they suffer from reduced reliability. Mechanism

APA, Harvard, Vancouver, ISO, and other styles

We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!