Literatura académica sobre el tema "Hadoop"

Crea una cita precisa en los estilos APA, MLA, Chicago, Harvard y otros

Elija tipo de fuente:

Consulte las listas temáticas de artículos, libros, tesis, actas de conferencias y otras fuentes académicas sobre el tema "Hadoop".

Junto a cada fuente en la lista de referencias hay un botón "Agregar a la bibliografía". Pulsa este botón, y generaremos automáticamente la referencia bibliográfica para la obra elegida en el estilo de cita que necesites: APA, MLA, Harvard, Vancouver, Chicago, etc.

También puede descargar el texto completo de la publicación académica en formato pdf y leer en línea su resumen siempre que esté disponible en los metadatos.

Artículos de revistas sobre el tema "Hadoop"

1

Li, Xin Liang y Jian De Zheng. "Improvement of Hadoop Security Mechanism". Applied Mechanics and Materials 484-485 (enero de 2014): 912–15. http://dx.doi.org/10.4028/www.scientific.net/amm.484-485.912.

Texto completo
Resumen
Hadoop, as an open-source cloud computing framework, is increasingly applied in many fields,while the weakness of security mechanism now becomes one of the main problems hindering its development. This paper first analyzes the current security mechanisms of Hadoop,then through study of Hadoop's security mechanism and analysis of security risk in its current version,proposes a corresponding solution based on secure multicast to resovle the security risks. All these could provide certain technical supports for the enterprises in their Hadoop applications with new security needs.
Los estilos APA, Harvard, Vancouver, ISO, etc.
2

Revathy, P. y Rajeswari Mukesh. "HadoopSec 2.0: Prescriptive analytics-based multi-model sensitivity-aware constraints centric block placement strategy for Hadoop". Journal of Intelligent & Fuzzy Systems 39, n.º 6 (4 de diciembre de 2020): 8477–86. http://dx.doi.org/10.3233/jifs-189165.

Texto completo
Resumen
Like many open-source technologies such as UNIX or TCP/IP, Hadoop was not created with Security in mind. Hadoop however evolved from the other tools over time and got widely adopted across large enterprises. Some of Hadoop’s architectural features present Hadoop its unique security issues. Given this security vulnerability and potential invasion of confidentiality due to malicious attackers or internal customers, organizations face challenges in implementing a strong security framework for Hadoop. Furthermore, given the method in which data is placed in Hadoop Cluster adds to the only growing list of these potential security vulnerabilities. Data privacy is compromised when these critical and data-sensitive blocks are accessed either by unauthorized users or for that matter even misuse by authorized users. In this paper, we intend to address the strategy of data block placement across the allotted DataNodes. Prescriptive analytics algorithms are used to determine the Sensitivity Index of the Data and thereby decide on data placement allocation to provide impenetrable access to an unauthorized user. This data block placement strategy aims to adaptively distribute the data across the cluster using innovative ML techniques to make the data infrastructure extra secured.
Los estilos APA, Harvard, Vancouver, ISO, etc.
3

Masadeh, M. B., M. S. Azmi y S. S. S. Ahmad. "Available techniques in hadoop small file issue". International Journal of Electrical and Computer Engineering (IJECE) 10, n.º 2 (1 de abril de 2020): 2097. http://dx.doi.org/10.11591/ijece.v10i2.pp2097-2101.

Texto completo
Resumen
Hadoop is an optimal solution for big data processing and storing since being released in the late of 2006, hadoop data processing stands on master-slaves manner [1] that’s splits the large file job into several small files in order to process them separately, this technique was adopted instead of pushing one large file into a costly super machine to insights some useful information. Hadoop runs very good with large file of big data, but when it comes to big data in small files it could facing some problems in performance, processing slow down, data access delay, high latency and up to a completely cluster shutting down [2]. In this paper we will high light on one of hadoop’s limitations, that’s affects the data processing performance, one of these limits called “big data in small files” accrued when a massive number of small files pushed into a hadoop cluster which will rides the cluster to shut down totally. This paper also high light on some native and proposed solutions for big data in small files, how do they work to reduce the negative effects on hadoop cluster, and add extra performance on storing and accessing mechanism.
Los estilos APA, Harvard, Vancouver, ISO, etc.
4

Hanafi, Idris y Amal Abdel-Raouf. "P-Codec: Parallel Compressed File Decompression Algorithm for Hadoop". INTERNATIONAL JOURNAL OF COMPUTERS & TECHNOLOGY 15, n.º 8 (24 de mayo de 2016): 6991–98. http://dx.doi.org/10.24297/ijct.v15i8.1500.

Texto completo
Resumen
The increasing amount and size of data being handled by data analytic applications running on Hadoop has created a need for faster data processing. One of the effective methods for handling big data sizes is compression. Data compression not only makes network I/O processing faster, but also provides better utilization of resources. However, this approach defeats one of Hadoop’s main purposes, which is the parallelism of map and reduce tasks. The number of map tasks created is determined by the size of the file, so by compressing a large file, the number of mappers is reduced which in turn decreases parallelism. Consequently, standard Hadoop takes longer times to process. In this paper, we propose the design and implementation of a Parallel Compressed File Decompressor (P-Codec) that improves the performance of Hadoop when processing compressed data. P-Codec includes two modules; the first module decompresses data upon retrieval by a data node during the phase of uploading the data to the Hadoop Distributed File System (HDFS). This process reduces the runtime of a job by removing the burden of decompression during the MapReduce phase. The second P-Codec module is a decompressed map task divider that increases parallelism by dynamically changing the map task split sizes based on the size of the final decompressed block. Our experimental results using five different MapReduce benchmarks show an average improvement of approximately 80% compared to standard Hadoop.
Los estilos APA, Harvard, Vancouver, ISO, etc.
5

Lee, Kyong-Ha, Woo Lam Kang y Young-Kyoon Suh. "Improving I/O Efficiency in Hadoop-Based Massive Data Analysis Programs". Scientific Programming 2018 (2 de diciembre de 2018): 1–9. http://dx.doi.org/10.1155/2018/2682085.

Texto completo
Resumen
Apache Hadoop has been a popular parallel processing tool in the era of big data. While practitioners have rewritten many conventional analysis algorithms to make them customized to Hadoop, the issue of inefficient I/O in Hadoop-based programs has been repeatedly reported in the literature. In this article, we address the problem of the I/O inefficiency in Hadoop-based massive data analysis by introducing our efficient modification of Hadoop. We first incorporate a columnar data layout into the conventional Hadoop framework, without any modification of the Hadoop internals. We also provide Hadoop with indexing capability to save a huge amount of I/O while processing not only selection predicates but also star-join queries that are often used in many analysis tasks.
Los estilos APA, Harvard, Vancouver, ISO, etc.
6

Adawiyah, Robiyatul y Sirojul Munir. "Analisis Kecepatan Algoritma MapReduce Word Count Pada Cluster Hadoop Studi Kasus Pada Global Dataset of Events, Language and Tone (GDELT)". Jurnal Informatika Terpadu 6, n.º 1 (6 de marzo de 2020): 14–19. http://dx.doi.org/10.54914/jit.v6i1.214.

Texto completo
Resumen
Penelitian diajukan untuk menganalisis kecepatan algoritma mapreduce pada cluster Hadoop dan mengetahui waktu yang dibutuhkan dalam mengolah data GDELT pada Hadoop. Penelitian ini menggunakan metode analisis kualitatif. Berdasarkan analisa data yang telah dilakukan, diperoleh kesimpulan bahwa algoritma Word Count yang diterapkan pada data set GDELT dapat berjalan pada cluster Hadoop. Kecepatan algoritma Word Count pada MapReduce yang diterapkan untuk data set GDELT pada hadoop berpengaruh apabila node yang digunakan ditambah, dimana dalam penelitian menggunakan sebanyak 2 node physical machine. Hadoop dapat mengolah data yang memiliki ukuran besar dan banyak karena Hadoop mengolah data secara terdistribusi. Kecepatan Hadoop dapat diatur dengan menambahkan node dan juga pengaturan lainnya seperti halnya block size.
Los estilos APA, Harvard, Vancouver, ISO, etc.
7

Azeroual, Otmane y Renaud Fabre. "Processing Big Data with Apache Hadoop in the Current Challenging Era of COVID-19". Big Data and Cognitive Computing 5, n.º 1 (9 de marzo de 2021): 12. http://dx.doi.org/10.3390/bdcc5010012.

Texto completo
Resumen
Big data have become a global strategic issue, as increasingly large amounts of unstructured data challenge the IT infrastructure of global organizations and threaten their capacity for strategic forecasting. As experienced in former massive information issues, big data technologies, such as Hadoop, should efficiently tackle the incoming large amounts of data and provide organizations with relevant processed information that was formerly neither visible nor manageable. After having briefly recalled the strategic advantages of big data solutions in the introductory remarks, in the first part of this paper, we focus on the advantages of big data solutions in the currently difficult time of the COVID-19 pandemic. We characterize it as an endemic heterogeneous data context; we then outline the advantages of technologies such as Hadoop and its IT suitability in this context. In the second part, we identify two specific advantages of Hadoop solutions, globality combined with flexibility, and we notice that they are at work with a “Hadoop Fusion Approach” that we describe as an optimal response to the context. In the third part, we justify selected qualifications of globality and flexibility by the fact that Hadoop solutions enable comparable returns in opposite contexts of models of partial submodels and of models of final exact systems. In part four, we remark that in both these opposite contexts, Hadoop’s solutions allow a large range of needs to be fulfilled, which fits with requirements previously identified as the current heterogeneous data structure of COVID-19 information. In the final part, we propose a framework of strategic data processing conditions. To the best of our knowledge, they appear to be the most suitable to overcome COVID-19 massive information challenges.
Los estilos APA, Harvard, Vancouver, ISO, etc.
8

Li, Pengcheng, Haidong Chen, Shipeng Li, Tinggui Yan y Hang Qian. "Research on Distributed Calculation of Flight Parameters Based on Hadoop". Journal of Physics: Conference Series 2337, n.º 1 (1 de septiembre de 2022): 012013. http://dx.doi.org/10.1088/1742-6596/2337/1/012013.

Texto completo
Resumen
Abstract With the improvement of launch vehicle technology and the increase of launch missions, under the intensive launch tasks, the contradiction between the large-scale calculation demand of flight parameters of launch vehicle and the traditional standalone calculation mode is increasingly prominent, which is mainly reflected in the slow calculation speed, low processing efficiency, limited bandwidth bottleneck, and single point fault. Big data processing architecture Hadoop’s distributed computing framework MapReduce, running in low cost cluster, is innovative applied in the large-scale calculation of launch vehicle’s flight parameters, relying on its distributed storage, coordination and load balancing mechanism. The method effectively improves computing efficiency, breakthroughs performance bottleneck and avoids single point failure. Compared with the traditional standalone deployment and pseudo-distributed cluster deployment based on Hadoop, the fully distributed cluster deployment based on Hadoop is the optimal deployment for calculating flight parameters. The results show that the parallel calculation of flight parameters based on Hadoop could save time used by 51% with the consistency of calculation results, which is the same as the standalone deployment.
Los estilos APA, Harvard, Vancouver, ISO, etc.
9

Ji, Keungyeup y Youngmi Kwon. "Hadoop MapReduce Performance Optimization Analysis by Calibrating Hadoop Parameters". Journal of Korean Institute of Information Technology 19, n.º 6 (30 de junio de 2021): 9–19. http://dx.doi.org/10.14801/jkiit.2021.19.6.9.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
10

Lee, Sungchul, Ju-Yeon Jo y Yoohwan Kim. "Hadoop Performance Analysis Model with Deep Data Locality". Information 10, n.º 7 (27 de junio de 2019): 222. http://dx.doi.org/10.3390/info10070222.

Texto completo
Resumen
Background: Hadoop has become the base framework on the big data system via the simple concept that moving computation is cheaper than moving data. Hadoop increases a data locality in the Hadoop Distributed File System (HDFS) to improve the performance of the system. The network traffic among nodes in the big data system is reduced by increasing a data-local on the machine. Traditional research increased the data-local on one of the MapReduce stages to increase the Hadoop performance. However, there is currently no mathematical performance model for the data locality on the Hadoop. Methods: This study made the Hadoop performance analysis model with data locality for analyzing the entire process of MapReduce. In this paper, the data locality concept on the map stage and shuffle stage was explained. Also, this research showed how to apply the Hadoop performance analysis model to increase the performance of the Hadoop system by making the deep data locality. Results: This research proved the deep data locality for increasing performance of Hadoop via three tests, such as, a simulation base test, a cloud test and a physical test. According to the test, the authors improved the Hadoop system by over 34% by using the deep data locality. Conclusions: The deep data locality improved the Hadoop performance by reducing the data movement in HDFS.
Los estilos APA, Harvard, Vancouver, ISO, etc.
Más fuentes

Tesis sobre el tema "Hadoop"

1

Raja, Anitha. "A Coordination Framework for Deploying Hadoop MapReduce Jobs on Hadoop Cluster". Thesis, KTH, Skolan för informations- och kommunikationsteknik (ICT), 2016. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-196951.

Texto completo
Resumen
Apache Hadoop is an open source framework that delivers reliable, scalable, and distributed computing. Hadoop services are provided for distributed data storage, data processing, data access, and security. MapReduce is the heart of the Hadoop framework and was designed to process vast amounts of data distributed over a large number of nodes. MapReduce has been used extensively to process structured and unstructured data in diverse fields such as e-commerce, web search, social networks, and scientific computation. Understanding the characteristics of Hadoop MapReduce workloads is the key to achieving improved configurations and refining system throughput. Thus far, MapReduce workload characterization in a large-scale production environment has not been well studied. In this thesis project, the focus is mainly on composing a Hadoop cluster (as an execution environment for data processing) to analyze two types of Hadoop MapReduce (MR) jobs via a proposed coordination framework. This coordination framework is referred to as a workload translator. The outcome of this work includes: (1) a parametric workload model for the target MR jobs, (2) a cluster specification to develop an improved cluster deployment strategy using the model and coordination framework, and (3) better scheduling and hence better performance of jobs (i.e. shorter job completion time). We implemented a prototype of our solution using Apache Tomcat on (OpenStack) Ubuntu Trusty Tahr, which uses RESTful APIs to (1) create a Hadoop cluster version 2.7.2 and (2) to scale up and scale down the number of workers in the cluster. The experimental results showed that with well tuned parameters, MR jobs can achieve a reduction in the job completion time and improved utilization of the hardware resources. The target audience for this thesis are developers. As future work, we suggest adding additional parameters to develop a more refined workload model for MR and similar jobs.
Apache Hadoop är ett öppen källkods system som levererar pålitlig, skalbar och distribuerad användning. Hadoop tjänster hjälper med distribuerad data förvaring, bearbetning, åtkomst och trygghet. MapReduce är en viktig del av Hadoop system och är designad att bearbeta stora data mängder och även distribuerad i flera leder. MapReduce är använt extensivt inom bearbetning av strukturerad och ostrukturerad data i olika branscher bl. a e-handel, webbsökning, sociala medier och även vetenskapliga beräkningar. Förståelse av MapReduces arbetsbelastningar är viktiga att få förbättrad konfigurationer och resultat. Men, arbetsbelastningar av MapReduce inom massproduktions miljö var inte djup-forskat hittills. I detta examensarbete, är en hel del fokus satt på ”Hadoop cluster” (som en utförande miljö i data bearbetning) att analysera två typer av Hadoop MapReduce (MR) arbeten genom ett tilltänkt system. Detta system är refererad som arbetsbelastnings översättare. Resultaten från denna arbete innehåller: (1) en parametrisk arbetsbelastningsmodell till inriktad MR arbeten, (2) en specifikation att utveckla förbättrad kluster strategier med båda modellen och koordinations system, och (3) förbättrad planering och arbetsprestationer, d.v.s kortare tid att utföra arbetet. Vi har realiserat en prototyp med Apache Tomcat på (OpenStack) Ubuntu Trusty Tahr som använder RESTful API (1) att skapa ”Hadoop cluster” version 2.7.2 och (2) att båda skala upp och ner antal medarbetare i kluster. Forskningens resultat har visat att med vältrimmad parametrar, kan MR arbete nå förbättringar dvs. sparad tid vid slutfört arbete och förbättrad användning av hårdvara resurser. Målgruppen för denna avhandling är utvecklare. I framtiden, föreslår vi tilläggning av olika parametrar att utveckla en allmän modell för MR och liknande arbeten.
Los estilos APA, Harvard, Vancouver, ISO, etc.
2

Savvidis, Evangelos. "Searching Metadata in Hadoop". Thesis, KTH, Skolan för informations- och kommunikationsteknik (ICT), 2015. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-177467.

Texto completo
Resumen
The rapid expansion of the internet has led to the Big Data era. Companies that provide services which deal with Big Data have to face two major issues: i) storing petabytes of data and ii) manipulating this data. On the one end the open source Hadoop ecosystem and particularly its distributed file system HDFS comes to take care of the former issue, by providing a persistent storage for unprecedented amounts of data. For the latter, there are many approaches when it comes to data analytics – from map-reduce jobs to information retrieval and data discovery. This thesis provides a novel approach to information discovery firstly by providing the means to create, manage and associate metadata to HDFS files and secondly searching for files through their metadata using Elasticsearch. The work is composed of three parts: The first one is the metadata designer/manager, which is the AngularJS front end. The second part is the J2EE back end which enables the front end to perform all the managing actions on metadata using websockets. The third part is the indexing of data into Elasticsearch, the distributed and scalable open source search engine. Our work has shown that this approach works and it greatly helps finding information in the vast sea of data in the HDFS.
Los estilos APA, Harvard, Vancouver, ISO, etc.
3

Bux, Marc Nicolas. "Scientific Workflows for Hadoop". Doctoral thesis, Humboldt-Universität zu Berlin, 2018. http://dx.doi.org/10.18452/19321.

Texto completo
Resumen
Scientific Workflows bieten flexible Möglichkeiten für die Modellierung und den Austausch komplexer Arbeitsabläufe zur Analyse wissenschaftlicher Daten. In den letzten Jahrzehnten sind verschiedene Systeme entstanden, die den Entwurf, die Ausführung und die Verwaltung solcher Scientific Workflows unterstützen und erleichtern. In mehreren wissenschaftlichen Disziplinen wachsen die Mengen zu verarbeitender Daten inzwischen jedoch schneller als die Rechenleistung und der Speicherplatz verfügbarer Rechner. Parallelisierung und verteilte Ausführung werden häufig angewendet, um mit wachsenden Datenmengen Schritt zu halten. Allerdings sind die durch verteilte Infrastrukturen bereitgestellten Ressourcen häufig heterogen, instabil und unzuverlässig. Um die Skalierbarkeit solcher Infrastrukturen nutzen zu können, müssen daher mehrere Anforderungen erfüllt sein: Scientific Workflows müssen parallelisiert werden. Simulations-Frameworks zur Evaluation von Planungsalgorithmen müssen die Instabilität verteilter Infrastrukturen berücksichtigen. Adaptive Planungsalgorithmen müssen eingesetzt werden, um die Nutzung instabiler Ressourcen zu optimieren. Hadoop oder ähnliche Systeme zur skalierbaren Verwaltung verteilter Ressourcen müssen verwendet werden. Diese Dissertation präsentiert neue Lösungen für diese Anforderungen. Zunächst stellen wir DynamicCloudSim vor, ein Simulations-Framework für Cloud-Infrastrukturen, welches verschiedene Aspekte der Variabilität adäquat modelliert. Im Anschluss beschreiben wir ERA, einen adaptiven Planungsalgorithmus, der die Ausführungszeit eines Scientific Workflows optimiert, indem er Heterogenität ausnutzt, kritische Teile des Workflows repliziert und sich an Veränderungen in der Infrastruktur anpasst. Schließlich präsentieren wir Hi-WAY, eine Ausführungsumgebung die ERA integriert und die hochgradig skalierbare Ausführungen in verschiedenen Sprachen beschriebener Scientific Workflows auf Hadoop ermöglicht.
Scientific workflows provide a means to model, execute, and exchange the increasingly complex analysis pipelines necessary for today's data-driven science. Over the last decades, scientific workflow management systems have emerged to facilitate the design, execution, and monitoring of such workflows. At the same time, the amounts of data generated in various areas of science outpaced hardware advancements. Parallelization and distributed execution are generally proposed to deal with increasing amounts of data. However, the resources provided by distributed infrastructures are subject to heterogeneity, dynamic performance changes at runtime, and occasional failures. To leverage the scalability provided by these infrastructures despite the observed aspects of performance variability, workflow management systems have to progress: Parallelization potentials in scientific workflows have to be detected and exploited. Simulation frameworks, which are commonly employed for the evaluation of scheduling mechanisms, have to consider the instability encountered on the infrastructures they emulate. Adaptive scheduling mechanisms have to be employed to optimize resource utilization in the face of instability. State-of-the-art systems for scalable distributed resource management and storage, such as Apache Hadoop, have to be supported. This dissertation presents novel solutions for these aspirations. First, we introduce DynamicCloudSim, a cloud computing simulation framework that is able to adequately model the various aspects of variability encountered in computational clouds. Secondly, we outline ERA, an adaptive scheduling policy that optimizes workflow makespan by exploiting heterogeneity, replicating bottlenecks in workflow execution, and adapting to changes in the underlying infrastructure. Finally, we present Hi-WAY, an execution engine that integrates ERA and enables the highly scalable execution of scientific workflows written in a number of languages on Hadoop.
Los estilos APA, Harvard, Vancouver, ISO, etc.
4

Wu, Yuanyuan. "HADOOP-EDF: LARGE-SCALE DISTRIBUTED PROCESSING OF ELECTROPHYSIOLOGICAL SIGNAL DATA IN HADOOP MAPREDUCE". UKnowledge, 2019. https://uknowledge.uky.edu/cs_etds/88.

Texto completo
Resumen
The rapidly growing volume of electrophysiological signals has been generated for clinical research in neurological disorders. European Data Format (EDF) is a standard format for storing electrophysiological signals. However, the bottleneck of existing signal analysis tools for handling large-scale datasets is the sequential way of loading large EDF files before performing an analysis. To overcome this, we develop Hadoop-EDF, a distributed signal processing tool to load EDF data in a parallel manner using Hadoop MapReduce. Hadoop-EDF uses a robust data partition algorithm making EDF data parallel processable. We evaluate Hadoop-EDF’s scalability and performance by leveraging two datasets from the National Sleep Research Resource and running experiments on Amazon Web Service clusters. The performance of Hadoop-EDF on a 20-node cluster improves 27 times and 47 times than sequential processing of 200 small-size files and 200 large-size files, respectively. The results demonstrate that Hadoop-EDF is more suitable and effective in processing large EDF files.
Los estilos APA, Harvard, Vancouver, ISO, etc.
5

Büchler, Peter. "Indexing Genomic Data on Hadoop". Thesis, KTH, Skolan för informations- och kommunikationsteknik (ICT), 2014. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-177298.

Texto completo
Resumen
In the last years Hadoop has been used as a standard backend for big data applications. Its most known application MapReduce provides a powerful parallel programming paradigm. Big companies, storing petabytes of data, like Facebook and Yahoo deployed their own Hadoop distribution for data analytics, interactive services etc. Nevertheless MapReduce’s simplicity in its map stage always leads to a full data scan of the input data and thus potentially wastes resources. Recently new sources of big data, e.g. the 4k video format or genomic data, have appeared. Genomic data in its raw file format (FastQ) can take up to hundreds of gigabytes per file. Simply using MapReduce for a population analysis would easily end up in a full data scan on terabytes of data. Obviously there is a need for more efficient ways of accessing the data by reducing the amount of data, considered for the computation. Already existing approaches introduce indexing structures into their respective Hadoop distribution. While some of them are specifically made for certain data structures, e.g. key-value pairs, others strongly depend on the existence of a MapReduce framework. To overcome these problems we integrated an indexing structure into Hadoop’s file system, the Hadoop Distributed File System (HDFS), working independently of MapReduce. This structure supports the definition of own input formats and individual indexing strategies. The building process of an index is integrated into the file writing processes and is independent of software, working in higher layers of Hadoop. As a proof-of-concept though MapReduce has been given the possibility to make use of these indexing structures by simply adding a new parameter to its job definition. A prototype and its evaluation will show the advantages of using those structures with genomic data (FastQ and SAM files) as a use case.
Los estilos APA, Harvard, Vancouver, ISO, etc.
6

Schätzle, Alexander [Verfasser] y Georg [Akademischer Betreuer] Lausen. "Distributed RDF Querying on Hadoop". Freiburg : Universität, 2017. http://d-nb.info/1128574187/34.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
7

Tabatabaei, Mahsa. "Evaluation of Security in Hadoop". Thesis, KTH, Kommunikationsnät, 2015. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-160269.

Texto completo
Resumen
There are different ways to store and process large amount of data. Hadoop iswidely used, one of the most popular platforms to store huge amount of dataand process them in parallel. While storing sensitive data, security plays animportant role to keep it safe. Security was not that much considered whenHadoop was initially designed. The initial use of Hadoop was managing largeamount of public web data so confidentiality of the stored data was not anissue. Initially users and services in Hadoop were not authenticated; Hadoop isdesigned to run code on a distributed cluster of machines so without properauthentication anyone could submit code and it would be executed. Differentprojects have started to improve the security of Hadoop. Two of these projectsare called project Rhino and Project Sentry [1].Project Rhino implements splittable crypto codec to provide encryptionfor the data that is stored in Hadoop distributed file system. It also developsthe centralized authentication by implementing Hadoop single sign on whichprevents repeated authentication of the users accessing the same services manytimes. From the authorization point of view Project Rhino provides cell-basedauthorization for Hbase [2].Project Sentry provides fine-grained access control by supporting role-basedauthorization which different services can be bound to it to provide authorizationfor their users [3].It is possible to combine security enhancements which have been done inthe Project Rhino and Project Sentry to further improve the performance andprovide better mechanisms to secure Hadoop.In this thesis, the security of the system in Hadoop version 1 and Hadoopversion 2 is evaluated and different security enhancements are proposed, consideringsecurity improvements made by the two aforementioned projects, ProjectRhino and Project Sentry, in terms of encryption, authentication, and authorization.This thesis suggests some high-level security improvements on theCentralized authentication system (Hadoop Single Sign on) implementationmade by Project Rhino.
Los estilos APA, Harvard, Vancouver, ISO, etc.
8

Дикий, В. С. "Сутність та особливості використання Hadoop". Thesis, Київський національний універститет технологій та дизайну, 2017. https://er.knutd.edu.ua/handle/123456789/10420.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
9

Brotánek, Jan. "Apache Hadoop jako analytická platforma". Master's thesis, Vysoká škola ekonomická v Praze, 2017. http://www.nusl.cz/ntk/nusl-358801.

Texto completo
Resumen
Diploma Thesis focuses on integrating Hadoop platform into current data warehouse architecture. In theoretical part, properties of Big Data are described together with their methods and processing models. Hadoop framework, its components and distributions are discussed. Moreover, compoments which enables end users, developers and analytics to access Hadoop cluster are described. Case study of batch data extraction from current data warehouse on Oracle platform with aid of Sqoop tool, their transformation in relational structures of Hive component and uploading them back to the original source is being discussed at practical part of thesis. Compression of data and efficiency of queries depending on various storage formats is also discussed. Quality and consistency of manipulated data is checked during all phases of the process. Fraction of practical part discusses ways of storing and capturing stream data. For this purposes tool Flume is used to capture stream data. Further this data are transformed in Pig tool. Purpose of implementing the process is to move part of data and its processing from current data warehouse to Hadoop cluster. Therefore process of integration of current data warehouse and Hortonworks Data Platform and its components, was designed
Los estilos APA, Harvard, Vancouver, ISO, etc.
10

Nilsson, Johan. "Hadoop MapReduce in Eucalyptus Private Cloud". Thesis, Umeå universitet, Institutionen för datavetenskap, 2011. http://urn.kb.se/resolve?urn=urn:nbn:se:umu:diva-51309.

Texto completo
Resumen
This thesis investigates how setting up a private cloud using the Eucalyptus Cloud system could be done along with it's usability, requirements and limitations as an open-source cloud platform providing private cloud solutions. It also studies if using the MapReduce framework through Apache Hadoop's implementation on top of the private Eucalyptus Cloud can provide near linear scalability in terms of time and the amount of virtual machines in the cluster. Analysis has shown that Eucalyptus is lacking in a few usability areas when setting up the cloud infrastructure in terms of private networking and DNS lookups, yet the API that Eucalyptus provides gives benefits when migrating from public clouds like Amazon. The MapReduce framework is showing an initial near-linear relation which is declining when the amount of virtual machines is reaching the maximum of the cloud infrastructure.
Los estilos APA, Harvard, Vancouver, ISO, etc.
Más fuentes

Libros sobre el tema "Hadoop"

1

Venner, Jason. Pro Hadoop. Berkeley, CA: Apress, 2009. http://dx.doi.org/10.1007/978-1-4302-1943-9.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
2

service), SpringerLink (Online, ed. Pro Hadoop. Berkeley, CA: Apress, 2009.

Buscar texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
3

Anthony, Benoy, Konstantin Boudnik, Cheryl Adams, Branky Shao, Cazen Lee y Kai Sasaki. Professional Hadoop®. Indianapolis, IN, USA: John Wiley & Sons, Inc, 2016. http://dx.doi.org/10.1002/9781119281320.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
4

Lakhe, Bhushan. Practical Hadoop Migration. Berkeley, CA: Apress, 2016. http://dx.doi.org/10.1007/978-1-4842-1287-5.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
5

Vohra, Deepak. Practical Hadoop Ecosystem. Berkeley, CA: Apress, 2016. http://dx.doi.org/10.1007/978-1-4842-2199-0.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
6

Lakhe, Bhushan. Practical Hadoop Security. Berkeley, CA: Apress, 2014. http://dx.doi.org/10.1007/978-1-4302-6545-0.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
7

Wadkar, Sameer y Madhu Siddalingaiah. Pro Apache Hadoop. Berkeley, CA: Apress, 2014. http://dx.doi.org/10.1007/978-1-4302-4864-4.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
8

Hadoop in action. Greenwich, Conn: Manning Publications, 2011.

Buscar texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
9

White, Tom. Hadoop: The Definitive Guide. 2a ed. Sebastopol: O'Reilly, 2010.

Buscar texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
10

Koitzsch, Kerry. Pro Hadoop Data Analytics. Berkeley, CA: Apress, 2017. http://dx.doi.org/10.1007/978-1-4842-1910-2.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
Más fuentes

Capítulos de libros sobre el tema "Hadoop"

1

Srinivasa, K. G., Siddesh G. M. y Srinidhi H. "Hadoop". En Computer Communications and Networks, 29–53. Cham: Springer International Publishing, 2018. http://dx.doi.org/10.1007/978-3-319-77800-6_2.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
2

de Souza Granha, Renata Ghisloti Duarte. "Hadoop". En Encyclopedia of Big Data Technologies, 1–5. Cham: Springer International Publishing, 2018. http://dx.doi.org/10.1007/978-3-319-63962-8_36-1.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
3

Freiknecht, Jonas. "Hadoop". En Big Data in der Praxis, 19–179. München: Carl Hanser Verlag GmbH & Co. KG, 2014. http://dx.doi.org/10.3139/9783446441774.003.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
4

de Souza Granha, Renata Ghisloti Duarte. "Hadoop". En Encyclopedia of Big Data Technologies, 913–17. Cham: Springer International Publishing, 2019. http://dx.doi.org/10.1007/978-3-319-77525-8_36.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
5

Freiknecht, Jonas y Stefan Papp. "Hadoop". En Big Data in der Praxis, 21–186. München: Carl Hanser Verlag GmbH & Co. KG, 2018. http://dx.doi.org/10.3139/9783446456013.003.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
6

Shaw, Scott. "HADOOP TECHNOLOGY". En Internet of Things and Data Analytics Handbook, 383–97. Hoboken, NJ, USA: John Wiley & Sons, Inc., 2016. http://dx.doi.org/10.1002/9781119173601.ch22.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
7

Wadkar, Sameer y Madhu Siddalingaiah. "Installing Hadoop". En Pro Apache Hadoop, 381–90. Berkeley, CA: Apress, 2014. http://dx.doi.org/10.1007/978-1-4302-4864-4_18.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
8

Wadkar, Sameer y Madhu Siddalingaiah. "Hadoop Concepts". En Pro Apache Hadoop, 11–30. Berkeley, CA: Apress, 2014. http://dx.doi.org/10.1007/978-1-4302-4864-4_2.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
9

Wadkar, Sameer y Madhu Siddalingaiah. "Hadoop Administration". En Pro Apache Hadoop, 47–72. Berkeley, CA: Apress, 2014. http://dx.doi.org/10.1007/978-1-4302-4864-4_4.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
10

Wadkar, Sameer y Madhu Siddalingaiah. "Monitoring Hadoop". En Pro Apache Hadoop, 203–15. Berkeley, CA: Apress, 2014. http://dx.doi.org/10.1007/978-1-4302-4864-4_9.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.

Actas de conferencias sobre el tema "Hadoop"

1

He, Wenting, Huimin Cui, Binbin Lu, Jiacheng Zhao, Shengmei Li, Gong Ruan, Jingling Xue, Xiaobing Feng, Wensen Yang y Youliang Yan. "Hadoop+". En ICS'15: 2015 International Conference on Supercomputing. New York, NY, USA: ACM, 2015. http://dx.doi.org/10.1145/2751205.2751236.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
2

Bhandarkar, Milind. "Hadoop". En KDD' 13: The 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York, NY, USA: ACM, 2013. http://dx.doi.org/10.1145/2487575.2491128.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
3

Radhakrishnan, Srihari, Bryan J. Muscedere y Khuzaima Daudjee. "V-Hadoop: Virtualized Hadoop using containers". En 2016 IEEE 15th International Symposium on Network Computing and Applications (NCA). IEEE, 2016. http://dx.doi.org/10.1109/nca.2016.7778624.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
4

Alzuru, Icaro, Kevin Long, Bhaskar Gowda, David Zimmerman y Tao Li. "Hadoop Characterization". En 2015 IEEE Trustcom/BigDataSE/ISPA. IEEE, 2015. http://dx.doi.org/10.1109/trustcom.2015.567.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
5

Alarabi, Louai. "ST-Hadoop". En the 2017 ACM International Conference. New York, New York, USA: ACM Press, 2017. http://dx.doi.org/10.1145/3055167.3055181.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
6

Zhang, Yunming. "HJ-Hadoop". En the 2013 companion publication for conference. New York, New York, USA: ACM Press, 2013. http://dx.doi.org/10.1145/2508075.2514875.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
7

Chen, Yi-Wei, Shih-Hao Hung, Chia-Heng Tu y Chih Wei Yeh. "Virtual Hadoop". En RACS '16: International Conference on Research in Adaptive and Convergent Systems. New York, NY, USA: ACM, 2016. http://dx.doi.org/10.1145/2987386.2987408.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
8

Wang, Jianwu, Daniel Crawl y Ilkay Altintas. "Kepler + Hadoop". En the 4th Workshop. New York, New York, USA: ACM Press, 2009. http://dx.doi.org/10.1145/1645164.1645176.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
9

Haque, Riyaz, David M. Peixotto y Vivek Sarkar. "CnC-Hadoop". En the 8th ACM International Conference. New York, New York, USA: ACM Press, 2011. http://dx.doi.org/10.1145/2016604.2016626.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
10

Brown, Richard A. "Hadoop at home". En the 40th ACM technical symposium. New York, New York, USA: ACM Press, 2009. http://dx.doi.org/10.1145/1508865.1508904.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.

Informes sobre el tema "Hadoop"

1

Chou, T. T. High energy hadron-hadron collisions. Office of Scientific and Technical Information (OSTI), diciembre de 1991. http://dx.doi.org/10.2172/7206040.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
2

Chou, T. T. High energy hadron-hadron collisions. Office of Scientific and Technical Information (OSTI), noviembre de 1990. http://dx.doi.org/10.2172/6138833.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
3

Chou, T. T. High energy hadron-hadron collisions. Office of Scientific and Technical Information (OSTI), enero de 1990. http://dx.doi.org/10.2172/7017363.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
4

Chou, T. T. High energy hadron-hadron collisions. Final report. Office of Scientific and Technical Information (OSTI), agosto de 1995. http://dx.doi.org/10.2172/106596.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
5

Chou, T. T. High energy hadron-hadron collisions. Annual progress report. Office of Scientific and Technical Information (OSTI), diciembre de 1992. http://dx.doi.org/10.2172/10157470.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
6

Chou, T. T. High energy hadron-hadron collisions. Annual progress report. Office of Scientific and Technical Information (OSTI), diciembre de 1991. http://dx.doi.org/10.2172/10165579.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
7

Savit, R. Multiplicities And Regge Poles In Deep Hadron-Hadron Scattering. Office of Scientific and Technical Information (OSTI), junio de 2018. http://dx.doi.org/10.2172/1453925.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
8

Longacre, R. S. Hadron Molecules Revisted. Office of Scientific and Technical Information (OSTI), diciembre de 2013. http://dx.doi.org/10.2172/1122758.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
9

Atwood, W. B. y J. A. Jaros. B HADRON LIFETIMES. Office of Scientific and Technical Information (OSTI), octubre de 1991. http://dx.doi.org/10.2172/1449625.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
10

Brodsky, Stanley J. Hadron Spin Dynamics. Office of Scientific and Technical Information (OSTI), enero de 2002. http://dx.doi.org/10.2172/798965.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
Ofrecemos descuentos en todos los planes premium para autores cuyas obras están incluidas en selecciones literarias temáticas. ¡Contáctenos para obtener un código promocional único!

Pasar a la bibliografía