Dissertations / Theses on the topic 'Cloud data storage'
Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles
Consult the top 50 dissertations / theses for your research on the topic 'Cloud data storage.'
Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.
You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.
Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.
Avlasovych, V. V. "Cloud data Storage." Thesis, Sumy State University, 2016. http://essuir.sumdu.edu.ua/handle/123456789/46877.
Full textCappelli, Gino. "Data cloud through google cloud storage." Bachelor's thesis, Alma Mater Studiorum - Università di Bologna, 2012. http://amslaurea.unibo.it/3032/.
Full textWang, Xing. "Benchmarking Cloud Storage Systems." Thesis, Norges teknisk-naturvitenskapelige universitet, Institutt for telematikk, 2014. http://urn.kb.se/resolve?urn=urn:nbn:no:ntnu:diva-26716.
Full textAl, Beshri Aiiad Ahmad M. "Outsourcing data storage without outsourcing trust in cloud computing." Thesis, Queensland University of Technology, 2013. https://eprints.qut.edu.au/61738/1/Aiiad_Ahmad_M._Al_Beshri_Thesis.pdf.
Full textBilbray, Kyle. "DSFS: a data storage facilitating service for maximizing security, availability, performance, and customizability." Thesis, Georgia Institute of Technology, 2014. http://hdl.handle.net/1853/52984.
Full textGonçalves, André Miguel Augusto. "Estimating data divergence in cloud computing storage systems." Master's thesis, Faculdade de Ciências e Tecnologia, 2013. http://hdl.handle.net/10362/10852.
Full textMany internet services are provided through cloud computing infrastructures that are composed of multiple data centers. To provide high availability and low latency, data is replicated in machines in different data centers, which introduces the complexity of guaranteeing that clients view data consistently. Data stores often opt for a relaxed approach to replication, guaranteeing only eventual consistency, since it improves latency of operations. However, this may lead to replicas having different values for the same data. One solution to control the divergence of data in eventually consistent systems is the usage of metrics that measure how stale data is for a replica. In the past, several algorithms have been proposed to estimate the value of these metrics in a deterministic way. An alternative solution is to rely on probabilistic metrics that estimate divergence with a certain degree of certainty. This relaxes the need to contact all replicas while still providing a relatively accurate measurement. In this work we designed and implemented a solution to estimate the divergence of data in eventually consistent data stores, that scale to many replicas by allowing clientside caching. Measuring the divergence when there is a large number of clients calls for the development of new algorithms that provide probabilistic guarantees. Additionally, unlike previous works, we intend to focus on measuring the divergence relative to a state that can lead to the violation of application invariants.
Partially funded by project PTDC/EIA EIA/108963/2008 and by an ERC Starting Grant, Agreement Number 307732
Kaaniche, Nesrine. "Cloud data storage security based on cryptographic mechanisms." Thesis, Evry, Institut national des télécommunications, 2014. http://www.theses.fr/2014TELE0033/document.
Full textRecent technological advances have given rise to the popularity and success of cloud. This new paradigm is gaining an expanding interest, since it provides cost efficient architectures that support the transmission, storage, and intensive computing of data. However, these promising storage services bring many challenging design issues, considerably due to the loss of data control. These challenges, namely data confidentiality and data integrity, have significant influence on the security and performances of the cloud system. This thesis aims at overcoming this trade-off, while considering two data security concerns. On one hand, we focus on data confidentiality preservation which becomes more complex with flexible data sharing among a dynamic group of users. It requires the secrecy of outsourced data and an efficient sharing of decrypting keys between different authorized users. For this purpose, we, first, proposed a new method relying on the use of ID-Based Cryptography (IBC), where each client acts as a Private Key Generator (PKG). That is, he generates his own public elements and derives his corresponding private key using a secret. Thanks to IBC properties, this contribution is shown to support data privacy and confidentiality, and to be resistant to unauthorized access to data during the sharing process, while considering two realistic threat models, namely an honest but curious server and a malicious user adversary. Second, we define CloudaSec, a public key based solution, which proposes the separation of subscription-based key management and confidentiality-oriented asymmetric encryption policies. That is, CloudaSec enables flexible and scalable deployment of the solution as well as strong security guarantees for outsourced data in cloud servers. Experimental results, under OpenStack Swift, have proven the efficiency of CloudaSec in scalable data sharing, while considering the impact of the cryptographic operations at the client side. On the other hand, we address the Proof of Data Possession (PDP) concern. In fact, the cloud customer should have an efficient way to perform periodical remote integrity verifications, without keeping the data locally, following three substantial aspects : security level, public verifiability, and performance. This concern is magnified by the client’s constrained storage and computation capabilities and the large size of outsourced data. In order to fulfill this security requirement, we first define a new zero-knowledge PDP proto- col that provides deterministic integrity verification guarantees, relying on the uniqueness of the Euclidean Division. These guarantees are considered as interesting, compared to several proposed schemes, presenting probabilistic approaches. Then, we propose SHoPS, a Set-Homomorphic Proof of Data Possession scheme, supporting the 3 levels of data verification. SHoPS enables the cloud client not only to obtain a proof of possession from the remote server, but also to verify that a given data file is distributed across multiple storage devices to achieve a certain desired level of fault tolerance. Indeed, we present the set homomorphism property, which extends malleability to set operations properties, such as union, intersection and inclusion. SHoPS presents high security level and low processing complexity. For instance, SHoPS saves energy within the cloud provider by distributing the computation over multiple nodes. Each node provides proofs of local data block sets. This is to make applicable, a resulting proof over sets of data blocks, satisfying several needs, such as, proofs aggregation
Kaaniche, Nesrine. "Cloud data storage security based on cryptographic mechanisms." Electronic Thesis or Diss., Evry, Institut national des télécommunications, 2014. http://www.theses.fr/2014TELE0033.
Full textRecent technological advances have given rise to the popularity and success of cloud. This new paradigm is gaining an expanding interest, since it provides cost efficient architectures that support the transmission, storage, and intensive computing of data. However, these promising storage services bring many challenging design issues, considerably due to the loss of data control. These challenges, namely data confidentiality and data integrity, have significant influence on the security and performances of the cloud system. This thesis aims at overcoming this trade-off, while considering two data security concerns. On one hand, we focus on data confidentiality preservation which becomes more complex with flexible data sharing among a dynamic group of users. It requires the secrecy of outsourced data and an efficient sharing of decrypting keys between different authorized users. For this purpose, we, first, proposed a new method relying on the use of ID-Based Cryptography (IBC), where each client acts as a Private Key Generator (PKG). That is, he generates his own public elements and derives his corresponding private key using a secret. Thanks to IBC properties, this contribution is shown to support data privacy and confidentiality, and to be resistant to unauthorized access to data during the sharing process, while considering two realistic threat models, namely an honest but curious server and a malicious user adversary. Second, we define CloudaSec, a public key based solution, which proposes the separation of subscription-based key management and confidentiality-oriented asymmetric encryption policies. That is, CloudaSec enables flexible and scalable deployment of the solution as well as strong security guarantees for outsourced data in cloud servers. Experimental results, under OpenStack Swift, have proven the efficiency of CloudaSec in scalable data sharing, while considering the impact of the cryptographic operations at the client side. On the other hand, we address the Proof of Data Possession (PDP) concern. In fact, the cloud customer should have an efficient way to perform periodical remote integrity verifications, without keeping the data locally, following three substantial aspects : security level, public verifiability, and performance. This concern is magnified by the client’s constrained storage and computation capabilities and the large size of outsourced data. In order to fulfill this security requirement, we first define a new zero-knowledge PDP proto- col that provides deterministic integrity verification guarantees, relying on the uniqueness of the Euclidean Division. These guarantees are considered as interesting, compared to several proposed schemes, presenting probabilistic approaches. Then, we propose SHoPS, a Set-Homomorphic Proof of Data Possession scheme, supporting the 3 levels of data verification. SHoPS enables the cloud client not only to obtain a proof of possession from the remote server, but also to verify that a given data file is distributed across multiple storage devices to achieve a certain desired level of fault tolerance. Indeed, we present the set homomorphism property, which extends malleability to set operations properties, such as union, intersection and inclusion. SHoPS presents high security level and low processing complexity. For instance, SHoPS saves energy within the cloud provider by distributing the computation over multiple nodes. Each node provides proofs of local data block sets. This is to make applicable, a resulting proof over sets of data blocks, satisfying several needs, such as, proofs aggregation
Noman, Ali. "Addressing the Data Location Assurance Problem of Cloud Storage Environments." Thesis, Université d'Ottawa / University of Ottawa, 2018. http://hdl.handle.net/10393/37375.
Full textArteaga, Clavijo Dulcardo Ariel. "Flash Caching for Cloud Computing Systems." FIU Digital Commons, 2016. http://digitalcommons.fiu.edu/etd/2496.
Full textYahya, Farashazillah. "A security framework to protect data in cloud storage." Thesis, University of Southampton, 2017. https://eprints.soton.ac.uk/415861/.
Full textKagalkar, Ramesh [Verfasser]. "PRIVACY PRESERVING PUBLIC AUDITING AND DATA INTEGRITY USING TPA FOR SECURE CLOUD STORAGE : SECURE CLOUD STORAGE / Ramesh Kagalkar." Hamburg : Anchor Academic Publishing, 2015. http://d-nb.info/1110039662/34.
Full textWu, Justin Chun. "Peering Through the Cloud—Investigating the Perceptions and Behaviors of Cloud Storage Users." BYU ScholarsArchive, 2016. https://scholarsarchive.byu.edu/etd/6175.
Full textBuop, George Onyango. "Data storage security for cloud computing using elliptic curve cryptography." Master's thesis, University of Cape Town, 2020. http://hdl.handle.net/11427/32489.
Full textVasilopoulos, Dimitrios. "Reconciling cloud storage functionalities with security : proofs of storage with data reliability and secure deduplication." Electronic Thesis or Diss., Sorbonne université, 2019. http://www.theses.fr/2019SORUS399.
Full textIn this thesis we study in depth the problem of verifiability in cloud storage systems. We study Proofs of Storage -a family of cryptographic protocols that enable a cloud storage provider to prove to a user that the integrity of her data has not been compromised- and we identify their limitations with respect to two key characteristics of cloud storage systems, namely, reliable data storage with automatic maintenance and data deduplication. To cope with the first characteristic, we introduce the notion of Proofs of Data Reliability, a comprehensive verification scheme that aims to resolve the conflict between reliable data storage verification and automatic maintenance. We further propose two Proofs of Data Reliability schemes, namely POROS and PORTOS, that succeed in verifying reliable data storage and, at the same time, enable the cloud storage provider to autonomously perform automatic maintenance operations. As regards to the second characteristic, we address the conflict between Proofs of Storage and deduplication. More precisely, inspired by previous attempts in solving the problem of deduplicating encrypted data, we propose message-locked PoR, a solution that combines Proofs of Storage with deduplication. In addition, we propose a novel message-locked key generation protocol which is more resilient against off-line dictionary attacks compared to existing solutions
Zhang, Gong. "Data and application migration in cloud based data centers --architectures and techniques." Diss., Georgia Institute of Technology, 2011. http://hdl.handle.net/1853/41078.
Full textOduyiga, Adeshola Oyesanya. "Security in Cloud Storage : A Suitable Security Algorithm for Data Protection." Thesis, Mittuniversitetet, Avdelningen för informationssystem och -teknologi, 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:miun:diva-34428.
Full textNavarro, Martín Joan. "From cluster databases to cloud storage: Providing transactional support on the cloud." Doctoral thesis, Universitat Ramon Llull, 2015. http://hdl.handle.net/10803/285655.
Full textDurante las últimas tres décadas, las limitaciones tecnológicas (por ejemplo la capacidad de los dispositivos de almacenamiento o el ancho de banda de las redes de comunicación) y las crecientes demandas de los usuarios (estructuras de información, volúmenes de datos) han conducido la evolución de las bases de datos distribuidas. Desde los primeros repositorios de datos para archivos planos que se desarrollaron en la década de los ochenta, se han producido importantes avances en los algoritmos de control de concurrencia, protocolos de replicación y en la gestión de transacciones. Sin embargo, los retos modernos de almacenamiento de datos que plantean el Big Data y el cloud computing—orientados a mejorar la limitaciones en cuanto a escalabilidad y elasticidad de las bases de datos estáticas—están empujando a los profesionales a relajar algunas propiedades importantes de los sistemas transaccionales clásicos, lo que excluye a varias aplicaciones las cuales no pueden encajar en esta estrategia debido a su alta dependencia transaccional. El propósito de esta tesis es abordar dos retos importantes todavía latentes en el campo de las bases de datos distribuidas: (1) las limitaciones en cuanto a escalabilidad de los sistemas transaccionales y (2) el soporte transaccional en repositorios de almacenamiento en la nube. Analizar las técnicas tradicionales de control de concurrencia y de replicación, utilizadas por las bases de datos clásicas para soportar transacciones, es fundamental para identificar las razones que hacen que estos sistemas degraden su rendimiento cuando el número de nodos y/o cantidad de datos crece. Además, este análisis está orientado a justificar el diseño de los repositorios en la nube que deliberadamente han dejado de lado el soporte transaccional. Efectivamente, acercar el paradigma del almacenamiento en la nube a las aplicaciones que tienen una fuerte dependencia en las transacciones es crucial para su adaptación a los requerimientos actuales en cuanto a volúmenes de datos y modelos de negocio. Esta tesis empieza con la propuesta de un simulador de protocolos para bases de datos distribuidas estáticas, el cual sirve como base para la revisión y comparativa de rendimiento de los protocolos de control de concurrencia y las técnicas de replicación existentes. En cuanto a la escalabilidad de las bases de datos y las transacciones, se estudian los efectos que tiene ejecutar distintos perfiles de transacción bajo diferentes condiciones. Este análisis continua con una revisión de los repositorios de almacenamiento en la nube existentes—que prometen encajar en entornos dinámicos que requieren alta escalabilidad y disponibilidad—, el cual permite evaluar los parámetros y características que estos sistemas han sacrificado con el fin de cumplir las necesidades actuales en cuanto a almacenamiento de datos a gran escala. Para explorar las posibilidades que ofrece el paradigma del cloud computing en un escenario real, se presenta el desarrollo de una arquitectura de almacenamiento de datos inspirada en el cloud computing para almacenar la información generada en las Smart Grids. Concretamente, se combinan las técnicas de replicación en bases de datos transaccionales y la propagación epidémica con los principios de diseño usados para construir los repositorios de datos en la nube. Las lecciones recogidas en el estudio de los protocolos de replicación y control de concurrencia en el simulador de base de datos, junto con las experiencias derivadas del desarrollo del repositorio de datos para las Smart Grids, desembocan en lo que hemos acuñado como Epidemia: una infraestructura de almacenamiento para Big Data concebida para proporcionar soporte transaccional en la nube. Además de heredar los beneficios de los repositorios en la nube altamente en cuanto a escalabilidad, Epidemia incluye una capa de gestión de transacciones que reenvía las transacciones de los clientes a un conjunto jerárquico de particiones de datos, lo que permite al sistema ofrecer distintos niveles de consistencia y adaptar elásticamente su configuración a nuevas demandas cargas de trabajo. Por último, los resultados experimentales ponen de manifiesto la viabilidad de nuestra contribución y alientan a los profesionales a continuar trabajando en esta área.
Over the past three decades, technology constraints (e.g., capacity of storage devices, communication networks bandwidth) and an ever-increasing set of user demands (e.g., information structures, data volumes) have driven the evolution of distributed databases. Since flat-file data repositories developed in the early eighties, there have been important advances in concurrency control algorithms, replication protocols, and transactions management. However, modern concerns in data storage posed by Big Data and cloud computing—related to overcome the scalability and elasticity limitations of classic databases—are pushing practitioners to relax some important properties featured by transactions, which excludes several applications that are unable to fit in this strategy due to their intrinsic transactional nature. The purpose of this thesis is to address two important challenges still latent in distributed databases: (1) the scalability limitations of transactional databases and (2) providing transactional support on cloud-based storage repositories. Analyzing the traditional concurrency control and replication techniques, used by classic databases to support transactions, is critical to identify the reasons that make these systems degrade their throughput when the number of nodes and/or amount of data rockets. Besides, this analysis is devoted to justify the design rationale behind cloud repositories in which transactions have been generally neglected. Furthermore, enabling applications which are strongly dependent on transactions to take advantage of the cloud storage paradigm is crucial for their adaptation to current data demands and business models. This dissertation starts by proposing a custom protocol simulator for static distributed databases, which serves as a basis for revising and comparing the performance of existing concurrency control protocols and replication techniques. As this thesis is especially concerned with transactions, the effects on the database scalability of different transaction profiles under different conditions are studied. This analysis is followed by a review of existing cloud storage repositories—that claim to be highly dynamic, scalable, and available—, which leads to an evaluation of the parameters and features that these systems have sacrificed in order to meet current large-scale data storage demands. To further explore the possibilities of the cloud computing paradigm in a real-world scenario, a cloud-inspired approach to store data from Smart Grids is presented. More specifically, the proposed architecture combines classic database replication techniques and epidemic updates propagation with the design principles of cloud-based storage. The key insights collected when prototyping the replication and concurrency control protocols at the database simulator, together with the experiences derived from building a large-scale storage repository for Smart Grids, are wrapped up into what we have coined as Epidemia: a storage infrastructure conceived to provide transactional support on the cloud. In addition to inheriting the benefits of highly-scalable cloud repositories, Epidemia includes a transaction management layer that forwards client transactions to a hierarchical set of data partitions, which allows the system to offer different consistency levels and elastically adapt its configuration to incoming workloads. Finally, experimental results highlight the feasibility of our contribution and encourage practitioners to further research in this area.
Gureya, Daharewa David. "Self-trained Proactive Elasticity Manager for Cloud-based Storage Services." Thesis, KTH, Skolan för informations- och kommunikationsteknik (ICT), 2015. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-187353.
Full textSyckor, Jens. "Dropbox & Co, alles schon ge-cloud?" Saechsische Landesbibliothek- Staats- und Universitaetsbibliothek Dresden, 2014. http://nbn-resolving.de/urn:nbn:de:bsz:14-qucosa-153998.
Full textYu, Shucheng. "Data Sharing on Untrusted Storage with Attribute-Based Encryption." Digital WPI, 2010. https://digitalcommons.wpi.edu/etd-dissertations/321.
Full textMeinel, Christoph, Maxim Schnjakin, Tobias Metzke, and Markus Freitag. "Anbieter von Cloud Speicherdiensten im Überblick." Universität Potsdam, 2014. http://opus.kobv.de/ubp/volltexte/2014/6878/.
Full textDue to the ever-increasing flood of digital information, more and more applications make use of cost-effective cloud storage services. The number of vendors that provide these services has increased significantly in recent years. The identification of an appropriate service provider requires an individual consideration of several criteria. This survey presents a comparison of some established basic storage providers. For this comparison, several criteria are extracted that are applicable to any of the selected providers and thus allow for an assessment that is as objective as possible. The criteria include factors like costs, legal information, security, performance, and supported interfaces. The presented criteria can be used to evaluate cloud storage providers in a specific use case in order to identify the most suitable service based on individual requirements.
Barreto, Andres E. "API-Based Acquisition of Evidence from Cloud Storage Providers." ScholarWorks@UNO, 2015. http://scholarworks.uno.edu/td/2030.
Full textRodrigues, João Miguel Cardia Melro. "TSKY: a dependable middleware solution for data privacy using public storage clouds." Master's thesis, Faculdade de Ciências e Tecnologia, 2013. http://hdl.handle.net/10362/11071.
Full textThis dissertation aims to take advantage of the virtues offered by data storage cloud based systems on the Internet, proposing a solution that avoids security issues by combining different providers’ solutions in a vision of a cloud-of-clouds storage and computing. The solution, TSKY System (or Trusted Sky), is implemented as a middleware system, featuring a set of components designed to establish and to enhance conditions for security, privacy, reliability and availability of data, with these conditions being secured and verifiable by the end-user, independently of each provider. These components, implement cryptographic tools, including threshold and homomorphic cryptographic schemes, combined with encryption, replication, and dynamic indexing mecha-nisms. The solution allows data management and distribution functions over data kept in different storage clouds, not necessarily trusted, improving and ensuring resilience and security guarantees against Byzantine faults and at-tacks. The generic approach of the TSKY system model and its implemented services are evaluated in the context of a Trusted Email Repository System (TSKY-TMS System). The TSKY-TMS system is a prototype that uses the base TSKY middleware services to store mailboxes and email Messages in a cloud-of-clouds.
Martello, Rosanna. "Cloud storage and processing of automotive Lithium-ion batteries data for RUL prediction." Master's thesis, Alma Mater Studiorum - Università di Bologna, 2021.
Find full textCao, Ning. "Secure and Reliable Data Outsourcing in Cloud Computing." Digital WPI, 2012. https://digitalcommons.wpi.edu/etd-dissertations/333.
Full textMahmud, A. S. M. Hasan. "Sustainable Resource Management for Cloud Data Centers." FIU Digital Commons, 2016. http://digitalcommons.fiu.edu/etd/2634.
Full textSENIGAGLIESI, LINDA. "Information-theoretic security techniques for data communications and storage." Doctoral thesis, Università Politecnica delle Marche, 2019. http://hdl.handle.net/11566/263165.
Full textThe last years have seen a growing need of security and privacy in many aspects of communications, together with the technological progress. Most of the implemented security solutions are based on the notion of computational security, and must be kept continuously updated to face new attacks and technology advancements. To meet the more and more strict requirements, solutions based on the information-theoretic paradigm are gaining interest to support pure cryptographic techniques, thanks to their capacity to achieve security independently on the attacker’s computing resources, also known as unconditional security. In this work we investigate how information-theoretic security can be applied to practical systems in order to ensure data security and privacy. We first start defining information-theoretic metrics to assess the secrecy performance of realistic wireless communication settings under practical conditions, together with a protocol that mixes coding techniques for physical layer security and cryptographic solutions. This scheme is able to achieve some level of semantic security at the presence of a passive attacker. At the same time, multiple scenarios are considered. We provide a security analysis for parallel relay channels, thus finding an optimal resource allocation that maximizes the secrecy rate. Successively, by exploiting a probabilistic model checker, we define the parameters for heterogeneous distributed storage systems that permit us to achieve perfect secrecy in practical conditions. For privacy purposes, we propose a scheme which guarantees private information retrieval of files for caching at the wireless edge against multiple spy nodes. We find the optimal content placement that minimizes the backhaul usage, thus reducing the communication cost of the system.
Brandenburger, Marcus Verfasser], Rüdiger [Akademischer Betreuer] [Kapitza, and Christian [Akademischer Betreuer] Cachin. "Securing Data Integrity from Cloud Storage to Blockchains / Marcus Brandenburger ; Rüdiger Kapitza, Christian Cachin." Braunschweig : Technische Universität Braunschweig, 2021. http://d-nb.info/1225863465/34.
Full textIkken, Sonia. "Efficient placement design and storage cost saving for big data workflow in cloud datacenters." Thesis, Evry, Institut national des télécommunications, 2017. http://www.theses.fr/2017TELE0020/document.
Full textThe typical cloud big data systems are the workflow-based including MapReduce which has emerged as the paradigm of choice for developing large scale data intensive applications. Data generated by such systems are huge, valuable and stored at multiple geographical locations for reuse. Indeed, workflow systems, composed of jobs using collaborative task-based models, present new dependency and intermediate data exchange needs. This gives rise to new issues when selecting distributed data and storage resources so that the execution of tasks or job is on time, and resource usage-cost-efficient. Furthermore, the performance of the tasks processing is governed by the efficiency of the intermediate data management. In this thesis we tackle the problem of intermediate data management in cloud multi-datacenters by considering the requirements of the workflow applications generating them. For this aim, we design and develop models and algorithms for big data placement problem in the underlying geo-distributed cloud infrastructure so that the data management cost of these applications is minimized. The first addressed problem is the study of the intermediate data access behavior of tasks running in MapReduce-Hadoop cluster. Our approach develops and explores Markov model that uses spatial locality of intermediate data blocks and analyzes spill file sequentiality through a prediction algorithm. Secondly, this thesis deals with storage cost minimization of intermediate data placement in federated cloud storage. Through a federation mechanism, we propose an exact ILP algorithm to assist multiple cloud datacenters hosting the generated intermediate data dependencies of pair of files. The proposed algorithm takes into account scientific user requirements, data dependency and data size. Finally, a more generic problem is addressed in this thesis that involve two variants of the placement problem: splittable and unsplittable intermediate data dependencies. The main goal is to minimize the operational data cost according to inter and intra-job dependencies
Ikken, Sonia. "Efficient placement design and storage cost saving for big data workflow in cloud datacenters." Electronic Thesis or Diss., Evry, Institut national des télécommunications, 2017. http://www.theses.fr/2017TELE0020.
Full textThe typical cloud big data systems are the workflow-based including MapReduce which has emerged as the paradigm of choice for developing large scale data intensive applications. Data generated by such systems are huge, valuable and stored at multiple geographical locations for reuse. Indeed, workflow systems, composed of jobs using collaborative task-based models, present new dependency and intermediate data exchange needs. This gives rise to new issues when selecting distributed data and storage resources so that the execution of tasks or job is on time, and resource usage-cost-efficient. Furthermore, the performance of the tasks processing is governed by the efficiency of the intermediate data management. In this thesis we tackle the problem of intermediate data management in cloud multi-datacenters by considering the requirements of the workflow applications generating them. For this aim, we design and develop models and algorithms for big data placement problem in the underlying geo-distributed cloud infrastructure so that the data management cost of these applications is minimized. The first addressed problem is the study of the intermediate data access behavior of tasks running in MapReduce-Hadoop cluster. Our approach develops and explores Markov model that uses spatial locality of intermediate data blocks and analyzes spill file sequentiality through a prediction algorithm. Secondly, this thesis deals with storage cost minimization of intermediate data placement in federated cloud storage. Through a federation mechanism, we propose an exact ILP algorithm to assist multiple cloud datacenters hosting the generated intermediate data dependencies of pair of files. The proposed algorithm takes into account scientific user requirements, data dependency and data size. Finally, a more generic problem is addressed in this thesis that involve two variants of the placement problem: splittable and unsplittable intermediate data dependencies. The main goal is to minimize the operational data cost according to inter and intra-job dependencies
Bäcklund, Simon, and Albin Ljungdahl. "Data storage for a small lumberprocessing company in Sweden." Thesis, Linnéuniversitetet, Institutionen för datavetenskap och medieteknik (DM), 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:lnu:diva-105088.
Full textCheng, Yue. "Workload-aware Efficient Storage Systems." Diss., Virginia Tech, 2017. http://hdl.handle.net/10919/78677.
Full textPh. D.
Carpen-Amarie, Alexandra. "BlobSeer as a data-storage facility for clouds : self-Adaptation, integration, evaluation." Thesis, Cachan, Ecole normale supérieure, 2011. http://www.theses.fr/2011DENS0066/document.
Full textThe emergence of Cloud computing brings forward many challenges that may limit the adoption rate of the Cloud paradigm. As data volumes processed by Cloud applications increase exponentially, designing efficient and secure solutions for data management emerges as a crucial requirement. The goal of this thesis is to enhance a distributed data-management system with self-management capabilities, so that it can meet the requirements of the Cloud storage services in terms of scalability, data availability, reliability and security. Furthermore, we aim at building a Cloud data service both compatible with state-of-the-art Cloud interfaces and able to deliver high-throughput data storage. To meet these goals, we proposed generic self-awareness, self-protection and self-configuration components targeted at distributed data-management systems. We validated them on top of BlobSeer, a large-scale data-management system designed to optimize highly-concurrent data accesses. Next, we devised and implemented a BlobSeer-based file system optimized to efficiently serve as a storage backend for Cloud services. We then integrated it within a real-world Cloud environment, the Nimbus platform. The benefits and drawbacks of using Cloud storage for real-life applications have been emphasized in evaluations that involved data-intensive MapReduce applications and tightly-coupled, high-performance computing applications
Chihoub, Houssem Eddine. "Managing consistency for big data applications : tradeoffs and self-adaptiveness." Thesis, Cachan, Ecole normale supérieure, 2013. http://www.theses.fr/2013DENS0059/document.
Full textIn the era of Big Data, data-intensive applications handle extremely large volumes of data while requiring fast processing times. A large number of such applications run in the cloud in order to benefit from cloud elasticity, easy on-demand deployments, and cost-efficient Pays-As-You-Go usage. In this context, replication is an essential feature in the cloud in order to deal with Big Data challenges. Therefore, replication therefore, enables high availability through multiple replicas, fast data access to local replicas, fault tolerance, and disaster recovery. However, replication introduces the major issue of data consistency across different copies. Consistency management is a critical for Big Data systems. Strong consistency models introduce serious limitations to systems scalability and performance due to the required synchronization efforts. In contrast, weak and eventual consistency models reduce the performance overhead and enable high levels of availability. However, these models may tolerate, under certain scenarios, too much temporal inconsistency. In this Ph.D thesis, we address this issue of consistency tradeoffs in large-scale Big Data systems and applications. We first, focus on consistency management at the storage system level. Accordingly, we propose an automated self-adaptive model (named Harmony) that scale up/down the consistency level at runtime when needed in order to provide as high performance as possible while preserving the application consistency requirements. In addition, we present a thorough study of consistency management impact on the monetary cost of running in the cloud. Hereafter, we leverage this study in order to propose a cost efficient consistency tuning (named Bismar) in the cloud. In a third direction, we study the consistency management impact on energy consumption within the data center. According to our findings, we investigate adaptive configurations of the storage system cluster that target energy saving. In order to complete our system-side study, we focus on the application level. Applications are different and so are their consistency requirements. Understanding such requirements at the storage system level is not possible. Therefore, we propose an application behavior modeling that apprehend the consistency requirements of an application. Based on the model, we propose an online prediction approach- named Chameleon that adapts to the application specific needs and provides customized consistency
Berg, Markus. "Privat molnlagring i arbetet : En fallstudie om hur ett IT-företaghanterar att anställda använder privatmolnlagring för arbetsrelateradinformation." Thesis, Högskolan i Skövde, Institutionen för informationsteknologi, 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:his:diva-15655.
Full textIn this work, a case study was made on one company that is working with IT. The study was made to see how an IT company mitigates and handles the problem that employees use their private cloud storage to store work related information. A survey was sent to three different departments at the company to examine what employees think about private cloud storage and work related information and how and if it differs between different departments. One interview was also made with one informant from the same company to compare with the results from the survey. A second interview with another company was also made to validate the result from the first interview.The results from the survey showed that employees do store or have stored work related information in their private cloud storage, despite that the company policies forbid it. One of the departments that the survey was sent to was the information security team. It can be assumed that employees that work with information security have better understanding about security regarding information and how it should be stored. Despite that did the results show that the information security team was the department that used private cloud storage the most, compared to the other departments. The results from the interviews shows that the companies at least have policies on how and where the employees are allowed to store work related information. One company in particular works really hard and continuously to educate and enlighten its employees to take great care when dealing with work related information.
Vijayakumar, Sruthi. "Hadoop Based Data Intensive Computation on IAAS Cloud Platforms." UNF Digital Commons, 2015. http://digitalcommons.unf.edu/etd/567.
Full textNetshiongolwe, Mpho. "Investigating perceptions of reliability, efficiency and feasibility of data storage technology: A case study of cloud storage adoption at UCT Faculty of Science." Master's thesis, Faculty of Humanities, 2019. http://hdl.handle.net/11427/30935.
Full textRuty, Guillaume. "Towards more scalability and flexibility for distributed storage systems." Thesis, Université Paris-Saclay (ComUE), 2019. http://www.theses.fr/2019SACLT006/document.
Full textThe exponentially growing demand for storage puts a huge stress on traditionnal distributed storage systems. While storage devices' performance have caught up with network devices in the last decade, their capacity do not grow as fast as the rate of data growth, especially with the rise of cloud big data applications. Furthermore, the performance balance between storage, network and compute devices has shifted and the assumptions that are the foundation for most distributed storage systems are not true anymore. This dissertation explains how several aspects of such storage systems can be modified and rethought to make a more efficient use of the resource at their disposal. It presents an original architecture that uses a distributed layer of metadata to provide flexible and scalable object-level storage, then proposes a scheduling algorithm improving how a generic storage system handles concurrent requests. Finally, it describes how to improve legacy filesystem-level caching for erasure-code-based distributed storage systems, before presenting a few other contributions made in the context of short research projects
Chiossi, Luca. "High-Performance Persistent Caching in Multi- and Hybrid- Cloud Environments." Master's thesis, Alma Mater Studiorum - Università di Bologna, 2020. http://amslaurea.unibo.it/20089/.
Full textWiren, Jakob. "Data Storage Cost Optimization Based on Electricity Price Forecasting with Machine Learning in a Multi-Geographical Cloud Environment." Thesis, Linköpings universitet, Kommunikations- och transportsystem, 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-152250.
Full textCeccarelli, Viviana. "Data Logger: cognitive analysis." Master's thesis, Alma Mater Studiorum - Università di Bologna, 2021.
Find full textNicolae, Bogdan. "BlobSeer : towards efficient data storage management for large-scale, distributed systems." Phd thesis, Université Rennes 1, 2010. http://tel.archives-ouvertes.fr/tel-00552271.
Full textNicolae, Bogdan. "BlobSeer : towards efficient data storage management for large-scale, distributed systems." Phd thesis, Rennes 1, 2010. http://www.theses.fr/2010REN1S123.
Full textAvec des volumes de données en forte augmentation et l'émergence d'infrastructures avec un fort passage à l'échelle (cloud computing, calcul petascale), une gestion distribuée des données devient un problème crucial qui présente plusieurs challenges. Cette thèse apporte plusieurs contributions afin de résoudre ces challenges. Premièrement, elle propose un ensemble de principes permettant de concevoir des systèmes de stockage distribués passant largement à l'échelle et optimisés pour des accès sous forte concurrence. En particulier, ellemet en évidence les avantages liés à l'utilisation du versionnement dans ce contexte. Ensuite, fondé sur ces principes, elle introduit un ensemble d'algorithmes de gestion distribuée de données et méta-données qui permettent un débit d'accès important sous haute concurrence. Enfin, elle montre comment mettre en oeuvre de façon efficace ces algorithmes, en résolvant des problèmes pratiques. Ces résultats sont utilisés pour construire BlobSeer, un prototype expérimental utilisé pour démontrer les avantages théoriques de cette approche dans des tests synthétiques, ansi que ses avantages pratiques dans des applications réelles: système de stockage pour applications MapReduce, système de stockage pour déployer et sauvegarder des images de machines virtuelles dans les clouds, et service de stockage de données pour des applications sur les clouds. Des analyses approfondies sur la plate-forme d'expérimentation Grid'5000 démontrent que BlobSeer passe largement à l'échelle et soutient un débit d'accès important même sous haute concurrence, dépassant d'une large marge plusieurs approches de l'état de l'art courant
Sharma, Sagar. "Towards Data and Model Confidentiality in Outsourced Machine Learning." Wright State University / OhioLINK, 2019. http://rave.ohiolink.edu/etdc/view?acc_num=wright1567529092809275.
Full textFramner, Erik. "A Configuration User Interface for Multi-Cloud Storage Based on Secret Sharing : An Exploratory Design Study." Thesis, Karlstads universitet, Handelshögskolan (from 2013), 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:kau:diva-71354.
Full textPRISMACLOUD
Khan, Syeduzzaman. "A PROBABILISTIC MACHINE LEARNING FRAMEWORK FOR CLOUD RESOURCE SELECTION ON THE CLOUD." Scholarly Commons, 2020. https://scholarlycommons.pacific.edu/uop_etds/3720.
Full textHabtu, Simon. "Indexing file metadata using a distributed search engine for searching files on a public cloud storage." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-232064.
Full textVisma Labs AB eller Visma ville genomföra experiment för att se om filmetadata skulle kunna indexeras för att söka efter filer på ett publikt moln. Med tanke på att lagring av filer på ett publikt moln är billigare än den nuvarande lagringslösningen, kan implementeringen spara Visma pengar som spenderas på dyra lagringskostnader. Denna studie är därför till för att hitta och utvärdera ett tillvägagångssätt valt för att indexera filmetadata och söka filer på ett offentligt molnlagring med den utvalda distribuerade sökmotorn Elasticsearch. Arkitekturen för den föreslagna lösningen har likenelser av en filtjänst och implementerades med flera containeriserade tjänster för att den ska fungera. Resultaten visar att filservicelösningen verkligen är möjlig men skulle behöva ytterligare modifikationer och fler resurser att fungera enligt Vismas krav.
Zampetakis, Stamatis. "Scalable algorithms for cloud-based Semantic Web data management." Thesis, Paris 11, 2015. http://www.theses.fr/2015PA112199/document.
Full textIn order to build smart systems, where machines are able to reason exactly like humans, data with semantics is a major requirement. This need led to the advent of the Semantic Web, proposing standard ways for representing and querying data with semantics. RDF is the prevalent data model used to describe web resources, and SPARQL is the query language that allows expressing queries over RDF data. Being able to store and query data with semantics triggered the development of many RDF data management systems. The rapid evolution of the Semantic Web provoked the shift from centralized data management systems to distributed ones. The first systems to appear relied on P2P and client-server architectures, while recently the focus moved to cloud computing.Cloud computing environments have strongly impacted research and development in distributed software platforms. Cloud providers offer distributed, shared-nothing infrastructures that may be used for data storage and processing. The main features of cloud computing involve scalability, fault-tolerance, and elastic allocation of computing and storage resources following the needs of the users.This thesis investigates the design and implementation of scalable algorithms and systems for cloud-based Semantic Web data management. In particular, we study the performance and cost of exploiting commercial cloud infrastructures to build Semantic Web data repositories, and the optimization of SPARQL queries for massively parallel frameworks.First, we introduce the basic concepts around Semantic Web and the main components and frameworks interacting in massively parallel cloud-based systems. In addition, we provide an extended overview of existing RDF data management systems in the centralized and distributed settings, emphasizing on the critical concepts of storage, indexing, query optimization, and infrastructure. Second, we present AMADA, an architecture for RDF data management using public cloud infrastructures. We follow the Software as a Service (SaaS) model, where the complete platform is running in the cloud and appropriate APIs are provided to the end-users for storing and retrieving RDF data. We explore various storage and querying strategies revealing pros and cons with respect to performance and also to monetary cost, which is a important new dimension to consider in public cloud services. Finally, we present CliqueSquare, a distributed RDF data management system built on top of Hadoop, incorporating a novel optimization algorithm that is able to produce massively parallel plans for SPARQL queries. We present a family of optimization algorithms, relying on n-ary (star) equality joins to build flat plans, and compare their ability to find the flattest possibles. Inspired by existing partitioning and indexing techniques we present a generic storage strategy suitable for storing RDF data in HDFS (Hadoop’s Distributed File System). Our experimental results validate the efficiency and effectiveness of the optimization algorithm demonstrating also the overall performance of the system
Trebulová, Debora. "Zálohování dat a datová úložiště." Master's thesis, Vysoké učení technické v Brně. Fakulta podnikatelská, 2017. http://www.nusl.cz/ntk/nusl-318599.
Full text