Log in

Relevant bibliographies by topics / Cloud data storage / Dissertations / Theses

To see the other types of publications on this topic, follow the link: Cloud data storage.

Dissertations / Theses on the topic 'Cloud data storage'

Author: Grafiati

Published: 4 June 2021

Last updated: 25 May 2024

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 dissertations / theses for your research on the topic 'Cloud data storage.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.

1

Avlasovych, V. V. "Cloud data Storage." Thesis, Sumy State University, 2016. http://essuir.sumdu.edu.ua/handle/123456789/46877.

Full text

Abstract:

Today, the Internet gives us a lot of opportunities. One of them is a cloud data storage. Cloud Storage is a model of the data warehouse, which is located in the network on a large number of servers and provides non-stop customer access to their data from anywhere and any device with Internet access.

APA, Harvard, Vancouver, ISO, and other styles

2

Cappelli, Gino. "Data cloud through google cloud storage." Bachelor's thesis, Alma Mater Studiorum - Università di Bologna, 2012. http://amslaurea.unibo.it/3032/.

Full text

Abstract:

Il Cloud Storage è un modello di conservazione dati su computer in rete, dove i dati stessi sono memorizzati su molteplici server, reali e/o virtuali, generalmente ospitati presso strutture di terze parti o su server dedicati. Tramite questo modello è possibile accedere alle informazioni personali o aziendali, siano essi video, fotografie, musica, database o file in maniera “smaterializzata”, senza conoscere l’ubicazione fisica dei dati, da qualsiasi parte del mondo, con un qualsiasi dispositivo adeguato. I vantaggi di questa metodologia sono molteplici: infinita capacita’ di spazio di memoria, pagamento solo dell’effettiva quantità di memoria utilizzata, file accessibili da qualunque parte del mondo, manutenzione estremamente ridotta e maggiore sicurezza in quanto i file sono protetti da furto, fuoco o danni che potrebbero avvenire su computer locali. Google Cloud Storage cade in questa categoria: è un servizio per sviluppatori fornito da Google che permette di salvare e manipolare dati direttamente sull’infrastruttura di Google. In maggior dettaglio, Google Cloud Storage fornisce un’interfaccia di programmazione che fa uso di semplici richieste HTTP per eseguire operazioni sulla propria infrastruttura. Esempi di operazioni ammissibili sono: upload di un file, download di un file, eliminazione di un file, ottenere la lista dei file oppure la dimensione di un dato file. Ogniuna di queste richieste HTTP incapsula l’informazione sul metodo utilizzato (il tipo di richista, come GET, PUT, ...) e un’informazione di “portata” (la risorsa su cui effettuare la richiesta). Ne segue che diventa possibile la creazione di un’applicazione che, facendo uso di queste richieste HTTP, fornisce un servizio di Cloud Storage (in cui le applicazioni salvano dati in remoto generalmene attraverso dei server di terze parti). In questa tesi, dopo aver analizzato tutti i dettagli del servizio Google Cloud Storage, è stata implementata un’applicazione, chiamata iHD, che fa uso di quest’ultimo servizio per salvare, manipolare e condividere dati in remoto (nel “cloud”). Operazioni comuni di questa applicazione permettono di condividere cartelle tra più utenti iscritti al servizio, eseguire operazioni di upload e download di file, eliminare cartelle o file ed infine creare cartelle. L’esigenza di un’appliazione di questo tipo è nata da un forte incremento, sul merato della telefonia mobile, di dispositivi con tecnologie e con funzioni sempre più legate ad Internet ed alla connettività che esso offre. La tesi presenta anche una descrizione delle fasi di progettazione e implementazione riguardanti l’applicazione iHD. Nella fase di progettazione si sono analizzati tutti i requisiti funzionali e non funzionali dell’applicazione ed infine tutti i moduli da cui è composta quest’ultima. Infine, per quanto riguarda la fase di implementazione, la tesi presenta tutte le classi ed i rispettivi metodi presenti per ogni modulo, ed in alcuni casi anche come queste classi sono state effettivamente implementate nel linguaggio di programmazione utilizzato.

APA, Harvard, Vancouver, ISO, and other styles

3

Wang, Xing. "Benchmarking Cloud Storage Systems." Thesis, Norges teknisk-naturvitenskapelige universitet, Institutt for telematikk, 2014. http://urn.kb.se/resolve?urn=urn:nbn:no:ntnu:diva-26716.

Full text

Abstract:

With the rise of cloud computing, many cloud storage systems like Dropbox, Google Drive and Mega have been built to provide decentralized and reliable file storage. It is thus of prime importance to know their features, performance, and the best way to make use of them. In this context, we introduce BenchCloud, a tool designed as part of this thesis to conveniently and efficiently benchmark any cloud storage system.First, we provide a study of six commonly-used cloud storage systems to identify different types of their features. Then existing benchmarking tools for cloud systems are presented, and the requirements, design goals and internal architecture of BenchCloud are studied. Finally, we show how to use BenchCloud to analysis cloud storage systems and take a series of experiments on Dropbox to show how BenchCloud can be used to benchmark and inspect various kinds of features of cloud storage systems.

APA, Harvard, Vancouver, ISO, and other styles

4

Al, Beshri Aiiad Ahmad M. "Outsourcing data storage without outsourcing trust in cloud computing." Thesis, Queensland University of Technology, 2013. https://eprints.qut.edu.au/61738/1/Aiiad_Ahmad_M._Al_Beshri_Thesis.pdf.

Full text

Abstract:

The main theme of this thesis is to allow the users of cloud services to outsource their data without the need to trust the cloud provider. The method is based on combining existing proof-of-storage schemes with distance-bounding protocols. Specifically, cloud customers will be able to verify the confidentiality, integrity, availability, fairness (or mutual non-repudiation), data freshness, geographic assurance and replication of their stored data directly, without having to rely on the word of the cloud provider.

APA, Harvard, Vancouver, ISO, and other styles

5

Bilbray, Kyle. "DSFS: a data storage facilitating service for maximizing security, availability, performance, and customizability." Thesis, Georgia Institute of Technology, 2014. http://hdl.handle.net/1853/52984.

Full text

Abstract:

The objective of this thesis is to study methods for the flexible and secure storage of sensitive data in an unaltered cloud. While current cloud storage providers make guarantees on the availability and security of data once it enters their domain, clients are not given any options for customization. All availability and security measures, along with any resulting performance hits, are applied to all requests, regardless of the data's sensitivity or client's wishes. In addition, once a client's data enters the cloud, it becomes vulnerable to different types of attacks. Other cloud users may access or disrupt the availability of their peers' data, and cloud providers cannot protect from themselves in the event of a malicious administrator or government directive. Current solutions use combinations of known encoding schemes and encryption techniques to provide confidentiality from peers and sometimes the cloud service provider, but its an all-or-nothing model. A client either uses the security methods of their system, or does not, regardless of whether the client's data needs more or less protection and availability. Our approach, referred to as the Data Storage Facilitating Service (DSFS), involves providing a basic set of proven protection schemes with configurable parameters that encode input data into a number of fragments and intelligently scatters them across the target cloud. A client may choose the encoding scheme most appropriate for the sensitivity of their data. If none of the supported schemes are sufficient for the client's needs or the client has their own custom encoding, DSFS can accept already encoded fragments and perform secure placement. Evaluation of our prototype service demonstrates clear trade-offs in performance between the different levels of security encoding provides, allowing clients to choose how much the importance of their data is worth. This amount of flexibility is unique to DSFS and turns it into more of a secure storage facilitator that can help clients as much or as little as required. We also see a significant effect on overhead from the service's location relative to its cloud when we compare performances of our own setup with a commercial cloud service.

APA, Harvard, Vancouver, ISO, and other styles

6

Gonçalves, André Miguel Augusto. "Estimating data divergence in cloud computing storage systems." Master's thesis, Faculdade de Ciências e Tecnologia, 2013. http://hdl.handle.net/10362/10852.

Full text

Abstract:

Dissertação para obtenção do Grau de Mestre em Engenharia Informática
Many internet services are provided through cloud computing infrastructures that are composed of multiple data centers. To provide high availability and low latency, data is replicated in machines in different data centers, which introduces the complexity of guaranteeing that clients view data consistently. Data stores often opt for a relaxed approach to replication, guaranteeing only eventual consistency, since it improves latency of operations. However, this may lead to replicas having different values for the same data. One solution to control the divergence of data in eventually consistent systems is the usage of metrics that measure how stale data is for a replica. In the past, several algorithms have been proposed to estimate the value of these metrics in a deterministic way. An alternative solution is to rely on probabilistic metrics that estimate divergence with a certain degree of certainty. This relaxes the need to contact all replicas while still providing a relatively accurate measurement. In this work we designed and implemented a solution to estimate the divergence of data in eventually consistent data stores, that scale to many replicas by allowing clientside caching. Measuring the divergence when there is a large number of clients calls for the development of new algorithms that provide probabilistic guarantees. Additionally, unlike previous works, we intend to focus on measuring the divergence relative to a state that can lead to the violation of application invariants.
Partially funded by project PTDC/EIA EIA/108963/2008 and by an ERC Starting Grant, Agreement Number 307732

APA, Harvard, Vancouver, ISO, and other styles

7

Kaaniche, Nesrine. "Cloud data storage security based on cryptographic mechanisms." Thesis, Evry, Institut national des télécommunications, 2014. http://www.theses.fr/2014TELE0033/document.

Full text

Abstract:

Au cours de la dernière décennie, avec la standardisation d’Internet, le développement des réseaux à haut débit, le paiement à l’usage et la quête sociétale de la mobilité, le monde informatique a vu se populariser un nouveau paradigme, le Cloud. Le recours au cloud est de plus en plus remarquable compte tenu de plusieurs facteurs, notamment ses architectures rentables, prenant en charge la transmission, le stockage et le calcul intensif de données. Cependant, ces services de stockage prometteurs soulèvent la question de la protection des données et de la conformité aux réglementations, considérablement due à la perte de maîtrise et de gouvernance. Cette dissertation vise à surmonter ce dilemme, tout en tenant compte de deux préoccupations de sécurité des données, à savoir la confidentialité des données et l’intégrité des données. En premier lieu, nous nous concentrons sur la confidentialité des données, un enjeu assez considérable étant donné le partage de données flexible au sein d’un groupe dynamique d’utilisateurs. Cet enjeu exige, par conséquence, un partage efficace des clés entre les membres du groupe. Pour répondre à cette préoccupation, nous avons, d’une part, proposé une nouvelle méthode reposant sur l’utilisation de la cryptographie basée sur l’identité (IBC), où chaque client agit comme une entité génératrice de clés privées. Ainsi, il génère ses propres éléments publics et s’en sert pour le calcul de sa clé privée correspondante. Grâce aux propriétés d’IBC, cette contribution a démontré sa résistance face aux accès non autorisés aux données au cours du processus de partage, tout en tenant compte de deux modèles de sécurité, à savoir un serveur de stockage honnête mais curieux et un utilisateur malveillant. D’autre part, nous définissons CloudaSec, une solution à base de clé publique, qui propose la séparation de la gestion des clés et les techniques de chiffrement, sur deux couches. En effet, CloudaSec permet un déploiement flexible d’un scénario de partage de données ainsi que des garanties de sécurité solides pour les données externalisées sur les serveurs du cloud. Les résultats expérimentaux, sous OpenStack Swift, ont prouvé l’efficacité de CloudaSec, en tenant compte de l’impact des opérations cryptographiques sur le terminal du client. En deuxième lieu, nous abordons la problématique de la preuve de possession de données (PDP). En fait, le client du cloud doit avoir un moyen efficace lui permettant d’effectuer des vérifications périodiques d’intégrité à distance, sans garder les données localement. La preuve de possession se base sur trois aspects : le niveau de sécurité, la vérification publique, et les performances. Cet enjeu est amplifié par des contraintes de stockage et de calcul du terminal client et de la taille des données externalisées. Afin de satisfaire à cette exigence de sécurité, nous définissons d’abord un nouveau protocole PDP, sans apport de connaissance, qui fournit des garanties déterministes de vérification d’intégrité, en s’appuyant sur l’unicité de la division euclidienne. Ces garanties sont considérées comme intéressantes par rapport à plusieurs schémas proposés, présentant des approches probabilistes. Ensuite, nous proposons SHoPS, un protocole de preuve de possession de données capable de traiter les trois relations d’ensembles homomorphiques. SHoPS permet ainsi au client non seulement d’obtenir une preuve de la possession du serveur distant, mais aussi de vérifier que le fichier, en question, est bien réparti sur plusieurs périphériques de stockage permettant d’atteindre un certain niveau de la tolérance aux pannes. En effet, nous présentons l’ensemble des propriétés homomorphiques, qui étend la malléabilité du procédé aux propriétés d’union, intersection et inclusion
Recent technological advances have given rise to the popularity and success of cloud. This new paradigm is gaining an expanding interest, since it provides cost efficient architectures that support the transmission, storage, and intensive computing of data. However, these promising storage services bring many challenging design issues, considerably due to the loss of data control. These challenges, namely data confidentiality and data integrity, have significant influence on the security and performances of the cloud system. This thesis aims at overcoming this trade-off, while considering two data security concerns. On one hand, we focus on data confidentiality preservation which becomes more complex with flexible data sharing among a dynamic group of users. It requires the secrecy of outsourced data and an efficient sharing of decrypting keys between different authorized users. For this purpose, we, first, proposed a new method relying on the use of ID-Based Cryptography (IBC), where each client acts as a Private Key Generator (PKG). That is, he generates his own public elements and derives his corresponding private key using a secret. Thanks to IBC properties, this contribution is shown to support data privacy and confidentiality, and to be resistant to unauthorized access to data during the sharing process, while considering two realistic threat models, namely an honest but curious server and a malicious user adversary. Second, we define CloudaSec, a public key based solution, which proposes the separation of subscription-based key management and confidentiality-oriented asymmetric encryption policies. That is, CloudaSec enables flexible and scalable deployment of the solution as well as strong security guarantees for outsourced data in cloud servers. Experimental results, under OpenStack Swift, have proven the efficiency of CloudaSec in scalable data sharing, while considering the impact of the cryptographic operations at the client side. On the other hand, we address the Proof of Data Possession (PDP) concern. In fact, the cloud customer should have an efficient way to perform periodical remote integrity verifications, without keeping the data locally, following three substantial aspects : security level, public verifiability, and performance. This concern is magnified by the client’s constrained storage and computation capabilities and the large size of outsourced data. In order to fulfill this security requirement, we first define a new zero-knowledge PDP proto- col that provides deterministic integrity verification guarantees, relying on the uniqueness of the Euclidean Division. These guarantees are considered as interesting, compared to several proposed schemes, presenting probabilistic approaches. Then, we propose SHoPS, a Set-Homomorphic Proof of Data Possession scheme, supporting the 3 levels of data verification. SHoPS enables the cloud client not only to obtain a proof of possession from the remote server, but also to verify that a given data file is distributed across multiple storage devices to achieve a certain desired level of fault tolerance. Indeed, we present the set homomorphism property, which extends malleability to set operations properties, such as union, intersection and inclusion. SHoPS presents high security level and low processing complexity. For instance, SHoPS saves energy within the cloud provider by distributing the computation over multiple nodes. Each node provides proofs of local data block sets. This is to make applicable, a resulting proof over sets of data blocks, satisfying several needs, such as, proofs aggregation

APA, Harvard, Vancouver, ISO, and other styles

8

Kaaniche, Nesrine. "Cloud data storage security based on cryptographic mechanisms." Electronic Thesis or Diss., Evry, Institut national des télécommunications, 2014. http://www.theses.fr/2014TELE0033.

Full text

Abstract:

Au cours de la dernière décennie, avec la standardisation d’Internet, le développement des réseaux à haut débit, le paiement à l’usage et la quête sociétale de la mobilité, le monde informatique a vu se populariser un nouveau paradigme, le Cloud. Le recours au cloud est de plus en plus remarquable compte tenu de plusieurs facteurs, notamment ses architectures rentables, prenant en charge la transmission, le stockage et le calcul intensif de données. Cependant, ces services de stockage prometteurs soulèvent la question de la protection des données et de la conformité aux réglementations, considérablement due à la perte de maîtrise et de gouvernance. Cette dissertation vise à surmonter ce dilemme, tout en tenant compte de deux préoccupations de sécurité des données, à savoir la confidentialité des données et l’intégrité des données. En premier lieu, nous nous concentrons sur la confidentialité des données, un enjeu assez considérable étant donné le partage de données flexible au sein d’un groupe dynamique d’utilisateurs. Cet enjeu exige, par conséquence, un partage efficace des clés entre les membres du groupe. Pour répondre à cette préoccupation, nous avons, d’une part, proposé une nouvelle méthode reposant sur l’utilisation de la cryptographie basée sur l’identité (IBC), où chaque client agit comme une entité génératrice de clés privées. Ainsi, il génère ses propres éléments publics et s’en sert pour le calcul de sa clé privée correspondante. Grâce aux propriétés d’IBC, cette contribution a démontré sa résistance face aux accès non autorisés aux données au cours du processus de partage, tout en tenant compte de deux modèles de sécurité, à savoir un serveur de stockage honnête mais curieux et un utilisateur malveillant. D’autre part, nous définissons CloudaSec, une solution à base de clé publique, qui propose la séparation de la gestion des clés et les techniques de chiffrement, sur deux couches. En effet, CloudaSec permet un déploiement flexible d’un scénario de partage de données ainsi que des garanties de sécurité solides pour les données externalisées sur les serveurs du cloud. Les résultats expérimentaux, sous OpenStack Swift, ont prouvé l’efficacité de CloudaSec, en tenant compte de l’impact des opérations cryptographiques sur le terminal du client. En deuxième lieu, nous abordons la problématique de la preuve de possession de données (PDP). En fait, le client du cloud doit avoir un moyen efficace lui permettant d’effectuer des vérifications périodiques d’intégrité à distance, sans garder les données localement. La preuve de possession se base sur trois aspects : le niveau de sécurité, la vérification publique, et les performances. Cet enjeu est amplifié par des contraintes de stockage et de calcul du terminal client et de la taille des données externalisées. Afin de satisfaire à cette exigence de sécurité, nous définissons d’abord un nouveau protocole PDP, sans apport de connaissance, qui fournit des garanties déterministes de vérification d’intégrité, en s’appuyant sur l’unicité de la division euclidienne. Ces garanties sont considérées comme intéressantes par rapport à plusieurs schémas proposés, présentant des approches probabilistes. Ensuite, nous proposons SHoPS, un protocole de preuve de possession de données capable de traiter les trois relations d’ensembles homomorphiques. SHoPS permet ainsi au client non seulement d’obtenir une preuve de la possession du serveur distant, mais aussi de vérifier que le fichier, en question, est bien réparti sur plusieurs périphériques de stockage permettant d’atteindre un certain niveau de la tolérance aux pannes. En effet, nous présentons l’ensemble des propriétés homomorphiques, qui étend la malléabilité du procédé aux propriétés d’union, intersection et inclusion
Recent technological advances have given rise to the popularity and success of cloud. This new paradigm is gaining an expanding interest, since it provides cost efficient architectures that support the transmission, storage, and intensive computing of data. However, these promising storage services bring many challenging design issues, considerably due to the loss of data control. These challenges, namely data confidentiality and data integrity, have significant influence on the security and performances of the cloud system. This thesis aims at overcoming this trade-off, while considering two data security concerns. On one hand, we focus on data confidentiality preservation which becomes more complex with flexible data sharing among a dynamic group of users. It requires the secrecy of outsourced data and an efficient sharing of decrypting keys between different authorized users. For this purpose, we, first, proposed a new method relying on the use of ID-Based Cryptography (IBC), where each client acts as a Private Key Generator (PKG). That is, he generates his own public elements and derives his corresponding private key using a secret. Thanks to IBC properties, this contribution is shown to support data privacy and confidentiality, and to be resistant to unauthorized access to data during the sharing process, while considering two realistic threat models, namely an honest but curious server and a malicious user adversary. Second, we define CloudaSec, a public key based solution, which proposes the separation of subscription-based key management and confidentiality-oriented asymmetric encryption policies. That is, CloudaSec enables flexible and scalable deployment of the solution as well as strong security guarantees for outsourced data in cloud servers. Experimental results, under OpenStack Swift, have proven the efficiency of CloudaSec in scalable data sharing, while considering the impact of the cryptographic operations at the client side. On the other hand, we address the Proof of Data Possession (PDP) concern. In fact, the cloud customer should have an efficient way to perform periodical remote integrity verifications, without keeping the data locally, following three substantial aspects : security level, public verifiability, and performance. This concern is magnified by the client’s constrained storage and computation capabilities and the large size of outsourced data. In order to fulfill this security requirement, we first define a new zero-knowledge PDP proto- col that provides deterministic integrity verification guarantees, relying on the uniqueness of the Euclidean Division. These guarantees are considered as interesting, compared to several proposed schemes, presenting probabilistic approaches. Then, we propose SHoPS, a Set-Homomorphic Proof of Data Possession scheme, supporting the 3 levels of data verification. SHoPS enables the cloud client not only to obtain a proof of possession from the remote server, but also to verify that a given data file is distributed across multiple storage devices to achieve a certain desired level of fault tolerance. Indeed, we present the set homomorphism property, which extends malleability to set operations properties, such as union, intersection and inclusion. SHoPS presents high security level and low processing complexity. For instance, SHoPS saves energy within the cloud provider by distributing the computation over multiple nodes. Each node provides proofs of local data block sets. This is to make applicable, a resulting proof over sets of data blocks, satisfying several needs, such as, proofs aggregation

APA, Harvard, Vancouver, ISO, and other styles

9

Noman, Ali. "Addressing the Data Location Assurance Problem of Cloud Storage Environments." Thesis, Université d'Ottawa / University of Ottawa, 2018. http://hdl.handle.net/10393/37375.

Full text

Abstract:

In a cloud storage environment, providing geo-location assurance of data to a cloud user is very challenging as the cloud storage provider physically controls the data and it would be challenging for the user to detect if the data is stored in different datacenters/storage servers other than the one where it is supposed to be. We name this problem as the “Data Location Assurance Problem” of a Cloud Storage Environment. Aside from the privacy and security concerns, the lack of geo-location assurance of cloud data involved in the cloud storage has been identified as one of the main reasons why organizations that deal with sensitive data (e.g., financial data, health-related data, and data related to Personally Identifiable Infor-mation, PII) cannot adopt a cloud storage solution even if they might wish to. It might seem that cryptographic techniques such as Proof of Data Possession (PDP) can be a solution for this problem; however, we show that those cryptographic techniques alone cannot solve that. In this thesis, we address the data location assurance (DLA) problem of the cloud storage environment which includes but is not limited to investigating the necessity for a good data location assurance solution as well as challenges involved in providing this kind of solution; we then come up with efficient solutions for the DLA problem. Note that, for the totally dis-honest cloud storage server attack model, it may be impossible to offer a solution for the DLA problem. So the main objective of this thesis is to come up with solutions for the DLA problem for different system and attack models (from less adversarial system and attack models to more adversarial ones) available in existing cloud storage environments so that it can meet the need for cloud storage applications that exist today.

APA, Harvard, Vancouver, ISO, and other styles

10

Arteaga, Clavijo Dulcardo Ariel. "Flash Caching for Cloud Computing Systems." FIU Digital Commons, 2016. http://digitalcommons.fiu.edu/etd/2496.

Full text

Abstract:

As the size of cloud systems and the number of hosted virtual machines (VMs) rapidly grow, the scalability of shared VM storage systems becomes a serious issue. Client-side flash-based caching has the potential to improve the performance of cloud VM storage by employing flash storage available on the VM hosts to exploit the locality inherent in VM IOs. However, there are several challenges to the effective use of flash caching in cloud systems. First, cache configurations such as size, write policy, metadata persistency and RAID level have significant impacts on flash caching. Second, the typical capacity of flash devices is limited compared to the dataset size of consolidated VMs. Finally, flash devices wear out and face serious endurance issues which are aggravated by the use for caching. This dissertation presents the research for addressing these problems of cloud flash caching in the following three aspects. First, it presents a thorough study of different cache configurations including a new cache-optimized RAID configuration using a large amount of long-term traces collected from real-world public and private clouds. Second, it studies an on-demand flash cache management solution for meeting VM cache demands and minimizing device wear-out. It uses a new cache demand model Reuse Working Set (RWS) to capture the data with good temporal locality, and uses the RWS size (RWSS) to model a workload?s cache demand. Finally, to handle situations where a cache is insufficient for VMs? demands, it employs dynamic cache migration to balance cache load across hosts by live migrating cached data along with the VMs. The results show that the cache-optimized RAID improves performance by 137% without sacrificing reliability, compared to traditional RAID. The RWSS-based on-demand cache allocation reduces workload?s cache usage by 78% and lowers the amount of writes sent to cache device by 40%, compared to traditional working set based cache allocation. Combining on-demand cache allocation with dynamic cache migration for 12 concurrent VMs, results show 28% higher hit ratio and 28% lower 90th percentile IO latency, compared to the case without cache allocation.

APA, Harvard, Vancouver, ISO, and other styles

11

Yahya, Farashazillah. "A security framework to protect data in cloud storage." Thesis, University of Southampton, 2017. https://eprints.soton.ac.uk/415861/.

Full text

Abstract:

According to Cisco Global Cloud Index, cloud storage users will store 1.6 Gigabytes data per month by 2019, compared to 992 megabytes data per month in 2014. With this trend, it has been shown that more and more data will reside in cloud storage and it is expected to grow further. As cloud storage is becoming an option for users for keeping their data online, it comes with security concerns for protecting data from threats. This thesis addresses the need to investigate the security factors that will enable efficient security protection for data in cloud storage and the relationships that exist between the different security factors. Consequently, this research has developed a conceptual framework that supports security in cloud storage. The main contribution of this research is the development of a Cloud Storage Security Framework (CSSF) to support an integrative approach to understanding and evaluating security in cloud storage. The framework enables understanding of the makeup of security in cloud storage and measures the understanding of security in cloud storage. Drawing upon established theories and prior research findings, the framework indicates that security in cloud storage can be determined by nine factors: (1) security policies implementation in cloud storage, security measure that relates to (2) protecting the data accessed in cloud storage; (3) modifications of data stored; (4) accessibility of data stored in cloud storage; (5) non-repudiation to the data stored; (6) authenticity of the original data; (7) reliability of the cloud storage services; (8) accountability of service provision; and (9) auditability of the data accessed and stored in cloud storage. An example of CSSF application has been demonstrated through the development of a measuring instrument called Security Rating Score (SecRaS) and through a series of experiments, SecRaS has been validated and used in a research scenario. The instrument consists of several items generated using goal-question-metric approach. These potential items were evaluated by a series of experiments; the security experts assessed using content validity ratio while the security practitioners took part in the validation study. The validation study completed two experiments that look into the correlation analyses and internal reliability. SecRaS instrument was later applied in a research scenario; the validated instrument was distributed and a number of 218 usable responses were received. Using structural equation modelling, the data has revealed a good fit of the measurement analyses and structural model. The key findings were as follow: the relationships between factors were found to have both direct and indirect effects in the result. While establishing the relationship(s) among the factors, the structural model proposes three types of causal relationships in terms of how the security implementation in cloud storage could be affected by the security factors. This thesis presents a detailed discussion of the CSSF development, confirmation, and application in a research scenario. For security managers, CSSF offers a new paradigm on how stakeholders can make cloud storage security implementation successful in some depth. For security practitioners, the CSSF enables deconstruction of the concept of security in cloud storage into smaller, conceptually distinct and manageable factors to guide the design of security in cloud storage. For researchers, the CSSF provides a common framework in which to conceptualise their research and make it easier to see how the security factors fit into the larger picture.

APA, Harvard, Vancouver, ISO, and other styles

12

Kagalkar, Ramesh [Verfasser]. "PRIVACY PRESERVING PUBLIC AUDITING AND DATA INTEGRITY USING TPA FOR SECURE CLOUD STORAGE : SECURE CLOUD STORAGE / Ramesh Kagalkar." Hamburg : Anchor Academic Publishing, 2015. http://d-nb.info/1110039662/34.

Full text

APA, Harvard, Vancouver, ISO, and other styles

13

Wu, Justin Chun. "Peering Through the Cloud—Investigating the Perceptions and Behaviors of Cloud Storage Users." BYU ScholarsArchive, 2016. https://scholarsarchive.byu.edu/etd/6175.

Full text

Abstract:

We present the results of a survey and interviews focused on user perceptions and behaviors with respect to cloud storage services. In particular, we study behaviors such as which services are used, what types of data are stored, and how collaboration and sharing are performed. We also investigate user attitudes toward cloud storage on topics such as payment, privacy, security, and robustness. We find that users are drawn to cloud storage because it enables robust, ubiquitous access to their files, as well as enabling sharing and collaborative efforts. However, users' preferred medium for file sharing continues to be email, due to its ubiquity and role as "lowest common denominator." Privacy and security are of great concern to users, and though users vocally describe feeling "safe" on the cloud, this is because they actively filter the content they store in cloud services. Payment is a sensitive issue, with users exhibiting a strong aversion to any form of direct payment, preferring even disliked alternative funding mechanisms such as targeted advertising. Finally, the cloud serves as an important backup location for users, although space limitations prevent them from using it as a full backup solution.

APA, Harvard, Vancouver, ISO, and other styles

14

Buop, George Onyango. "Data storage security for cloud computing using elliptic curve cryptography." Master's thesis, University of Cape Town, 2020. http://hdl.handle.net/11427/32489.

Full text

Abstract:

Institutions and enterprises are moving towards more service availability, managed risk and at the same time, aim at reducing cost. Cloud Computing is a growing technology, thriving in the fields of information communication and data storage. With the proliferation of online activity, more and more information is saved as data every day. This means that more data is being stored in the cloud than ever before. Data that is stored online often holds private information – such as addresses, payment details and medical documentation. These become the target of cyber criminals. There is therefore growing need to protect these data from threats and issues such as data breach and leakage, data loss, account takeover or hijackings, among others. Cryptography refers to securing the information and communication techniques based on mathematical concepts and algorithms which transform messages in ways that are hard to decipher. Cryptography is one of the techniques we could protect data stored in the cloud as it enables security properties of data confidentiality and integrity. This research investigates the security issues that affect storage of data in the cloud. This thesis also discusses the previous research work and the currently available technology and techniques that are used for securing data in the cloud. This thesis then presents a novel scheme for security of data stored in Cloud Computing by using Elliptic Curve Integrated Encryption Scheme (ECIES) that provides for confidentiality and integrity. This scheme also uses Identity Based Cryptography (IBC) for more efficient key management. The proposed scheme combines the security of Identity- Based Cryptography (IBC), Trusted cloud (TC), and Elliptic Curve Cryptography (ECC) to reduce system complexity and provide more security for cloud computing applications. The research shows that it is possible to securely store confidential user data on a Public Cloud such as Amazon S3 or Windows Azure Storage without the need to trust the Cloud Provider and with minimal overhead in processing time. The results of implementing the proposed scheme shows faster and more efficient communication operation when it comes to key generation as well as encryption and decryption. The difference in the time taken for these operations is as a result of the use of ECC algorithm which has a small key size and hence highly efficient compared with other types of asymmetric cryptography. The results obtained show the scheme is more efficient, when compared with other classification techniques in the literature.

APA, Harvard, Vancouver, ISO, and other styles

15

Vasilopoulos, Dimitrios. "Reconciling cloud storage functionalities with security : proofs of storage with data reliability and secure deduplication." Electronic Thesis or Diss., Sorbonne université, 2019. http://www.theses.fr/2019SORUS399.

Full text

Abstract:

Dans cette thèse, nous étudions en profondeur le problème de la vérifiabilité des systèmes de stockage en nuage. Suite à notre étude des preuves de stockage et nous avons identifié des limitations par rapport à deux caractéristiques essentielles aux systèmes de stockage en nuage: la fiabilité du stockage des données avec une maintenance automatique et la déduplication des données. Pour faire face à la première limitation, nous introduisons la notion de preuve de fiabilité des données, un schéma de vérification complet visant à résoudre le conflit entre la vérification fiable du stockage des données et la maintenance automatique. Nous proposons deux schémas de preuve de fiabilité des données, à savoir POROS et PORTOS, qui permettent de vérifier un mécanisme de stockage fiable de données tout en permettant au fournisseur de stockage en nuage d'effectuer de manière autonome des opérations de maintenance automatique. En ce qui concerne la deuxième caractéristique, nous traitons le conflit entre les preuves de stockage et la déduplication. Plus précisément nous proposons une preuve de stockage à message verrouillé c'est-à-dire une solution combinant les preuves de stockage avec la déduplication. De plus, nous proposons un nouveau protocole de génération de clé à message verrouillé qui résiste mieux aux attaques de dictionnaire hors ligne par rapport aux solutions existantes
In this thesis we study in depth the problem of verifiability in cloud storage systems. We study Proofs of Storage -a family of cryptographic protocols that enable a cloud storage provider to prove to a user that the integrity of her data has not been compromised- and we identify their limitations with respect to two key characteristics of cloud storage systems, namely, reliable data storage with automatic maintenance and data deduplication. To cope with the first characteristic, we introduce the notion of Proofs of Data Reliability, a comprehensive verification scheme that aims to resolve the conflict between reliable data storage verification and automatic maintenance. We further propose two Proofs of Data Reliability schemes, namely POROS and PORTOS, that succeed in verifying reliable data storage and, at the same time, enable the cloud storage provider to autonomously perform automatic maintenance operations. As regards to the second characteristic, we address the conflict between Proofs of Storage and deduplication. More precisely, inspired by previous attempts in solving the problem of deduplicating encrypted data, we propose message-locked PoR, a solution that combines Proofs of Storage with deduplication. In addition, we propose a novel message-locked key generation protocol which is more resilient against off-line dictionary attacks compared to existing solutions

APA, Harvard, Vancouver, ISO, and other styles

16

Zhang, Gong. "Data and application migration in cloud based data centers --architectures and techniques." Diss., Georgia Institute of Technology, 2011. http://hdl.handle.net/1853/41078.

Full text

Abstract:

Computing and communication have continued to impact on the way we run business, the way we learn, and the way we live. The rapid technology evolution of computing has also expedited the growth of digital data, the workload of services, and the complexity of applications. Today, the cost of managing storage hardware ranges from two to ten times the acquisition cost of the storage hardware. We see an increasing demand on technologies for transferring management burden from humans to software. Data migration and application migration are one of popular technologies that enable computing and data storage management to be autonomic and self-managing. In this dissertation, we examine important issues in designing and developing scalable architectures and techniques for efficient and effective data migration and application migration. The first contribution we have made is to investigate the opportunity of automated data migration across multi-tier storage systems. The significant IO improvement in Solid State Disks (SSD) over traditional rotational hard disks (HDD) motivates the integration of SSD into existing storage hierarchy for enhanced performance. We developed adaptive look-ahead data migration approach to effectively integrate SSD into the multi-tiered storage architecture. When using the fast and expensive SSD tier to store the high temperature data (hot data) while placing the relatively low temperature data (low data) in the HDD tier, one of the important functionality is to manage the migration of data as their access patterns are changed from hot to cold and vice versa. For example, workloads during day time in typical banking applications can be dramatically different from those during night time. We designed and implemented an adaptive lookahead data migration model. A unique feature of our automated migration approach is its ability to dynamically adapt the data migration schedule to achieve the optimal migration effectiveness by taking into account of application specific characteristics and I/O profiles as well as workload deadlines. Our experiments running over the real system trace show that the basic look-ahead data migration model is effective in improving system resource utilization and the adaptive look-ahead migration model is more efficient for continuously improving and tuning of the performance and scalability of multi-tier storage systems. The second main contribution we have made in this dissertation research is to address the challenge of ensuring reliability and balancing loads across a network of computing nodes, managed in a decentralized service computing system. Considering providing location based services for geographically distributed mobile users, the continuous and massive service request workloads pose significant technical challenges for the system to guarantee scalable and reliable service provision. We design and develop a decentralized service computing architecture, called Reliable GeoGrid, with two unique features. First, we develop a distributed workload migration scheme with controlled replication, which utilizes a shortcut-based optimization to increase the resilience of the system against various node failures and network partition failures. Second, we devise a dynamic load balancing technique to scale the system in anticipation of unexpected workload changes. Our experimental results show that the Reliable GeoGrid architecture is highly scalable under changing service workloads with moving hotspots and highly reliable in the presence of massive node failures. The third research thrust in this dissertation research is focused on study the process of migrating applications from local physical data centers to Cloud. We design migration experiments and study the error types and further build the error model. Based on the analysis and observations in migration experiments, we propose the CloudMig system which provides both configuration validation and installation automation which effectively reduces the configuration errors and installation complexity. In this dissertation, I will provide an in-depth discussion of the principles of migration and its applications in improving data storage performance, balancing service workloads and adapting to cloud platform.

APA, Harvard, Vancouver, ISO, and other styles

17

Oduyiga, Adeshola Oyesanya. "Security in Cloud Storage : A Suitable Security Algorithm for Data Protection." Thesis, Mittuniversitetet, Avdelningen för informationssystem och -teknologi, 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:miun:diva-34428.

Full text

Abstract:

The purpose of this thesis work was to conduct a general research on existing security techniques and come up with a considerable algorithm for data security in cloud storage. Cloud storage is an infrastructure or is a model of computer data storage in which the digital data is stored in logical pools. It unifies object storage for both developers and enterprises, from live applications data to cloud archival. It help to save valuable space on PC computers or mobile devices and provides the easy storage and access of data anywhere in the world. However, just as the benefits of cloud computing abounds, so also are the risks involved. If data are not well secured or encrypted before deployment for storage in the cloud, in case of negligence on the side of the developers, then hackers can gain unauthorized access to the data. The behavior of existing security algorithms on data were studied, the encryption and decryption process of the each algorithm on data was studied and also their weaknesses against attacks. Apart from data encryption, security policies also plays an important roll in cloud storage which was also covered in this report. The research work was conducted through the use of online publications, literature review, books, academic publications and reputable research materials. The study showed that regardless of the challenges in cloud storage, there is still a suitable algorithm for protecting data against attack in the cloud.

APA, Harvard, Vancouver, ISO, and other styles

18

Navarro, Martín Joan. "From cluster databases to cloud storage: Providing transactional support on the cloud." Doctoral thesis, Universitat Ramon Llull, 2015. http://hdl.handle.net/10803/285655.

Full text

Abstract:

Durant les últimes tres dècades, les limitacions tecnològiques (com per exemple la capacitat dels dispositius d'emmagatzematge o l'ample de banda de les xarxes de comunicació) i les creixents demandes dels usuaris (estructures d'informació, volums de dades) han conduït l'evolució de les bases de dades distribuïdes. Des dels primers repositoris de dades per arxius plans que es van desenvolupar en la dècada dels vuitanta, s'han produït importants avenços en els algoritmes de control de concurrència, protocols de replicació i en la gestió de transaccions. No obstant això, els reptes moderns d'emmagatzematge de dades que plantegen el Big Data i el cloud computing—orientats a millorar la limitacions pel que fa a escalabilitat i elasticitat de les bases de dades estàtiques—estan empenyent als professionals a relaxar algunes propietats importants dels sistemes transaccionals clàssics, cosa que exclou a diverses aplicacions les quals no poden encaixar en aquesta estratègia degut a la seva alta dependència transaccional. El propòsit d'aquesta tesi és abordar dos reptes importants encara latents en el camp de les bases de dades distribuïdes: (1) les limitacions pel que fa a escalabilitat dels sistemes transaccionals i (2) el suport transaccional en repositoris d'emmagatzematge en el núvol. Analitzar les tècniques tradicionals de control de concurrència i de replicació, utilitzades per les bases de dades clàssiques per suportar transaccions, és fonamental per identificar les raons que fan que aquests sistemes degradin el seu rendiment quan el nombre de nodes i / o quantitat de dades creix. A més, aquest anàlisi està orientat a justificar el disseny dels repositoris en el núvol que deliberadament han deixat de banda el suport transaccional. Efectivament, apropar el paradigma de l'emmagatzematge en el núvol a les aplicacions que tenen una forta dependència en les transaccions és fonamental per a la seva adaptació als requeriments actuals pel que fa a volums de dades i models de negoci. Aquesta tesi comença amb la proposta d'un simulador de protocols per a bases de dades distribuïdes estàtiques, el qual serveix com a base per a la revisió i comparativa de rendiment dels protocols de control de concurrència i les tècniques de replicació existents. Pel que fa a la escalabilitat de les bases de dades i les transaccions, s'estudien els efectes que té executar diferents perfils de transacció sota diferents condicions. Aquesta anàlisi contínua amb una revisió dels repositoris d'emmagatzematge de dades en el núvol existents—que prometen encaixar en entorns dinàmics que requereixen alta escalabilitat i disponibilitat—, el qual permet avaluar els paràmetres i característiques que aquests sistemes han sacrificat per tal de complir les necessitats actuals pel que fa a emmagatzematge de dades a gran escala. Per explorar les possibilitats que ofereix el paradigma del cloud computing en un escenari real, es presenta el desenvolupament d'una arquitectura d'emmagatzematge de dades inspirada en el cloud computing la qual s’utilitza per emmagatzemar la informació generada en les Smart Grids. Concretament, es combinen les tècniques de replicació en bases de dades transaccionals i la propagació epidèmica amb els principis de disseny usats per construir els repositoris de dades en el núvol. Les lliçons recollides en l'estudi dels protocols de replicació i control de concurrència en el simulador de base de dades, juntament amb les experiències derivades del desenvolupament del repositori de dades per a les Smart Grids, desemboquen en el que hem batejat com Epidemia: una infraestructura d'emmagatzematge per Big Data concebuda per proporcionar suport transaccional en el núvol. A més d'heretar els beneficis dels repositoris en el núvol en quant a escalabilitat, Epidemia inclou una capa de gestió de transaccions que reenvia les transaccions dels clients a un conjunt jeràrquic de particions de dades, cosa que permet al sistema oferir diferents nivells de consistència i adaptar elàsticament la seva configuració a noves demandes de càrrega de treball. Finalment, els resultats experimentals posen de manifest la viabilitat de la nostra contribució i encoratgen als professionals a continuar treballant en aquesta àrea.
Durante las últimas tres décadas, las limitaciones tecnológicas (por ejemplo la capacidad de los dispositivos de almacenamiento o el ancho de banda de las redes de comunicación) y las crecientes demandas de los usuarios (estructuras de información, volúmenes de datos) han conducido la evolución de las bases de datos distribuidas. Desde los primeros repositorios de datos para archivos planos que se desarrollaron en la década de los ochenta, se han producido importantes avances en los algoritmos de control de concurrencia, protocolos de replicación y en la gestión de transacciones. Sin embargo, los retos modernos de almacenamiento de datos que plantean el Big Data y el cloud computing—orientados a mejorar la limitaciones en cuanto a escalabilidad y elasticidad de las bases de datos estáticas—están empujando a los profesionales a relajar algunas propiedades importantes de los sistemas transaccionales clásicos, lo que excluye a varias aplicaciones las cuales no pueden encajar en esta estrategia debido a su alta dependencia transaccional. El propósito de esta tesis es abordar dos retos importantes todavía latentes en el campo de las bases de datos distribuidas: (1) las limitaciones en cuanto a escalabilidad de los sistemas transaccionales y (2) el soporte transaccional en repositorios de almacenamiento en la nube. Analizar las técnicas tradicionales de control de concurrencia y de replicación, utilizadas por las bases de datos clásicas para soportar transacciones, es fundamental para identificar las razones que hacen que estos sistemas degraden su rendimiento cuando el número de nodos y/o cantidad de datos crece. Además, este análisis está orientado a justificar el diseño de los repositorios en la nube que deliberadamente han dejado de lado el soporte transaccional. Efectivamente, acercar el paradigma del almacenamiento en la nube a las aplicaciones que tienen una fuerte dependencia en las transacciones es crucial para su adaptación a los requerimientos actuales en cuanto a volúmenes de datos y modelos de negocio. Esta tesis empieza con la propuesta de un simulador de protocolos para bases de datos distribuidas estáticas, el cual sirve como base para la revisión y comparativa de rendimiento de los protocolos de control de concurrencia y las técnicas de replicación existentes. En cuanto a la escalabilidad de las bases de datos y las transacciones, se estudian los efectos que tiene ejecutar distintos perfiles de transacción bajo diferentes condiciones. Este análisis continua con una revisión de los repositorios de almacenamiento en la nube existentes—que prometen encajar en entornos dinámicos que requieren alta escalabilidad y disponibilidad—, el cual permite evaluar los parámetros y características que estos sistemas han sacrificado con el fin de cumplir las necesidades actuales en cuanto a almacenamiento de datos a gran escala. Para explorar las posibilidades que ofrece el paradigma del cloud computing en un escenario real, se presenta el desarrollo de una arquitectura de almacenamiento de datos inspirada en el cloud computing para almacenar la información generada en las Smart Grids. Concretamente, se combinan las técnicas de replicación en bases de datos transaccionales y la propagación epidémica con los principios de diseño usados para construir los repositorios de datos en la nube. Las lecciones recogidas en el estudio de los protocolos de replicación y control de concurrencia en el simulador de base de datos, junto con las experiencias derivadas del desarrollo del repositorio de datos para las Smart Grids, desembocan en lo que hemos acuñado como Epidemia: una infraestructura de almacenamiento para Big Data concebida para proporcionar soporte transaccional en la nube. Además de heredar los beneficios de los repositorios en la nube altamente en cuanto a escalabilidad, Epidemia incluye una capa de gestión de transacciones que reenvía las transacciones de los clientes a un conjunto jerárquico de particiones de datos, lo que permite al sistema ofrecer distintos niveles de consistencia y adaptar elásticamente su configuración a nuevas demandas cargas de trabajo. Por último, los resultados experimentales ponen de manifiesto la viabilidad de nuestra contribución y alientan a los profesionales a continuar trabajando en esta área.
Over the past three decades, technology constraints (e.g., capacity of storage devices, communication networks bandwidth) and an ever-increasing set of user demands (e.g., information structures, data volumes) have driven the evolution of distributed databases. Since flat-file data repositories developed in the early eighties, there have been important advances in concurrency control algorithms, replication protocols, and transactions management. However, modern concerns in data storage posed by Big Data and cloud computing—related to overcome the scalability and elasticity limitations of classic databases—are pushing practitioners to relax some important properties featured by transactions, which excludes several applications that are unable to fit in this strategy due to their intrinsic transactional nature. The purpose of this thesis is to address two important challenges still latent in distributed databases: (1) the scalability limitations of transactional databases and (2) providing transactional support on cloud-based storage repositories. Analyzing the traditional concurrency control and replication techniques, used by classic databases to support transactions, is critical to identify the reasons that make these systems degrade their throughput when the number of nodes and/or amount of data rockets. Besides, this analysis is devoted to justify the design rationale behind cloud repositories in which transactions have been generally neglected. Furthermore, enabling applications which are strongly dependent on transactions to take advantage of the cloud storage paradigm is crucial for their adaptation to current data demands and business models. This dissertation starts by proposing a custom protocol simulator for static distributed databases, which serves as a basis for revising and comparing the performance of existing concurrency control protocols and replication techniques. As this thesis is especially concerned with transactions, the effects on the database scalability of different transaction profiles under different conditions are studied. This analysis is followed by a review of existing cloud storage repositories—that claim to be highly dynamic, scalable, and available—, which leads to an evaluation of the parameters and features that these systems have sacrificed in order to meet current large-scale data storage demands. To further explore the possibilities of the cloud computing paradigm in a real-world scenario, a cloud-inspired approach to store data from Smart Grids is presented. More specifically, the proposed architecture combines classic database replication techniques and epidemic updates propagation with the design principles of cloud-based storage. The key insights collected when prototyping the replication and concurrency control protocols at the database simulator, together with the experiences derived from building a large-scale storage repository for Smart Grids, are wrapped up into what we have coined as Epidemia: a storage infrastructure conceived to provide transactional support on the cloud. In addition to inheriting the benefits of highly-scalable cloud repositories, Epidemia includes a transaction management layer that forwards client transactions to a hierarchical set of data partitions, which allows the system to offer different consistency levels and elastically adapt its configuration to incoming workloads. Finally, experimental results highlight the feasibility of our contribution and encourage practitioners to further research in this area.

APA, Harvard, Vancouver, ISO, and other styles

19

Gureya, Daharewa David. "Self-trained Proactive Elasticity Manager for Cloud-based Storage Services." Thesis, KTH, Skolan för informations- och kommunikationsteknik (ICT), 2015. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-187353.

Full text

Abstract:

The pay-as-you-go pricing model and the illusion of unlimited resources makes cloud computing a conducive environment for provision of elastic services where different resources are dynamically requested and released in response to changes in their demand. The benefit of elastic resource allocation to cloud systems is to minimize resource provisioning costs while meeting service level objectives (SLOs). With the emergence of elastic services, and more particularly elastic key-value stores, that can scale horizontally by adding/removing servers, organizations perceive potential in being able to reduce cost and complexity of large scale Web 2.0 applications. A well-designed elasticity controller helps reducing the cost of hosting services using dynamic resource provisioning and, in the meantime, does not compromise service quality. An elasticity controller often needs to be trained either online or offline in order to make it intelligent enough to make decisions on spawning or removing extra instances when workload increase or decrease. However, there are two main issues on the process of control model training. A significant amount of recent works train the models offline and apply them to an online system. This approach may lead the elasticity controller to make inaccurate decisions since not all parameters can be considered when building the model offline. The complete training of the model consumes large efforts, including modifying system setups and changing system configurations. Worse, some models can even include several dimensions of system parameters. To overcome these limitations, we present the design and evaluation of a self-trained proactive elasticity manager for cloud-based elastic key-value stores. Our elasticity controller uses online profiling and support vector machines (SVM) to provide a black-box performance model of an application’s SLO violation for a given resource demand. The model is dynamically updated to adapt to operating environment changes such as workload pattern variations, data rebalance, changes in data size, etc. We have implemented and evaluated our controller using the Apache Cassandra key-value store in an OpenStack Cloud environment. Our experiments with artificial workload traces shows that our controller guarantees a high level of SLO commitments while keeping the overall resource utilization optimal.

APA, Harvard, Vancouver, ISO, and other styles

20

Syckor, Jens. "Dropbox & Co, alles schon ge-cloud?" Saechsische Landesbibliothek- Staats- und Universitaetsbibliothek Dresden, 2014. http://nbn-resolving.de/urn:nbn:de:bsz:14-qucosa-153998.

Full text

Abstract:

Cloudspeicherdienste sind zu einem Standard für den Austausch großer Datenmengen in virtuellen Gemeinschaften geworden, sowohl im privaten Umfeld als auch im öffentlichen Bereich. Einfache Bedienbarkeit sowie nahtlose Integration in Applikationen, Betriebssystemen und Endgeräten sind wesentliche Bausteine dieses Siegeszuges.

APA, Harvard, Vancouver, ISO, and other styles

21

Yu, Shucheng. "Data Sharing on Untrusted Storage with Attribute-Based Encryption." Digital WPI, 2010. https://digitalcommons.wpi.edu/etd-dissertations/321.

Full text

Abstract:

"Storing data on untrusted storage makes secure data sharing a challenge issue. On one hand, data access policies should be enforced on these storage servers; on the other hand, confidentiality of sensitive data should be well protected against them. Cryptographic methods are usually applied to address this issue -- only encrypted data are stored on storage servers while retaining secret key(s) to the data owner herself; user access is granted by issuing the corresponding data decryption keys. The main challenges for cryptographic methods include simultaneously achieving system scalability and fine-grained data access control, efficient key/user management, user accountability and etc. To address these challenge issues, this dissertation studies and enhances a novel public-key cryptography -- attribute-based encryption (ABE), and applies it for fine-grained data access control on untrusted storage. The first part of this dissertation discusses the necessity of applying ABE to secure data sharing on untrusted storage and addresses several security issues for ABE. More specifically, we propose three enhancement schemes for ABE: In the first enhancement scheme, we focus on how to revoke users in ABE with the help of untrusted servers. In this work, we enable the data owner to delegate most computation-intensive tasks pertained to user revocation to untrusted servers without disclosing data content to them. In the second enhancement scheme, we address key abuse attacks in ABE, in which authorized but malicious users abuse their access privileges by sharing their decryption keys with unauthorized users. Our proposed scheme makes it possible for the data owner to efficiently disclose the original key owner's identity merely by checking the input and output of a suspicious user's decryption device. Our third enhancement schemes study the issue of privacy preservation in ABE. Specifically, our proposed schemes hide the data owner's access policy not only to the untrusted servers but also to all the users. The second part presents our ABE-based secure data sharing solutions for two specific applications -- Cloud Computing and Wireless Sensor Networks (WSNs). In Cloud Computing cloud servers are usually operated by third-party providers, which are almost certain to be outside the trust domain of cloud users. To secure data storage and sharing for cloud users, our proposed scheme lets the data owner (also a cloud user) generate her own ABE keys for data encryption and take the full control on key distribution/revocation. The main challenge in this work is to make the computation load affordable to the data owner and data consumers (both are cloud users). We address this challenge by uniquely combining various computation delegation techniques with ABE and allow both the data owner and data consumers to securely mitigate most computation-intensive tasks to cloud servers which are envisaged to have unlimited resources. In WSNs, wireless sensor nodes are often unattendedly deployed in the field and vulnerable to strong attacks such as memory breach. For securing storage and sharing of data on distributed storage sensor nodes while retaining data confidentiality, sensor nodes encrypt their collected data using ABE public keys and store encrypted data on storage nodes. Authorized users are given corresponding decryption keys to read data. The main challenge in this case is that sensor nodes are extremely resource-constrained and can just afford limited computation/communication load. Taking this into account we divide the lifetime of sensor nodes into phases and distribute the computation tasks into each phase. We also revised the original ABE scheme to make the overhead pertained to user revocation minimal for sensor nodes. Feasibility of the scheme is demonstrated by experiments on real sensor platforms. "

APA, Harvard, Vancouver, ISO, and other styles

22

Meinel, Christoph, Maxim Schnjakin, Tobias Metzke, and Markus Freitag. "Anbieter von Cloud Speicherdiensten im Überblick." Universität Potsdam, 2014. http://opus.kobv.de/ubp/volltexte/2014/6878/.

Full text

Abstract:

Durch die immer stärker werdende Flut an digitalen Informationen basieren immer mehr Anwendungen auf der Nutzung von kostengünstigen Cloud Storage Diensten. Die Anzahl der Anbieter, die diese Dienste zur Verfügung stellen, hat sich in den letzten Jahren deutlich erhöht. Um den passenden Anbieter für eine Anwendung zu finden, müssen verschiedene Kriterien individuell berücksichtigt werden. In der vorliegenden Studie wird eine Auswahl an Anbietern etablierter Basic Storage Diensten vorgestellt und miteinander verglichen. Für die Gegenüberstellung werden Kriterien extrahiert, welche bei jedem der untersuchten Anbieter anwendbar sind und somit eine möglichst objektive Beurteilung erlauben. Hierzu gehören unter anderem Kosten, Recht, Sicherheit, Leistungsfähigkeit sowie bereitgestellte Schnittstellen. Die vorgestellten Kriterien können genutzt werden, um Cloud Storage Anbieter bezüglich eines konkreten Anwendungsfalles zu bewerten.
Due to the ever-increasing flood of digital information, more and more applications make use of cost-effective cloud storage services. The number of vendors that provide these services has increased significantly in recent years. The identification of an appropriate service provider requires an individual consideration of several criteria. This survey presents a comparison of some established basic storage providers. For this comparison, several criteria are extracted that are applicable to any of the selected providers and thus allow for an assessment that is as objective as possible. The criteria include factors like costs, legal information, security, performance, and supported interfaces. The presented criteria can be used to evaluate cloud storage providers in a specific use case in order to identify the most suitable service based on individual requirements.

APA, Harvard, Vancouver, ISO, and other styles

23

Barreto, Andres E. "API-Based Acquisition of Evidence from Cloud Storage Providers." ScholarWorks@UNO, 2015. http://scholarworks.uno.edu/td/2030.

Full text

Abstract:

Cloud computing and cloud storage services, in particular, pose a new challenge to digital forensic investigations. Currently, evidence acquisition for such services still follows the traditional approach of collecting artifacts on a client device. In this work, we show that such an approach not only requires upfront substantial investment in reverse engineering each service, but is also inherently incomplete as it misses prior versions of the artifacts, as well as cloud-only artifacts that do not have standard serialized representations on the client. In this work, we introduce the concept of API-based evidence acquisition for cloud services, which addresses these concerns by utilizing the officially supported API of the service. To demonstrate the utility of this approach, we present a proof-of-concept acquisition tool, kumodd, which can acquire evidence from four major cloud storage providers: Google Drive, Microsoft One, Dropbox, and Box. The implementation provides both command-line and web user interfaces, and can be readily incorporated into established forensic processes.

APA, Harvard, Vancouver, ISO, and other styles

24

Rodrigues, João Miguel Cardia Melro. "TSKY: a dependable middleware solution for data privacy using public storage clouds." Master's thesis, Faculdade de Ciências e Tecnologia, 2013. http://hdl.handle.net/10362/11071.

Full text

Abstract:

Dissertação para obtenção do Grau de Mestre em Engenharia Informática
This dissertation aims to take advantage of the virtues offered by data storage cloud based systems on the Internet, proposing a solution that avoids security issues by combining different providers’ solutions in a vision of a cloud-of-clouds storage and computing. The solution, TSKY System (or Trusted Sky), is implemented as a middleware system, featuring a set of components designed to establish and to enhance conditions for security, privacy, reliability and availability of data, with these conditions being secured and verifiable by the end-user, independently of each provider. These components, implement cryptographic tools, including threshold and homomorphic cryptographic schemes, combined with encryption, replication, and dynamic indexing mecha-nisms. The solution allows data management and distribution functions over data kept in different storage clouds, not necessarily trusted, improving and ensuring resilience and security guarantees against Byzantine faults and at-tacks. The generic approach of the TSKY system model and its implemented services are evaluated in the context of a Trusted Email Repository System (TSKY-TMS System). The TSKY-TMS system is a prototype that uses the base TSKY middleware services to store mailboxes and email Messages in a cloud-of-clouds.

APA, Harvard, Vancouver, ISO, and other styles

25

Martello, Rosanna. "Cloud storage and processing of automotive Lithium-ion batteries data for RUL prediction." Master's thesis, Alma Mater Studiorum - Università di Bologna, 2021.

Find full text

Abstract:

Lithium-ion batteries are the ideal choice for electric and hybrid vehicles, but the high cost and the relatively short life represent an open issue for automotive industries. For this reason, the estimation of battery Remaining Useful Life (RUL) and the State of Health (SoH) are primary goals in the automotive sector. Cloud computing provides all the resources necessary to store, process and analyze all sensor data coming from connected vehicles in order to develop Predictive Maintenance tasks. This project describes the work done during my internship at FEV Italia s.r.l. The aims were designing an architecture for managing the data coming from a vehicle fleet and developing algorithms able to predict the SoH and the RUL of Lithium-ion batteries. The designed architecture is based on three Amazon Web Services: Amazon Elastic Compute Cloud, Amazon Simple Storage Service and Amazon Relational Database Service. After data processing and the feature extraction, the RUL and SoH estimations are performed by training two Neural Networks.

APA, Harvard, Vancouver, ISO, and other styles

26

Cao, Ning. "Secure and Reliable Data Outsourcing in Cloud Computing." Digital WPI, 2012. https://digitalcommons.wpi.edu/etd-dissertations/333.

Full text

Abstract:

"The many advantages of cloud computing are increasingly attracting individuals and organizations to outsource their data from local to remote cloud servers. In addition to cloud infrastructure and platform providers, such as Amazon, Google, and Microsoft, more and more cloud application providers are emerging which are dedicated to offering more accessible and user friendly data storage services to cloud customers. It is a clear trend that cloud data outsourcing is becoming a pervasive service. Along with the widespread enthusiasm on cloud computing, however, concerns on data security with cloud data storage are arising in terms of reliability and privacy which raise as the primary obstacles to the adoption of the cloud. To address these challenging issues, this dissertation explores the problem of secure and reliable data outsourcing in cloud computing. We focus on deploying the most fundamental data services, e.g., data management and data utilization, while considering reliability and privacy assurance. The first part of this dissertation discusses secure and reliable cloud data management to guarantee the data correctness and availability, given the difficulty that data are no longer locally possessed by data owners. We design a secure cloud storage service which addresses the reliability issue with near-optimal overall performance. By allowing a third party to perform the public integrity verification, data owners are significantly released from the onerous work of periodically checking data integrity. To completely free the data owner from the burden of being online after data outsourcing, we propose an exact repair solution so that no metadata needs to be generated on the fly for the repaired data. The second part presents our privacy-preserving data utilization solutions supporting two categories of semantics - keyword search and graph query. For protecting data privacy, sensitive data has to be encrypted before outsourcing, which obsoletes traditional data utilization based on plaintext keyword search. We define and solve the challenging problem of privacy-preserving multi- keyword ranked search over encrypted data in cloud computing. We establish a set of strict privacy requirements for such a secure cloud data utilization system to become a reality. We first propose a basic idea for keyword search based on secure inner product computation, and then give two improved schemes to achieve various stringent privacy requirements in two different threat models. We also investigate some further enhancements of our ranked search mechanism, including supporting more search semantics, i.e., TF Ã— IDF, and dynamic data operations. As a general data structure to describe the relation between entities, the graph has been increasingly used to model complicated structures and schemaless data, such as the personal social network, the relational database, XML documents and chemical compounds. In the case that these data contains sensitive information and need to be encrypted before outsourcing to the cloud, it is a very challenging task to effectively utilize such graph-structured data after encryption. We define and solve the problem of privacy-preserving query over encrypted graph-structured data in cloud computing. By utilizing the principle of filtering-and-verification, we pre-build a feature-based index to provide feature-related information about each encrypted data graph, and then choose the efficient inner product as the pruning tool to carry out the filtering procedure."

APA, Harvard, Vancouver, ISO, and other styles

27

Mahmud, A. S. M. Hasan. "Sustainable Resource Management for Cloud Data Centers." FIU Digital Commons, 2016. http://digitalcommons.fiu.edu/etd/2634.

Full text

Abstract:

In recent years, the demand for data center computing has increased significantly due to the growing popularity of cloud applications and Internet-based services. Today's large data centers host hundreds of thousands of servers and the peak power rating of a single data center may even exceed 100MW. The combined electricity consumption of global data centers accounts for about 3% of worldwide production, raising serious concerns about their carbon footprint. The utility providers and governments are consistently pressuring data center operators to reduce their carbon footprint and energy consumption. While these operators (e.g., Apple, Facebook, and Google) have taken steps to reduce their carbon footprints (e.g., by installing on-site/off-site renewable energy facility), they are aggressively looking for new approaches that do not require expensive hardware installation or modification. This dissertation focuses on developing algorithms and systems to improve the sustainability in data centers without incurring significant additional operational or setup costs. In the first part, we propose a provably-efficient resource management solution for a self-managed data center to cap and reduce the carbon emission while maintaining satisfactory service performance. Our solution reduces the carbon emission of a self-managed data center to net-zero level and achieves carbon neutrality. In the second part, we consider minimizing the carbon emission in a hybrid data center infrastructure that includes geographically distributed self-managed and colocation data centers. This segment identifies and addresses the challenges of resource management in a hybrid data center infrastructure and proposes an efficient distributed solution to optimize the workload and resource allocation jointly in both self-managed and colocation data centers. In the final part, we explore sustainable resource management from cloud service users' point of view. A cloud service user purchases computing resources (e.g., virtual machines) from the service provider and does not have direct control over the carbon emission of the service provider's data center. Our proposed solution encourages a user to take part in sustainable (both economical and environmental) computing by limiting its spending on cloud resource purchase while satisfying its application performance requirements.

APA, Harvard, Vancouver, ISO, and other styles

28

SENIGAGLIESI, LINDA. "Information-theoretic security techniques for data communications and storage." Doctoral thesis, Università Politecnica delle Marche, 2019. http://hdl.handle.net/11566/263165.

Full text

Abstract:

Negli ultimi anni il bisogno di sicurezza e privacy è cresciuto in maniera esponenziale in molti aspetti delle comunicazioni, parallelamente allo sviluppo tecnologico. La maggior parte dei sistemi di sicurezza attualmente implementati sono basati sulla nozione di sicurezza computazionale, e devono essere continuamente tenuti aggiornati per affrontare il miglioramento degli attacchi e l’avanzamento tecnologico. Allo scopo di soddisfare requisiti sempre più stringenti e rigorosi, di recente è cresciuto l’interesse verso soluzioni appartenenti al paradigma della teoria dell’informazione a supporto di schemi di segretezza prettamente crittografici, in particolare grazie alla capacità di queste soluzioni di garantire sicurezza indipendentemente dalla capacità dell’attaccante, altrimenti nota come sicurezza incondizionata. In questo lavoro di tesi il nostro scopo è quello di analizzare come metriche di segretezza relative alla teoria dell’informazione possono essere applicate in sistemi pratici con lo scopo di garantire la sicurezza e la privacy dei dati. Per iniziare, vengono definite delle metriche di tipo information-teoretiche per valutare le prestazioni di segretezza di sistemi realistici di comunicazione wireless sotto vincoli pratici, e con esse un protocollo che combina tecniche di codifica per sicurezza a livello fisico con soluzioni crittografiche. Questo schema è in grado di raggiungere un dato livello di sicurezza semantica in presenza di un attaccante passivo. Allo stesso tempo vengono presi in considerazione molteplici scenari: viene fornita un’analisi di sicurezza per canali paralleli con nodi relay, trovando l’allocazione ottima di risorse che massimizza il secrecy rate. Successivamente, sfruttando un model checker probabilistico, vengono definiti i parametri per sistemi di storage distribuiti ed eterogenei che permettono di raggiungere la segretezza perfetta in condizioni pratiche. Per garantire la privacy, proponiamo inoltre uno schema che garantisce il recupero privato delle informazioni in uno scenario di caching wireless in presenza di nodi malevoli. Definiamo infine il piazzamento ottimale dei contenuti tale minimizzare l’uso del canale di backhaul, riducendo così il costo delle comunicazioni del sistema.
The last years have seen a growing need of security and privacy in many aspects of communications, together with the technological progress. Most of the implemented security solutions are based on the notion of computational security, and must be kept continuously updated to face new attacks and technology advancements. To meet the more and more strict requirements, solutions based on the information-theoretic paradigm are gaining interest to support pure cryptographic techniques, thanks to their capacity to achieve security independently on the attacker’s computing resources, also known as unconditional security. In this work we investigate how information-theoretic security can be applied to practical systems in order to ensure data security and privacy. We first start defining information-theoretic metrics to assess the secrecy performance of realistic wireless communication settings under practical conditions, together with a protocol that mixes coding techniques for physical layer security and cryptographic solutions. This scheme is able to achieve some level of semantic security at the presence of a passive attacker. At the same time, multiple scenarios are considered. We provide a security analysis for parallel relay channels, thus finding an optimal resource allocation that maximizes the secrecy rate. Successively, by exploiting a probabilistic model checker, we define the parameters for heterogeneous distributed storage systems that permit us to achieve perfect secrecy in practical conditions. For privacy purposes, we propose a scheme which guarantees private information retrieval of files for caching at the wireless edge against multiple spy nodes. We find the optimal content placement that minimizes the backhaul usage, thus reducing the communication cost of the system.

APA, Harvard, Vancouver, ISO, and other styles

29

Brandenburger, Marcus Verfasser], Rüdiger [Akademischer Betreuer] [Kapitza, and Christian [Akademischer Betreuer] Cachin. "Securing Data Integrity from Cloud Storage to Blockchains / Marcus Brandenburger ; Rüdiger Kapitza, Christian Cachin." Braunschweig : Technische Universität Braunschweig, 2021. http://d-nb.info/1225863465/34.

Full text

APA, Harvard, Vancouver, ISO, and other styles

30

Ikken, Sonia. "Efficient placement design and storage cost saving for big data workflow in cloud datacenters." Thesis, Evry, Institut national des télécommunications, 2017. http://www.theses.fr/2017TELE0020/document.

Full text

Abstract:

Les workflows sont des systèmes typiques traitant le big data. Ces systèmes sont déployés sur des sites géo-distribués pour exploiter des infrastructures cloud existantes et réaliser des expériences à grande échelle. Les données générées par de telles expériences sont considérables et stockées à plusieurs endroits pour être réutilisées. En effet, les systèmes workflow sont composés de tâches collaboratives, présentant de nouveaux besoins en terme de dépendance et d'échange de données intermédiaires pour leur traitement. Cela entraîne de nouveaux problèmes lors de la sélection de données distribuées et de ressources de stockage, de sorte que l'exécution des tâches ou du job s'effectue à temps et que l'utilisation des ressources soit rentable. Par conséquent, cette thèse aborde le problème de gestion des données hébergées dans des centres de données cloud en considérant les exigences des systèmes workflow qui les génèrent. Pour ce faire, le premier problème abordé dans cette thèse traite le comportement d'accès aux données intermédiaires des tâches qui sont exécutées dans un cluster MapReduce-Hadoop. Cette approche développe et explore le modèle de Markov qui utilise la localisation spatiale des blocs et analyse la séquentialité des fichiers spill à travers un modèle de prédiction. Deuxièmement, cette thèse traite le problème de placement de données intermédiaire dans un stockage cloud fédéré en minimisant le coût de stockage. A travers les mécanismes de fédération, nous proposons un algorithme exacte ILP afin d’assister plusieurs centres de données cloud hébergeant les données de dépendances en considérant chaque paire de fichiers. Enfin, un problème plus générique est abordé impliquant deux variantes du problème de placement lié aux dépendances divisibles et entières. L'objectif principal est de minimiser le coût opérationnel en fonction des besoins de dépendances inter et intra-job
The typical cloud big data systems are the workflow-based including MapReduce which has emerged as the paradigm of choice for developing large scale data intensive applications. Data generated by such systems are huge, valuable and stored at multiple geographical locations for reuse. Indeed, workflow systems, composed of jobs using collaborative task-based models, present new dependency and intermediate data exchange needs. This gives rise to new issues when selecting distributed data and storage resources so that the execution of tasks or job is on time, and resource usage-cost-efficient. Furthermore, the performance of the tasks processing is governed by the efficiency of the intermediate data management. In this thesis we tackle the problem of intermediate data management in cloud multi-datacenters by considering the requirements of the workflow applications generating them. For this aim, we design and develop models and algorithms for big data placement problem in the underlying geo-distributed cloud infrastructure so that the data management cost of these applications is minimized. The first addressed problem is the study of the intermediate data access behavior of tasks running in MapReduce-Hadoop cluster. Our approach develops and explores Markov model that uses spatial locality of intermediate data blocks and analyzes spill file sequentiality through a prediction algorithm. Secondly, this thesis deals with storage cost minimization of intermediate data placement in federated cloud storage. Through a federation mechanism, we propose an exact ILP algorithm to assist multiple cloud datacenters hosting the generated intermediate data dependencies of pair of files. The proposed algorithm takes into account scientific user requirements, data dependency and data size. Finally, a more generic problem is addressed in this thesis that involve two variants of the placement problem: splittable and unsplittable intermediate data dependencies. The main goal is to minimize the operational data cost according to inter and intra-job dependencies

APA, Harvard, Vancouver, ISO, and other styles

31

Ikken, Sonia. "Efficient placement design and storage cost saving for big data workflow in cloud datacenters." Electronic Thesis or Diss., Evry, Institut national des télécommunications, 2017. http://www.theses.fr/2017TELE0020.

Full text

Abstract:

Les workflows sont des systèmes typiques traitant le big data. Ces systèmes sont déployés sur des sites géo-distribués pour exploiter des infrastructures cloud existantes et réaliser des expériences à grande échelle. Les données générées par de telles expériences sont considérables et stockées à plusieurs endroits pour être réutilisées. En effet, les systèmes workflow sont composés de tâches collaboratives, présentant de nouveaux besoins en terme de dépendance et d'échange de données intermédiaires pour leur traitement. Cela entraîne de nouveaux problèmes lors de la sélection de données distribuées et de ressources de stockage, de sorte que l'exécution des tâches ou du job s'effectue à temps et que l'utilisation des ressources soit rentable. Par conséquent, cette thèse aborde le problème de gestion des données hébergées dans des centres de données cloud en considérant les exigences des systèmes workflow qui les génèrent. Pour ce faire, le premier problème abordé dans cette thèse traite le comportement d'accès aux données intermédiaires des tâches qui sont exécutées dans un cluster MapReduce-Hadoop. Cette approche développe et explore le modèle de Markov qui utilise la localisation spatiale des blocs et analyse la séquentialité des fichiers spill à travers un modèle de prédiction. Deuxièmement, cette thèse traite le problème de placement de données intermédiaire dans un stockage cloud fédéré en minimisant le coût de stockage. A travers les mécanismes de fédération, nous proposons un algorithme exacte ILP afin d’assister plusieurs centres de données cloud hébergeant les données de dépendances en considérant chaque paire de fichiers. Enfin, un problème plus générique est abordé impliquant deux variantes du problème de placement lié aux dépendances divisibles et entières. L'objectif principal est de minimiser le coût opérationnel en fonction des besoins de dépendances inter et intra-job
The typical cloud big data systems are the workflow-based including MapReduce which has emerged as the paradigm of choice for developing large scale data intensive applications. Data generated by such systems are huge, valuable and stored at multiple geographical locations for reuse. Indeed, workflow systems, composed of jobs using collaborative task-based models, present new dependency and intermediate data exchange needs. This gives rise to new issues when selecting distributed data and storage resources so that the execution of tasks or job is on time, and resource usage-cost-efficient. Furthermore, the performance of the tasks processing is governed by the efficiency of the intermediate data management. In this thesis we tackle the problem of intermediate data management in cloud multi-datacenters by considering the requirements of the workflow applications generating them. For this aim, we design and develop models and algorithms for big data placement problem in the underlying geo-distributed cloud infrastructure so that the data management cost of these applications is minimized. The first addressed problem is the study of the intermediate data access behavior of tasks running in MapReduce-Hadoop cluster. Our approach develops and explores Markov model that uses spatial locality of intermediate data blocks and analyzes spill file sequentiality through a prediction algorithm. Secondly, this thesis deals with storage cost minimization of intermediate data placement in federated cloud storage. Through a federation mechanism, we propose an exact ILP algorithm to assist multiple cloud datacenters hosting the generated intermediate data dependencies of pair of files. The proposed algorithm takes into account scientific user requirements, data dependency and data size. Finally, a more generic problem is addressed in this thesis that involve two variants of the placement problem: splittable and unsplittable intermediate data dependencies. The main goal is to minimize the operational data cost according to inter and intra-job dependencies

APA, Harvard, Vancouver, ISO, and other styles

32

Bäcklund, Simon, and Albin Ljungdahl. "Data storage for a small lumberprocessing company in Sweden." Thesis, Linnéuniversitetet, Institutionen för datavetenskap och medieteknik (DM), 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:lnu:diva-105088.

Full text

Abstract:

The world is becoming increasingly digitized, and with this trend comes an increas-ing need for storing data for companies of all sizes. For smaller enterprises, thiscould prove to be a major challenge due to limitations in knowledge and financialassets. So the purpose of this study is to investigate how smaller companies cansatisfy their needs for data storage and which database management system to usein order to not let their shortcomings hold their development and growth back. Tofulfill this purpose, a small wood processing company in Sweden is examined andused as an example. To investigate and answer the problem, literary research is con-ducted to gain knowledge about data storage and the different options for this thatexist. Microsoft Access, MySQL, and MongoDB are selected for evaluation andtheir performance is compared in controlled experiments. The results of this studyindicates that, due to the small amount of data that the example company possesses,the simplicity of Microsoft Access trumps the high performance of its competitors.However, with increasingly developed internet infrastructure, the option of hostinga database in the cloud has become a feasible option. If hosting the database in thecloud is the desired solution, Microsoft Access has a higher operating cost than theother alternatives, making MySQL come out on top.

APA, Harvard, Vancouver, ISO, and other styles

33

Cheng, Yue. "Workload-aware Efficient Storage Systems." Diss., Virginia Tech, 2017. http://hdl.handle.net/10919/78677.

Full text

Abstract:

The growing disparity in data storage and retrieval needs of modern applications is driving the proliferation of a wide variety of storage systems (e.g., key-value stores, cloud storage services, distributed filesystems, and flash cache, etc.). While extant storage systems are designed and tuned for a specific set of applications targeting a range of workload characteristics, they lack the flexibility in adapting to the ever-changing workload behaviors. Moreover, the complexities in implementing modern storage systems and adapting ever-changing storage requirements present unique opportunities and engineering challenges. In this dissertation, we design and develop a series of novel data management and storage systems solutions by applying a simple yet effective rule---workload awareness. We find that simple workload-aware data management strategies are effective in improving the efficiency of modern storage systems, sometimes by an order of magnitude. The first two works tackle the data management and storage space allocation issues at distributed and cloud storage level, while the third work focuses on low-level data management problems in the local storage system, which many high-level storage/data-intensive applications rely on. In the first part of this dissertation (Chapter~ref{ch:mbal}), we propose and develop MBal, a high-performance in-memory object caching framework with adaptive multi-phase load balancing, which supports not only horizontal (scale-out) but vertical (scale-up) scalability as well. MBal is able to make efficient use of available resources in the cloud through its fine-grained, partitioned, lockless design. In the second part of this dissertation (Chapter~ref{ch:cast} and Chapter~ref{ch:pricing}), we design and build CAST (Chapter~ref{ch:cast}), a Cloud Analytics Storage Tiering solution that cloud tenants can use to reduce monetary cost and improve performance of analytics workloads. The approach takes the first step towards providing storage tiering support for data analytics in the cloud. Furthermore, we propose a hybrid cloud object storage system (Chapter~ref{ch:pricing}) that could effectively engage both the cloud service providers and cloud tenants via a novel dynamic pricing mechanism. In the third part of this dissertation (Chapter~ref{ch:offline}), targeting local storage, we explore offline algorithms for flash caching in terms of both hit ratio and flash lifespan. We design and implement a multi-stage heuristic by synthesizing several techniques that manage data at the granularity of a flash erasure unit (which we call a container) to approximate the offline optimal algorithm. In the fourth part of this dissertation (Chapter~ref{ch:turnkey}), we are focused on how to enable fast prototyping of efficient distributed key-value stores targeting a proxy-based layered architecture. In this work, we design and build {con}, a framework that significantly reduce the engineering effort required to build a full-fledged distributed key-value store. Our dissertation shows that simple workload-aware data management strategies can bring huge benefit in terms of both efficiency (i.e., performance, monetary cost, etc.) and flexibility (i.e., ease-of-use, ease-of-deployment, programmability, etc.). The principles of leveraging workload dynamicity and storage heterogeneity can be used to guide next-generation storage system software design, especially when being faced with new storage hardware technologies.
Ph. D.

APA, Harvard, Vancouver, ISO, and other styles

34

Carpen-Amarie, Alexandra. "BlobSeer as a data-storage facility for clouds : self-Adaptation, integration, evaluation." Thesis, Cachan, Ecole normale supérieure, 2011. http://www.theses.fr/2011DENS0066/document.

Full text

Abstract:

L’émergence de l’informatique dans les nuages met en avant de nombreux défis qui pourraient limiter l’adoption du paradigme Cloud. Tandis que la taille des données traitées par les applications Cloud augmente exponentiellement, un défi majeur porte sur la conception de solutions efficaces pour la gestion de données. Cette thèse a pour but de concevoir des mécanismes d’auto-adaptation pour des systèmes de gestion de données, afin qu’ils puissent répondre aux exigences des services de stockage Cloud en termes de passage à l’échelle, disponibilité et sécurité des données. De plus, nous nous proposons de concevoir un service de données qui soit à la fois compatible avec les interfaces Cloud standard dans et capable d’offrir un stockage de données à haut débit. Pour relever ces défis, nous avons proposé des mécanismes génériques pour l’auto-connaissance, l’auto-protection et l’auto-configuration des systèmes de gestion de données. Ensuite, nous les avons validés en les intégrant dans le logiciel BlobSeer, un système de stockage qui optimise les accès hautement concurrents aux données. Finalement, nous avons conçu et implémenté un système de fichiers s’appuyant sur BlobSeer, afin d’optimiser ce dernier pour servir efficacement comme support de stockage pour les services Cloud. Puis, nous l’avons intégré dans un environnement Cloud réel, la plate-forme Nimbus. Les avantages et les désavantages de l’utilisation du stockage dans le Cloud pour des applications réelles sont soulignés lors des évaluations effectuées sur Grid’5000. Elles incluent des applications à accès intensif aux données, comme MapReduce, et des applications fortement couplées, comme les simulations atmosphériques
The emergence of Cloud computing brings forward many challenges that may limit the adoption rate of the Cloud paradigm. As data volumes processed by Cloud applications increase exponentially, designing efficient and secure solutions for data management emerges as a crucial requirement. The goal of this thesis is to enhance a distributed data-management system with self-management capabilities, so that it can meet the requirements of the Cloud storage services in terms of scalability, data availability, reliability and security. Furthermore, we aim at building a Cloud data service both compatible with state-of-the-art Cloud interfaces and able to deliver high-throughput data storage. To meet these goals, we proposed generic self-awareness, self-protection and self-configuration components targeted at distributed data-management systems. We validated them on top of BlobSeer, a large-scale data-management system designed to optimize highly-concurrent data accesses. Next, we devised and implemented a BlobSeer-based file system optimized to efficiently serve as a storage backend for Cloud services. We then integrated it within a real-world Cloud environment, the Nimbus platform. The benefits and drawbacks of using Cloud storage for real-life applications have been emphasized in evaluations that involved data-intensive MapReduce applications and tightly-coupled, high-performance computing applications

APA, Harvard, Vancouver, ISO, and other styles

35

Chihoub, Houssem Eddine. "Managing consistency for big data applications : tradeoffs and self-adaptiveness." Thesis, Cachan, Ecole normale supérieure, 2013. http://www.theses.fr/2013DENS0059/document.

Full text

Abstract:

Dans l’ère de Big Data, les applications intensives en données gèrent des volumes de données extrêmement grand. De plus, ils ont besoin de temps de traitement rapide. Une grande partie de ces applications sont déployées sur des infrastructures cloud. Ceci est afin de bénéficier de l’élasticité des clouds, les déploiements sur demande et les coûts réduits strictement relatifs à l’usage. Dans ce contexte, la réplication est un moyen essentiel dans le cloud afin de surmonter les défis de Big Data. En effet, la réplication fournit les moyens pour assurer la disponibilité des données à travers de nombreuses copies de données, des accès plus rapide aux copies locales, la tolérance aux fautes. Cependant, la réplication introduit le problème majeur de la cohérence de données. La gestion de la cohérence est primordiale pour les systèmes de Big Data. Les modèles à cohérence forte présentent de grandes limitations aux aspects liées aux performances et au passage à l’échelle à cause des besoins de synchronisation. En revanche, les modèles à cohérence faible et éventuelle promettent de meilleures performances ainsi qu’une meilleure disponibilité de données. Toutefois, ces derniers modèles peuvent tolérer, sous certaines conditions, trop d’incohérence temporelle. Dans le cadre du travail de cette thèse, on s'adresse particulièrement aux problèmes liés aux compromis de cohérence dans les systèmes à large échelle de Big Data. Premièrement, on étudie la gestion de cohérence au niveau du système de stockage. On introduit un modèle de cohérence auto-adaptative (nommé Harmony). Ce modèle augmente et diminue de manière automatique le niveau de cohérence et le nombre de copies impliquées dans les opérations. Ceci permet de fournir de meilleures performances toute en satisfaisant les besoins de cohérence de l’application. De plus, on introduit une étude détaillée sur l'impact de la gestion de la cohérence sur le coût financier dans le cloud. On emploi cette étude afin de proposer une gestion de cohérence efficace qui réduit les coûts. Dans une troisième direction, on étudie les effets de gestion de cohérence sur la consommation en énergie des systèmes de stockage distribués. Cette étude nous mène à analyser les gains potentiels des reconfigurations adaptatives des systèmes de stockage en matière de réduction de la consommation. Afin de compléter notre travail au niveau système de stockage, on s'adresse à la gestion de cohérence au niveau de l’application. Les applications de Big Data sont de nature différente et ont des besoins de cohérence différents. Par conséquent, on introduit une approche de modélisation du comportement de l’application lors de ses accès aux données. Le modèle résultant facilite la compréhension des besoins en cohérence. De plus, ce modèle est utilisé afin de délivrer une cohérence customisée spécifique à l’application
In the era of Big Data, data-intensive applications handle extremely large volumes of data while requiring fast processing times. A large number of such applications run in the cloud in order to benefit from cloud elasticity, easy on-demand deployments, and cost-efficient Pays-As-You-Go usage. In this context, replication is an essential feature in the cloud in order to deal with Big Data challenges. Therefore, replication therefore, enables high availability through multiple replicas, fast data access to local replicas, fault tolerance, and disaster recovery. However, replication introduces the major issue of data consistency across different copies. Consistency management is a critical for Big Data systems. Strong consistency models introduce serious limitations to systems scalability and performance due to the required synchronization efforts. In contrast, weak and eventual consistency models reduce the performance overhead and enable high levels of availability. However, these models may tolerate, under certain scenarios, too much temporal inconsistency. In this Ph.D thesis, we address this issue of consistency tradeoffs in large-scale Big Data systems and applications. We first, focus on consistency management at the storage system level. Accordingly, we propose an automated self-adaptive model (named Harmony) that scale up/down the consistency level at runtime when needed in order to provide as high performance as possible while preserving the application consistency requirements. In addition, we present a thorough study of consistency management impact on the monetary cost of running in the cloud. Hereafter, we leverage this study in order to propose a cost efficient consistency tuning (named Bismar) in the cloud. In a third direction, we study the consistency management impact on energy consumption within the data center. According to our findings, we investigate adaptive configurations of the storage system cluster that target energy saving. In order to complete our system-side study, we focus on the application level. Applications are different and so are their consistency requirements. Understanding such requirements at the storage system level is not possible. Therefore, we propose an application behavior modeling that apprehend the consistency requirements of an application. Based on the model, we propose an online prediction approach- named Chameleon that adapts to the application specific needs and provides customized consistency

APA, Harvard, Vancouver, ISO, and other styles

36

Berg, Markus. "Privat molnlagring i arbetet : En fallstudie om hur ett IT-företaghanterar att anställda använder privatmolnlagring för arbetsrelateradinformation." Thesis, Högskolan i Skövde, Institutionen för informationsteknologi, 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:his:diva-15655.

Full text

Abstract:

Denna fallstudie har undersökt och studerat hur ett företag som arbetar med informationsteknologi (IT) hanterar problemet att anställda lagrar arbetsrelaterad information i sina privata molnlagringstjänster. En enkätundersökning gjordes för att undersöka och jämföra hur olika avdelningar på företaget bland annat använder sin privata molnlagringstjänst och om de är medvetna om risker med molnlagring. En intervju genomfördes med en informant från företaget för att kontrastera svaren från enkätundersökningen. Vidare genomfördes också en valideringsintervju med en informant från ett annat företag som arbetar inom samma bransch.Resultatet visade att anställda som arbetar inom IT-branschen faktiskt lagrar eller har lagrat arbetsrelaterade information i sina privata molnlagringstjänster. En av avdelningarna som deltog i enkäten var informationssäkerhetsteamet. Anställda som arbetar med informationssäkerhet kan antas vara mest pålästa och kunniga vad gäller säkerhet kring information, trots det så visade resultatet att dessa personer var de som använde privat molnlagring mest, i jämförelse med övriga avdelningar. Vidare så visar resultatet också att företagen arbetar med att motverka och upplysa anställda om risker med hur information hanteras, det ena företaget i synnerhet arbetar väldigt proaktivt och tar informationssäkerhet på stort allvar.
In this work, a case study was made on one company that is working with IT. The study was made to see how an IT company mitigates and handles the problem that employees use their private cloud storage to store work related information. A survey was sent to three different departments at the company to examine what employees think about private cloud storage and work related information and how and if it differs between different departments. One interview was also made with one informant from the same company to compare with the results from the survey. A second interview with another company was also made to validate the result from the first interview.The results from the survey showed that employees do store or have stored work related information in their private cloud storage, despite that the company policies forbid it. One of the departments that the survey was sent to was the information security team. It can be assumed that employees that work with information security have better understanding about security regarding information and how it should be stored. Despite that did the results show that the information security team was the department that used private cloud storage the most, compared to the other departments. The results from the interviews shows that the companies at least have policies on how and where the employees are allowed to store work related information. One company in particular works really hard and continuously to educate and enlighten its employees to take great care when dealing with work related information.

APA, Harvard, Vancouver, ISO, and other styles

37

Vijayakumar, Sruthi. "Hadoop Based Data Intensive Computation on IAAS Cloud Platforms." UNF Digital Commons, 2015. http://digitalcommons.unf.edu/etd/567.

Full text

Abstract:

Cloud computing is a relatively new form of computing which uses virtualized resources. It is dynamically scalable and is often provided as pay for use service over the Internet or Intranet or both. With increasing demand for data storage in the cloud, the study of data-intensive applications is becoming a primary focus. Data intensive applications are those which involve high CPU usage, processing large volumes of data typically in size of hundreds of gigabytes, terabytes or petabytes. The research in this thesis is focused on the Amazon’s Elastic Cloud Compute (EC2) and Amazon Elastic Map Reduce (EMR) using HiBench Hadoop Benchmark suite. HiBench is a Hadoop benchmark suite and is used for performing and evaluating Hadoop based data intensive computation on both these cloud platforms. Both quantitative and qualitative comparisons of Amazon EC2 and Amazon EMR are presented. Also presented are their pricing models and suggestions for future research.

APA, Harvard, Vancouver, ISO, and other styles

38

Netshiongolwe, Mpho. "Investigating perceptions of reliability, efficiency and feasibility of data storage technology: A case study of cloud storage adoption at UCT Faculty of Science." Master's thesis, Faculty of Humanities, 2019. http://hdl.handle.net/11427/30935.

Full text

Abstract:

Within an increasing number of organisations cloud storage is becoming more common as large amounts of data from people and projects are being produced, exchanged and stored (Chang & Wills, 2016: 56). In fact, “technology has evolved and has allowed increasingly large and efficient data storage, which in turn has allowed increasingly sophisticated ways to use it (Staff, 2016: n.p.). Thus, the aim of this study is to investigate the perceptions of reliability, efficiency and feasibility of data storage technology. The investigation is done by addressing claims and perceptions of data storage technology within the Faculty of Science at UCT. This study intends to determine if cloud storage is the future of storing, managing and preservation of digital data. The study used a qualitative research method grounded by Management Fashion Theory. Data was collected from three case studies from the Faculty of Science, and also from a desktop internet search on the marketing of cloud storage. Data collection from the case studies was facilitated through semi-structured interviews and from three researchers and academics who are working on cloud storage projects. Main themes that guided the dialogue during data collection originated from reviewed literature. The study concludes that cloud storage is the way forward for storing, sharing and managing research data. Academic researchers find storing data on cloud beneficial; however, it comes with challenges such as costs, security, access, privacy, control and ethics.

APA, Harvard, Vancouver, ISO, and other styles

39

Ruty, Guillaume. "Towards more scalability and flexibility for distributed storage systems." Thesis, Université Paris-Saclay (ComUE), 2019. http://www.theses.fr/2019SACLT006/document.

Full text

Abstract:

Les besoins en terme de stockage, en augmentation exponentielle, sont difficilement satisfaits par les systèmes de stockage distribué traditionnels. Alors que les performances des disques ont ratrappé celles des cartes réseau en terme d'ordre de grandeur, leur capacité ne croit pas à la même vitesse que l'ensemble des données requérant d'êtres stockées, notamment à cause de l'avènement des applications de big data. Par ailleurs, l'équilibre de performances entre disques, cartes réseau et processeurs a changé et les états de fait sur lesquels se basent la plupart des systèmes de stockage distribué actuels ne sont plus vrais. Cette dissertation explique de quelle manière certains aspects de tels systèmes de stockages peuvent être modifiés et repensés pour faire une utilisation plus efficace des ressources qui les composent. Elle présente une architecture de stockage nouvelle qui se base sur une couche de métadonnées distribuée afin de fournir du stockage d'objet de manière flexible tout en passant à l'échelle. Elle détaille ensuite un algorithme d'ordonnancement des requêtes permettant a un système de stockage générique de traiter les requêtes de clients en parallèle de manière plus équitable. Enfin, elle décrit comment améliorer le cache générique du système de fichier dans le contexte de systèmes de stockage distribué basés sur des codes correcteurs avant de présenter des contributions effectuées dans le cadre de courts projets de recherche
The exponentially growing demand for storage puts a huge stress on traditionnal distributed storage systems. While storage devices' performance have caught up with network devices in the last decade, their capacity do not grow as fast as the rate of data growth, especially with the rise of cloud big data applications. Furthermore, the performance balance between storage, network and compute devices has shifted and the assumptions that are the foundation for most distributed storage systems are not true anymore. This dissertation explains how several aspects of such storage systems can be modified and rethought to make a more efficient use of the resource at their disposal. It presents an original architecture that uses a distributed layer of metadata to provide flexible and scalable object-level storage, then proposes a scheduling algorithm improving how a generic storage system handles concurrent requests. Finally, it describes how to improve legacy filesystem-level caching for erasure-code-based distributed storage systems, before presenting a few other contributions made in the context of short research projects

APA, Harvard, Vancouver, ISO, and other styles

40

Chiossi, Luca. "High-Performance Persistent Caching in Multi- and Hybrid- Cloud Environments." Master's thesis, Alma Mater Studiorum - Università di Bologna, 2020. http://amslaurea.unibo.it/20089/.

Full text

Abstract:

Il modello di lavoro noto come Multi Cloud sta emergendo come una naturale evoluzione del Cloud Computing per rispondere alle nuove esigenze di business delle aziende. Un tipico esempio è il modello noto come Cloud Ibrido dove si ha un Cloud Privato connesso ad un Cloud Pubblico per consentire alle applicazioni di scalare al bisogno e contemporaneamente rispondere ai bisogni di privacy, costi e sicurezza. Data la distribuzione dei dati su diverse strutture, quando delle applicazioni in esecuzione su un centro di calcolo devono utilizzare dati memorizzati remotamente, diventa necessario accedere alla rete che connette le diverse infrastrutture. Questo ha grossi impatti negativi su carichi di lavoro che consumano dati in modo intensivo e che di conseguenza vengono influenzati da ritardi dovuti alla bassa banda e latenza tipici delle connessioni di rete. Applicazioni di Intelligenza Artificiale e Calcolo Scientifico sono esempi di questo tipo di carichi di lavoro che, grazie all’uso sempre maggiore di acceleratori come GPU e FPGA, diventano capaci di consumare dati ad una velocità maggiore di quella con cui diventano disponibili. Implementare un livello di cache che fornisce e memorizza i dati di calcolo dal dispositivo di memorizzazione lento (remoto) a quello più veloce (ma costoso) dove i calcoli sono eseguiti, sembra essere la migliore soluzione per trovare il compromesso ottimale tra il costo dei dispositivi di memorizzazione offerti come servizi Cloud e la grande velocità di calcolo delle moderne applicazioni. Il sistema cache presentato in questo lavoro è stato sviluppato tenendo conto di tutte le peculiarità dei servizi di memorizzazione Cloud che fanno uso di API S3 per comunicare con i clienti. La soluzione proposta è stata ottenuta lavorando con il sistema di memorizzazione distribuito Ceph che implementa molti dei servizi caratterizzanti la semantica S3 ed inoltre, essendo pensato per lavorare su ambienti Cloud si inserisce bene in scenari Multi Cloud.

APA, Harvard, Vancouver, ISO, and other styles

41

Wiren, Jakob. "Data Storage Cost Optimization Based on Electricity Price Forecasting with Machine Learning in a Multi-Geographical Cloud Environment." Thesis, Linköpings universitet, Kommunikations- och transportsystem, 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-152250.

Full text

Abstract:

As increased demand of cloud computing leads to increased electricity costs for cloud providers, there is an incentive to investigate in new methods to lower electricity costs in data centers. Electricity price markets suffer from sudden price spikes as well as irregularities between different geographical electricity markets. This thesis investigates in whether it is possible to leverage these volatilities and irregularities between different electricity price markets, to offload or move storage in order to reduce electricity price costs for data storage. By forecasting four different electricity price markets it was possible to predict sudden price spikes and leverage these forecasts in a simple optimization model to offload storage of data in data centers and successfully reduce electricity costs for data storage.

APA, Harvard, Vancouver, ISO, and other styles

42

Ceccarelli, Viviana. "Data Logger: cognitive analysis." Master's thesis, Alma Mater Studiorum - Università di Bologna, 2021.

Find full text

Abstract:

This work is developed on the basis of data related to the driving experience of a motorcycle in its aspects. The drivers point of view is supported by the analysis of the data produced by the sensors and collected by a data logger. The two aspects, human and technical, have been standardized from a technological point of view so that they can be analysed and related to each other. The need for cognitive algorithms has led to the use of Cloud PaaS and SaaS technologies: these have allowed the rapid realization of the project without technical constraints. All of this is part of the specific requirements set up by Ducati's PQ department whose aim is to evaluate the response on the road of new models or technological improvements before the production in series. The final solution for the proposed data collection was realized using different services of IBM Cloud: a web app was built using a virtual assistant based on cognitive technology that guides the user in the collection of information and verifies its consistency. The application required backend services such as an Object Storage and a Database, to store multimedia content and data respectively. Concerning the front-end instead, the automation of the collection mode has led to the introduction of an acquisition process that was easier to use and always available in terms of reachability and flexibility. The information related to the data collected by the logger did not need to be mediated by further technological components. The technology used for the analytical part is Microsoft's Power BI. The integration with the two data sources is plug and play: the data are automatically updated. Several types of analysis have been done and range from reconstruction and correlation of measurements on a route to text analysis of driver sensations. In addition to this, a PCB has been developed to solve an issue related to the data logger itself.

APA, Harvard, Vancouver, ISO, and other styles

43

Nicolae, Bogdan. "BlobSeer : towards efficient data storage management for large-scale, distributed systems." Phd thesis, Université Rennes 1, 2010. http://tel.archives-ouvertes.fr/tel-00552271.

Full text

Abstract:

With data volumes increasing at a high rate and the emergence of highly scalable infrastructures (cloud computing, petascale computing), distributed management of data becomes a crucial issue that faces many challenges. This thesis brings several contributions in order to address such challenges. First, it proposes a set of principles for designing highly scalable distributed storage systems that are optimized for heavy data access concurrency. In particular, it highlights the potentially large benefits of using versioning in this context. Second, based on these principles, it introduces a series of distributed data and metadata management algorithms that enable a high throughput under concurrency. Third, it shows how to efficiently implement these algorithms in practice, dealing with key issues such as high-performance parallel transfers, efficient maintainance of distributed data structures, fault tolerance, etc. These results are used to build BlobSeer, an experimental prototype that is used to demonstrate both the theoretical benefits of the approach in synthetic benchmarks, as well as the practical benefits in real-life, applicative scenarios: as a storage backend for MapReduce applications, as a storage backend for deployment and snapshotting of virtual machine images in clouds, as a quality-of-service enabled data storage service for cloud applications. Extensive experimentations on the Grid'5000 testbed show that BlobSeer remains scalable and sustains a high throughput even under heavy access concurrency, outperforming by a large margin several state-of-art approaches.

APA, Harvard, Vancouver, ISO, and other styles

44

Nicolae, Bogdan. "BlobSeer : towards efficient data storage management for large-scale, distributed systems." Phd thesis, Rennes 1, 2010. http://www.theses.fr/2010REN1S123.

Full text

Abstract:

With data volumes increasing at a high rate and the emergence of highly scalable infrastructures (cloud computing, petascale computing), distributed management of data becomes a crucial issue that faces many challenges. This thesis brings several contributions in order to address such challenges. First, it proposes a set of principles for designing highly scalable distributed storage systems that are optimized for heavy data access concurrency. In particular, it highlights the potentially large benefits of using versioning in this context. Second, based on these principles, it introduces a series of distributed data and metadata management algorithms that enable a high throughput under concurrency. Third, it shows how to efficiently implement these algorithms in practice, dealing with key issues such as high-performance parallel transfers, efficient maintainance of distributed data structures, fault tolerance, etc. These results are used to build BlobSeer, an experimental prototype that is used to demonstrate both the theoretical benefits of the approach in synthetic benchmarks, as well as the practical benefits in real-life, applicative scenarios: as a storage backend for MapReduce applications, as a storage backend for deployment and snapshotting of virtual machine images in clouds, as a quality-of-service enabled data storage service for cloud applications. Extensive experimentations on the Grid'5000 testbed show that BlobSeer remains scalable and sustains a high throughput even under heavy access concurrency, outperforming by a large margin several state-of-art approaches
Avec des volumes de données en forte augmentation et l'émergence d'infrastructures avec un fort passage à l'échelle (cloud computing, calcul petascale), une gestion distribuée des données devient un problème crucial qui présente plusieurs challenges. Cette thèse apporte plusieurs contributions afin de résoudre ces challenges. Premièrement, elle propose un ensemble de principes permettant de concevoir des systèmes de stockage distribués passant largement à l'échelle et optimisés pour des accès sous forte concurrence. En particulier, ellemet en évidence les avantages liés à l'utilisation du versionnement dans ce contexte. Ensuite, fondé sur ces principes, elle introduit un ensemble d'algorithmes de gestion distribuée de données et méta-données qui permettent un débit d'accès important sous haute concurrence. Enfin, elle montre comment mettre en oeuvre de façon efficace ces algorithmes, en résolvant des problèmes pratiques. Ces résultats sont utilisés pour construire BlobSeer, un prototype expérimental utilisé pour démontrer les avantages théoriques de cette approche dans des tests synthétiques, ansi que ses avantages pratiques dans des applications réelles: système de stockage pour applications MapReduce, système de stockage pour déployer et sauvegarder des images de machines virtuelles dans les clouds, et service de stockage de données pour des applications sur les clouds. Des analyses approfondies sur la plate-forme d'expérimentation Grid'5000 démontrent que BlobSeer passe largement à l'échelle et soutient un débit d'accès important même sous haute concurrence, dépassant d'une large marge plusieurs approches de l'état de l'art courant

APA, Harvard, Vancouver, ISO, and other styles

45

Sharma, Sagar. "Towards Data and Model Confidentiality in Outsourced Machine Learning." Wright State University / OhioLINK, 2019. http://rave.ohiolink.edu/etdc/view?acc_num=wright1567529092809275.

Full text

APA, Harvard, Vancouver, ISO, and other styles

46

Framner, Erik. "A Configuration User Interface for Multi-Cloud Storage Based on Secret Sharing : An Exploratory Design Study." Thesis, Karlstads universitet, Handelshögskolan (from 2013), 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:kau:diva-71354.

Full text

Abstract:

Storing personal information in a secure and reliable manner may be crucial for organizational as well as private users. Encryption protects the confidentiality of data against adversaries but if the cryptographic key is lost, the information will not be obtainable for authorized individuals either. Redundancy may protect information against availability issues or data loss, but also comes with greater storage overhead and cost. Cloud storage serves as an attractive alternative to traditional storage as one is released from maintenance responsibilities and does not have to invest in in-house IT-resources. However, cloud adoption is commonly hindered due to privacy concerns. Instead of relying on the security of a single cloud, this study aims to investigate the applicability of a multi-cloud solution based on Secret Sharing, and to identify suitable options and guidelines in a configuration user interface (UI). Interviews were conducted with technically skilled people representing prospective users, followed by walkthroughs of a UI prototype. Although the solution would (theoretically) allow for employment of less “trustworthy” clouds without compromising the data confidentiality, the research results indicate that trust factors such as compliance with EU laws may still be a crucial prerequisite in order for users to utilize cloud services. Users may worry about cloud storage providers colluding, and the solution may not be perceived as adequately secure without the use of encryption. The configuration of the Secret Sharing parameters are difficult to comprehend even for technically skilled individuals and default values could/should be recommended to the user.
PRISMACLOUD

APA, Harvard, Vancouver, ISO, and other styles

47

Khan, Syeduzzaman. "A PROBABILISTIC MACHINE LEARNING FRAMEWORK FOR CLOUD RESOURCE SELECTION ON THE CLOUD." Scholarly Commons, 2020. https://scholarlycommons.pacific.edu/uop_etds/3720.

Full text

Abstract:

The execution of the scientific applications on the Cloud comes with great flexibility, scalability, cost-effectiveness, and substantial computing power. Market-leading Cloud service providers such as Amazon Web service (AWS), Azure, Google Cloud Platform (GCP) offer various general purposes, memory-intensive, and compute-intensive Cloud instances for the execution of scientific applications. The scientific community, especially small research institutions and undergraduate universities, face many hurdles while conducting high-performance computing research in the absence of large dedicated clusters. The Cloud provides a lucrative alternative to dedicated clusters, however a wide range of Cloud computing choices makes the instance selection for the end-users. This thesis aims to simplify Cloud instance selection for end-users by proposing a probabilistic machine learning framework to allow to users select a suitable Cloud instance for their scientific applications. This research builds on the previously proposed A2Cloud-RF framework that recommends high-performing Cloud instances by profiling the application and the selected Cloud instances. The framework produces a set of objective scores called the A2Cloud scores, which denote the compatibility level between the application and the selected Cloud instances. When used alone, the A2Cloud scores become increasingly unwieldy with an increasing number of tested Cloud instances. Additionally, the framework only examines the raw application performance and does not consider the execution cost to guide resource selection. To improve the usability of the framework and assist with economical instance selection, this research adds two Naïve Bayes (NB) classifiers that consider both the application’s performance and execution cost. These NB classifiers include: 1) NB with a Random Forest Classifier (RFC) and 2) a standalone NB module. Naïve Bayes with a Random Forest Classifier (RFC) augments the A2Cloud-RF framework's final instance ratings with the execution cost metric. In the training phase, the classifier builds the frequency and probability tables. The classifier recommends a Cloud instance based on the highest posterior probability for the selected application. The standalone NB classifier uses the generated A2Cloud score (an intermediate result from the A2Cloud-RF framework) and execution cost metric to construct an NB classifier. The NB classifier forms a frequency table and probability (prior and likelihood) tables. For recommending a Cloud instance for a test application, the classifier calculates the highest posterior probability for all of the Cloud instances. The classifier recommends a Cloud instance with the highest posterior probability. This study performs the execution of eight real-world applications on 20 Cloud instances from AWS, Azure, GCP, and Linode. We train the NB classifiers using 80% of this dataset and employ the remaining 20% for testing. The testing yields more than 90% recommendation accuracy for the chosen applications and Cloud instances. Because of the imbalanced nature of the dataset and multi-class nature of classification, we consider the confusion matrix (true positive, false positive, true negative, and false negative) and F1 score with above 0.9 scores to describe the model performance. The final goal of this research is to make Cloud computing an accessible resource for conducting high-performance scientific executions by enabling users to select an effective Cloud instance from across multiple providers.

APA, Harvard, Vancouver, ISO, and other styles

48

Habtu, Simon. "Indexing file metadata using a distributed search engine for searching files on a public cloud storage." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-232064.

Full text

Abstract:

Visma Labs AB or Visma wanted to conduct experiments to see if file metadata could be indexed for searching files on a public cloud storage. Given that storing files in a public cloud storage is cheaper than the current storage solution, the implementation could save Visma money otherwise spent on expensive storage costs. The thesis is therefore to find and evaluate an approach chosen for indexing file metadata and searching files on a public cloud storage with the chosen distributed search engine Elasticsearch. The architecture of the proposed solution is similar to a file service and was implemented using several containerized services for it to function. The results show that the file service solution is indeed feasible but would need further tuning and more resources to function according to the demands of Visma.
Visma Labs AB eller Visma ville genomföra experiment för att se om filmetadata skulle kunna indexeras för att söka efter filer på ett publikt moln. Med tanke på att lagring av filer på ett publikt moln är billigare än den nuvarande lagringslösningen, kan implementeringen spara Visma pengar som spenderas på dyra lagringskostnader. Denna studie är därför till för att hitta och utvärdera ett tillvägagångssätt valt för att indexera filmetadata och söka filer på ett offentligt molnlagring med den utvalda distribuerade sökmotorn Elasticsearch. Arkitekturen för den föreslagna lösningen har likenelser av en filtjänst och implementerades med flera containeriserade tjänster för att den ska fungera. Resultaten visar att filservicelösningen verkligen är möjlig men skulle behöva ytterligare modifikationer och fler resurser att fungera enligt Vismas krav.

APA, Harvard, Vancouver, ISO, and other styles

49

Zampetakis, Stamatis. "Scalable algorithms for cloud-based Semantic Web data management." Thesis, Paris 11, 2015. http://www.theses.fr/2015PA112199/document.

Full text

Abstract:

Afin de construire des systèmes intelligents, où les machines sont capables de raisonner exactement comme les humains, les données avec sémantique sont une exigence majeure. Ce besoin a conduit à l’apparition du Web sémantique, qui propose des technologies standards pour représenter et interroger les données avec sémantique. RDF est le modèle répandu destiné à décrire de façon formelle les ressources Web, et SPARQL est le langage de requête qui permet de rechercher, d’ajouter, de modifier ou de supprimer des données RDF. Être capable de stocker et de rechercher des données avec sémantique a engendré le développement des nombreux systèmes de gestion des données RDF.L’évolution rapide du Web sémantique a provoqué le passage de systèmes de gestion des données centralisées à ceux distribués. Les premiers systèmes étaient fondés sur les architectures pair-à-pair et client-serveur, alors que récemment l’attention se porte sur le cloud computing.Les environnements de cloud computing ont fortement impacté la recherche et développement dans les systèmes distribués. Les fournisseurs de cloud offrent des infrastructures distribuées autonomes pouvant être utilisées pour le stockage et le traitement des données. Les principales caractéristiques du cloud computing impliquent l’évolutivité́, la tolérance aux pannes et l’allocation élastique des ressources informatiques et de stockage en fonction des besoins des utilisateurs.Cette thèse étudie la conception et la mise en œuvre d’algorithmes et de systèmes passant à l’échelle pour la gestion des données du Web sémantique sur des platformes cloud. Plus particulièrement, nous étudions la performance et le coût d’exploitation des services de cloud computing pour construire des entrepôts de données du Web sémantique, ainsi que l’optimisation de requêtes SPARQL pour les cadres massivement parallèles.Tout d’abord, nous introduisons les concepts de base concernant le Web sémantique et les principaux composants des systèmes fondés sur le cloud. En outre, nous présentons un aperçu des systèmes de gestion des données RDF (centralisés et distribués), en mettant l’accent sur les concepts critiques de stockage, d’indexation, d’optimisation des requêtes et d’infrastructure.Ensuite, nous présentons AMADA, une architecture de gestion de données RDF utilisant les infrastructures de cloud public. Nous adoptons le modèle de logiciel en tant que service (software as a service - SaaS), où la plateforme réside dans le cloud et des APIs appropriées sont mises à disposition des utilisateurs, afin qu’ils soient capables de stocker et de récupérer des données RDF. Nous explorons diverses stratégies de stockage et d’interrogation, et nous étudions leurs avantages et inconvénients au regard de la performance et du coût monétaire, qui est une nouvelle dimension importante à considérer dans les services de cloud public.Enfin, nous présentons CliqueSquare, un système distribué de gestion des données RDF basé sur Hadoop. CliqueSquare intègre un nouvel algorithme d’optimisation qui est capable de produire des plans massivement parallèles pour des requêtes SPARQL. Nous présentons une famille d’algorithmes d’optimisation, s’appuyant sur les équijointures n- aires pour générer des plans plats, et nous comparons leur capacité à trouver les plans les plus plats possibles. Inspirés par des techniques de partitionnement et d’indexation existantes, nous présentons une stratégie de stockage générique appropriée au stockage de données RDF dans HDFS (Hadoop Distributed File System). Nos résultats expérimentaux valident l’effectivité et l’efficacité de l’algorithme d’optimisation démontrant également la performance globale du système
In order to build smart systems, where machines are able to reason exactly like humans, data with semantics is a major requirement. This need led to the advent of the Semantic Web, proposing standard ways for representing and querying data with semantics. RDF is the prevalent data model used to describe web resources, and SPARQL is the query language that allows expressing queries over RDF data. Being able to store and query data with semantics triggered the development of many RDF data management systems. The rapid evolution of the Semantic Web provoked the shift from centralized data management systems to distributed ones. The first systems to appear relied on P2P and client-server architectures, while recently the focus moved to cloud computing.Cloud computing environments have strongly impacted research and development in distributed software platforms. Cloud providers offer distributed, shared-nothing infrastructures that may be used for data storage and processing. The main features of cloud computing involve scalability, fault-tolerance, and elastic allocation of computing and storage resources following the needs of the users.This thesis investigates the design and implementation of scalable algorithms and systems for cloud-based Semantic Web data management. In particular, we study the performance and cost of exploiting commercial cloud infrastructures to build Semantic Web data repositories, and the optimization of SPARQL queries for massively parallel frameworks.First, we introduce the basic concepts around Semantic Web and the main components and frameworks interacting in massively parallel cloud-based systems. In addition, we provide an extended overview of existing RDF data management systems in the centralized and distributed settings, emphasizing on the critical concepts of storage, indexing, query optimization, and infrastructure. Second, we present AMADA, an architecture for RDF data management using public cloud infrastructures. We follow the Software as a Service (SaaS) model, where the complete platform is running in the cloud and appropriate APIs are provided to the end-users for storing and retrieving RDF data. We explore various storage and querying strategies revealing pros and cons with respect to performance and also to monetary cost, which is a important new dimension to consider in public cloud services. Finally, we present CliqueSquare, a distributed RDF data management system built on top of Hadoop, incorporating a novel optimization algorithm that is able to produce massively parallel plans for SPARQL queries. We present a family of optimization algorithms, relying on n-ary (star) equality joins to build flat plans, and compare their ability to find the flattest possibles. Inspired by existing partitioning and indexing techniques we present a generic storage strategy suitable for storing RDF data in HDFS (Hadoop’s Distributed File System). Our experimental results validate the efficiency and effectiveness of the optimization algorithm demonstrating also the overall performance of the system

APA, Harvard, Vancouver, ISO, and other styles

50

Trebulová, Debora. "Zálohování dat a datová úložiště." Master's thesis, Vysoké učení technické v Brně. Fakulta podnikatelská, 2017. http://www.nusl.cz/ntk/nusl-318599.

Full text

Abstract:

This diploma thesis focuses on ways of backing up data and their practical use in a specific proposal for Transroute Group s.r.o.. In the introduction part the theoretical knowledge on this issue is presented. Next part of the thesis deals with the analysis of the current state of backup in the company. This section is followed by a chapter where several solutions are presented each with their financial evaluation. The ending part is composed of the choice of a specific solution and a time estimate for its implementation.

APA, Harvard, Vancouver, ISO, and other styles

We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!