To see the other types of publications on this topic, follow the link: Multi-cloud Data.

Dissertations / Theses on the topic 'Multi-cloud Data'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 47 dissertations / theses for your research on the topic 'Multi-cloud Data.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.

1

Fan, Qi. "Multi-Objective Optimization for Data Analytics in the Cloud." Electronic Thesis or Diss., Institut polytechnique de Paris, 2024. http://www.theses.fr/2024IPPAX069.

Full text
Abstract:
Le traitement des requêtes Big Data est devenu de plus en plus important, ce qui a conduit au développement et au déploiement dans le cloud de nombreux systèmes. Cependant, le réglage automatique des nombreux paramètres de ces systèmes Big Data introduit une complexité croissante pour répondre aux objectifs de performance et aux contraintes budgétaires des utilisateurs. La détermination des configurations optimales est un défi en raison de la nécessité de prendre en compte : 1) plusieurs objectifs de performances et contraintes budgétaires concurrents, tels qu'une faible latence et un faible coût, 2) un espace de paramètres de grande dimension avec un contrôle de paramètres complexe, et 3) l'exigence d'une configuration élevée. efficacité de calcul dans l'utilisation du cloud, généralement en 1 à 2 secondes.Pour relever les défis ci-dessus, cette thèse propose des algorithmes d'optimisation multi-objectifs (MOO) efficaces pour un optimiseur de cloud afin de répondre à divers objectifs des utilisateurs. Il calcule les configurations Pareto optimales pour les requêtes Big Data dans un espace de paramètres de grande dimension tout en respectant des exigences strictes en matière de temps de résolution. Plus précisément, cette thèse présente les contributions suivantes.La première contribution de cette thèse est une analyse comparative des méthodes et solveurs MOO existants, identifiant leurs limites, notamment en termes d'efficacité et de qualité des solutions Pareto, lorsqu'elles sont appliquées à l'optimisation du cloud.La deuxième contribution présente les algorithmes MOO conçus pour calculer les solutions optimales de Pareto pour les étapes de requête, qui sont des unités définies par des limites de mélange. Dans le traitement du Big Data à l’échelle de la production, chaque étape opère dans un espace de paramètres de grande dimension, avec des milliers d’instances parallèles. Chaque instance nécessite des paramètres de ressources déterminés lors de l'affectation à l'une des milliers de machines, comme en témoignent des systèmes comme MaxCompute. Pour atteindre l’optimalité Pareto pour chaque étape de requête, nous proposons une nouvelle approche hiérarchique MOO. Cette méthode décompose le problème MOO au niveau de l'étape en plusieurs problèmes MOO parallèles au niveau de l'instance et dérive efficacement des solutions MOO au niveau de l'étape à partir de solutions MOO au niveau de l'instance. Les résultats de l'évaluation utilisant des charges de travail de production démontrent que notre approche hiérarchique MOO surpasse les méthodes MOO existantes de 4% à 77% en termes de performances et jusqu'à 48% en réduction des coûts tout en fonctionnant dans un délai de 0,02 à 0,23 secondes par rapport aux optimiseurs et planificateurs actuels.Notre troisième contribution vise à atteindre l’optimalité Pareto pour l’ensemble de la requête avec un contrôle plus fin des paramètres. Dans les systèmes Big Data comme Spark, certains paramètres peuvent être ajustés indépendamment pour chaque étape de la requête, tandis que d'autres sont partagés entre toutes les étapes, introduisant ainsi un espace de paramètres de grande dimension et des contraintes complexes. Pour relever ce défi, nous proposons une nouvelle approche appelée MOO hiérarchique avec contraintes (HMOOC). Cette méthode décompose le problème d’optimisation d’un grand espace de paramètres en sous-problèmes plus petits, chacun contraint d’utiliser les mêmes paramètres partagés. Étant donné que ces sous-problèmes ne sont pas indépendants, nous développons des techniques pour générer un ensemble suffisamment large de solutions candidates et les agréger efficacement pour former des solutions Pareto optimales globales. Les résultats de l'évaluation utilisant les benchmarks TPC-H et TPC-DS démontrent que HMOOC surpasse les méthodes MOO existantes, obtenant une amélioration de 4,7% à 54,1% de l'hypervolume et une réduction de 81% à 98,3% du temps de résolution<br>Big data query processing has become increasingly important, prompting the development and cloud deployment of numerous systems. However, automatically tuning the numerous parameters in these big data systems introduces growing complexity in meeting users' performance goals and budgetary constraints. Determining optimal configurations is challenging due to the need to address: 1) multiple competing performance goals and budgetary constraints, such as low latency and low cost, 2) a high-dimensional parameter space with complex parameter control, and 3) the requirement for high computational efficiency in cloud use, typically within 1-2 seconds.To address the above challenges, this thesis proposes efficient multi-objective optimization (MOO) algorithms for a cloud optimizer to meet various user objectives. It computes Pareto optimal configurations for big data queries within a high-dimensional parameter space while adhering to stringent solving time requirements. More specifically, this thesis introduces the following contributions.The first contribution of this thesis is a benchmarking analysis of existing MOO methods and solvers, identifying their limitations, particularly in terms of efficiency and the quality of Pareto solutions, when applied to cloud optimization.The second contribution introduces MOO algorithms designed to compute Pareto optimal solutions for query stages, which are units defined by shuffle boundaries. In production-scale big data processing, each stage operates within a high-dimensional parameter space, with thousands of parallel instances. Each instance requires resource parameters determined upon assignment to one of thousands of machines, as exemplified by systems like MaxCompute. To achieve Pareto optimality for each query stage, we propose a novel hierarchical MOO approach. This method decomposes the stage-level MOO problem into multiple parallel instance-level MOO problems and efficiently derives stage-level MOO solutions from instance-level MOO solutions. Evaluation results using production workloads demonstrate that our hierarchical MOO approach outperforms existing MOO methods by 4% to 77% in terms of performance and up to 48% in cost reduction while operating within 0.02 to 0.23 seconds compared to current optimizers and schedulers.Our third contribution aims to achieve Pareto optimality for the entire query with finer-granularity control of parameters. In big data systems like Spark, some parameters can be tuned independently for each query stage, while others are shared across all stages, introducing a high-dimensional parameter space and complex constraints. To address this challenge, we propose a new approach called Hierarchical MOO with Constraints (HMOOC). This method decomposes the optimization problem of a large parameter space into smaller subproblems, each constrained to use the same shared parameters. Given that these subproblems are not independent, we develop techniques to generate a sufficiently large set of candidate solutions and efficiently aggregate them to form global Pareto optimal solutions. Evaluation results using TPC-H and TPC-DS benchmarks demonstrate that HMOOC outperforms existing MOO methods, achieving a 4.7% to 54.1% improvement in hypervolume and an 81% to 98.3% reduction in solving time
APA, Harvard, Vancouver, ISO, and other styles
2

Jung, Gueyoung. "Multi-dimensional optimization for cloud based multi-tier applications." Diss., Georgia Institute of Technology, 2010. http://hdl.handle.net/1853/37267.

Full text
Abstract:
Emerging trends toward cloud computing and virtualization have been opening new avenues to meet enormous demands of space, resource utilization, and energy efficiency in modern data centers. By being allowed to host many multi-tier applications in consolidated environments, cloud infrastructure providers enable resources to be shared among these applications at a very fine granularity. Meanwhile, resource virtualization has recently gained considerable attention in the design of computer systems and become a key ingredient for cloud computing. It provides significant improvement of aggregated power efficiency and high resource utilization by enabling resource consolidation. It also allows infrastructure providers to manage their resources in an agile way under highly dynamic conditions. However, these trends also raise significant challenges to researchers and practitioners to successfully achieve agile resource management in consolidated environments. First, they must deal with very different responsiveness of different applications, while handling dynamic changes in resource demands as applications' workloads change over time. Second, when provisioning resources, they must consider management costs such as power consumption and adaptation overheads (i.e., overheads incurred by dynamically reconfiguring resources). Dynamic provisioning of virtual resources entails the inherent performance-power tradeoff. Moreover, indiscriminate adaptations can result in significant overheads on power consumption and end-to-end performance. Hence, to achieve agile resource management, it is important to thoroughly investigate various performance characteristics of deployed applications, precisely integrate costs caused by adaptations, and then balance benefits and costs. Fundamentally, the research question is how to dynamically provision available resources for all deployed applications to maximize overall utility under time-varying workloads, while considering such management costs. Given the scope of the problem space, this dissertation aims to develop an optimization system that not only meets performance requirements of deployed applications, but also addresses tradeoffs between performance, power consumption, and adaptation overheads. To this end, this dissertation makes two distinct contributions. First, I show that adaptations applied to cloud infrastructures can cause significant overheads on not only end-to-end response time, but also server power consumption. Moreover, I show that such costs can vary in intensity and time scale against workload, adaptation types, and performance characteristics of hosted applications. Second, I address multi-dimensional optimization between server power consumption, performance benefit, and transient costs incurred by various adaptations. Additionally, I incorporate the overhead of the optimization procedure itself into the problem formulation. Typically, system optimization approaches entail intensive computations and potentially have a long delay to deal with a huge search space in cloud computing infrastructures. Therefore, this type of cost cannot be ignored when adaptation plans are designed. In this multi-dimensional optimization work, scalable optimization algorithm and hierarchical adaptation architecture are developed to handle many applications, hosting servers, and various adaptations to support various time-scale adaptation decisions.
APA, Harvard, Vancouver, ISO, and other styles
3

Schmidt, Eric Otto. "Cloud properties as inferred from HIRS/2 multi-spectral data." Diss., Georgia Institute of Technology, 1991. http://hdl.handle.net/1853/26817.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Xhagjika, Vamis. "Resource, data and application management for cloud federations and multi-clouds." Doctoral thesis, Universitat Politècnica de Catalunya, 2017. http://hdl.handle.net/10803/409728.

Full text
Abstract:
Distributed Real-Time Media Processing refers to classes of highly distributed, delay no-tolerant applications that account for the majority of the data traffic generated in the world today. Real-Time audio/video conferencing and live content streaming are of particular research interests as technology forecasts predict video traffic surpassing every other type of data traffic in the world in the near future. These applications are very sensitive to both communication properties such as latency, jitter, packet loss, bit rate as well as backend stream processing load profiles. In this work we provide a novel and generalized large-scale Multi-Cloud architectural blueprint for ISP and Carrier providers, that permits smart geo-distributed service placement in order to optimize latency/locality of stream processing applications. We provide as a well self-managed Intra-Cloud federation algorithm based on gradient topologies in order to optimize routes in a live media streaming backend. Additionally we introduce a novel distributed Network Bandwidth Manager that optimizes system stability by arbitrating network bandwidth between multiple Cloud services sharing the same network infrastructure. At last, an empirical study is provided connecting media quality parameters and Cloud backend load profiles, including an algorithm for stream allocation on Cloud Selective Forwarding units.<br>El procesamiento de medios en tiempo real distribuido se refiere a clases de aplicaciones altamente distribuidas, no tolerantes al retardo, que representan la mayoría del tráfico de datos generado en el mundo actual. Las conferencias de audio y video en tiempo real y la transmisión de contenido en vivo tienen especial interés en investigación, ya que la prospectiva tecnológica estima que el tráfico de video supere a cualquier otro tipo de tráfico de datos en el futuro cercano. La transmisión en vivo se refiere a aplicaciones en las que flujos de audio/vídeo de una fuente se han de entregar a un conjunto de destinos en lugares geográficos diferentes mientras se mantiene baja la latencia de entrega del flujo (como por ejemplo la cobertura de eventos en vivo). Las plataformas de conferencia en tiempo real son plataformas de aplicación que implementan comunicaciones de audio/video en tiempo real entre muchos participantes. Ambas categorías presentan una alta sensibilidad tanto al estado de la red (latencia, jitter, pérdida de paquetes, velocidad de bits) como a los perfiles de carga de la infraestructura de procesamiento de flujo (latencia y jitter introducidos durante el procesamiento en la nube de paquetes de datos multimedia). Esta tesis trata de mejorar el procesamiento de datos multimedia en tiempo real tanto en los parámetros de nivel de red como en las optimizaciones en la nube.
APA, Harvard, Vancouver, ISO, and other styles
5

Mohamad, Baraa. "Medical Data Management on the cloud." Thesis, Clermont-Ferrand 2, 2015. http://www.theses.fr/2015CLF22582.

Full text
Abstract:
Résumé indisponible<br>Medical data management has become a real challenge due to the emergence of new imaging technologies providing high image resolutions.This thesis focuses in particular on the management of DICOM files. DICOM is one of the most important medical standards. DICOM files have special data format where one file may contain regular data, multimedia data and services. These files are extremely heterogeneous (the schema of a file cannot be predicted) and have large data sizes. The characteristics of DICOM files added to the requirements of medical data management in general – in term of availability and accessibility- have led us to construct our research question as follows:Is it possible to build a system that: (1) is highly available, (2) supports any medical images (different specialties, modalities and physicians’ practices), (3) enables to store extremely huge/ever increasing data, (4) provides expressive accesses and (5) is cost-effective .In order to answer this question we have built a hybrid (row-column) cloud-enabled storage system. The idea of this solution is to disperse DICOM attributes thoughtfully, depending on their characteristics, over both data layouts in a way that provides the best of row-oriented and column-oriented storage models in one system. All with exploiting the interesting features of the cloud that enables us to ensure the availability and portability of medical data. Storing data on such hybrid data layout opens the door for a second research question, how to process queries efficiently over this hybrid data storage with enabling new and more efficient query plansThe originality of our proposal comes from the fact that there is currently no system that stores data in such hybrid storage (i.e. an attribute is either on row-oriented database or on column-oriented one and a given query could interrogate both storage models at the same time) and studies query processing over it.The experimental prototypes implemented in this thesis show interesting results and opens the door for multiple optimizations and research questions
APA, Harvard, Vancouver, ISO, and other styles
6

Pagliari, Alessio. "Network as an On-Demand Service for Multi-Cloud Workloads." Master's thesis, Alma Mater Studiorum - Università di Bologna, 2017.

Find full text
Abstract:
The PrEstoCloud project aims to enable on-demand resource scaling of Big Data applications to the cloud. In this context, we have to deal with the huge amount of data processed and more, in particular, its transportation between one cloud and another. The scope of this thesis is to develop a network-level architecture that could easily deal with Big Data application challenges and could be integrated into the PrEstoCloud consortium staying transparent to the application level. However, the connection between multiple cloud providers in this context presents a series of challenges: the architecture should adapt to the variable number of clouds to connect, it have to bypass the limitations of the cloud infrastructure and most importantly, it must have a general design able to work in every cloud provider. In this report, we present a general VPN-based Inter-Cloud architecture able to work in every kind of environment. We implemented a prototype with IPSec and OpenVPN, connecting the i3s laboratory with Amazon AWS and Azure, we evaluate our architecture and the used tools in two ways: (i) we test the stability over time of the architecture via latency tests; (ii) we perform non-intrusive Pathload tests in the Amazon, showing the usability of the available bandwidth estimator in the cloud, the AWS network characteristics discovered through the tests and a final comparison of the VPN tools overhead.
APA, Harvard, Vancouver, ISO, and other styles
7

Xu, Zichen. "Energy Modeling and Management for Data Services in Multi-Tier Mobile Cloud Architectures." The Ohio State University, 2016. http://rave.ohiolink.edu/etdc/view?acc_num=osu1468272637.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

de, Carvalho Tiago Filipe Rodrigues. "Integrated Approach to Dynamic and Distributed Cloud Data Center Management." Research Showcase @ CMU, 2016. http://repository.cmu.edu/dissertations/739.

Full text
Abstract:
Management solutions for current and future Infrastructure-as-a-Service (IaaS) Data Centers (DCs) face complex challenges. First, DCs are now very large infrastructures holding hundreds of thousands if not millions of servers and applications. Second, DCs are highly heterogeneous. DC infrastructures consist of servers and network devices with different capabilities from various vendors and different generations. Cloud applications are owned by different tenants and have different characteristics and requirements. Third, most DC elements are highly dynamic. Applications can change over time. During their lifetime, their logical architectures evolve and change according to workload and resource requirements. Failures and bursty resource demand can lead to unstable states affecting a large number of services. Global and centralized approaches limit scalability and are not suitable for large dynamic DC environments with multiple tenants with different application requirements. We propose a novel fully distributed and dynamic management paradigm for highly diverse and volatile DC environments. We develop LAMA, a novel framework for managing large scale cloud infrastructures based on a multi-agent system (MAS). Provider agents collaborate to advertise and manage available resources, while app agents provide integrated and customized application management. Distributing management tasks allows LAMA to scale naturally. Integrated approach improves its efficiency. The proximity to the application and knowledge of the DC environment allow agents to quickly react to changes in performance and to pre-plan for potential failures. We implement and deploy LAMA in a testbed server cluster. We demonstrate how LAMA improves scalability of management tasks such as provisioning and monitoring. We evaluate LAMA in light of state-of-the-art open source frameworks. LAMA enables customized dynamic management strategies to multi-tier applications. These strategies can be configured to respond to failures and workload changes within the limits of the desired SLA for each application.
APA, Harvard, Vancouver, ISO, and other styles
9

Liu, Kun. "Multi-View Oriented 3D Data Processing." Thesis, Université de Lorraine, 2015. http://www.theses.fr/2015LORR0273/document.

Full text
Abstract:
Le raffinement de nuage de points et la reconstruction de surface sont deux problèmes fondamentaux dans le traitement de la géométrie. La plupart des méthodes existantes ont été ciblées sur les données de capteur de distance et se sont avérées être mal adaptées aux données multi-vues. Dans cette thèse, deux nouvelles méthodes sont proposées respectivement pour les deux problèmes avec une attention particulière aux données multi-vues. La première méthode permet de lisser les nuages de points provenant de la reconstruction multi-vue sans endommager les données. Le problème est formulé comme une optimisation non-linéaire sous contrainte et ensuite résolu par une série de problèmes d’optimisation sans contrainte au moyen d’une méthode de barrière. La seconde méthode effectue une triangulation du nuage de points d’entrée pour générer un maillage en utilisant une stratégie de l’avancement du front pilotée par un critère de l’empilement compact de sphères. L’algorithme est simple et permet de produire efficacement des maillages de haute qualité. Les expérimentations sur des données synthétiques et du monde réel démontrent la robustesse et l’efficacité des méthodes proposées. Notre méthodes sont adaptées aux applications qui nécessitent des informations de position précises et cohérentes telles que la photogrammétrie et le suivi des objets en vision par ordinateur<br>Point cloud refinement and surface reconstruction are two fundamental problems in geometry processing. Most of the existing methods have been targeted at range sensor data and turned out be ill-adapted to multi-view data. In this thesis, two novel methods are proposed respectively for the two problems with special attention to multi-view data. The first method smooths point clouds originating from multi-view reconstruction without impairing the data. The problem is formulated as a nonlinear constrained optimization and addressed as a series of unconstrained optimization problems by means of a barrier method. The second method triangulates point clouds into meshes using an advancing front strategy directed by a sphere packing criterion. The method is algorithmically simple and can produce high-quality meshes efficiently. The experiments on synthetic and real-world data have been conducted as well, which demonstrates the robustness and the efficiency of the methods. The developed methods are suitable for applications which require accurate and consistent position information such photogrammetry and tracking in computer vision
APA, Harvard, Vancouver, ISO, and other styles
10

Breschi, Valentina. "Model learning from data: from centralized multi-model regression to distributed cloud-aided single-model estimation." Thesis, IMT Alti Studi Lucca, 2018. http://e-theses.imtlucca.it/256/1/Breschi_phdthesis.pdf.

Full text
Abstract:
This thesis presents a collection of methods for learning models from data, looking at this problem from two perspectives: learning multiple models from a single data source and how to switch among them, and learning a single model from data collected from multiple sources. Regarding the first, to describe complex phenomena with simple but yet complete models, we propose a computationally efficient method for Piecewise Affine (PWA) regression. This approach relies on the combined use (i) multi-model Recursive Least-Squares (RLS) and (ii) piecewise linear multi- category discrimination, and shows good performances when used for the identification of Piecewise Affine dynamical systems with eXogenous inputs (PWARX) and Linear Parameter Varying (LPV) models. The technique for PWA regression is then extended to handle the problem of black-box identification of Discrete Hybrid Automata (DHA) from input/output observations, with hidden operating modes. The method for DHA identification is based on multi-model RLS and multicategory discrimination and it can approximate both the continuous affine dynamics and the Finite State Machine (FSM) governing the logical dynamics of the DHA. Two more approaches are presented to tackle the problem of learning models that jump over time. While the technique designed to learn Rarely Jump Models (RJMs) from data relies on the combined solution of a convex optimization problem and the use of Dynamic Programming, the method proposed for Markov Jump Models (MJMs) learning is based on the joint use of clustering plus multi-model RLS and a probabilistic clustering technique. The results of the tests performed on the method for RJMs learning have motivated the design of two techniques for Non-Intrusive Load Monitoring, i.e., to estimate the power consumed by the appliances in an household from aggregated measurements, which are also presented in the thesis. In particular, methods based on (i) the optimization of a least-square error cost function, modified to account for the changes in the appliances operating regime, and relying on (ii) multi-model Kalman filters are proposed. Regarding the second perspective, we propose methods for cloud-aided consensus-based parameter estimation over a multitude of similar devices (such as a mass production). In particular, we focus on the design of RLS-based estimators, which allow to handle (i) linear and (ii) nonlinear consensus constraints and (iii) multi-class estimation.
APA, Harvard, Vancouver, ISO, and other styles
11

Chiossi, Luca. "High-Performance Persistent Caching in Multi- and Hybrid- Cloud Environments." Master's thesis, Alma Mater Studiorum - Università di Bologna, 2020. http://amslaurea.unibo.it/20089/.

Full text
Abstract:
Il modello di lavoro noto come Multi Cloud sta emergendo come una naturale evoluzione del Cloud Computing per rispondere alle nuove esigenze di business delle aziende. Un tipico esempio è il modello noto come Cloud Ibrido dove si ha un Cloud Privato connesso ad un Cloud Pubblico per consentire alle applicazioni di scalare al bisogno e contemporaneamente rispondere ai bisogni di privacy, costi e sicurezza. Data la distribuzione dei dati su diverse strutture, quando delle applicazioni in esecuzione su un centro di calcolo devono utilizzare dati memorizzati remotamente, diventa necessario accedere alla rete che connette le diverse infrastrutture. Questo ha grossi impatti negativi su carichi di lavoro che consumano dati in modo intensivo e che di conseguenza vengono influenzati da ritardi dovuti alla bassa banda e latenza tipici delle connessioni di rete. Applicazioni di Intelligenza Artificiale e Calcolo Scientifico sono esempi di questo tipo di carichi di lavoro che, grazie all’uso sempre maggiore di acceleratori come GPU e FPGA, diventano capaci di consumare dati ad una velocità maggiore di quella con cui diventano disponibili. Implementare un livello di cache che fornisce e memorizza i dati di calcolo dal dispositivo di memorizzazione lento (remoto) a quello più veloce (ma costoso) dove i calcoli sono eseguiti, sembra essere la migliore soluzione per trovare il compromesso ottimale tra il costo dei dispositivi di memorizzazione offerti come servizi Cloud e la grande velocità di calcolo delle moderne applicazioni. Il sistema cache presentato in questo lavoro è stato sviluppato tenendo conto di tutte le peculiarità dei servizi di memorizzazione Cloud che fanno uso di API S3 per comunicare con i clienti. La soluzione proposta è stata ottenuta lavorando con il sistema di memorizzazione distribuito Ceph che implementa molti dei servizi caratterizzanti la semantica S3 ed inoltre, essendo pensato per lavorare su ambienti Cloud si inserisce bene in scenari Multi Cloud.
APA, Harvard, Vancouver, ISO, and other styles
12

Tang, Yuzhe. "Secure and high-performance big-data systems in the cloud." Diss., Georgia Institute of Technology, 2014. http://hdl.handle.net/1853/53995.

Full text
Abstract:
Cloud computing and big data technology continue to revolutionize how computing and data analysis are delivered today and in the future. To store and process the fast-changing big data, various scalable systems (e.g. key-value stores and MapReduce) have recently emerged in industry. However, there is a huge gap between what these open-source software systems can offer and what the real-world applications demand. First, scalable key-value stores are designed for simple data access methods, which limit their use in advanced database applications. Second, existing systems in the cloud need automatic performance optimization for better resource management with minimized operational overhead. Third, the demand continues to grow for privacy-preserving search and information sharing between autonomous data providers, as exemplified by the Healthcare information networks. My Ph.D. research aims at bridging these gaps. First, I proposed HINDEX, for secondary index support on top of write-optimized key-value stores (e.g. HBase and Cassandra). To update the index structure efficiently in the face of an intensive write stream, HINDEX synchronously executes append-only operations and defers the so-called index-repair operations which are expensive. The core contribution of HINDEX is a scheduling framework for deferred and lightweight execution of index repairs. HINDEX has been implemented and is currently being transferred to an IBM big data product. Second, I proposed Auto-pipelining for automatic performance optimization of streaming applications on multi-core machines. The goal is to prevent the bottleneck scenario in which the streaming system is blocked by a single core while all other cores are idling, which wastes resources. To partition the streaming workload evenly to all the cores and to search for the best partitioning among many possibilities, I proposed a heuristic based search strategy that achieves locally optimal partitioning with lightweight search overhead. The key idea is to use a white-box approach to search for the theoretically best partitioning and then use a black-box approach to verify the effectiveness of such partitioning. The proposed technique, called Auto-pipelining, is implemented on IBM Stream S. Third, I proposed ǫ-PPI, a suite of privacy preserving index algorithms that allow data sharing among unknown parties and yet maintaining a desired level of data privacy. To differentiate privacy concerns of different persons, I proposed a personalized privacy definition and substantiated this new privacy requirement by the injection of false positives in the published ǫ-PPI data. To construct the ǫ-PPI securely and efficiently, I proposed to optimize the performance of multi-party computations which are otherwise expensive; the key idea is to use addition-homomorphic secret sharing mechanism which is inexpensive and to do the distributed computation in a scalable P2P overlay.
APA, Harvard, Vancouver, ISO, and other styles
13

Pagano, F. "A DISTRIBUTED APPROACH TO PRIVACY ON THE CLOUD." Doctoral thesis, Università degli Studi di Milano, 2012. http://hdl.handle.net/2434/172441.

Full text
Abstract:
The increasing adoption of Cloud-based data processing and storage poses a number of privacy issues. Users wish to preserve full control over their sensitive data and cannot accept it to be fully accessible to an external storage provider. Previous research in this area was mostly addressed at techniques to protect data stored on untrusted database servers; however, I argue that the Cloud architecture presents a number of specific problems and issues. This dissertation contains a detailed analysis of open issues. To handle them, I present a novel approach where confidential data is stored in a highly distributed partitioned database, partly located on the Cloud and partly on the clients. In my approach, data can be either private or shared; the latter is shared in a secure manner by means of simple grant-and-revoke permissions. I have developed a proof-of-concept implementation using an in‑memory RDBMS with row-level data encryption in order to achieve fine-grained data access control. This type of approach is rarely adopted in conventional outsourced RDBMSs because it requires several complex steps. Benchmarks of my proof-of-concept implementation show that my approach overcomes most of the problems.
APA, Harvard, Vancouver, ISO, and other styles
14

Liu, Kun. "Multi-View Oriented 3D Data Processing." Electronic Thesis or Diss., Université de Lorraine, 2015. http://www.theses.fr/2015LORR0273.

Full text
Abstract:
Le raffinement de nuage de points et la reconstruction de surface sont deux problèmes fondamentaux dans le traitement de la géométrie. La plupart des méthodes existantes ont été ciblées sur les données de capteur de distance et se sont avérées être mal adaptées aux données multi-vues. Dans cette thèse, deux nouvelles méthodes sont proposées respectivement pour les deux problèmes avec une attention particulière aux données multi-vues. La première méthode permet de lisser les nuages de points provenant de la reconstruction multi-vue sans endommager les données. Le problème est formulé comme une optimisation non-linéaire sous contrainte et ensuite résolu par une série de problèmes d’optimisation sans contrainte au moyen d’une méthode de barrière. La seconde méthode effectue une triangulation du nuage de points d’entrée pour générer un maillage en utilisant une stratégie de l’avancement du front pilotée par un critère de l’empilement compact de sphères. L’algorithme est simple et permet de produire efficacement des maillages de haute qualité. Les expérimentations sur des données synthétiques et du monde réel démontrent la robustesse et l’efficacité des méthodes proposées. Notre méthodes sont adaptées aux applications qui nécessitent des informations de position précises et cohérentes telles que la photogrammétrie et le suivi des objets en vision par ordinateur<br>Point cloud refinement and surface reconstruction are two fundamental problems in geometry processing. Most of the existing methods have been targeted at range sensor data and turned out be ill-adapted to multi-view data. In this thesis, two novel methods are proposed respectively for the two problems with special attention to multi-view data. The first method smooths point clouds originating from multi-view reconstruction without impairing the data. The problem is formulated as a nonlinear constrained optimization and addressed as a series of unconstrained optimization problems by means of a barrier method. The second method triangulates point clouds into meshes using an advancing front strategy directed by a sphere packing criterion. The method is algorithmically simple and can produce high-quality meshes efficiently. The experiments on synthetic and real-world data have been conducted as well, which demonstrates the robustness and the efficiency of the methods. The developed methods are suitable for applications which require accurate and consistent position information such photogrammetry and tracking in computer vision
APA, Harvard, Vancouver, ISO, and other styles
15

Wiren, Jakob. "Data Storage Cost Optimization Based on Electricity Price Forecasting with Machine Learning in a Multi-Geographical Cloud Environment." Thesis, Linköpings universitet, Kommunikations- och transportsystem, 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-152250.

Full text
Abstract:
As increased demand of cloud computing leads to increased electricity costs for cloud providers, there is an incentive to investigate in new methods to lower electricity costs in data centers. Electricity price markets suffer from sudden price spikes as well as irregularities between different geographical electricity markets. This thesis investigates in whether it is possible to leverage these volatilities and irregularities between different electricity price markets, to offload or move storage in order to reduce electricity price costs for data storage. By forecasting four different electricity price markets it was possible to predict sudden price spikes and leverage these forecasts in a simple optimization model to offload storage of data in data centers and successfully reduce electricity costs for data storage.
APA, Harvard, Vancouver, ISO, and other styles
16

Al-ou'n, Ashraf M. S. "VM Allocation in Cloud Datacenters Based on the Multi-Agent System. An Investigation into the Design and Response Time Analysis of a Multi-Agent-based Virtual Machine (VM) Allocation/Placement Policy in Cloud Datacenters." Thesis, University of Bradford, 2017. http://hdl.handle.net/10454/16067.

Full text
Abstract:
Recent years have witnessed a surge in demand for infrastructure and services to cover high demands on processing big chunks of data and applications resulting in a mega Cloud Datacenter. A datacenter is of high complexity with increasing difficulties to identify, allocate efficiently and fast an appropriate host for the requested virtual machine (VM). Establishing a good awareness of all datacenter’s resources enables the allocation “placement” policies to make the best decision in reducing the time that is needed to allocate and create the VM(s) at the appropriate host(s). However, current algorithms and policies of placement “allocation” do not focus efficiently on awareness of the resources of the datacenter, and moreover, they are based on conventional static techniques. Which are adversely impacting on the allocation progress of the policies. This thesis proposes a new Agent-based allocation/placement policy that employs some of the Multi-Agent system features to get a good awareness of Cloud Datacenter resources and also provide an efficient allocation decision for the requested VMs. Specifically, (a) The Multi-Agent concept is used as a part of the placement policy (b) A Contract Net Protocol is devised to establish good awareness and (c) A verification process is developed to fully dimensional VM specifications during allocation. These new results show a reduction in response time of VM allocation and the usage improvement of occupied resources. The proposed Agent-based policy was implemented using the CloudSim toolkit and consequently was compared, based on a series of typical numerical experiments, with the toolkit’s default policy. The comparative study was carried out in terms of the time duration of VM allocation and other aspects such as the number of available VM types and the amount of occupied resources. Moreover, a two-stage comparative study was introduced through this thesis. Firstly, the proposed policy is compared with four state of the art algorithms, namely the Random algorithm and three one-dimensional Bin-Packing algorithms. Secondly, the three Bin-Packing algorithms were enhanced to have a two-dimensional verification structure and were compared against the proposed new algorithm of the Agent-based policy. Following a rigorous comparative study, it was shown that, through the typical numerical experiments of all stages, the proposed new Agent-based policy had superior performance in terms of the allocation times. Finally, avenues arising from this thesis are included.
APA, Harvard, Vancouver, ISO, and other styles
17

Framner, Erik. "A Configuration User Interface for Multi-Cloud Storage Based on Secret Sharing : An Exploratory Design Study." Thesis, Karlstads universitet, Handelshögskolan (from 2013), 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:kau:diva-71354.

Full text
Abstract:
Storing personal information in a secure and reliable manner may be crucial for organizational as well as private users. Encryption protects the confidentiality of data against adversaries but if the cryptographic key is lost, the information will not be obtainable for authorized individuals either. Redundancy may protect information against availability issues or data loss, but also comes with greater storage overhead and cost. Cloud storage serves as an attractive alternative to traditional storage as one is released from maintenance responsibilities and does not have to invest in in-house IT-resources. However, cloud adoption is commonly hindered due to privacy concerns. Instead of relying on the security of a single cloud, this study aims to investigate the applicability of a multi-cloud solution based on Secret Sharing, and to identify suitable options and guidelines in a configuration user interface (UI). Interviews were conducted with technically skilled people representing prospective users, followed by walkthroughs of a UI prototype. Although the solution would (theoretically) allow for employment of less “trustworthy” clouds without compromising the data confidentiality, the research results indicate that trust factors such as compliance with EU laws may still be a crucial prerequisite in order for users to utilize cloud services. Users may worry about cloud storage providers colluding, and the solution may not be perceived as adequately secure without the use of encryption. The configuration of the Secret Sharing parameters are difficult to comprehend even for technically skilled individuals and default values could/should be recommended to the user.<br>PRISMACLOUD
APA, Harvard, Vancouver, ISO, and other styles
18

Vítek, Daniel. "Cloud computing s ohledem na technologické aspekty a změny v infrastruktuře." Master's thesis, Vysoká škola ekonomická v Praze, 2010. http://www.nusl.cz/ntk/nusl-72548.

Full text
Abstract:
This thesis discusses the new way of delivering IT services over the Internet widely known as cloud computing. In its opening part, cloud computing is put into a historical context of the evolution of enterprise computing, and the dominant issues the IT department faces today are mentioned. Further, the paper deals with several components that make up the architecture of cloud computing and reviews the benefits and drawbacks an enterprise can have while it adopts this new model. One of the primary aims of this thesis is to identify the impact of the technology trends on cloud computing. The thesis brings together four major computing trends, namely virtualization, multi-tenant architecture, service-oriented architecture and grid computing. Another aim is to focus on two trends related to IT infrastructure that will lead to fundamental changes in IT industry. The first of them is the emergence of extremely large-scale data centers at low cost locations, which can serve tremendous amount of customers and achieve considerable economies of scale. The second trend this paper points out is the shift from multi-purpose all-in-one computers into a wide range of mobile devices dedicated to a specific user's needs. The last aim of this thesis is to clarify the economic impact of cloud computing in terms of costs and changes in business models. The thesis concludes by evaluating the current adoption and predicting the future trend of cloud computing.
APA, Harvard, Vancouver, ISO, and other styles
19

Camacho, Rodriguez Jesus. "Efficient techniques for large-scale Web data management." Thesis, Paris 11, 2014. http://www.theses.fr/2014PA112229/document.

Full text
Abstract:
Le développement récent des offres commerciales autour du cloud computing a fortement influé sur la recherche et le développement des plateformes de distribution numérique. Les fournisseurs du cloud offrent une infrastructure de distribution extensible qui peut être utilisée pour le stockage et le traitement des données.En parallèle avec le développement des plates-formes de cloud computing, les modèles de programmation qui parallélisent de manière transparente l'exécution des tâches gourmandes en données sur des machines standards ont suscité un intérêt considérable, à commencer par le modèle MapReduce très connu aujourd'hui puis par d'autres frameworks plus récents et complets. Puisque ces modèles sont de plus en plus utilisés pour exprimer les tâches de traitement de données analytiques, la nécessité se fait ressentir dans l'utilisation des langages de haut niveau qui facilitent la charge de l'écriture des requêtes complexes pour ces systèmes.Cette thèse porte sur des modèles et techniques d'optimisation pour le traitement efficace de grandes masses de données du Web sur des infrastructures à grande échelle. Plus particulièrement, nous étudions la performance et le coût d'exploitation des services de cloud computing pour construire des entrepôts de données Web ainsi que la parallélisation et l'optimisation des langages de requêtes conçus sur mesure selon les données déclaratives du Web.Tout d'abord, nous présentons AMADA, une architecture d'entreposage de données Web à grande échelle dans les plateformes commerciales de cloud computing. AMADA opère comme logiciel en tant que service, permettant aux utilisateurs de télécharger, stocker et interroger de grands volumes de données Web. Sachant que les utilisateurs du cloud prennent en charge les coûts monétaires directement liés à leur consommation de ressources, notre objectif n'est pas seulement la minimisation du temps d'exécution des requêtes, mais aussi la minimisation des coûts financiers associés aux traitements de données. Plus précisément, nous étudions l'applicabilité de plusieurs stratégies d'indexation de contenus et nous montrons qu'elles permettent non seulement de réduire le temps d'exécution des requêtes mais aussi, et surtout, de diminuer les coûts monétaires liés à l'exploitation de l'entrepôt basé sur le cloud.Ensuite, nous étudions la parallélisation efficace de l'exécution de requêtes complexes sur des documents XML mis en œuvre au sein de notre système PAXQuery. Nous fournissons de nouveaux algorithmes montrant comment traduire ces requêtes dans des plans exprimés par le modèle de programmation PACT (PArallelization ConTracts). Ces plans sont ensuite optimisés et exécutés en parallèle par le système Stratosphere. Nous démontrons l'efficacité et l'extensibilité de notre approche à travers des expérimentations sur des centaines de Go de données XML.Enfin, nous présentons une nouvelle approche pour l'identification et la réutilisation des sous-expressions communes qui surviennent dans les scripts Pig Latin. Notre algorithme, nommé PigReuse, agit sur les représentations algébriques des scripts Pig Latin, identifie les possibilités de fusion des sous-expressions, sélectionne les meilleurs à exécuter en fonction du coût et fusionne d'autres expressions équivalentes pour partager leurs résultats. Nous apportons plusieurs extensions à l'algorithme afin d’améliorer sa performance. Nos résultats expérimentaux démontrent l'efficacité et la rapidité de nos algorithmes basés sur la réutilisation et des stratégies d'optimisation<br>The recent development of commercial cloud computing environments has strongly impacted research and development in distributed software platforms. Cloud providers offer a distributed, shared-nothing infrastructure, that may be used for data storage and processing.In parallel with the development of cloud platforms, programming models that seamlessly parallelize the execution of data-intensive tasks over large clusters of commodity machines have received significant attention, starting with the MapReduce model very well known by now, and continuing through other novel and more expressive frameworks. As these models are increasingly used to express analytical-style data processing tasks, the need for higher-level languages that ease the burden of writing complex queries for these systems arises.This thesis investigates the efficient management of Web data on large-scale infrastructures. In particular, we study the performance and cost of exploiting cloud services to build Web data warehouses, and the parallelization and optimization of query languages that are tailored towards querying Web data declaratively.First, we present AMADA, an architecture for warehousing large-scale Web data in commercial cloud platforms. AMADA operates in a Software as a Service (SaaS) approach, allowing users to upload, store, and query large volumes of Web data. Since cloud users support monetary costs directly connected to their consumption of resources, our focus is not only on query performance from an execution time perspective, but also on the monetary costs associated to this processing. In particular, we study the applicability of several content indexing strategies, and show that they lead not only to reducing query evaluation time, but also, importantly, to reducing the monetary costs associated with the exploitation of the cloud-based warehouse.Second, we consider the efficient parallelization of the execution of complex queries over XML documents, implemented within our system PAXQuery. We provide novel algorithms showing how to translate such queries into plans expressed in the PArallelization ConTracts (PACT) programming model. These plans are then optimized and executed in parallel by the Stratosphere system. We demonstrate the efficiency and scalability of our approach through experiments on hundreds of GB of XML data.Finally, we present a novel approach for identifying and reusing common subexpressions occurring in Pig Latin scripts. In particular, we lay the foundation of our reuse-based algorithms by formalizing the semantics of the Pig Latin query language with extended nested relational algebra for bags. Our algorithm, named PigReuse, operates on the algebraic representations of Pig Latin scripts, identifies subexpression merging opportunities, selects the best ones to execute based on a cost function, and merges other equivalent expressions to share its result. We bring several extensions to the algorithm to improve its performance. Our experiment results demonstrate the efficiency and effectiveness of our reuse-based algorithms and optimization strategies
APA, Harvard, Vancouver, ISO, and other styles
20

Arres, Billel. "Optimisation des performances dans les entrepôts distribués avec Mapreduce : traitement des problèmes de partionnement et de distribution des données." Thesis, Lyon, 2016. http://www.theses.fr/2016LYSE2012.

Full text
Abstract:
Dans ce travail de thèse, nous abordons les problèmes liés au partitionnement et à la distribution des grands volumes d’entrepôts de données distribués avec Mapreduce. Dans un premier temps, nous abordons le problème de la distribution des données. Dans ce cas, nous proposons une stratégie d’optimisation du placement des données, basée sur le principe de la colocalisation. L’objectif est d’optimiser les traitements lors de l’exécution des requêtes d’analyse à travers la définition d’un schéma de distribution intentionnelle des données permettant de réduire la quantité des données transférées entre les noeuds lors des traitements, plus précisément lors phase de tri (shuffle). Nous proposons dans un second temps une nouvelle démarche pour améliorer les performances du framework Hadoop, qui est l’implémentation standard du paradigme Mapreduce. Celle-ci se base sur deux principales techniques d’optimisation. La première consiste en un pré-partitionnement vertical des données entreposées, réduisant ainsi le nombre de colonnes dans chaque fragment. Ce partitionnement sera complété par la suite par un autre partitionnement d’Hadoop, qui est horizontal, appliqué par défaut. L’objectif dans ce cas est d’améliorer l’accès aux données à travers la réduction de la taille des différents blocs de données. La seconde technique permet, en capturant les affinités entre les attributs d’une charge de requêtes et ceux de l’entrepôt, de définir un placement efficace de ces blocs de données à travers les noeuds qui composent le cluster. Notre troisième proposition traite le problème de l’impact du changement de la charge de requêtes sur la stratégie de distribution des données. Du moment que cette dernière dépend étroitement des affinités des attributs des requêtes et de l’entrepôt. Nous avons proposé, à cet effet, une approche dynamique qui permet de prendre en considération les nouvelles requêtes d’analyse qui parviennent au système. Pour pouvoir intégrer l’aspect de "dynamicité", nous avons utilisé un système multi-agents (SMA) pour la gestion automatique et autonome des données entreposées, et cela, à travers la redéfinition des nouveaux schémas de distribution et de la redistribution des blocs de données. Enfin, pour valider nos contributions nous avons conduit un ensemble d’expérimentations pour évaluer nos différentes approches proposées dans ce manuscrit. Nous étudions l’impact du partitionnement et la distribution intentionnelle sur le chargement des données, l’exécution des requêtes d’analyses, la construction de cubes OLAP, ainsi que l’équilibrage de la charge (Load Balacing). Nous avons également défini un modèle de coût qui nous a permis d’évaluer et de valider la stratégie de partitionnement proposée dans ce travail<br>In this manuscript, we addressed the problems of data partitioning and distribution for large scale data warehouses distributed with MapReduce. First, we address the problem of data distribution. In this case, we propose a strategy to optimize data placement on distributed systems, based on the collocation principle. The objective is to optimize queries performances through the definition of an intentional data distribution schema of data to reduce the amount of data transferred between nodes during treatments, specifically during MapReduce’s shuffling phase. Secondly, we propose a new approach to improve data partitioning and placement in distributed file systems, especially Hadoop-based systems, which is the standard implementation of the MapReduce paradigm. The aim is to overcome the default data partitioning and placement policies which does not take any relational data characteristics into account. Our proposal proceeds according to two steps. Based on queries workload, it defines an efficient partitioning schema. After that, the system defines a data distribution schema that meets the best user’s needs, and this, by collocating data blocks on the same or closest nodes. The objective in this case is to optimize queries execution and parallel processing performances, by improving data access. Our third proposal addresses the problem of the workload dynamicity, since users analytical needs evolve through time. In this case, we propose the use of multi-agents systems (MAS) as an extension of our data partitioning and placement approach. Through autonomy and self-control that characterize MAS, we developed a platform that defines automatically new distribution schemas, as new queries appends to the system, and apply a data rebalancing according to this new schema. This allows offloading the system administrator of the burden of managing load balance, besides improving queries performances by adopting careful data partitioning and placement policies. Finally, to validate our contributions we conduct a set of experiments to evaluate our different approaches proposed in this manuscript. We study the impact of an intentional data partitioning and distribution on data warehouse loading phase, the execution of analytical queries, OLAP cubes construction, as well as load balancing. We also defined a cost model that allowed us to evaluate and validate the partitioning strategy proposed in this work
APA, Harvard, Vancouver, ISO, and other styles
21

Moreira, Leonardo Oliveira. "Abordagem para Qualidade de ServiÃo em Banco de Dados Multi-Inquilinos em Nuvem." Universidade Federal do CearÃ, 2014. http://www.teses.ufc.br/tde_busca/arquivo.php?codArquivo=12688.

Full text
Abstract:
FundaÃÃo de Amparo à Pesquisa do Estado do CearÃ<br>A computaÃÃo em nuvens à um paradigma bem consolidado de utilizaÃÃo de recursos computacionais, segundo o qual infraestrutura de hardware, software e plataformas para o desenvolvimento de novas aplicaÃÃes sÃo oferecidos como serviÃos disponÃveis remotamente e em escala global. Os usuÃrios de nuvens computacionais abrem mÃo de uma infraestrutura computacional prÃpria para dispÃ-la mediante serviÃos oferecidos por provedores de nuvem, delegando aspectos de Qualidade de ServiÃo (QoS) e assumindo custos proporcionais à quantidade de recursos que utilizam modelo de pagamento baseado no uso. Essas garantias de QoS sÃo definidas entre o provedor do serviÃo e o usuÃrio, e expressas por meio de Acordo de NÃvel de ServiÃo (SLA), o qual consiste de contratos que especificam um nÃvel de qualidade a ser atendido, e penalidades em caso de falha. A maioria das aplicaÃÃes em nuvem à orientada a dados e, por conta disso, Sistemas Gerenciadores de Banco de Dados (SGBDs) sÃo candidatos potenciais para a implantaÃÃo em nuvem. SGBDs em nuvem devem tratar uma grande quantidade de aplicaÃÃes ou inquilinos. Os modelos de multi-inquilinatos sÃo utilizados para consolidar vÃrios inquilinos dentro de um sà SGBD, favorecendo o compartilhamento eficaz de recursos, alÃm de gerenciar uma grande quantidade de inquilinos com padrÃes de carga de trabalho irregulares. Por outro lado, os provedores em nuvem devem reduzir os custos operacionais, garantindo a qualidade. Para muitas aplicaÃÃes, o maior tempo gasto no processamento das requisiÃÃes està relacionado ao tempo de execuÃÃo do SGBD. Portanto, torna-se importante que um modelo de qualidade seja aplicado ao SGBD para seu desempenho. TÃcnicas de provisionamento dinÃmico sÃo voltadas para o tratamento de cargas de trabalho irregulares, para que violaÃÃes de SLA sejam evitadas. Sendo assim, uma estratÃgia para ajustar a nuvem no momento em que se prevà um comportamento que pode violar o SLA de um dado inquilino (banco de dados) deve ser considerada. As tÃcnicas de alocaÃÃo sÃo usadas no intuito de aproveitar os recursos do ambiente em detrimento ao provisionamento. Com base nos sistemas de monitoramento e de modelos de otimizaÃÃo, as tÃcnicas de alocaÃÃo decidem onde serà o melhor local para receber um dado inquilino. Para realizar a transferÃncia do inquilino de forma eficiente, tÃcnicas de Live Migration sÃo adotadas para ter o mÃnimo de interrupÃÃo do serviÃo. Acredita-se que a combinaÃÃo destas trÃs tÃcnicas podem contribuir para o desenvolvimento de um soluÃÃo robusta de QoS para bancos de dados em nuvem, minimizando violaÃÃes de SLA. Ante tais desafios, esta tese apresenta uma abordagem, denominada PMDB, para melhorar QoS em SGBDs multi-inquilinos em nuvem. A abordagem tem como objetivo reduzir o nÃmero de violaÃÃes de SLA e aproveitar os recursos à disposiÃÃo por meio de tÃcnicas que realizam prediÃÃo de carga de trabalho, alocaÃÃo e migraÃÃo de inquilinos quando necessitam de recursos com maior capacidade. Para isso, uma arquitetura foi proposta e um protÃtipo implementado com tais tÃcnicas, alÃm de estratÃgias de monitoramento e QoS voltada para aplicaÃÃes de banco de dados em nuvem. Ademais, alguns experimentos orientados a desempenho foram especificados para mostrar a eficiÃncia da abordagem a fim de alcanÃar o objetivo em foco.<br>Cloud computing is a well-established paradigm of computing resources usage, whereby hardware infrastructure, software and platforms for the development of new applications are offered as services available remotely and globally. Cloud computing users give up their own infrastructure to dispose of it through the services offered by cloud providers, to which they delegate aspects of Quality of Service (QoS) and assume costs proportional to the amount of resources they use, which is based on a payment model. These QoS guarantees are established between the service provider and the user, and are expressed through Service Level Agreements (SLA). This agreement consists of contracts that specify a level of quality that must be met, and penalties in case of failure. The majority of cloud applications are data-driven, and thus Database Management Systems (DBMSs) are potential candidates for cloud deployment. Cloud DBMS should treat a wide range of applications or tenants. Multi-tenant models have been used to consolidate multiple tenants within a single DBMS, favoring the efficient sharing of resources, and to manage a large number of tenants with irregular workload patterns. On the other hand, cloud providers must be able to reduce operational costs while keeping quality levels as agreed. To many applications, the longer time spent in processing requests is related to the DBMS runtime. Therefore, it becomes important to apply a quality model to obtain DBMS performance. Dynamic provisioning techniques are geared to treat irregular workloads so that SLA violations are avoided. Therefore, it is necessary to adopt a strategy to adjust the cloud at the time a behavior that may violate the SLA of a given tenant (database) is predicted. The allocation techniques are applied in order to utilize the resources of the environment to the dentriment of provisioning. Based on both the monitoring and the optimization models systems, the allocation techniques will decide the best place to assign a given tenant to. In order to efficiently perform the transfer of the tenant, minimal service interruption, Live Migration techniques are adopted. It is believed that the combination of these three techniques may contribute to the development of a robust QoS solution to cloud databases which minimizes SLA violations. Faced with these challenges, this thesis proposes an approach, called PMDB, to improve DBMS QoS in multi-tenant cloud. The approach aims to reduce the number of SLA violations and take advantage the resources that are available using techniques that perform workload prediction, allocation and migration of tenants when greater capacity resources are needed. An architecture was then proposed and a prototype implementing such techniques was developed, besides monitoring strategies and QoS oriented database applications in the cloud. Some performance oriented experiments were then specified to show the effectiveness of our approach.
APA, Harvard, Vancouver, ISO, and other styles
22

Thompson-Arjona, William G. "Curricular Optimization: Solving for the Optimal Student Success Pathway." UKnowledge, 2019. https://uknowledge.uky.edu/ece_etds/139.

Full text
Abstract:
Considering the significant investment of higher education made by students and their families, graduating in a timely manner is of the utmost importance. Delay attributed to drop out or the retaking of a course adds cost and negatively affects a student’s academic progression. Considering this, it becomes paramount for institutions to focus on student success in relation to term scheduling. Often overlooked, complexity of a course schedule may be one of the most important factors in whether or not a student successfully completes his or her degree. More often than not students entering an institution as a first time full time (FSFT) freshman follow the advised and published schedule given by administrators. Providing the optimal schedule that gives the student the highest probability of success is critical. In efforts to create this optimal schedule, this thesis introduces a novel optimization algorithm with the objective to separate courses which when taken together hurt students’ pass rates. Inversely, we combine synergistic relationships that improve a students probability for success when the courses are taken in the same semester. Using actual student data at the University of Kentucky, we categorically find these positive and negative combinations by analyzing recorded pass rates. Using Julia language on top of the Gurobi solver, we solve for the optimal degree plan of a student in the electrical engineering program using a linear and non-linear multi-objective optimization. A user interface is created for administrators to optimize their curricula at main.optimizeplans.com.
APA, Harvard, Vancouver, ISO, and other styles
23

Velthuis, Paul. "New authentication mechanism using certificates for big data analytic tools." Thesis, KTH, Skolan för informations- och kommunikationsteknik (ICT), 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-215694.

Full text
Abstract:
Companies analyse large amounts of sensitive data on clusters of machines, using a framework such as Apache Hadoop to handle inter-process communication, and big data analytic tools such as Apache Spark and Apache Flink to analyse the growing amounts of data. Big data analytic tools are mainly tested on performance and reliability. Security and authentication have not been enough considered and they lack behind. The goal of this research is to improve the authentication and security for data analytic tools.Currently, the aforementioned big data analytic tools are using Kerberos for authentication. Kerberos has difficulties in providing multi factor authentication. Attacks on Kerberos can abuse the authentication. To improve the authentication, an analysis of the authentication in Hadoop and the data analytic tools is performed. The research describes the characteristics to gain an overview of the security of Hadoop and the data analytic tools. One characteristic is that the usage of the transport layer security (TLS) for the security of data transportation. TLS usually establishes connections with certificates. Recently, certificates with a short time to live can be automatically handed out.This thesis develops new authentication mechanism using certificates for data analytic tools on clusters of machines, providing advantages over Kerberos. To evaluate the possibility to replace Kerberos, the mechanism is implemented in Spark. As a result, the new implementation provides several improvements. The certificates used for authentication are made valid with a short time to live and are thus less vulnerable to abuse. Further, the authentication mechanism solves new requirements coming from businesses, such as providing multi-factor authenticationand scalability.In this research a new authentication mechanism is developed, implemented and evaluated, giving better data protection by providing improved authentication.
APA, Harvard, Vancouver, ISO, and other styles
24

Bondiombouy, Carlyna. "Query Processing in Multistore Systems." Thesis, Montpellier, 2017. http://www.theses.fr/2017MONTS056/document.

Full text
Abstract:
Le cloud computing a eu un impact majeur sur la gestion des données, conduisant à une prolifération de nouvelles solutions évolutives de gestion des données telles que le stockage distribué de fichiers et d’objets, les bases de données NoSQL et les frameworks de traitement de données. Cela a conduit également à une grande diversification des interfaces aux SGBD et à la perte d’un paradigme de programmation commun, ce qui rend très difficile pour un utilisateur d’intégrer ses données lorsqu’elles se trouvent dans des sources de données spécialisées, par exemple, relationnelle, document et graphe.Dans cette thèse, nous abordons le problème du traitement de requêtes avec plusieurs sources de données dans le cloud, où ces sources ont des modèles, des langages et des API différents. Cette thèse a été préparée dans le cadre du projet européen CoherentPaaS et, en particulier, du système multistore CloudMdsQL. CloudMdsQL est un langage de requête fonctionnel capable d’exploiter toute la puissance des sources de données locales, en permettant simplement à certaines requêtes natives portant sur les systèmes locauxd’être appelées comme des fonctions et en même temps optimisées, par exemple, en exploitant les prédicats de sélection, en utilisant le bindjoin, en réalisant l’ordonnancement des jointures ou en réduisant les transferts de données intermédiaires.Dans cette thèse, nous proposons une extension de CloudMdsQL pour tirer pleinement parti des fonctionnalités des frameworks de traitement de données sous-jacents tels que Spark en permettant l’utilisation ad hoc des opérateurs de map/filter/reduce (MFR) définis par l’utilisateur en combinaison avec les ordres SQL traditionnels. Cela permet d’effectuer des jointures entre données relationnelles et HDFS. Notre solution permet l’optimisation en permettant la réécriture de sous-requêtes afin de réaliser des optimisations majeures comme le bindjoin ou le filtrage des données le plus tôt possible.Nous avons validé notre solution en implémentant l’extension MFR dans le moteur de requête CloudMdsQL. Sur la base de ce prototype, nous proposons une validation expérimentale du traitement des requêtes multistore dans un cluster pour évaluer l’impact sur les performances de l’optimisation. Plus précisément, nous explorons les avantages de l’utilisation du bindjoin et du filtrage de données dans des conditions différentes. Dans l’ensemble, notre évaluation des performances illustre la capacité du moteur de requête CloudMdsQL à optimiser une requête et à choisir la stratégie d’exécution la plus efficace<br>Cloud computing is having a major impact on data management, with a proliferation of new, scalable data management solutions such as distributed file and object storage, NoSQL databases and big data processing frameworks. This also leads to a wide diversification of DBMS interfaces and the loss of a common programming paradigm, making it very hard for a user to integrate its data sitting in specialized data stores, e.g. relational, documents and graph data stores.In this thesis, we address the problem of query processing with multiple cloud data stores, where the data stores have different models, languages and APIs. This thesis has been prepared in the context of the CoherentPaaS European project and, in particular, the CloudMdsQL multistore system. CloudMdsQL is a functional query language able to exploit the full power of local data stores, by simply allowing some local data store native queries to be called as functions, and at the same time be optimized, e.g. by pushing down select predicates, using bind join, performing join ordering, or planning intermediate data shipping.In this thesis, we propose an extension of CloudMdsQL to take full advantage of the functionality of the underlying data processing frameworks such as Spark by allowing the ad-hoc usage of user defined map/filter/reduce (MFR) operators in combination with traditional SQL statements. This allows performing joins between relational and HDFS big data. Our solution allows for optimization by enabling subquery rewriting so that bind join can be used and filter conditions can be pushed down and applied by the data processing framework as early as possible.We validated our solution by implementing the MFR extension as part of the CloudMdsQL query engine. Based on this prototype, we provide an experimental validation of multistore query processing in a cluster to evaluate the impact on performance of optimization. More specifically, we explore the performance benefit of using bind join and select pushdown under different conditions. Overall, our performance evaluation illustrates the CloudMdsQL query engine’s ability to optimize a query and choose the most efficient execution strategy
APA, Harvard, Vancouver, ISO, and other styles
25

Athamnah, Malek. "ENABLING MULTI-PARTY COLLABORATIVE DATA ACCESS." Diss., Temple University Libraries, 2018. http://cdm16002.contentdm.oclc.org/cdm/ref/collection/p245801coll10/id/528695.

Full text
Abstract:
Computer and Information Science<br>Ph.D.<br>Cloud computing has brought availability of services at unprecedented scales but data accessibility considerations become more complex due to involvement of multiple parties in providing the infrastructure. In this thesis, we discuss the problem of enabling cooperative data access in a multi-cloud environment where the data is owned and managed by multiple enterprises. We consider a multi-party collaboration scheme whereby a set of parties collectively decide accessibility to data from individual parties using different data models such as relational databases, and graph databases. In order to implement desired business services, parties need to share a selected portion of information with one another. We consider a model with a set of authorization rules over the joins of basic relations, and such rules are defined by these cooperating parties. The accessible information is constrained by these rules. Specifically, the following critical issues were examined: Combine rule enforcement and query planning and devise an algorithm which simultaneously checks for the enforceability of each rule and generation of minimum cost plan of its execution using a cost metric whenever the enforcement is possible; We also consider other forms of limiting the access to the shared data using safety properties and selection conditions. We proposed algorithms for both forms to remove any conflicts or violations between the limited accesses and model queries; Used graph databases with our authorization rules and query planning model to conduct similarity search between tuples, where we represent the relational database tuples as a graph with weighted edges, which enables queries involving "similarity" across the tuples. We proposed an algorithm to exploit the correlations between attributes to create virtual attributes that can be used to catch much of the data variance, and enhance the speed at which similarity search occurs; Proposed a framework for defining test functionalities their composition, and their access control. We discussed an algorithm to determine the realization of the given test via valid compositions of individual functionalities in a way to minimize the number of parties involved. The research significance resides in solving real-world issues that arise in using cloud services for enterprises After extensive evaluations, results revealed: collaborative data access model improves the security during cooperative data processes; systematic and efficient solving access rules conflict issues minimizes the possible data leakage; and, a systematic approach tackling control failure diagnosis helps reducing troubleshooting times and all that improve availability and resiliency. The study contributes to the knowledge, literature, and practice. This research opens up the space for further studies in various aspects of secure data cooperation in large-scale cyber and cyber-physical infrastructures.<br>Temple University--Theses
APA, Harvard, Vancouver, ISO, and other styles
26

Amir, Mohammad. "Semantically-enriched and semi-Autonomous collaboration framework for the Web of Things. Design, implementation and evaluation of a multi-party collaboration framework with semantic annotation and representation of sensors in the Web of Things and a case study on disaster management." Thesis, University of Bradford, 2015. http://hdl.handle.net/10454/14363.

Full text
Abstract:
This thesis proposes a collaboration framework for the Web of Things based on the concepts of Service-oriented Architecture and integrated with semantic web technologies to offer new possibilities in terms of efficient asset management during operations requiring multi-actor collaboration. The motivation for the project comes from the rise in disasters where effective cross-organisation collaboration can increase the efficiency of critical information dissemination. Organisational boundaries of participants as well as their IT capability and trust issues hinders the deployment of a multi-party collaboration framework, thereby preventing timely dissemination of critical data. In order to tackle some of these issues, this thesis proposes a new collaboration framework consisting of a resource-based data model, resource-oriented access control mechanism and semantic technologies utilising the Semantic Sensor Network Ontology that can be used simultaneously by multiple actors without impacting each other’s networks and thus increase the efficiency of disaster management and relief operations. The generic design of the framework enables future extensions, thus enabling its exploitation across many application domains. The performance of the framework is evaluated in two areas: the capability of the access control mechanism to scale with increasing number of devices, and the capability of the semantic annotation process to increase in efficiency as more information is provided. The results demonstrate that the proposed framework is fit for purpose.
APA, Harvard, Vancouver, ISO, and other styles
27

Kashyap, Pradyumna Krishna. "Project-based Multi-tenant Container Registry For Hopsworks." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-284561.

Full text
Abstract:
There has been a substantial growth in the usage of data in the past decade, cloud technologies and big data platforms have gained popularity as they help in processing such data on a large scale. Hopsworks is such a managed plat- form for scale out data science. It is an open-source platform for the develop- ment and operation of Machine Learning models, available on-premise and as a managed platform in the cloud. As most of these platforms provide data sci- ence environments to collate the required libraries to work with, Hopsworks provides users with Anaconda environments.Hopsworks provides multi-tenancy, ensuring a secure model to manage sen- sitive data in the shared platform. Most of the Hopsworks features are built around projects, each project includes an Anaconda environment that provides users with a number of libraries capable of processing data. Each project cre- ation triggers a creation of a base Anaconda environment and each added li- brary updates this environment. For an on-premise application, as data science teams are diverse and work towards building repeatable and scalable models, it becomes increasingly important to manage these environments in a central location locally.The purpose of the thesis is to provide a secure storage for these Anaconda en- vironments. As Hopsworks uses a Kubernetes cluster to serve models, these environments can be containerized and stored on a secure container registry on the Kubernetes Cluster. The provided solution also aims to extend the multi- tenancy feature of Hopsworks onto the hosted local storage. The implemen- tation comprises of two parts; First one, is to host a compatible open source container registry to store the container images on a local Kubernetes cluster with fault tolerance and by avoiding a single point of failure. Second one, is to leverage the multi-tenancy feature in Hopsworks by storing the images on the self sufficient secure registry with project level isolation.<br>Det har skett en betydande tillväxt i dataanvändningen under det senaste decen- niet, molnteknologier och big data-plattformar har vunnit popularitet eftersom de hjälper till att bearbeta sådan data i stor skala. Hopsworks är en sådan hante- rad plattform för att skala ut datavetenskap. Det är en öppen källkodsplattform för utveckling och drift av Machine Learning-modeller, tillgänglig på plats och som en hanterad plattform i molnet. Eftersom de flesta av dessa plattformar tillhandahåller datavetenskapsmiljöer för att samla in de bibliotek som krävs för att arbeta med, ger Hopsworks användare Anaconda-miljöer.Hopsworks tillhandahåller multi-tenancy, vilket säkerställer en säker modell för att hantera känslig data i den delade plattformen. De flesta av Hopsworks- funktionerna är uppbyggda kring projekt, varje projekt innehåller en Anaconda- miljö som ger användarna ett antal bibliotek som kan bearbeta data. Varje projektskapning utlöser skapandet av en basanacondamiljö och varje tillagt bibliotek uppdaterar denna miljö. För en lokal applikation, eftersom datave- tenskapsteam är olika och arbetar för att bygga repeterbara och skalbara mo- deller, blir det allt viktigare att hantera dessa miljöer på en central plats lokalt. Syftet med avhandlingen är att tillhandahålla en säker lagring för dessa Anaconda- miljöer. Eftersom Hopsworks använder ett Kubernetes-kluster för att betjäna modeller kan dessa miljöer containeriseras och lagras i ett säkert container- register i Kubernetes-klustret. Den medföljande lösningen syftar också till att utvidga Hopsworks-funktionen för flera hyresgäster till det lokala lagrade vär- det. Implementeringen består av två delar; Den första är att vara värd för ett kompatibelt register med öppen källkod för att lagra behållaravbildningarna iett lokalt Kubernetes-kluster med feltolerans och genom att undvika en enda felpunkt. Den andra är att utnyttja multihyresfunktionen i Hopsworks genom att lagra bilderna i det självförsörjande säkra registret med projektnivåisole- ring.
APA, Harvard, Vancouver, ISO, and other styles
28

Khaleel, Ali. "Optimisation of a Hadoop cluster based on SDN in cloud computing for big data applications." Thesis, Brunel University, 2018. http://bura.brunel.ac.uk/handle/2438/17076.

Full text
Abstract:
Big data has received a great deal attention from many sectors, including academia, industry and government. The Hadoop framework has emerged for supporting its storage and analysis using the MapReduce programming module. However, this framework is a complex system that has more than 150 parameters and some of them can exert a considerable effect on the performance of a Hadoop job. The optimum tuning of the Hadoop parameters is a difficult task as well as being time consuming. In this thesis, an optimisation approach is presented to improve the performance of a Hadoop framework by setting the values of the Hadoop parameters automatically. Specifically, genetic programming is used to construct a fitness function that represents the interrelations among the Hadoop parameters. Then, a genetic algorithm is employed to search for the optimum or near the optimum values of the Hadoop parameters. A Hadoop cluster is configured on two severe at Brunel University London to evaluate the performance of the proposed optimisation approach. The experimental results show that the performance of a Hadoop MapReduce job for 20 GB on Word Count Application is improved by 69.63% and 30.31% when compared to the default settings and state of the art, respectively. Whilst on Tera sort application, it is improved by 73.39% and 55.93%. For better optimisation, SDN is also employed to improve the performance of a Hadoop job. The experimental results show that the performance of a Hadoop job in SDN network for 50 GB is improved by 32.8% when compared to traditional network. Whilst on Tera sort application, the improvement for 50 GB is on average 38.7%. An effective computing platform is also presented in this thesis to support solar irradiation data analytics. It is built based on RHIPE to provide fast analysis and calculation for solar irradiation datasets. The performance of RHIPE is compared with the R language in terms of accuracy, scalability and speedup. The speed up of RHIPE is evaluated by Gustafson's Law, which is revised to enhance the performance of the parallel computation on intensive irradiation data sets in a cluster computing environment like Hadoop. The performance of the proposed work is evaluated using a Hadoop cluster based on the Microsoft azure cloud and the experimental results show that RHIPE provides considerable improvements over the R language. Finally, an effective routing algorithm based on SDN to improve the performance of a Hadoop job in a large scale cluster in a data centre network is presented. The proposed algorithm is used to improve the performance of a Hadoop job during the shuffle phase by allocating efficient paths for each shuffling flow, according to the network resources demand of each flow as well as their size and number. Furthermore, it is also employed to allocate alternative paths for each shuffling flow in the case of any link crashing or failure. This algorithm is evaluated by two network topologies, namely, fat tree and leaf-spine, built by EstiNet emulator software. The experimental results show that the proposed approach improves the performance of a Hadoop job in a data centre network.
APA, Harvard, Vancouver, ISO, and other styles
29

Sousa, FlÃvio Rubens de Carvalho. "RepliC: ReplicaÃÃo ElÃstica de Banco de Dados Multi-Inquilino em Nuvem com Qualidade de ServiÃo." Universidade Federal do CearÃ, 2013. http://www.teses.ufc.br/tde_busca/arquivo.php?codArquivo=9121.

Full text
Abstract:
nÃo hÃ<br>Fatores econÃmicos estÃo levando ao aumento das infraestruturas e instalaÃÃes de fornecimento de computaÃÃo como um serviÃo, conhecido como Cloud Computing ou ComputaÃÃo em Nuvem, onde empresas e indivÃduos podem alugar capacidade de computaÃÃo e armazenamento, em vez de fazerem grandes investimentos de capital necessÃrios para a construÃÃo e instalaÃÃo de equipamentos de computaÃÃo em larga escala. Na nuvem, o usuÃrio do serviÃo tem algumas garantias, tais como desempenho e disponibilidade. Essas garantias de qualidade de serviÃo (QoS) sÃo definidas entre o provedor do serviÃo e o usuÃrio e expressas por meio de um acordo de nÃvel de serviÃo (SLA). Este acordo consiste de contratos que especificam um nÃvel de qualidade que deve ser atendido e penalidades em caso de falha. Muitas empresas dependem de um SLA e estas esperam que os provedores de nuvem forneÃam SLAs baseados em caracterÃsticas de desempenho. Contudo, em geral, os provedores baseiam seus SLAs apenas na disponibilidade dos serviÃos oferecidos. Sistemas de gerenciamento de banco de dados (SGBDs) para computaÃÃo em nuvem devem tratar uma grande quantidade de aplicaÃÃes, tenants ou inquilinos. Abordagens multi-inquilino tÃm sido utilizadas para hospedar vÃrios inquilinos dentro de um Ãnico SGBD, favorecendo o compartilhamento eficaz de recursos, alÃm de gerenciar uma grande quantidade de inquilinos com padrÃes de carga de trabalho irregulares. Por outro lado, os provedores em nuvem devem reduzir os custos operacionais garantindo a qualidade. Neste contexto, uma caracterÃstica chave à a replicaÃÃo de banco de dados, que melhora a disponibilidade, desempenho e, consequentemente, a qualidade do serviÃo. TÃcnicas de replicaÃÃo de dados tÃm sido usadas para melhorar a disponibilidade, o desempenho e a escalabilidade em diversos ambientes. Contudo, a maior parte das estratÃgias de replicaÃÃo de banco de dados tÃm se concentrado em aspectos de escalabilidade e consistÃncia do sistema com um nÃmero estÃtico de rÃplicas. Aspectos relacionados à elasticidade para banco de dados multi-inquilino tÃm recebido pouca atenÃÃo. Estas questÃes sÃo importantes em ambientes em nuvem, pois os provedores precisam adicionar rÃplicas de acordo com a carga de trabalho para evitar violaÃÃo do SLA e eles precisam remover rÃplicas quando a carga de trabalho diminui, alÃm de consolidar os inquilinos. Visando solucionar este problema, este trabalho apresenta RepliC, uma abordagem para a replicaÃÃo de banco de dados em nuvem com foco na qualidade do serviÃo, elasticidade e utilizaÃÃo eficiente dos recursos por meio de tÃcnicas multi-inquilino. RepliC utiliza informaÃÃes dos SGBDs e do provedor para provisionar recursos de forma dinÃmica. Com o objetivo de avaliar RepliC, experimentos que medem a qualidade de serviÃo e elasticidade sÃo apresentados. Os resultados destes experimentos confirmam que RepliC garante a qualidade com uma pequena quantidade de violaÃÃo do SLA enquanto utiliza os recursos de forma eficiente.<br>Fatores econÃmicos estÃo levando ao aumento das infraestruturas e instalaÃÃes de fornecimento de computaÃÃo como um serviÃo, conhecido como Cloud Computing ou ComputaÃÃo em Nuvem, onde empresas e indivÃduos podem alugar capacidade de computaÃÃo e armazenamento, em vez de fazerem grandes investimentos de capital necessÃrios para a construÃÃo e instalaÃÃo de equipamentos de computaÃÃo em larga escala. Na nuvem, o usuÃrio do serviÃo tem algumas garantias, tais como desempenho e disponibilidade. Essas garantias de qualidade de serviÃo (QoS) sÃo definidas entre o provedor do serviÃo e o usuÃrio e expressas por meio de um acordo de nÃvel de serviÃo (SLA). Este acordo consiste de contratos que especificam um nÃvel de qualidade que deve ser atendido e penalidades em caso de falha. Muitas empresas dependem de um SLA e estas esperam que os provedores de nuvem forneÃam SLAs baseados em caracterÃsticas de desempenho. Contudo, em geral, os provedores baseiam seus SLAs apenas na disponibilidade dos serviÃos oferecidos. Sistemas de gerenciamento de banco de dados (SGBDs) para computaÃÃo em nuvem devem tratar uma grande quantidade de aplicaÃÃes, tenants ou inquilinos. Abordagens multi-inquilino tÃm sido utilizadas para hospedar vÃrios inquilinos dentro de um Ãnico SGBD, favorecendo o compartilhamento eficaz de recursos, alÃm de gerenciar uma grande quantidade de inquilinos com padrÃes de carga de trabalho irregulares. Por outro lado, os provedores em nuvem devem reduzir os custos operacionais garantindo a qualidade. Neste contexto, uma caracterÃstica chave à a replicaÃÃo de banco de dados, que melhora a disponibilidade, desempenho e, consequentemente, a qualidade do serviÃo. TÃcnicas de replicaÃÃo de dados tÃm sido usadas para melhorar a disponibilidade, o desempenho e a escalabilidade em diversos ambientes. Contudo, a maior parte das estratÃgias de replicaÃÃo de banco de dados tÃm se concentrado em aspectos de escalabilidade e consistÃncia do sistema com um nÃmero estÃtico de rÃplicas. Aspectos relacionados à elasticidade para banco de dados multi-inquilino tÃm recebido pouca atenÃÃo. Estas questÃes sÃo importantes em ambientes em nuvem, pois os provedores precisam adicionar rÃplicas de acordo com a carga de trabalho para evitar violaÃÃo do SLA e eles precisam remover rÃplicas quando a carga de trabalho diminui, alÃm de consolidar os inquilinos. Visando solucionar este problema, este trabalho apresenta RepliC, uma abordagem para a replicaÃÃo de banco de dados em nuvem com foco na qualidade do serviÃo, elasticidade e utilizaÃÃo eficiente dos recursos por meio de tÃcnicas multi-inquilino. RepliC utiliza informaÃÃes dos SGBDs e do provedor para provisionar recursos de forma dinÃmica. Com o objetivo de avaliar RepliC, experimentos que medem a qualidade de serviÃo e elasticidade sÃo apresentados. Os resultados destes experimentos confirmam que RepliC garante a qualidade com uma pequena quantidade de violaÃÃo do SLA enquanto utiliza os recursos de forma eficiente.
APA, Harvard, Vancouver, ISO, and other styles
30

Tang, Hsueh-Ling, and 唐雪玲. "Multi-cue Pedestrian detection from 3D point cloud data." Thesis, 2017. http://ndltd.ncl.edu.tw/handle/78102839918306483517.

Full text
Abstract:
碩士<br>國立臺灣科技大學<br>資訊工程系<br>105<br>Pedestrian detection is one of the key technologies of driver assistance system. In order to prevent potential collisions, pedestrians should be always accurately identified whether during the day or at night. Since the visual images of the night are not clear, this thesis proposes a method for recognizing pedestrians by using a high-definition LIDAR without visual images. In order to handle the long-distance sparse point problem, a novel solution is introduced to improve the performance. The proposed method maps the three-dimensional point cloud to the two-dimensional plane by a distance-aware expansion approach and the corresponding 2D contour and its associated 2D features are then extracted. In addition to hand-crafted features, deep learned features are also considered in this thesis. Based on multiple cues, the proposed method obtains significant performance boosts over state-of-the-art approaches by 23% in terms of F1-measure.
APA, Harvard, Vancouver, ISO, and other styles
31

Liu, Hsing Yen, and 劉信彥. "Implementation of Data Query Across Multi-Tenants in Cloud." Thesis, 2014. http://ndltd.ncl.edu.tw/handle/46412659743377792269.

Full text
Abstract:
碩士<br>東海大學<br>資訊工程學系<br>102<br>SaaS cloud computing provides thousands of users to operate without interfering with one another. From the software supplier’s point of view, each user sharing the same package of hardware and software, extra expense can be exempted from providing each user a different entity.With great potentials in software market, it is foreseen that the SaaS will become the trend of IT business. Multi-tenant technology increases resource utilization and reduces operational cost by using majority users’ software/hardware resources to create profits. However, transplanting software to the cloud requires overcoming the software limitation as well as the environment factors.To quickly convert from the traditional software to the multi-tenant structure and solve the data isolation issues, this research allows the developers to program using the traditional relational database and in the meantime support queries across multiple tenants. Based on the Chunk Table data structure and through the SQL statements conversion mechanism, the traditional relational SQL statements were automatically converted into the Chunk Table conformable SQL statements. We have also completed the tenant queries across multiple implementations. We have an example of a query successfully completed Chunk Table real-time queries across multiple tenants implemented. We also do extra to Materialized View non-real time queries across multiple tenants implemented.
APA, Harvard, Vancouver, ISO, and other styles
32

Hsieh, Fei-Ju, and 謝斐如. "Semantics-based Multi-Keyword Search over Encrypted Cloud Data." Thesis, 2017. http://ndltd.ncl.edu.tw/handle/q58x9g.

Full text
Abstract:
碩士<br>國立臺灣科技大學<br>資訊工程系<br>105<br>Cloud storages have gained popularity in the recent years. With the increasing quantity of data outsourced to cloud storages, keyword search over encrypted cloud data with the consideration of privacy preservation has become an important topic. The majority techniques in the literature only provide exact single or multiple keyword search in which the keywords have to exactly match those in a pre-defined dictionary. However, restricting users’keywords within the pre-defined dictionary is impractical for real-world applications. Some existing fuzzy keyword search schemes only focus on dealing with spelling mistakes of keywords. The flexibility of keywords used in the search is not considered. This paper addresses the problem of semantic multi-keyword search over encrypted cloud data. Users can use keywords not just in the pre-defined dictionary of the dataset, but with the flexibility of their own choice. The similarity of the given keywords with the search index of each document is then calculated. An adequate set of documents are selected as the results for the search based on the similarity. In addition, privacy of the search is also considered during the search executed by the third party service provider. Experiments are conducted using a dataset of massive papers in real world. The experimental analyses show that the proposed scheme can perform the semantic multi-keyword search effectively and efficiently.
APA, Harvard, Vancouver, ISO, and other styles
33

SHIH, WAN-NI, and 施宛妮. "Efficient Multi-keyword Approximate Search on Encrypted Cloud Data." Thesis, 2017. http://ndltd.ncl.edu.tw/handle/yezd9k.

Full text
Abstract:
碩士<br>國立臺灣科技大學<br>資訊工程系<br>105<br>Now with the growing popularity of cloud computing, data owners outsource the data to cloud for convenience management. Despite the advantages of cloud storage, putting a lot of attention on data security is definitely necessary. Before outsourcing the sensitive data, data owners should encrypt it for privacy requirements. However, the encryption could make search problem of utilization become difficult by keywords. There are some existing techniques provide keyword search over encrypted data. Those strategies contain weakness about the inefficient searchable vectors. The previous strategies maintain a large number of words as the elements to construct the index for each document. It will cause the amount of calculation that sharply increase. In this thesis, we propose a novel keyword search over encrypted data scheme by reducing the size of the index vectors. First, we analyze data through latent semantic analysis, to discover the association between different words. Then, we select the principal dimension to retain the important information of dataset. After transforming the data from high-dimensional space to low-dimensional space, we provide appropriate result to the user's request. Our proposed scheme support approximate keyword search with premeditate data analysis rather than maintain large number of words. Experiments on real-world dataset show that our scheme not only provides flexibly query for users, but also effectively reduce the amount of calculation.
APA, Harvard, Vancouver, ISO, and other styles
34

Li, Yen-Yi, and 李彥儀. "Multi-response Optimization for Cloud Computing Data Center Cooling System." Thesis, 2012. http://ndltd.ncl.edu.tw/handle/80153498792909874658.

Full text
Abstract:
碩士<br>國立交通大學<br>工業工程與管理學系<br>100<br>Cloud computing is an emerging industry and is viewed as a new business model of IT service in recent years. Cloud computing must rely on robust and powerful data center to fulfill the need of customers. The data center must keep running in order to maintain the high efficiency of computing and storage service. This situation might cause computer crashed due to overheat. Therefore, the heat dissipation problem is a serious challenge for the supervisor of data center. The heat dissipation problem is usually associated with power consumption, water consumption and carbon emissions, and can therefore be considered as a multi-response problem. Previous studies related to heat dissipation problem are based on engineering method to improve the cooling system, and it is rarely seen that statistical methods are utilized to solve the heat dissipation problems. Therefore, the main objective of this study is to develop a multi-response optimization algorithm using Design of Experiment (DOE), Weighted Principal Component Analysis (WPCA) and Response Surface Method (RSM) to find the optimal parameter-setting of cooling system for data center of cloud computing. Finally, a simulated case is used to demonstrate the effectiveness of the proposed procedure.
APA, Harvard, Vancouver, ISO, and other styles
35

Hsieh, Cheng-Han, and 謝承翰. "Data Placement Optimization of Erasure Code-based Multi-Cloud Storage." Thesis, 2017. http://ndltd.ncl.edu.tw/handle/h34g6z.

Full text
Abstract:
碩士<br>國立清華大學<br>資訊工程學系所<br>105<br>The cloud storage system has been popular recently due to the higher and higher demand of storage space. The cloud providers offer large but cheap storage service. People or companies use them and do not have to pay on hardware or electric utility. Companies can use these features to build its own storage service for benefit too. For cloud storage, erasure code can be used to improve data availability and have potential to reduce download time. Erasure code encode files into chunks and place them in different storage regions for higher availability. Besides, these chunks is smaller than the origin file that the download time can be reduced by using parallel downloading. These chunks can also improve the availability by placed at different regions for avoiding regions failure. However, each region owns different request cost, storage cost or even latency and bandwidth. Besides, users location can largely influence download latency and access cost. With these multiple issues, the main point is how to choose the candidate regions for chunks that can fulfill all requirements. In the past, most thesiss focus on specific features. However, the models from those research are not realistic enough. There are many aspects that we need to take into consideration for being closer to the real world. In this thesis, we propose the method with using erasure code and linear programming to include multiple requirements at the same time and find the best placement strategy. The experiment shows that our work can save money at most 66\% and have at most 50\% performance improvement.
APA, Harvard, Vancouver, ISO, and other styles
36

Lee, Shu-yi, and 李姝儀. "Extracting Corner Feature Points for Registration of Multi-Station Point Cloud Data." Thesis, 2005. http://ndltd.ncl.edu.tw/handle/31993221990438481880.

Full text
Abstract:
碩士<br>國立成功大學<br>測量工程學系碩博士班<br>93<br>Terrestrial laser scanner can rapidly acquire accurate and dense 3D point clouds covered on the surfaces of scanned objects such as buildings. The point clouds provide the detailed data necessary for accurate building modeling. In order to acquire complete data points on a scanned building, the scanning operations must be done at more stations. Each station has its own coordinate system representing the 3D position of each laser point. Therefore, all coordinate systems of different scanning stations must be transformed into a common system to register laser points acquired on different laser-scanning stations.  This thesis proposes a semi-automatic method for registration of terrestrial laser point sets acquired on different stations. Firstly, a point cloud on a local plane is selected manually, and then a mathematical plane is fitted in a least squares manner onto them. Three local planes on an object corner are thus respectively determined, and their intersection point is computed by solving these three plane equations. Such points are used as tie points for transforming different laser coordinate systems into a common system. The transformation is bad in cases of inaccurate, or insufficient, or worse-distributed tie points. This paper suggests to apply suitable “virtual corner points” to solve this problem.  The test results show that suitable virtual corner points really can improve the geometrical condition for the transformation and thus raise the accuracy of corresponding transform parameters with the improvement rate of 36% to 71%. On the other hand, the ratio of the RMSD-value to the scanning distance S becomes a constant of 0.00015 in case of S > 50m, where RMSD denotes the root mean square value of the perpendicular distance from a laser point to its corresponding plane.
APA, Harvard, Vancouver, ISO, and other styles
37

Liu, Hsuan-Lin, and 劉宣麟. "A Multi-node Data Mining System with Cloud Technology -Using Decision Tree." Thesis, 2014. http://ndltd.ncl.edu.tw/handle/4psz37.

Full text
Abstract:
碩士<br>國立虎尾科技大學<br>資訊管理研究所<br>102<br>With the improvement of information technology, data mining can be used to analyze various kinds of data. The parameters of mining method always affect the results’ quality. Researchers need constantly spend lots of computational time to find the optimal parameter set. However, currently commercial mining tools are unable to deal with multi-data model at one time. Furthermore, we need to spend much time when processing a mining model with large data set. This study proposes a new architecture using open-source statistical language R as the base, choosing decision tree model as our evaluation method. Use C# to design user interface as well as a work server and the R language script program. Apply the concept of cloud service technologies to our system, and develop a multi-node processing architecture. The proposed mining process will corporate all available hosts to improve the solving performance. This system can save computational periods and try to find the best combination of parameters of each model. Finally, we will provide the system limitation test (data size, ram usage) compared with some commercial mining software, and evaluate the feasibility of this architecture.
APA, Harvard, Vancouver, ISO, and other styles
38

Jheng, Jhu-Jyun, and 鄭竹君. "Multi-Objective Optimization Using Genetic Algorithm for Resource Prediction in Cloud Data Center." Thesis, 2014. http://ndltd.ncl.edu.tw/handle/43422933457376963821.

Full text
Abstract:
碩士<br>國立宜蘭大學<br>電子工程學系碩士班<br>102<br>In recent years, the resource demands in cloud environment have been increased incrementally. In order to effectively allocate the resources for optimal utilization, the workload prediction of virtual machines (VMs) is a vital issue. There are various problems of workload prediction in cloud environment, such as the variant workload, prediction accuracy and instantaneous prediction. The resource prediction usually accompanies resource allocation. However, the unnecessary VM migration leads to redundant resource consumption. We found that the resource allocation in data center usually executes periodically. In order to achieve real-time prediction, not only the execution time but also the finish time should be considered. The resource prediction and allocation should be completed before next time slot. In this thesis, we define the resource prediction in cloud data center based on Multi-Objective Optimization (MOO) problem. Then, we use the evolutionary Genetic algorithm (GA) to be our prediction tool. The VM allocation status in previous time slot is regarded as the history data, and we utilize the evolution feature of GA to predict the resource in next time slot. Moreover, we also propose the placement algorithm to allocate the resource and VMs before next time slot coming. The simulation results show that the proposed prediction scheme achieves the higher resource utilization, lower energy consumption, and the multi-objective resource optimization.
APA, Harvard, Vancouver, ISO, and other styles
39

Jayaraman, P. P., C. Perera, D. Georgakopoulos, S. Dustdar, Dhaval Thakker, and R. Ranjan. "Analytics-as-a-Service in a Multi-Cloud Environment through Semantically-enabled Hierarchical Data Processing." 2016. http://hdl.handle.net/10454/8523.

Full text
Abstract:
yes<br>A large number of cloud middleware platforms and tools are deployed to support a variety of Internet of Things (IoT) data analytics tasks. It is a common practice that such cloud platforms are only used by its owners to achieve their primary and predefined objectives, where raw and processed data are only consumed by them. However, allowing third parties to access processed data to achieve their own objectives significantly increases intergation, cooperation, and can also lead to innovative use of the data. Multicloud, privacy-aware environments facilitate such data access, allowing different parties to share processed data to reduce computation resource consumption collectively. However, there are interoperability issues in such environments that involve heterogeneous data and analytics-as-a-service providers. There is a lack of both - architectural blueprints that can support such diverse, multi-cloud environments, and corresponding empirical studies that show feasibility of such architectures. In this paper, we have outlined an innovative hierarchical data processing architecture that utilises semantics at all the levels of IoT stack in multicloud environments. We demonstrate the feasibility of such architecture by building a system based on this architecture using OpenIoT as a middleware, and Google Cloud and Microsoft Azure as cloud environments. The evaluation shows that the system is scalable and has no significant limitations or overheads.
APA, Harvard, Vancouver, ISO, and other styles
40

Yaish, HM. "A multi-tenant database framework for software and cloud computing applications." Thesis, 2014. http://hdl.handle.net/10453/30379.

Full text
Abstract:
University of Technology, Sydney. Faculty of Engineering and Information Technology.<br>Cloud Computing is a new computing paradigm that transforms accessing computing resources from internal data centres to external service providers. This approach is rapidly becoming a standard for offering cost effective and elastic computing services that are used over the internet. Software as a service (SaaS) is one of the Cloud Computing service models that exploits economies of scale for SaaS service providers by offering the same software and computing environment for multiple tenants. This contemporary multi-tenant service requires a multi-tenant database design that can accommodate data for multiple tenants in one single database schema. Due to multi-tenant database resource sharing in this service, the multi-tenant schema should be highly secured, optimized, configurable, and extendable during runtime execution to fulfil the applications’ requirements of different tenants. However, traditional Relational Database Management Systems (RDBMS) do not support such multi-tenant database schema capabilities, and it is a significant challenge to enable RDBMS to support these capabilities. Therefore, one solution is using an intermediate software layer that mediates multi-tenant applications and RDBMS, to convert multi-tenant queries into regular database queries, and to execute them in a RDBMS. Developing such a multi-tenant software layer to manage and access tenants’ data is a hard and complex problem to solve and has significant complexities that involve longer development lifecycle. There are two main contributions of this thesis. Firstly, a proposal for a novel multi-tenant schema technique called Elastic Extension Tables (EET). Secondly, a proposal for a multi-tenant database framework prototype to implement EET schema in a RDBMS. This approach can be used to develop a software layer that mediates software applications and a RDBMS. This software layer aims to facilitate the development of software applications, and multi-tenant SaaS and Big Data applications for both cloud service providers and their tenants. Extensive experiments were conducted to evaluate the feasibility and effectiveness of EET multi-tenant database schema by comparing it with Universal Table Schema Mapping (UTSM), which is commercially used. Significant performance improvements obtained using EET when compared to UTSM, makes the EET schema a good candidate for implementing multi-tenant databases and multi-tenant applications. Furthermore, the prototype of the EET framework was developed, and several experiments were performed to verify the practicability and the effectiveness of using this framework that based on EET multi-tenant database schema. The results of the experiments indicate that the EET framework is suitable for the development of software applications in general, and multi-tenant SaaS and Big Data applications in particular.
APA, Harvard, Vancouver, ISO, and other styles
41

Libório, João Paulo de Oliveira. "Privacy-Enhanced Dependable and Searchable Storage in a Cloud-of-Clouds." Master's thesis, 2016. http://hdl.handle.net/10362/20619.

Full text
Abstract:
In this dissertation we will propose a solution for a trustable and privacy-enhanced storage architecture based on a multi-cloud approach. The solution provides the necessary support for multi modal on-line searching operation on data that is always maintained encrypted on used cloud-services. We implemented a system prototype, conducting an experimental evaluation. Our results show that the proposal offers security and privacy guarantees, and provides efficient information retrieval capabilities without sacrificing precision and recall properties on the supported search operations. There is a constant increase in the demand of cloud services, particularly cloud-based storage services. These services are currently used by different applications as outsourced storage services, with some interesting advantages. Most personal and mobile applications also offer the user the choice to use the cloud to store their data, transparently and sometimes without entire user awareness and privacy-conditions, to overcome local storage limitations. Companies might also find that it is cheaper to outsource databases and keyvalue stores, instead of relying on storage solutions in private data-centers. This raises the concern about data privacy guarantees and data leakage danger. A cloud system administrator can easily access unprotected data and she/he could also forge, modify or delete data, violating privacy, integrity, availability and authenticity conditions. A possible solution to solve those problems would be to encrypt and add authenticity and integrity proofs in all data, before being sent to the cloud, and decrypting and verifying authenticity or integrity on data downloads. However this solution can be used only for backup purposes or when big data is not involved, and might not be very practical for online searching requirements over large amounts of cloud stored data that must be searched, accessed and retrieved in a dynamic way. Those solutions also impose high-latency and high amount of cloud inbound/outbound traffic, increasing the operational costs. Moreover, in the case of mobile or embedded devices, the power, computation and communication constraints cannot be ignored, since indexing, encrypting/decrypting and signing/verifying all data will be computationally expensive. To overcome the previous drawbacks, in this dissertation we propose a solution for a trustable and privacy-enhanced storage architecture based on a multi-cloud approach, providing privacy-enhanced, dependable and searchable support. Our solution provides the necessary support for dependable cloud storage and multi modal on-line searching operations over always-encrypted data in a cloud-of-clouds. We implemented a system prototype, conducting an experimental evaluation of the proposed solution, involving the use of conventional storage clouds, as well as, a high-speed in-memory cloud-storage backend. Our results show that the proposal offers the required dependability properties and privacy guarantees, providing efficient information retrieval capabilities without sacrificing precision and recall properties in the supported indexing and search operations.
APA, Harvard, Vancouver, ISO, and other styles
42

Sabih, Rafia. "Balancing Money and Time for OLAP Queries on Cloud Databases." Thesis, 2016. http://etd.iisc.ac.in/handle/2005/2931.

Full text
Abstract:
Enterprise Database Management Systems (DBMSs) have to contend with resource-intensive and time-varying workloads, making them well-suited candidates for migration to cloud plat-forms { specifically, they can dynamically leverage the resource elasticity while retaining affordability through the pay-as-you-go rental interface. The current design of database engine components lays emphasis on maximizing computing efficiency, but to fully capitalize on the cloud's benefits, the outlays of these computations also need to be factored into the planning exercise. In this thesis, we investigate this contemporary problem in the context of industrial-strength deployments of relational database systems on real-world cloud platforms. Specifically, we consider how the traditional metric used to compare query execution plans, namely response-time, can be augmented to incorporate monetary costs in the decision process. The challenge here is that execution-time and monetary costs are adversarial metrics, with a decrease in one entailing a rise in the other. For instance, a Virtual Machine (VM) with rich physical resources (RAM, cores, etc.) decreases the query response-time, but is expensive with regard to rental rates. In a nutshell, there is a tradeoff between money and time, and our goal therefore is to identify the VM that others the best tradeoff between these two competing considerations. In our study, we pro le the behavior of money versus time for a given query, and de ne the best tradeoff as the \knee" { that is, the location on the pro le with the minimum Euclidean distance from the origin. To study the performance of industrial-strength database engines on real-world cloud infrastructure, we have deployed a commercial DBMS on Google cloud services. On this platform, we have carried out extensive experimentation with the TPC-DS decision-support benchmark, an industry-wide standard for evaluating database system performance. Our experiments demonstrate that the choice of VM for hosting the database server is a crucial decision, because: (i) variation in time and money across VMs is significant for a given query, (ii) no one VM offers the best money-time tradeoff across all queries. To efficiently identify the VM with the best tradeoff from a large suite of available configurations, we propose a technique to characterize the money-time pro le for a given query. The core of this technique is a VM pruning mechanism that exploits the property of partially ordered set of the VMs on their resources. It processes the minimal and maximal VMs of this poset for estimated query response-time. If the response-times on these extreme VMs are similar, then all the VMs sandwiched between them are pruned from further consideration. Otherwise, the already processed VMs are set aside, and the minimal and maximal VMs of the remaining unprocessed VMs are evaluated for their response-times. Finally, the knee VM is identified from the processed VMs as the one with the minimum Euclidean distance from the origin on the money-time space. We theoretically prove that this technique always identifies the knee VM; further, if it is acceptable to and a \near-optimal" knee by providing a relaxation-factor on the response-time distance from the optimal knee, then it is also capable of finding more efficiently a satisfactory knee under these relaxed conditions. We propose two favors of this approach: the first one prunes the VMs using complete plan information received from database engine API, and named as Plan-based Identification of Knee (PIK). On the other hand, to further increase the efficiency of the identification of the knee VM, we propose a sub-plan based pruning algorithm called Sub-Plan-based Identification of Knee (SPIK), which requires modifications in the query optimizer. We have evaluated PIK on a commercial system and found that it often requires processing for only 20% of the total VMs. The efficiency of the algorithm is further increased significantly, by using 10-20% relaxation in response-time. For evaluating SPIK , we prototyped it on an open-source engine { Postgresql 9.3, and also implemented it as Java wrapper program with the commercial engine. Experimentally, the processing done by SPIK is found to be only 40% of the PIK approach. Therefore, from an overall perspective, this thesis facilitates the desired migration of enterprise databases to cloud platforms, by identifying the VM(s) that offer competitive tradeoffs between money and time for the given query.
APA, Harvard, Vancouver, ISO, and other styles
43

Sabih, Rafia. "Balancing Money and Time for OLAP Queries on Cloud Databases." Thesis, 2016. http://etd.iisc.ernet.in/handle/2005/2931.

Full text
Abstract:
Enterprise Database Management Systems (DBMSs) have to contend with resource-intensive and time-varying workloads, making them well-suited candidates for migration to cloud plat-forms { specifically, they can dynamically leverage the resource elasticity while retaining affordability through the pay-as-you-go rental interface. The current design of database engine components lays emphasis on maximizing computing efficiency, but to fully capitalize on the cloud's benefits, the outlays of these computations also need to be factored into the planning exercise. In this thesis, we investigate this contemporary problem in the context of industrial-strength deployments of relational database systems on real-world cloud platforms. Specifically, we consider how the traditional metric used to compare query execution plans, namely response-time, can be augmented to incorporate monetary costs in the decision process. The challenge here is that execution-time and monetary costs are adversarial metrics, with a decrease in one entailing a rise in the other. For instance, a Virtual Machine (VM) with rich physical resources (RAM, cores, etc.) decreases the query response-time, but is expensive with regard to rental rates. In a nutshell, there is a tradeoff between money and time, and our goal therefore is to identify the VM that others the best tradeoff between these two competing considerations. In our study, we pro le the behavior of money versus time for a given query, and de ne the best tradeoff as the \knee" { that is, the location on the pro le with the minimum Euclidean distance from the origin. To study the performance of industrial-strength database engines on real-world cloud infrastructure, we have deployed a commercial DBMS on Google cloud services. On this platform, we have carried out extensive experimentation with the TPC-DS decision-support benchmark, an industry-wide standard for evaluating database system performance. Our experiments demonstrate that the choice of VM for hosting the database server is a crucial decision, because: (i) variation in time and money across VMs is significant for a given query, (ii) no one VM offers the best money-time tradeoff across all queries. To efficiently identify the VM with the best tradeoff from a large suite of available configurations, we propose a technique to characterize the money-time pro le for a given query. The core of this technique is a VM pruning mechanism that exploits the property of partially ordered set of the VMs on their resources. It processes the minimal and maximal VMs of this poset for estimated query response-time. If the response-times on these extreme VMs are similar, then all the VMs sandwiched between them are pruned from further consideration. Otherwise, the already processed VMs are set aside, and the minimal and maximal VMs of the remaining unprocessed VMs are evaluated for their response-times. Finally, the knee VM is identified from the processed VMs as the one with the minimum Euclidean distance from the origin on the money-time space. We theoretically prove that this technique always identifies the knee VM; further, if it is acceptable to and a \near-optimal" knee by providing a relaxation-factor on the response-time distance from the optimal knee, then it is also capable of finding more efficiently a satisfactory knee under these relaxed conditions. We propose two favors of this approach: the first one prunes the VMs using complete plan information received from database engine API, and named as Plan-based Identification of Knee (PIK). On the other hand, to further increase the efficiency of the identification of the knee VM, we propose a sub-plan based pruning algorithm called Sub-Plan-based Identification of Knee (SPIK), which requires modifications in the query optimizer. We have evaluated PIK on a commercial system and found that it often requires processing for only 20% of the total VMs. The efficiency of the algorithm is further increased significantly, by using 10-20% relaxation in response-time. For evaluating SPIK , we prototyped it on an open-source engine { Postgresql 9.3, and also implemented it as Java wrapper program with the commercial engine. Experimentally, the processing done by SPIK is found to be only 40% of the PIK approach. Therefore, from an overall perspective, this thesis facilitates the desired migration of enterprise databases to cloud platforms, by identifying the VM(s) that offer competitive tradeoffs between money and time for the given query.
APA, Harvard, Vancouver, ISO, and other styles
44

(9187466), Bharath Kumar Comandur Jagannathan Raghunathan. "Semantic Labeling of Large Geographic Areas Using Multi-Date and Multi-View Satellite Images and Noisy OpenStreetMap Labels." Thesis, 2020.

Find full text
Abstract:
<div>This dissertation addresses the problem of how to design a convolutional neural network (CNN) for giving semantic labels to the points on the ground given the satellite image coverage over the area and, for the ground truth, given the noisy labels in OpenStreetMap (OSM). This problem is made challenging by the fact that -- (1) Most of the images are likely to have been recorded from off-nadir viewpoints for the area of interest on the ground; (2) The user-supplied labels in OSM are frequently inaccurate and, not uncommonly, entirely missing; and (3) The size of the area covered on the ground must be large enough to possess any engineering utility. As this dissertation demonstrates, solving this problem requires that we first construct a DSM (Digital Surface Model) from a stereo fusion of the available images, and subsequently use the DSM to map the individual pixels in the satellite images to points on the ground. That creates an association between the pixels in the images and the noisy labels in OSM. The CNN-based solution we present yields a 4-8% improvement in the per-class segmentation IoU (Intersection over Union) scores compared to the traditional approaches that use the views independently of one another. The system we present is end-to-end automated, which facilitates comparing the classifiers trained directly on true orthophotos vis-`a-vis first training them on the off-nadir images and subsequently translating the predicted labels to geographical coordinates. This work also presents, for arguably the first time, an in-depth discussion of large-area image alignment and DSM construction using tens of true multi-date and multi-view WorldView-3 satellite images on a distributed OpenStack cloud computing platform.</div>
APA, Harvard, Vancouver, ISO, and other styles
45

Pileththuwasan, Gallege Lahiru Sandakith. "Design, development and experimentation of a discovery service with multi-level matching." Thesis, 2013. http://hdl.handle.net/1805/3695.

Full text
Abstract:
Indiana University-Purdue University Indianapolis (IUPUI)<br>The contribution of this thesis focuses on addressing the challenges of improving and integrating the UniFrame Discovery Service (URDS) and Multi-level Matching (MLM) concepts. The objective was to find enhancements for both URDS and MLM and address the need of a comprehensive discovery service which goes beyond simple attribute based matching. It presents a detailed discussion on developing an enhanced version of URDS with MLM (proURDS). After implementing proURDS, the thesis includes details of experiments with different deployments of URDS components and different configurations of MLM. The experiments and analysis were carried out using proURDS produced MLM contracts. The proURDS referred to a public dataset called QWS dataset. This dataset includes actual information of software components (i.e., web services), which were harvested from the Internet. The proURDS implements the different matching operations as independent operators at each level of matching (i.e., General, Syntactic, Semantic, Synchronization, and QoS). Finally, a case study was carried out with the deployed proURDS. The case study addresses real world component discovery requirements from the earth science domain. It uses the contracts collected from public portals which provide geographical and weather related data.
APA, Harvard, Vancouver, ISO, and other styles
46

Kiran, Mariam. "Modelling Cities as a collection of TeraSystems - Computational challenges in Multi-Agent Approach." 2015. http://hdl.handle.net/10454/9056.

Full text
Abstract:
Yes<br>Agent-based modeling techniques are ideal for modeling massive complex systems such as insect colonies or biological cellular systems and even cities. However these models themselves are extremely complex to code, test, simulate and analyze. This paper discusses the challenges in using agent-based models to model complete cities as a complex system. In this paper we argue that Cities are actually a collection of various complex models which are themselves massive multiple systems, each of millions of agents, working together to form one system consisting of an order of a billion agents of different types - such as people, communities and technologies interacting together. Because of the agent numbers and complexity challenges, the present day hardware architectures are unable to cope with the simulations and processing of these models. To accommodate these issues, this paper proposes a Tera (to denote the order of millions)-modeling framework, which utilizes current technologies of Cloud computing and Big data processing, for modeling a city, by allowing infinite resources and complex interactions. This paper also lays the case for bringing together research communities for interdisciplinary research to build a complete reliable model of a city.
APA, Harvard, Vancouver, ISO, and other styles
47

Nettis, Andrea. "Seismic fragility and risk assessment of large bridge portfolios: efficient mechanical approaches based on multi-source data collection and integration." Doctoral thesis, 2021. http://hdl.handle.net/11589/229598.

Full text
Abstract:
In earthquake-prone countries, most of the existing bridges were designed in the past without appropriate anti-seismic regulations and can induce important direct or indirect losses if subjected to severe seismic ground shaking. The main challenges in the extensive seismic risk assessment of existing bridges are related to the large number of structures to be inspected and the limited available resources. Therefore, time- and cost-saving approaches for providing seismic risk metrics on existing bridges are needed. This dissertation investigates efficient methodologies for bridge-specific seismic risk assessment within portfolio analysis by using multi-source data integration and simplified mechanical approaches. A methodology for multi-source data collection is described. The applicability of remote-sensing data in populating inventory for structural analysis purposes is discussed. A procedure for using Remotely Piloted Aircraft Systems and photogrammetry to retrieve exhaustive structural information is presented. The effectiveness of displacement-based assessment approaches to be used together with the capacity spectrum method (CSM) for seismic performance assessment is analysed, considering continuous-deck reinforced-concrete (RC) and steel truss multi-span bridges. A fragility analysis methodology based on cloud analysis using the CSM results is also presented. The CSM is applied with real (i.e. recorded) ground-motion spectra (as opposed to code-based conventional spectra) to explicitly consider record-to-record variability. A seismic risk assessment framework combining the proposed efficient data collection and simplified probabilistic seismic assessment methodologies is finally presented. It accounts for the influence of knowledge-based uncertainties associated with an initial incomplete data collection. The proposed approach is applied and tested on eight simply-supported RC bridges of the Basilicata national road network.
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!