To see the other types of publications on this topic, follow the link: Cloud workflow.

Dissertations / Theses on the topic 'Cloud workflow'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 dissertations / theses for your research on the topic 'Cloud workflow.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.

1

Chaudhry, Nauman Riaz. "Workflow framework for cloud-based distributed simulation." Thesis, Brunel University, 2016. http://bura.brunel.ac.uk/handle/2438/14778.

Full text
Abstract:
Although distributed simulation (DS) using parallel computing has received considerable research and development in a number of compute-intensive fields, it has still to be significantly adopted by the wider simulation community. According to scientific literature, major reasons for low adoption of cloud-based services for DS execution are the perceived complexities of understanding and managing the underlying architecture and software for deploying DS models, as well as the remaining challenges in performance and interoperability of cloud-based DS. The focus of this study, therefore, has been to design and test the feasibility of a well-integrated, generic, workflow structured framework that is universal in character and transparent in implementation. The choice of a workflow framework for implementing cloud-based DS was influenced by the ability of scientific workflow management systems to define, execute, and actively manage computing workflows. As a result of this study, a hybrid workflow framework, combined with four cloud-based implementation services, has been used to develop an integrated potential standard for workflow implementation of cloud-based DS, which has been named the WORLDS framework (Workflow Framework for Cloud-based Distributed Simulation). The main contribution of this research study is the WORLDS framework itself, which identifies five services (including a Parametric Study Service) that can potentially be provided through the use of workflow technologies to deliver effective cloud-based distributed simulation that is transparently provisioned for the user. This takes DS a significant step closer to its provision as a viable cloud-based service (DSaaS). In addition, the study introduces a simple workflow solution to applying parametric studies to distributed simulations. Further research to confirm the generic nature of the workflow framework, to apply and test modified HLA standards, and to introduce a simulation analytics function by modifying the workflow is anticipated.
APA, Harvard, Vancouver, ISO, and other styles
2

Lopez, Israel Casas. "Scientific Workflow Scheduling for Cloud Computing Environments." Thesis, The University of Sydney, 2016. http://hdl.handle.net/2123/16769.

Full text
Abstract:
The scheduling of workflow applications consists of assigning their tasks to computer resources to fulfill a final goal such as minimizing total workflow execution time. For this reason, workflow scheduling plays a crucial role in efficiently running experiments. Workflows often have many discrete tasks and the number of different task distributions possible and consequent time required to evaluate each configuration quickly becomes prohibitively large. A proper solution to the scheduling problem requires the analysis of tasks and resources, production of an accurate environment model and, most importantly, the adaptation of optimization techniques. This study is a major step toward solving the scheduling problem by not only addressing these issues but also optimizing the runtime and reducing monetary cost, two of the most important variables. This study proposes three scheduling algorithms capable of answering key issues to solve the scheduling problem. Firstly, it unveils BaRRS, a scheduling solution that exploits parallelism and optimizes runtime and monetary cost. Secondly, it proposes GA-ETI, a scheduler capable of returning the number of resources that a given workflow requires for execution. Finally, it describes PSO-DS, a scheduler based on particle swarm optimization to efficiently schedule large workflows. To test the algorithms, five well-known benchmarks are selected that represent different scientific applications. The experiments found the novel algorithms solutions substantially improve efficiency, reducing makespan by 11% to 78%. The proposed frameworks open a path for building a complete system that encompasses the capabilities of a workflow manager, scheduler, and a cloud resource broker in order to offer scientists a single tool to run computationally intensive applications.
APA, Harvard, Vancouver, ISO, and other styles
3

Cao, Fei. "Efficient Scientific Workflow Scheduling in Cloud Environment." OpenSIUC, 2014. https://opensiuc.lib.siu.edu/dissertations/802.

Full text
Abstract:
Cloud computing enables the delivery of remote computing, software and storage services through web browsers following pay-as-you-go model. In addition to successful commercial applications, many research efforts including DOE Magellan Cloud project focus on discovering the opportunities and challenges arising from the computing and data-intensive scientific applications that are not well addressed by the current supercomputers, Linux clusters and Grid technologies. The elastic resource provision, noninterfering resource sharing and flexible customized configuration provided by the Cloud infrastructure has shed light on efficient execution of many scientific applications modeled as Directed Acyclic Graph (DAG) structured workflows to enforce the intricate dependency among a large number of different processing tasks. Meanwhile, the Cloud environment poses various challenges. Cloud providers and Cloud users pursue different goals. Providers aim to maximize profit by achieving higher resource utilization and users want to minimize expenses while meeting their performance requirements. Moreover, due to the expanding Cloud services and emerging newer technologies, the ever-increasing heterogeneity of the Cloud environment complicates the challenges for both parties. In this thesis, we address the workflow scheduling problem from different applications and various objectives. For batch applications, due to the increasing deployment of many data centers and computer servers around the globe escalated by the higher electricity price, the energy cost on running the computing, communication and cooling together with the amount of CO2 emissions have skyrocketed. In order to maintain sustainable Cloud computing facing with ever-increasing problem complexity and big data size in the next decades, we design and develop energy-aware scientific workflow scheduling algorithm to minimize energy consumption and CO2 emission while still satisfying certain Quality of Service (QoS) such as response time specified in Service Level Agreement (SLA). Furthermore, the underlying Cloud hardware/Virtual Machine (VM) resource availability is time-dependent because of the dual operation modes namely on-demand and reservation instances at various Cloud data centers. We also apply techniques such as Dynamic Voltage and Frequency Scaling (DVFS) and DNS scheme to further reduce energy consumption within acceptable performance bounds. Our multiple-step resource provision and allocation algorithm achieves the response time requirement in the step of forward task scheduling and minimizes the VM overhead for reduced energy consumption and higher resource utilization rate in the backward task scheduling step. We also evaluate the candidacy of multiple data centers from the energy and performance efficiency perspectives as different data centers have various energy and cost related parameters. For streaming applications, we formulate scheduling problems with two different objectives, namely one is to maximize the throughput under a budget constraint while another is to minimize execution cost under a minimum throughput constraint. Two different algorithms named as Budget constrained RATE (B-RATE) and Budget constrained SWAP (B-SWAP) are designed under the first objective; Another two algorithms, namely Throughput constrained RATE (TP-RATE) and Throughput constrained SWAP (TP-SWAP) are developed under the second objective.
APA, Harvard, Vancouver, ISO, and other styles
4

Gonzalez, Nelson Mimura. "MPSF: cloud scheduling framework for distributed workflow execution." Universidade de São Paulo, 2016. http://www.teses.usp.br/teses/disponiveis/3/3141/tde-03032017-083914/.

Full text
Abstract:
Cloud computing represents a distributed computing paradigm that gained notoriety due to its properties related to on-demand elastic and dynamic resource provisioning. These characteristics are highly desirable for the execution of workflows, in particular scientific workflows that required a great amount of computing resources and that handle large-scale data. One of the main questions in this sense is how to manage resources of one or more cloud infrastructures to execute workflows while optimizing resource utilization and minimizing the total duration of the execution of tasks (makespan). The more complex the infrastructure and the tasks to be executed are, the higher the risk of incorrectly estimating the amount of resources to be assigned to each task, leading to both performance and monetary costs. Scenarios which are inherently more complex, such as hybrid and multiclouds, rarely are considered by existing resource management solutions. Moreover, a thorough research of relevant related work revealed that most of the solutions do not address data-intensive workflows, a characteristic that is increasingly evident for modern scientific workflows. In this sense, this proposal presents MPSF, the Multiphase Proactive Scheduling Framework, a cloud resource management solution based on multiple scheduling phases that continuously assess the system to optimize resource utilization and task distribution. MPSF defines models to describe and characterize workflows and resources. MPSF also defines performance and reliability models to improve load distribution among nodes and to mitigate the effects of performance fluctuations and potential failures that might occur in the system. Finally, MPSF defines a framework and an architecture to integrate all these components and deliver a solution that can be implemented and tested in real applications. Experimental results show that MPSF is able to predict with much better accuracy the duration of workflows and workflow phases, as well as providing performance gains compared to greedy approaches.<br>A computação em nuvem representa um paradigma de computação distribuída que ganhoudestaque devido a aspectos relacionados à obtenção de recursos sob demanda de modo elástico e dinâmico. Estas características são consideravelmente desejáveis para a execução de tarefas relacionadas a fluxos de trabalho científicos, que exigem grande quantidade de recursos computacionais e grande fluxo de dados. Uma das principais questões neste sentido é como gerenciar os recursos de uma ou mais infraestruturas de nuvem para execução de fluxos de trabalho de modo a otimizar a utilização destes recursos e minimizar o tempo total de execução das tarefas. Quanto mais complexa a infraestrutura e as tarefas a serem executadas, maior o risco de estimar incorretamente a quantidade de recursos destinada para cada tarefa, levando a prejuízos não só em termos de tempo de execução como também financeiros. Cenários inerentemente mais complexos como nuvens híbridas e múltiplas nuvens raramente são considerados em soluções existentes de gerenciamento de recursos para nuvens. Além destes fatores, a maioria das soluções não oferece mecanismos claros para tratar de fluxos de trabalho com alta intensidade de dados, característica cada vez mais proeminente em fluxos de trabalho moderno. Neste sentido, esta proposta apresenta MPSF, uma solução de gerenciamento de recursos baseada em múltiplas fases de gerenciamento baseadas em mecanismos dinâmicos de alocação de tarefas. MPSF define modelos para descrever e caracterizar fluxos de trabalho e recursos de modo a suportar cenários simples e complexos, como nuvens híbridas e nuvens integradas. MPSF também define modelos de desempenho e confiabilidade para melhor distribuir a carga e para combater os efeitos de possíveis falhas que possam ocorrer no sistema. Por fim, MPSF define um arcabouço e um arquitetura que integra todos estes componentes de modo a definir uma solução que possa ser implementada e utilizada em cenários reais. Testes experimentais indicam que MPSF não só é capaz de prever com maior precisão a duração da execução de tarefas, como também consegue otimizar a execução das mesmas, especialmente para tarefas que demandam alto poder computacional e alta quantidade de dados.
APA, Harvard, Vancouver, ISO, and other styles
5

Ahmad, M. K. H. "Scientific workflow execution reproducibility using cloud-aware provenance." Thesis, University of the West of England, Bristol, 2016. http://eprints.uwe.ac.uk/27390/.

Full text
Abstract:
Scientific experiments and projects such as CMS and neuGRIDforYou (N4U) are annually producing data of the order of Peta-Bytes. They adopt scientific workflows to analyse this large amount of data in order to extract meaningful information. These workflows are executed over distributed resources, both compute and storage in nature, provided by the Grid and recently by the Cloud. The Cloud is becoming the playing field for scientists as it provides scalability and on-demand resource provisioning. Reproducing a workflow execution to verify results is vital for scientists and have proven to be a challenge. As per a study (Belhajjame et al. 2012) around 80% of workflows cannot be reproduced, and 12% of them are due to the lack of information about the execution environment. The dynamic and on-demand provisioning capability of the Cloud makes this more challenging. To overcome these challenges, this research aims to investigate how to capture the execution provenance of a scientific workflow along with the resources used to execute the workflow in a Cloud infrastructure. This information will then enable a scientist to reproduce workflow-based scientific experiments on the Cloud infrastructure by re-provisioning the similar resources on the Cloud. Provenance has been recognised as information that helps in debugging, verifying and reproducing a scientific workflow execution. Recent adoption of Cloud-based scientific workflows presents an opportunity to investigate the suitability of existing approaches or to propose new approaches to collect provenance information from the Cloud and to utilize it for workflow reproducibility on the Cloud. From literature analysis, it was found that the existing approaches for Grid or Cloud do not provide detailed resource information and also do not present an automatic provenance capturing approach for the Cloud environment. To mitigate the challenges and fulfil the knowledge gap, a provenance based approach, ReCAP, has been proposed in this thesis. In ReCAP, workflow execution reproducibility is achieved by (a) capturing the Cloud-aware provenance (CAP), b) re-provisioning similar resources on the Cloud and re-executing the workflow on them and (c) by comparing the provenance graph structure including the Cloud resource information, and outputs of workflows. ReCAP captures the Cloud resource information and links it with the workflow provenance to generate Cloud-aware provenance. The Cloud-aware provenance consists of configuration parameters relating to hardware and software describing a resource on the Cloud. This information once captured aids in re-provisioning the same execution infrastructure on the Cloud for workflow re-execution. Since resources on the Cloud can be used in static or dynamic (i.e. destroyed when a task is finished) manner, this presents a challenge for the devised provenance capturing approach. In order to deal with these scenarios, different capturing and mapping approaches have been presented in this thesis. These mapping approaches work outside the virtual machine and collect resource information from the Cloud middleware, thus they do not affect job performance. The impact of the collected Cloud resource information on the job as well as on the workflow execution has been evaluated through various experiments in this thesis. In ReCAP, the workflow reproducibility is verified by comparing the provenance graph structure, infrastructure details and the output produced by the workflows. To compare the provenance graphs, the captured provenance information including infrastructure details is translated to a graph model. These graphs of original execution and the reproduced execution are then compared in order to analyse their similarity. In this regard, two comparison approaches have been presented that can produce a qualitative analysis as well as quantitative analysis about the graph structure. The ReCAP framework and its constituent components are evaluated using different scientific workflows such as ReconAll and Montage from the domains of neuroscience (i.e. N4U) and astronomy respectively. The results have shown that ReCAP has been able to capture the Cloud-aware provenance and demonstrate the workflow execution reproducibility by re-provisioning the same resources on the Cloud. The results have also demonstrated that the provenance comparison approaches can determine the similarity between the two given provenance graphs. The results of workflow output comparison have shown that this approach is suitable to compare the outputs of scientific workflows, especially for deterministic workflows.
APA, Harvard, Vancouver, ISO, and other styles
6

Salvucci, Enrico. "MLOps - Standardizing the Machine Learning Workflow." Master's thesis, Alma Mater Studiorum - Università di Bologna, 2021. http://amslaurea.unibo.it/23645/.

Full text
Abstract:
MLOps is a very recent approach aimed at reducing the time to get a Machine Learning model in production; this methodology inherits its main features from DevOps and applies them to Machine Learning, by adding more features specific for Data Analysis. This thesis, which is the result of the internship at Data Reply, is aimed at studying this new approach and exploring different tools to build an MLOps architecture; another goal is to use these tools to implement an MLOps architecture (by using preferably Open Source software). This study provides a deep analysis of MLOps features, also compared to DevOps; furthermore, an in-depth survey on the tools, available in the market to build an MLOps architecture, is offered by focusing on Open Source tools. The reference architecture, designed adopting an exploratory approach, is implemented through MLFlow, Kubeflow, BentoML and deployed by using Google Cloud Platform; furthermore, the architecture is compared to different use cases of companies that have recently started adopting MLOps. MLOps is rapidly evolving and maturing, for these reasons many companies are starting to adopt this methodology. Based on the study conducted with this thesis, companies dealing with Machine Learning should consider adopting MLOps. This thesis can be a starting point to explore MLOps both theoretically and practically (also by relying on the implemented reference architecture and its code).
APA, Harvard, Vancouver, ISO, and other styles
7

Brodard, Zacharie. "Workflow management and scheduling in a cloud computing context." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-249714.

Full text
Abstract:
Public cloud providers have had a tremendous impact on the software engineering world recently by offering on-demand computing infrastructures and scalable managed solutions. The goal of this master degree project is to determine whether these services offered by public cloud providers can improve the design of workflow management and scheduling systems for data-driven workflows. A cloud based architecture for a distributed workflow management system was designed, implemented and experimented on. It allowed to confirm the value of cloud computing solutions to solve workflow scheduling problems, by allowing to efficiently launch workloads on elastic computing resources.<br>På senaste tiden har public cloud-leverantörer haft stor effekt på mjukvaruingenjörsvetenskapen genom att föreslå databehandlingsinfrastrukturer på begäran samt skalbara lösningar. Målet med detta examensarbete är att fastställa om dessa tjänster, som föreslås av public cloud-leverantörer, kan förbättra designen av arbetsflödeshantering och schemaläggningssystem för datastyrda arbetsflöden. En molnbaserad arkitektur för ett distribuerat system av arbetsflödeshantering utformades, implementerades och testades. Genom effektivt utnyttjande av elastiska datorresurser kunde värdet av molnlösningar för att lösa schemaläggningsproblem i arbetsflöden bekräftas.
APA, Harvard, Vancouver, ISO, and other styles
8

Sharif, Nabavi Shaghayeh. "Privacy-Aware Workflow Scheduling Algorithms in Hybrid Cloud Environments." Thesis, The University of Sydney, 2017. http://hdl.handle.net/2123/16576.

Full text
Abstract:
Hybrid clouds have gained popularity with many organisations in recent times due to their ability to provide additional computing capacity to private clouds. A hybrid cloud complements private cloud computing resources when there is resource scarcity. However, deploying distributed applications, such as work ows, in hybrid clouds introduces new challenges such as the privacy of tasks and data in scheduling work ows. The key problem is the danger of exposing private data and tasks in a third-party public cloud infrastructure, especially in healthcare applications. In this thesis, we tackle the problem of scheduling work ows in hybrid clouds while considering the multi-level privacy and deadline constraints. This research is di erent from most studies on work ow scheduling in which the main goal is to achieve a balance between desirable yet incompatible constraints. Although many others have addressed the trade-o between cost and time, or between single-level privacy and cost, their work still su ers from insu cient consideration of the trade-o between multi-level privacy constraints and time. To address such shortcomings in the literature, we introduced a new privacy-preserving method to execute work ows with multi-level privacy of tasks and data. This privacypreserving method assists private cloud providers and work ow owners to deploy their work ows in the hybrid cloud environment without the concern of exposing sensitive information in the public cloud environment. We proposed three static (o -line) scheduling algorithms to minimise the execution cost of a single work ow in hybrid clouds while meeting the privacy and deadline constraints. The evaluations of these algorithms indicate their e ciency in minimising the scheduling cost in time-pressured scenarios. We further proposed two formulations using mixed integer linear programming (MILP), namely, the discrete-time model and the continuous-time model, to schedule single work ows in a hybrid cloud environment while satisfying deadline and privacy constraints. We investigated the advantages and disadvantages of using these models and their impacts on the scheduling cost, as well as their solving time. We observed that using MILP to formulate and solve the scheduling problem would produce satisfactory results in reducing the work ow execution cost in hybrid clouds. We improved the continuous-time model by applying a greedy heuristic. This resulted in a faster solving time at the price of a higher scheduling cost. In addition, we introduced two dynamic (on-line) multiple work ow scheduling algorithms. These algorithms considered the dynamic nature of the hybrid cloud environment where the availability of cloud resources can be increased or decreased at any point of the scheduling time. iii In a nutshell, we considered both static and dynamic con gurations for hybrid clouds in developing the scheduling algorithms. We evaluated the scheduling algorithms (online and o -line) using real-world work ow datasets. The results show that the proposed scheduling algorithms are e cient in reducing the cost of executing work ows in hybrid clouds under multi-level privacy and deadline constraints in time-pressured scenarios.
APA, Harvard, Vancouver, ISO, and other styles
9

Chen, Ziwei. "Workflow Management Service based on an Event-driven Computational Cloud." Thesis, KTH, Skolan för informations- och kommunikationsteknik (ICT), 2013. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-141696.

Full text
Abstract:
The Event-driven computing paradigm, also known as trigger computing, is widely used in computer technology. Computer systems, such as database systems, introduce trigger mechanism to reduce repetitive human intervention. With the growing complexity of industrial use case requirements, independent and isolated triggers cannot fulfil the demands any more. Fortunately, an independent trigger can be triggered by the result produced by other triggered actions, and that enables the modelling of complex use cases, where the chains or graphs that consist of triggers are called workflows. Therefore, workflow construction and manipulation become a must for implementing the complex use cases. As the developing platform of this study, VISION Cloud is a computational storage system that executes small programs called storles as independent computation units in the storage. Similar to the trigger mechanism in database systems, storlets are triggered by specific events and then execute computations. As a result, one storlet can also be triggered by the result produced by other storlets, and it is called connections between storlets. Due to the growing complexity of use case requirements, an urgent demand is to have starlet workflow management supported in VISION system. Furthermore, because of the existence of connections between storlets in VISION, problems as non-termination triggering and unexpected overwriting appear as side-effects. This study develops a management service that consists of an analysis engine and a multi-level visualization interface. The analysis engine checks the connections between storlets by utilizing the technology of automatic theorem proving and deterministic finite automaton. The involved storlets and their connections are displayed in graphs via the multi-level visualization interface. Furthermore, the aforementioned connection problems are detected with graph theory algorithms. Finally, experimental results with practical use case examples demonstrate the correctness and comprehensiveness of the service. Algorithm performance and possible optimization are also discussed. They lead the way for future work to create a portable framework of event-driven workflow management services.
APA, Harvard, Vancouver, ISO, and other styles
10

Kiaian, Mousavy Sayyed Ali. "A learning based workflow scheduling approach on volatile cloud resources." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-282528.

Full text
Abstract:
Workflows, originally from the business world, provide a systematic organization to an otherwise chaotic complex process. Therefore, they have become dominant and popular in scientific computation, where complex and broadscale data analysis and scientific automation are required. In recent years, demand for reliable algorithms for workflow optimization problems, mainly scheduling and resource provisioning have grown considerably. There are various algorithms and proposals to optimize these problems. However, most of these provisioning and algorithms do not account for reliability and robustness. Besides, those that do require assumptions and handcrafted heuristics with manual parameter assignment to provide solutions. In this thesis, a new workflow scheduling algorithm is proposed that learns the heuristics required for reliability and robustness consideration in a volatile cloud environment, particularly on Amazon EC2 spot instances. Furthermore, the algorithm uses the learned data to propose an efficient scheduling strategy that prioritizes reliability but also considers minimization of execution time. The proposed algorithm mainly improves upon Failure rate and reliability in comparison to the other tested algorithms, such as Heterogeneous Earliest Finish Time(HEFT) and ReplicateAll, while at the same time, maintaining an acceptable degradation in Makespan compared to the vanilla HEFT, making it more reliable in an unreliable environment as a result. We have discovered that our proposed algorithm performs 5% worse than the baseline HEFT regarding total execution time. However, we realised that it wastes 52% less resources compared to the baseline HEFT and uses 40% less resources compared to the ReplicateAll algorithm as a result of reduced failure rate in the unreliable environment.<br>Efterfrågan på pålitliga algoritmer för arbetsflödesoptimering har ökat avsevärt. Det finns olika algoritmer och förslag till optimeringar av dem. De flesta algoritmerna står dock inte för tillförlitlighet och robusthet. I det här examensarbetet föreslås en ny arbetsflödesplaneringsalgoritm som tränas i den heuristik som krävs för tillförlitlighet och robusthetsbedömning i Amazon EC2 spotinstanser. Algoritmen förbättras med Heterogene Earliest Finish Time (HEFT), som är en populär heuristisk algoritm som används för att schemalägga arbetsflöden. Algoritmen använder inlärda data för att föreslå en effektiv schemaläggningsstrategi som prioriterar tillförlitlighet men också beaktar minimering av körningstiden. Vi har upptäckt att vår föreslagna algoritm har 5% längre exekveringstid än baslinje-HEFT, ödslar 52% mindre resurser jämfört med baslinje-HEFT och använder 40% mindre resurser jämfört med ReplicateAllalgoritmen som ett resultat av minskad felfrekvens i den opålitliga miljön.
APA, Harvard, Vancouver, ISO, and other styles
11

Nagavaram, Ashish. "Cloud Based Dynamic Workflow with QOS For Mass Spectrometry Data Analysis." The Ohio State University, 2011. http://rave.ohiolink.edu/etdc/view?acc_num=osu1322681210.

Full text
APA, Harvard, Vancouver, ISO, and other styles
12

Meeramohideen, Mohamed Nabeel. "Data-Intensive Biocomputing in the Cloud." Thesis, Virginia Tech, 2013. http://hdl.handle.net/10919/23847.

Full text
Abstract:
Next-generation sequencing (NGS) technologies have made it possible to rapidly sequence the human genome, heralding a new era of health-care innovations based on personalized genetic information. However, these NGS technologies generate data at a rate that far outstrips Moore\'s Law. As a consequence, analyzing this exponentially increasing data deluge requires enormous computational and storage resources, resources that many life science institutions do not have access to. As such, cloud computing has emerged as an obvious, but still nascent, solution. This thesis intends to investigate and design an efficient framework for running and managing large-scale data-intensive scientific applications in the cloud. Based on the learning from our parallel implementation of a genome analysis pipeline in the cloud, we aim to provide a framework for users to run such data-intensive scientific workflows using a hybrid setup of client and cloud resources. We first present SeqInCloud, our highly scalable parallel implementation of a popular genetic variant pipeline called genome analysis toolkit (GATK), on the Windows Azure HDInsight cloud platform. Together with a parallel implementation of GATK on Hadoop, we evaluate the potential of using cloud computing for large-scale DNA analysis and present a detailed study on efficiently utilizing cloud resources for running data-intensive, life-science applications. Based on our experience from running SeqInCloud on Azure, we present CloudFlow, a feature rich workflow manager for running MapReduce-based bioinformatic pipelines utilizing both client and cloud resources. CloudFlow, built on the top of an existing MapReduce-based workflow manager called Cloudgene, provides unique features that are not offered by existing MapReduce-based workflow managers, such as enabling simultaneous use of client and cloud resources, automatic data-dependency handling between client and cloud resources, and the flexibility of implementing user-defined plugins for data transformations. In-general, we believe that our work attempts to increase the adoption of cloud resources for running data-intensive scientific workloads.<br>Master of Science
APA, Harvard, Vancouver, ISO, and other styles
13

Pietri, Ilia. "Cost-efficient resource management for scientific workflows on the cloud." Thesis, University of Manchester, 2016. https://www.research.manchester.ac.uk/portal/en/theses/costefficient-resource-management-for-scientific-workflows-on-the-cloud(4cfe73ce-1de9-411b-8288-f463d6b52680).html.

Full text
Abstract:
Scientific workflows are used in many scientific fields to abstract complex computations (tasks) and data or flow dependencies between them. High performance computing (HPC) systems have been widely used for the execution of scientific workflows. Cloud computing has gained popularity by offering users on-demand provisioning of resources and providing the ability to choose from a wide range of possible configurations. To do so, resources are made available in the form of virtual machines (VMs), described as a set of resource characteristics, e.g. amount of CPU and memory. The notion of VMs enables the use of different resource combinations which facilitates the deployment of the applications and the management of the resources. A problem that arises is determining the configuration, such as the number and type of resources, that leads to efficient resource provisioning. For example, allocating a large amount of resources may reduce application execution time however at the expense of increased costs. This thesis investigates the challenges that arise on resource provisioning and task scheduling of scientific workflows and explores ways to address them, developing approaches to improve energy efficiency for scientific workflows and meet the user's objectives, e.g. makespan and monetary cost. The motivation stems from the wide range of options that enable to select cost-efficient configurations and improve resource utilisation. The contributions of this thesis are the following. (i) A survey of the issues arising in resource management in cloud computing; The survey focuses on VM management, cost efficiency and the deployment of scientific workflows. (ii) A performance model to estimate the workflow execution time for a different number of resources based on the workflow structure; The model can be used to estimate the respective user and energy costs in order to determine configurations that lead to efficient resource provisioning and achieve a balance between various conflicting goals. (iii) Two energy-aware scheduling algorithms that maximise the number of completed workflows from an ensemble under energy and budget or deadline constraints; The algorithms address the problem of energy-aware resource provisioning and scheduling for scientific workflow ensembles. (iv) An energy-aware algorithm that selects the frequency to be used for each workflow task in order to achieve energy savings without exceeding the workflow deadline; The algorithm takes into account the different requirements and constraints that arise depending on the workflow and system characteristics. (v) Two cost-based frequency selection algorithms that choose the CPU frequency for each provisioned resource in order to achieve cost-efficient resource configurations for the user and complete the workflow within the deadline; Decision making is based on both the workflow characteristics and the pricing model of the provider.
APA, Harvard, Vancouver, ISO, and other styles
14

Truong, Huu Tram. "Workflow-based applications performance and execution cost optimization on cloud infrastructures." Nice, 2010. http://www.theses.fr/2010NICE4091.

Full text
Abstract:
Les infrastructures virtuelles de cloud sont de plus en plus exploitées pour relever les défis de calcul intensif en sciences comme dans l’industrie. Elles fournissent des ressources de calcul, de communication et de stockage à la demande pour satisfaire les besoins des applications à grande échelle. Pour s’adapter à la diversité de ces infrastructures, de nouveaux outils et modèles sont nécessaires. L’estimation de la quantité de ressources consommées par chaque application est un problème particulièrement difficile, tant pour les utilisateurs qui visent à minimiser leurs coûts que pour les fournisseurs d’infrastructure qui visent à contrôler l’allocation des ressources. Même si une quantité quasi illimitée de ressources peut être allouée, un compromis doit être trouvé entre le coût de l’infrastructure allouée, la performance attendue et la performance optimale atteignable qui dépend du niveau de parallélisme inhérent à l’application. Partant du cas d’utilisation de l’analyse d’images médicales, un domaine scientifique représentatif d’un grand nombre d’applications à grandes échelle, cette thèse propose un modèle de coût à grain fin qui s’appuie sur l’expertise extraite de l’application formalisée comme un flot. Quatre stratégies d’allocation des ressources basées sur ce modèle de coût sont introduites. En tenant compte à la fois des ressources de calcul et de communication, ces stratégies permettent aux utilisateurs de déterminer la quantité de ressources de calcul et de bande passante à réserver afin de composer leur environnement d’exécution. De plus, l’optimisation du transfert de données et la faible fiabilité des systèmes à grande échelle, qui sont des problèmes bien connus ayant un impact sur la performance de l’application et donc sur le coût d’utilisation des infrastructures, sont également pris en considération. Les expériences exposées dans cette thèse ont été effectuées sur la plateforme Aladdin/Grid’50000, en utilisant l’intergiciel HIPerNet. Ce gestionnaire de plateforme vituelle permet la virtualisation de ressources de calcul et de communication. Une application réelle d’analyse d’images médicales a été utilisée pour toutes les validations expérimentales. Les résultats expérimentaux montrent la validité de l’approche en termes de contrôle du coût de l’infrastructure et de la performance des applications. Nos contributions facilitent à la fois l’exploitation des infrastructures de cloud, offrant une meilleure qualité de services aux utilisateurs, et la planification de la mise à disposition des ressources virtualisées<br>Cloud computing is increasingly exploited to tackle the computing challenges raised in both science and industry. Clouds provide computing, network and storage resources on demand to satisfy the needs of large-scale distributed applications. To adapt to the diversity of cloud infrastructures and usage, new tools and models are needed. Estimating the amount of resources consumed by each application in particular is a difficult problem, both for end users who aim at minimizing their cost and infrastructure providers who aim at controlling their resources allocation. Although a quasi-unlimited amount of resources may be allocated, a trade-off has to be found between the allocated infrastructure cost, the expected performance and the optimal performance achievable that depends on the level of parallelization of the applications. Focusing on medical image analysis, a scientific domain representative of the large class of data intensive distributed applications, this thesis propose a fine-grained cost function model relying on the expertise captured form the application. Based on this cost function model, four resources allocation strategies are proposed. Taking into account both computing and network resources, these strategies help users to determine the amount of resources to reserve and compose their execution environment. In addition, the data transfer overhead and the low reliability level, which are well-known problems of large-scale distributed systems impacting application performance and infrastructure usage cost, are also considered. The experiments reported in this thesis were carried out on the Aladdin/Grid’50000 infrastructure, using the HIPerNet virtualization middleware. This virtual platform manager enables the joint virtualization of computing and network resources. A real medical image analysis application was considered for all experimental validations. The experimental results assess the validity of the approach in terms of infrastructure cost and application performance control. Our contributions both facilitate the exploitation of cloud infrastructure, delivering a higher quality of services to end users, and help the planning of cloud resources delivery
APA, Harvard, Vancouver, ISO, and other styles
15

Muresan, Adrian. "Scheduling and deployment of large-scale applications on Cloud platforms." Phd thesis, Ecole normale supérieure de lyon - ENS LYON, 2012. http://tel.archives-ouvertes.fr/tel-00786475.

Full text
Abstract:
Infrastructure as a service (IaaS) Cloud platforms are increasingly used in the IT industry. IaaS platforms are providers of virtual resources from a catalogue of predefined types. Improvements in virtualization technology make it possible to create and destroy virtual machines on the fly, with a low overhead. As a result, the great benefit of IaaS platforms is the ability to scale a virtual platform on the fly, while only paying for the used resources. From a research point of view, IaaS platforms raise new questions in terms of making efficient virtual platform scaling decisions and then efficiently scheduling applications on dynamic platforms. The current thesis is a step forward towards exploring and answering these questions. The first contribution of the current work is focused on resource management. We have worked on the topic of automatically scaling cloud client applications to meet changing platform usage. There have been various studies showing self-similarities in web platform traffic which implies the existence of usage patterns that may or may not be periodical. We have developed an automatic platform scaling strategy that predicted platform usage by identifying non-periodic usage patterns and extrapolating future platform usage based on them. Next we have focused on extending an existing grid platform with on-demand resources from an IaaS platform. We have developed an extension to the DIET (Distributed Interactive Engineering Toolkit) middleware, that uses a virtual market based approach to perform resource allocation. Each user is given a sum of virtual currency that he will use for running his tasks. This mechanism help in ensuring fair platform sharing between users. The third and final contribution targets application management for IaaS platforms. We have studied and developed an allocation strategy for budget-constrained workflow applications that target IaaS Cloud platforms. The workflow abstraction is very common amongst scientific applications. It is easy to find examples in any field from bioinformatics to geology. In this work we have considered a general model of workflow applications that comprise parallel tasks and permit non-deterministic transitions. We have elaborated two budget-constrained allocation strategies for this type of workflow. The problem is a bi-criteria optimization problem as we are optimizing both budget and workflow makespan. This work has been practically validated by implementing it on top of the Nimbus open source cloud platform and the DIET MADAG workflow engine. This is being tested with a cosmological simulation workflow application called RAMSES. This is a parallel MPI application that, as part of this work, has been ported for execution on dynamic virtual platforms. Both theoretical simulations and practical experiments have shown encouraging results and improvements.
APA, Harvard, Vancouver, ISO, and other styles
16

Qasha, Rawaa Putros Polos. "Automatic deployment and reproducibility of workflow on the Cloud using container virtualization." Thesis, University of Newcastle upon Tyne, 2017. http://hdl.handle.net/10443/4037.

Full text
Abstract:
Cloud computing is a service-oriented approach to distributed computing that has many attractive features, including on-demand access to large compute resources. One type of cloud applications are scientific work ows, which are playing an increasingly important role in building applications from heterogeneous components. Work ows are increasingly used in science as a means to capture, share, and publish computational analysis. Clouds can offer a number of benefits to work ow systems, including the dynamic provisioning of the resources needed for computation and storage, which has the potential to dramatically increase the ability to quickly extract new results from the huge amounts of data now being collected. However, there are increasing number of Cloud computing platforms, each with different functionality and interfaces. It therefore becomes increasingly challenging to de ne work ows in a portable way so that they can be run reliably on different clouds. As a consequence, work ow developers face the problem of deciding which Cloud to select and - more importantly for the long-term - how to avoid vendor lock-in. A further issue that has arisen with work ows is that it is common for them to stop being executable a relatively short time after they were created. This can be due to the external resources required to execute a work ow - such as data and services - becoming unavailable. It can also be caused by changes in the execution environment on which the work ow depends, such as changes to a library causing an error when a work ow service is executed. This "work ow decay" issue is recognised as an impediment to the reuse of work ows and the reproducibility of their results. It is becoming a major problem, as the reproducibility of science is increasingly dependent on the reproducibility of scientific work ows. In this thesis we presented new solutions to address these challenges. We propose a new approach to work ow modelling that offers a portable and re-usable description of the work ow using the TOSCA specification language. Our approach addresses portability by allowing work ow components to be systematically specifed and automatically - v - deployed on a range of clouds, or in local computing environments, using container virtualisation techniques. To address the issues of reproducibility and work ow decay, our modelling and deployment approach has also been integrated with source control and container management techniques to create a new framework that e ciently supports dynamic work ow deployment, (re-)execution and reproducibility. To improve deployment performance, we extend the framework with number of new optimisation techniques, and evaluate their effect on a range of real and synthetic work ows.
APA, Harvard, Vancouver, ISO, and other styles
17

Croubois, Hadrien. "Toward an autonomic engine for scientific workflows and elastic Cloud infrastructure." Thesis, Lyon, 2018. http://www.theses.fr/2018LYSEN061/document.

Full text
Abstract:
Les infrastructures de calcul scientifique sont en constante évolution, et l’émergence de nouvelles technologies nécessite l’évolution des mécanismes d’ordonnancement qui leur sont associé. Durant la dernière décennie, l’apparition du modèle Cloud a suscité de nombreux espoirs, mais l’idée d’un déploiement et d’une gestion entièrement automatique des plates-formes de calcul est jusque la resté un voeu pieu. Les travaux entrepris dans le cadre de ce doctorat visent a concevoir un moteur de gestion de workflow qui intègre les logiques d’ordonnancement ainsi que le déploiement automatique d’une infrastructure Cloud. Plus particulièrement, nous nous intéressons aux plates-formes Clouds disposant de système de gestion de données de type DaaS (Data as a Service). L’objectif est d’automatiser l’exécution de workflows arbitrairement complexe, soumis de manière indépendante par de nombreux utilisateurs, sur une plate-forme Cloud entièrement élastique. Ces travaux proposent une infrastructure globale, et décrivent en détail les différents composants nécessaires à la réalisation de cette infrastructure :• Un mécanisme de clustering des tâches qui prend en compte les spécificités des communications via un DaaS ;• Un moteur décentralisé permettant l’exécution des workflows découpés en clusters de tâches ;• Un système permettant l’analyse des besoins et le déploiement automatique. Ces différents composants ont fait l’objet d’un simulateur qui a permis de tester leur comportement sur des workflows synthétiques ainsi que sur des workflows scientifiques réels issues du LBMC (Laboratoire de Biologie et Modélisation de la Cellule). Ils ont ensuite été implémentés dans l’intergiciel Diet. Les travaux théoriques décrivant la conception des composants, et les résultats de simulations qui les valident, ont été publié dans des workshops et conférences de portée internationale<br>The constant development of scientific and industrial computation infrastructures requires the concurrent development of scheduling and deployment mechanisms to manage such infrastructures. Throughout the last decade, the emergence of the Cloud paradigm raised many hopes, but achieving full platformautonomicity is still an ongoing challenge. Work undertaken during this PhD aimed at building a workflow engine that integrated the logic needed to manage workflow execution and Cloud deployment on its own. More precisely, we focus on Cloud solutions with a dedicated Data as a Service (DaaS) data management component. Our objective was to automate the execution of workflows submitted by many users on elastic Cloud resources.This contribution proposes a modular middleware infrastructure and details the implementation of the underlying modules:• A workflow clustering algorithm that optimises data locality in the context of DaaS-centeredcommunications;• A dynamic scheduler that executes clustered workflows on Cloud resources;• A deployment manager that handles the allocation and deallocation of Cloud resources accordingto the workload characteristics and users’ requirements. All these modules have been implemented in a simulator to analyse their behaviour and measure their effectiveness when running both synthetic and real scientific workflows. We also implemented these modules in the Diet middleware to give it new features and prove the versatility of this approach.Simulation running the WASABI workflow (waves analysis based inference, a framework for the reconstruction of gene regulatory networks) showed that our approach can decrease the deployment cost byup to 44% while meeting the required deadlines
APA, Harvard, Vancouver, ISO, and other styles
18

Ikken, Sonia. "Efficient placement design and storage cost saving for big data workflow in cloud datacenters." Thesis, Evry, Institut national des télécommunications, 2017. http://www.theses.fr/2017TELE0020/document.

Full text
Abstract:
Les workflows sont des systèmes typiques traitant le big data. Ces systèmes sont déployés sur des sites géo-distribués pour exploiter des infrastructures cloud existantes et réaliser des expériences à grande échelle. Les données générées par de telles expériences sont considérables et stockées à plusieurs endroits pour être réutilisées. En effet, les systèmes workflow sont composés de tâches collaboratives, présentant de nouveaux besoins en terme de dépendance et d'échange de données intermédiaires pour leur traitement. Cela entraîne de nouveaux problèmes lors de la sélection de données distribuées et de ressources de stockage, de sorte que l'exécution des tâches ou du job s'effectue à temps et que l'utilisation des ressources soit rentable. Par conséquent, cette thèse aborde le problème de gestion des données hébergées dans des centres de données cloud en considérant les exigences des systèmes workflow qui les génèrent. Pour ce faire, le premier problème abordé dans cette thèse traite le comportement d'accès aux données intermédiaires des tâches qui sont exécutées dans un cluster MapReduce-Hadoop. Cette approche développe et explore le modèle de Markov qui utilise la localisation spatiale des blocs et analyse la séquentialité des fichiers spill à travers un modèle de prédiction. Deuxièmement, cette thèse traite le problème de placement de données intermédiaire dans un stockage cloud fédéré en minimisant le coût de stockage. A travers les mécanismes de fédération, nous proposons un algorithme exacte ILP afin d’assister plusieurs centres de données cloud hébergeant les données de dépendances en considérant chaque paire de fichiers. Enfin, un problème plus générique est abordé impliquant deux variantes du problème de placement lié aux dépendances divisibles et entières. L'objectif principal est de minimiser le coût opérationnel en fonction des besoins de dépendances inter et intra-job<br>The typical cloud big data systems are the workflow-based including MapReduce which has emerged as the paradigm of choice for developing large scale data intensive applications. Data generated by such systems are huge, valuable and stored at multiple geographical locations for reuse. Indeed, workflow systems, composed of jobs using collaborative task-based models, present new dependency and intermediate data exchange needs. This gives rise to new issues when selecting distributed data and storage resources so that the execution of tasks or job is on time, and resource usage-cost-efficient. Furthermore, the performance of the tasks processing is governed by the efficiency of the intermediate data management. In this thesis we tackle the problem of intermediate data management in cloud multi-datacenters by considering the requirements of the workflow applications generating them. For this aim, we design and develop models and algorithms for big data placement problem in the underlying geo-distributed cloud infrastructure so that the data management cost of these applications is minimized. The first addressed problem is the study of the intermediate data access behavior of tasks running in MapReduce-Hadoop cluster. Our approach develops and explores Markov model that uses spatial locality of intermediate data blocks and analyzes spill file sequentiality through a prediction algorithm. Secondly, this thesis deals with storage cost minimization of intermediate data placement in federated cloud storage. Through a federation mechanism, we propose an exact ILP algorithm to assist multiple cloud datacenters hosting the generated intermediate data dependencies of pair of files. The proposed algorithm takes into account scientific user requirements, data dependency and data size. Finally, a more generic problem is addressed in this thesis that involve two variants of the placement problem: splittable and unsplittable intermediate data dependencies. The main goal is to minimize the operational data cost according to inter and intra-job dependencies
APA, Harvard, Vancouver, ISO, and other styles
19

Ikken, Sonia. "Efficient placement design and storage cost saving for big data workflow in cloud datacenters." Electronic Thesis or Diss., Evry, Institut national des télécommunications, 2017. http://www.theses.fr/2017TELE0020.

Full text
Abstract:
Les workflows sont des systèmes typiques traitant le big data. Ces systèmes sont déployés sur des sites géo-distribués pour exploiter des infrastructures cloud existantes et réaliser des expériences à grande échelle. Les données générées par de telles expériences sont considérables et stockées à plusieurs endroits pour être réutilisées. En effet, les systèmes workflow sont composés de tâches collaboratives, présentant de nouveaux besoins en terme de dépendance et d'échange de données intermédiaires pour leur traitement. Cela entraîne de nouveaux problèmes lors de la sélection de données distribuées et de ressources de stockage, de sorte que l'exécution des tâches ou du job s'effectue à temps et que l'utilisation des ressources soit rentable. Par conséquent, cette thèse aborde le problème de gestion des données hébergées dans des centres de données cloud en considérant les exigences des systèmes workflow qui les génèrent. Pour ce faire, le premier problème abordé dans cette thèse traite le comportement d'accès aux données intermédiaires des tâches qui sont exécutées dans un cluster MapReduce-Hadoop. Cette approche développe et explore le modèle de Markov qui utilise la localisation spatiale des blocs et analyse la séquentialité des fichiers spill à travers un modèle de prédiction. Deuxièmement, cette thèse traite le problème de placement de données intermédiaire dans un stockage cloud fédéré en minimisant le coût de stockage. A travers les mécanismes de fédération, nous proposons un algorithme exacte ILP afin d’assister plusieurs centres de données cloud hébergeant les données de dépendances en considérant chaque paire de fichiers. Enfin, un problème plus générique est abordé impliquant deux variantes du problème de placement lié aux dépendances divisibles et entières. L'objectif principal est de minimiser le coût opérationnel en fonction des besoins de dépendances inter et intra-job<br>The typical cloud big data systems are the workflow-based including MapReduce which has emerged as the paradigm of choice for developing large scale data intensive applications. Data generated by such systems are huge, valuable and stored at multiple geographical locations for reuse. Indeed, workflow systems, composed of jobs using collaborative task-based models, present new dependency and intermediate data exchange needs. This gives rise to new issues when selecting distributed data and storage resources so that the execution of tasks or job is on time, and resource usage-cost-efficient. Furthermore, the performance of the tasks processing is governed by the efficiency of the intermediate data management. In this thesis we tackle the problem of intermediate data management in cloud multi-datacenters by considering the requirements of the workflow applications generating them. For this aim, we design and develop models and algorithms for big data placement problem in the underlying geo-distributed cloud infrastructure so that the data management cost of these applications is minimized. The first addressed problem is the study of the intermediate data access behavior of tasks running in MapReduce-Hadoop cluster. Our approach develops and explores Markov model that uses spatial locality of intermediate data blocks and analyzes spill file sequentiality through a prediction algorithm. Secondly, this thesis deals with storage cost minimization of intermediate data placement in federated cloud storage. Through a federation mechanism, we propose an exact ILP algorithm to assist multiple cloud datacenters hosting the generated intermediate data dependencies of pair of files. The proposed algorithm takes into account scientific user requirements, data dependency and data size. Finally, a more generic problem is addressed in this thesis that involve two variants of the placement problem: splittable and unsplittable intermediate data dependencies. The main goal is to minimize the operational data cost according to inter and intra-job dependencies
APA, Harvard, Vancouver, ISO, and other styles
20

Perera, Shelan. "Efficient and Cost-effective Workflow Based on Containers for Distributed Reproducible Experiments." Thesis, KTH, Skolan för informations- och kommunikationsteknik (ICT), 2016. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-194209.

Full text
Abstract:
Reproducing distributed experiments is a challenging task for many researchers. There are many factors which make this problem harder to solve. In order to reproduce distributed experiments, researchers need to perform complex deployments which involve many dependent software stacks with many configurations and manual orchestrations. Further, researchers need to allocate a larger amount of money for clusters of machines and then spend their valuable time to perform those experiments. Also, some of the researchers spend a lot of time to validate a distributed scenario in a real environment as most of the pseudo distributed systems do not provide the characteristics of a real distributed system. Karamel provides solutions for the inconvenience caused by the manual orchestration by providing a comprehensive orchestration platform to deploy and run distributed experiments. But still, this solution may incur a similar amount of expenses as of a manual distributed setup since it uses virtual machines underneath. Further, it does not provide quick validations of a distributed setup with a quick feedback loop, as it takes considerable time to terminate and provision new virtual machines. Therefore, we provide a solution by integrating Docker that can co-exists with virtual machine based deployment model seamlessly. Our solution encapsulates the container-based deployment model for users to reproduce distributed experiment in a cost-effective and efficient manner. In this project, we introduce novel deployment model with containers that is not possible with the conventional virtual machine based deployment model. Further, we evaluate our solution with a real deployment of Apache Hadoop Terasort experiment which is a benchmark for Apache Hadoop map-reduce platform in order to explain how this model can be used to save the cost and improve the efficiency.
APA, Harvard, Vancouver, ISO, and other styles
21

Abidi, Leila. "Revisiter les grilles de PCs avec des technologies du Web et le Cloud computing." Thesis, Sorbonne Paris Cité, 2015. http://www.theses.fr/2015USPCD006/document.

Full text
Abstract:
Le contexte de cette thèse est à l’intersection des contextes des grilles de calculs, des nouvelles technologies du Web ainsi que des Clouds et des services à la demande. Depuis leur avènement au cours des années 90, les plates-formes distribuées, plus précisément les systèmes de grilles de calcul (Grid Computing), n’ont pas cessé d’évoluer permettant ainsi de susciter multiple efforts de recherche. Les grilles de PCs ont été proposées comme une alternative aux super-calculateurs par la fédération des milliers d’ordinateurs de bureau. Les détails de la mise en oeuvre d’une telle architecture de grille, en termes de mécanismes de mutualisation des ressources, restent très difficile à cerner. Parallèlement, le Web a complètement modifié notre façon d’accéder à l’information. Le Web est maintenant une composante essentielle de notre quotidien. Les équipements ont, à leur tour, évolué d’ordinateurs de bureau ou ordinateurs portables aux tablettes, lecteurs multimédias, consoles de jeux, smartphones, ou NetPCs. Cette évolution exige d’adapter et de repenser les applications/intergiciels de grille de PCs qui ont été développés ces dernières années. Notre contribution se résume dans la réalisation d’un intergiciel de grille de PCs que nous avons appelé RedisDG. Dans son fonctionnement, RedisDG reste similaire à la plupart des intergiciels de grilles de calcul, c’est-à-dire qu’il est capable d’exécuter des applications sous forme de «sacs de tâches» dans un environnement distribué, assurer le monitoring des noeuds, valider et certifier les résultats. L’innovation de RedisDG, réside dans l’intégration de la modélisation et la vérification formelles dans sa phase de conception, ce qui est non conventionnel mais très pertinent dans notre domaine. Notre approche consiste à repenser les grilles de PCs à partir d’une réflexion et d’un cadre formel permettant de les développer, de manière rigoureuse et de mieux maîtriser les évolutions technologiques à venir<br>The context of this work is at the intersection of grid computing, the new Web technologies and the Clouds and services on demand contexts. Desktop Grid have been proposed as an alternative to supercomputers by the federation of thousands of desktops. The details of the implementation of such an architecture, in terms of resource sharing mechanisms, remain very hard. Meanwhile, the Web has completely changed the way we access information. The equipment, in turn, have evolved from desktops or laptops to tablets, smartphones or NetPCs. Our approach is to rethink Desktop Grids from a reflexion and a formal framework to develop them rigorously and better control future technological developments. We have reconsidered the interactions between the traditional components of a Desktop Grid based on the Web technology, and given birth to RedisDG, a new Desktop Grid middelware capable to operate on small devices, ie smartphones, tablets like the more traditional devicves (PCs). Our system is entirely based on the publish-subscribe paradigm. RedisDG is developped with Python and uses Redis as advanced key-value cache and store
APA, Harvard, Vancouver, ISO, and other styles
22

Miccoli, Roberta. "Implementation of a complete sensor data collection and edge-cloud communication workflow within the WeLight project." Master's thesis, Alma Mater Studiorum - Università di Bologna, 2021. http://amslaurea.unibo.it/22563/.

Full text
Abstract:
This thesis aims at developing the full workflow of data collection from a laser sensor connected to a mobile application, working as edge device, which subsequently transmits the data to a Cloud platform for analysing and processing. The project is part of the We Light (WErable LIGHTing for smart apparels) project, in collaboration with TTLab of the INFN (National Institute of Nuclear Physics). The goal of We Light is to create an intelligent sports shirt, equipped with sensors that take information from the external environment and send it to a mobile device. The latter then sends the data via an application to an open source Cloud platform in order to create a real IoT system. The smart T-shirt is capable of emitting different levels of light depending on the perceived external light, with the aim of ensuring greater safety for road sports people. The thesis objective is to employ a prototype board provided by the CNR-IMAMOTER to collect data and send it to the specially created application via Bluetooth Low Energy connection. Furthermore, the connection between the edge device and the Thingsboard IoT platform is performed via MQTT protocol. Several device authentication techniques are implemented on TB and a special dashboard is created to display data from the IoT device; the user is also able to view data in numerical and even graphical form directly in the application without necessarily having to access TB. The app created is useful and versatile and can be adapted to be used for other IoT purposes, not only within the We Light project.
APA, Harvard, Vancouver, ISO, and other styles
23

Carrión, Collado Abel Antonio. "Management of generic and multi-platform workflows for exploiting heterogeneous environments on e-Science." Doctoral thesis, Universitat Politècnica de València, 2017. http://hdl.handle.net/10251/86179.

Full text
Abstract:
Scientific Workflows (SWFs) are widely used to model applications in e-Science. In this programming model, scientific applications are described as a set of tasks that have dependencies among them. During the last decades, the execution of scientific workflows has been successfully performed in the available computing infrastructures (supercomputers, clusters and grids) using software programs called Workflow Management Systems (WMSs), which orchestrate the workload on top of these computing infrastructures. However, because each computing infrastructure has its own architecture and each scientific applications exploits efficiently one of these infrastructures, it is necessary to organize the way in which they are executed. WMSs need to get the most out of all the available computing and storage resources. Traditionally, scientific workflow applications have been extensively deployed in high-performance computing infrastructures (such as supercomputers and clusters) and grids. But, in the last years, the advent of cloud computing infrastructures has opened the door of using on-demand infrastructures to complement or even replace local infrastructures. However, new issues have arisen, such as the integration of hybrid resources or the compromise between infrastructure reutilization and elasticity, everything on the basis of cost-efficiency. The main contribution of this thesis is an ad-hoc solution for managing workflows exploiting the capabilities of cloud computing orchestrators to deploy resources on demand according to the workload and to combine heterogeneous cloud providers (such as on-premise clouds and public clouds) and traditional infrastructures (supercomputers and clusters) to minimize costs and response time. The thesis does not propose yet another WMS, but demonstrates the benefits of the integration of cloud orchestration when running complex workflows. The thesis shows several configuration experiments and multiple heterogeneous backends from a realistic comparative genomics workflow called Orthosearch, to migrate memory-intensive workload to public infrastructures while keeping other blocks of the experiment running locally. The running time and cost of the experiments is computed and best practices are suggested.<br>Los flujos de trabajo científicos son comúnmente usados para modelar aplicaciones en e-Ciencia. En este modelo de programación, las aplicaciones científicas se describen como un conjunto de tareas que tienen dependencias entre ellas. Durante las últimas décadas, la ejecución de flujos de trabajo científicos se ha llevado a cabo con éxito en las infraestructuras de computación disponibles (supercomputadores, clústers y grids) haciendo uso de programas software llamados Gestores de Flujos de Trabajos, los cuales distribuyen la carga de trabajo en estas infraestructuras de computación. Sin embargo, debido a que cada infraestructura de computación posee su propia arquitectura y cada aplicación científica explota eficientemente una de estas infraestructuras, es necesario organizar la manera en que se ejecutan. Los Gestores de Flujos de Trabajo necesitan aprovechar el máximo todos los recursos de computación y almacenamiento disponibles. Habitualmente, las aplicaciones científicas de flujos de trabajos han sido ejecutadas en recursos de computación de altas prestaciones (tales como supercomputadores y clústers) y grids. Sin embargo, en los últimos años, la aparición de las infraestructuras de computación en la nube ha posibilitado el uso de infraestructuras bajo demanda para complementar o incluso reemplazar infraestructuras locales. No obstante, este hecho plantea nuevas cuestiones, tales como la integración de recursos híbridos o el compromiso entre la reutilización de la infraestructura y la elasticidad, todo ello teniendo en cuenta que sea eficiente en el coste. La principal contribución de esta tesis es una solución ad-hoc para gestionar flujos de trabajos explotando las capacidades de los orquestadores de recursos de computación en la nube para desplegar recursos bajo demando según la carga de trabajo y combinar proveedores de computación en la nube heterogéneos (privados y públicos) e infraestructuras tradicionales (supercomputadores y clústers) para minimizar el coste y el tiempo de respuesta. La tesis no propone otro gestor de flujos de trabajo más, sino que demuestra los beneficios de la integración de la orquestación de la computación en la nube cuando se ejecutan flujos de trabajo complejos. La tesis muestra experimentos con diferentes configuraciones y múltiples plataformas heterogéneas, haciendo uso de un flujo de trabajo real de genómica comparativa llamado Orthosearch, para traspasar cargas de trabajo intensivas de memoria a infraestructuras públicas mientras se mantienen otros bloques del experimento ejecutándose localmente. El tiempo de respuesta y el coste de los experimentos son calculados, además de sugerir buenas prácticas.<br>Els fluxos de treball científics són comunament usats per a modelar aplicacions en e-Ciència. En aquest model de programació, les aplicacions científiques es descriuen com un conjunt de tasques que tenen dependències entre elles. Durant les últimes dècades, l'execució de fluxos de treball científics s'ha dut a terme amb èxit en les infraestructures de computació disponibles (supercomputadors, clústers i grids) fent ús de programari anomenat Gestors de Fluxos de Treballs, els quals distribueixen la càrrega de treball en aquestes infraestructures de computació. No obstant açò, a causa que cada infraestructura de computació posseeix la seua pròpia arquitectura i cada aplicació científica explota eficientment una d'aquestes infraestructures, és necessari organitzar la manera en què s'executen. Els Gestors de Fluxos de Treball necessiten aprofitar el màxim tots els recursos de computació i emmagatzematge disponibles. Habitualment, les aplicacions científiques de fluxos de treballs han sigut executades en recursos de computació d'altes prestacions (tals com supercomputadors i clústers) i grids. No obstant açò, en els últims anys, l'aparició de les infraestructures de computació en el núvol ha possibilitat l'ús d'infraestructures sota demanda per a complementar o fins i tot reemplaçar infraestructures locals. No obstant açò, aquest fet planteja noves qüestions, tals com la integració de recursos híbrids o el compromís entre la reutilització de la infraestructura i l'elasticitat, tot açò tenint en compte que siga eficient en el cost. La principal contribució d'aquesta tesi és una solució ad-hoc per a gestionar fluxos de treballs explotant les capacitats dels orquestadors de recursos de computació en el núvol per a desplegar recursos baix demande segons la càrrega de treball i combinar proveïdors de computació en el núvol heterogenis (privats i públics) i infraestructures tradicionals (supercomputadors i clústers) per a minimitzar el cost i el temps de resposta. La tesi no proposa un gestor de fluxos de treball més, sinó que demostra els beneficis de la integració de l'orquestració de la computació en el núvol quan s'executen fluxos de treball complexos. La tesi mostra experiments amb diferents configuracions i múltiples plataformes heterogènies, fent ús d'un flux de treball real de genòmica comparativa anomenat Orthosearch, per a traspassar càrregues de treball intensives de memòria a infraestructures públiques mentre es mantenen altres blocs de l'experiment executant-se localment. El temps de resposta i el cost dels experiments són calculats, a més de suggerir bones pràctiques.<br>Carrión Collado, AA. (2017). Management of generic and multi-platform workflows for exploiting heterogeneous environments on e-Science [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/86179<br>TESIS
APA, Harvard, Vancouver, ISO, and other styles
24

Bux, Marc Nicolas. "Scientific Workflows for Hadoop." Doctoral thesis, Humboldt-Universität zu Berlin, 2018. http://dx.doi.org/10.18452/19321.

Full text
Abstract:
Scientific Workflows bieten flexible Möglichkeiten für die Modellierung und den Austausch komplexer Arbeitsabläufe zur Analyse wissenschaftlicher Daten. In den letzten Jahrzehnten sind verschiedene Systeme entstanden, die den Entwurf, die Ausführung und die Verwaltung solcher Scientific Workflows unterstützen und erleichtern. In mehreren wissenschaftlichen Disziplinen wachsen die Mengen zu verarbeitender Daten inzwischen jedoch schneller als die Rechenleistung und der Speicherplatz verfügbarer Rechner. Parallelisierung und verteilte Ausführung werden häufig angewendet, um mit wachsenden Datenmengen Schritt zu halten. Allerdings sind die durch verteilte Infrastrukturen bereitgestellten Ressourcen häufig heterogen, instabil und unzuverlässig. Um die Skalierbarkeit solcher Infrastrukturen nutzen zu können, müssen daher mehrere Anforderungen erfüllt sein: Scientific Workflows müssen parallelisiert werden. Simulations-Frameworks zur Evaluation von Planungsalgorithmen müssen die Instabilität verteilter Infrastrukturen berücksichtigen. Adaptive Planungsalgorithmen müssen eingesetzt werden, um die Nutzung instabiler Ressourcen zu optimieren. Hadoop oder ähnliche Systeme zur skalierbaren Verwaltung verteilter Ressourcen müssen verwendet werden. Diese Dissertation präsentiert neue Lösungen für diese Anforderungen. Zunächst stellen wir DynamicCloudSim vor, ein Simulations-Framework für Cloud-Infrastrukturen, welches verschiedene Aspekte der Variabilität adäquat modelliert. Im Anschluss beschreiben wir ERA, einen adaptiven Planungsalgorithmus, der die Ausführungszeit eines Scientific Workflows optimiert, indem er Heterogenität ausnutzt, kritische Teile des Workflows repliziert und sich an Veränderungen in der Infrastruktur anpasst. Schließlich präsentieren wir Hi-WAY, eine Ausführungsumgebung die ERA integriert und die hochgradig skalierbare Ausführungen in verschiedenen Sprachen beschriebener Scientific Workflows auf Hadoop ermöglicht.<br>Scientific workflows provide a means to model, execute, and exchange the increasingly complex analysis pipelines necessary for today's data-driven science. Over the last decades, scientific workflow management systems have emerged to facilitate the design, execution, and monitoring of such workflows. At the same time, the amounts of data generated in various areas of science outpaced hardware advancements. Parallelization and distributed execution are generally proposed to deal with increasing amounts of data. However, the resources provided by distributed infrastructures are subject to heterogeneity, dynamic performance changes at runtime, and occasional failures. To leverage the scalability provided by these infrastructures despite the observed aspects of performance variability, workflow management systems have to progress: Parallelization potentials in scientific workflows have to be detected and exploited. Simulation frameworks, which are commonly employed for the evaluation of scheduling mechanisms, have to consider the instability encountered on the infrastructures they emulate. Adaptive scheduling mechanisms have to be employed to optimize resource utilization in the face of instability. State-of-the-art systems for scalable distributed resource management and storage, such as Apache Hadoop, have to be supported. This dissertation presents novel solutions for these aspirations. First, we introduce DynamicCloudSim, a cloud computing simulation framework that is able to adequately model the various aspects of variability encountered in computational clouds. Secondly, we outline ERA, an adaptive scheduling policy that optimizes workflow makespan by exploiting heterogeneity, replicating bottlenecks in workflow execution, and adapting to changes in the underlying infrastructure. Finally, we present Hi-WAY, an execution engine that integrates ERA and enables the highly scalable execution of scientific workflows written in a number of languages on Hadoop.
APA, Harvard, Vancouver, ISO, and other styles
25

Castelán, Maldonado Edgar. "Modelo para el diseño de sistemas gestores de workflows con funcionalidades colaborativas, cloud y móviles." Doctoral thesis, Universitat Politècnica de Catalunya, 2014. http://hdl.handle.net/10803/283574.

Full text
Abstract:
This research was developed in the context of WFMS (Workflow Management Systems), mobile applications, cloud computing, and collaborative systems. Currently the design of WFMS is based on the reference model proposed by the WfMC (Workflow Management Coalition). The problem that exists today in the design and development of WfMS is that the reference model proposed by the WfMC was designed many years before the rise of mobile technologies, cloud computing and collaborative systems. It is important to create a new model for the design of WfMS taking in to account the new technological features and functionalities offered by mobile devices, collaborative and cloud computing services along with new paradigms of collaboration that can occur when using these technological solutions. This research has the general objective of obtain a model for the design of WfMS with Collaborative, Cloud and Mobile functionalities. For the development of this research we use the design - science research paradigm used in the research field of information systems. Is fully oriented to the solution of problems and has as a main goal the development and evaluation of artifacts that serve for a practical purpose, these artifacts must be completely relevant and novel in its application environment. The steps that were undertaken in order to carry out this research with the design - science methodology were: 1) A problem in the field of WfMS was identified. 2) The features the artifact should have in order to solve the problem were proposed, and also how the artifact would be represented in this case as a model. 3) The design processes that should be used to build the device were identified and proposed. 4) The artifact and the design process were theoretically justified with methodologies and models widely accepted in the field of information systems. 5) During the design cycle a model for evaluating software architectures was applied. 6) In order to introduce the artifact in the field of application we have carried out an implementation of the model resulting in a WfMS mobile application with collaborative, cloud and mobile functionalities. We have conducted a Delphi study in order to assess the functionalities of the new artifact and demonstrate its utility in its field of application. 7) The result of this research adds to the knowledge base a new model for the design and development of WfMS with collaborative, cloud and mobile functionalities. An article with the results of this research was published and presented to the scientific community. 8) The main objective and all the objectives in this research have been satisfactorily completed, this research has proved that the designed artifact solves the problem proposed in this research and that the artifact provides utility in the field of WfMS. This research propose a new methodology for conducting a Delphi study using the WfMS mobile application that was developed in this research. Each one of the Delphi questionnaires are made through a Wokflow and the use of cloud collaborative tools in order to store both the questionnaires as well as the results of the evaluation of the questionnaires. This research makes the following contributions: a model for the design of WfMS with collaborative, cloud and mobile functionalities. A Concrete Architecture as a result of implementing the new model. A Software Architecture for the development of WfMS as a result of implementing the Concrete Architecture. A WfMS mobile application for the iOS platform as a result of the implementation of the software architecture. A methodology for conducting a Delphi study using a WfMS mobile application with cloud collaborative tools.<br>Este trabajo de investigación se desarrolla en el contexto de WfMS (Workflow Management Systems), aplicaciones móviles, cloud computing y sistemas colaborativos. Actualmente el diseño de WfMS está basado en el modelo de referencia propuesto por la WfMC (Workflow Management Coalition). Actualmente el diseño y desarrollo de WfMS sigue el modelo de referencia propuesto por la WfMC, que fue diseñado con anterioridad a que surgierán las tecnologías móviles, cloud computing y los sistemas colaborativos. Es importante crear un nuevo modelo para el diseño de WfMS que tenga en cuenta las nuevas características y funcionalidades tecnológicas que ofrecen los dispositivos móviles, los servicios colaborativos y de cloud computing, junto con los nuevos paradigmas de colaboración que se pueden dar al utilizar estas soluciones tecnológicas. Esta investigación tiene como objetivo principal el proponer un modelo para el diseño de WfMS con funcionalidades Colaborativas, Cloud y Móviles. Para el desarrollo de esta investigación utilizamos el paradigma de investigación Diseño-Ciencia utilizado en el campo de investigación de los sistemas de información. Los pasos que se llevaron a cabo para realizar la investigación con la metodología diseño-ciencia, fueron los siguientes: 1) Se identifica un problema en el ámbito de los WfMS. 2) Se propusieron las características que debía tener el artefacto que se necesitaba para solucionar el problema y se seleccionó la forma en que iba a ser representado el artefacto, en este caso un modelo. 3) Se identificaron y propusieron los procesos de diseño que deberían ser utilizados para construir el artefacto. 4) Se justificó teóricamente, con metodologías y modelos ampliamente aceptados en el campo de sistemas de información, el artefacto y el proceso de diseño. 5) Durante el ciclo de diseño hemos utilizado un modelo de evaluación de arquitecturas de software. 6) Con la finalidad de introducir el artefacto dentro del campo de aplicación hemos realizado una implementación del modelo dando como resultado final una aplicación móvil de WfMS con funcionalidades colaborativas, cloud y móviles. Hemos llevado a cabo un estudio Delphi para evaluar las funcionalidades del nuevo artefacto y demostrar su utilidad en su campo de aplicación. 7) El resultado de esta investigación añade a la base de conocimiento un nuevo modelo que sirve para el diseño y desarrollo de WfMS con funcionalidades colaborativas, cloud y móviles. Se ha escrito un artículo para la comunidad científica donde se publican los resultados de esta investigación. 8) El objetivo principal y todos los objetivos propuestos en esta investigación se han cumplido satisfactoriamente, hemos logrado demostrar que el artefacto diseñado resuelve el problema planteado en esta investigación y que este artefacto provee utilidad en el campo de los WfMS. En esta investigación se presenta una nueva metodología para realizar un estudio Delphi mediante el uso de la aplicación móvil de WfMS que fue desarrollada en esta investigación. Cada uno de los cuestionarios del estudio Delphi se hicieron a través de un Wokflow y se hizo uso de herramientas colaborativas en la nube para almacenar tanto los cuestionarios así como también los resultados de la evaluación de los cuestionarios. Esta investigación hace las siguientes aportaciones: un modelo para el diseño de WfMS con funcionalidades colaborativas, cloud y móviles. Una Arquitectura Concreta resultado de la implementación del nuevo modelo. Una Arquitectura de Software para el desarrollo de WfMS resultado de la implementación de la Arquitectura Concreta. Una aplicación móvil de WfMS para la plataforma iOS y dispositivos iPad resultado de la implementación de la arquitectura de software. Una metodología para realizar un estudio Delphi utilizando una aplicación móvil de WfMS con herramientas colaborativas en la nube.
APA, Harvard, Vancouver, ISO, and other styles
26

Silva, Jefferson de Carvalho. "A framework for building component-based applications on a cloud computing platform for high performance computing services." Universidade Federal do CearÃ, 2016. http://www.teses.ufc.br/tde_busca/arquivo.php?codArquivo=16543.

Full text
Abstract:
CoordenaÃÃo de AperfeiÃoamento de Pessoal de NÃvel Superior<br>Developing High Performance Computing applications (HPC), which optimally access the available computing resources in a higher level of abstraction, is a challenge for many scientists. To address this problem, we present a proposal of a component computing cloud called HPC Shelf, where HPC applications perform and SAFe framework, a front-end aimed to create applications in HPC Shelf and the author's main contribution. SAFe is based on Scientific Workflows Management Systems (SWMS) projects and it allows the specification of computational solutions formed by components to solve problems specified by the expert user through a high level interface. For that purpose, it implements SAFeSWL, an architectural and orchestration description language for describing scientific worflows. Compared with other SWMS alternatives, besides rid expert users from concerns about the construction of parallel and efficient computational solutions from the components offered by the cloud, SAFe integrates itself to a system of contextual contracts which is aligned to a system of dynamic discovery (resolution) of components. In addition, SAFeSWL allows explicit control of life cycle stages (resolution, deployment, instantiation and execution) of components through embedded operators, aimed at optimizing the use of cloud resources and minimize the overall execution cost of computational solutions (workflows). Montage and Map/Reduce are the case studies that have been applied for demonstration, evaluation and validation of the particular features of SAFe in building HPC applications aimed to the HPC Shelf platform.<br>Desenvolver aplicaÃÃes de ComputaÃÃo de Alto Desempenho (CAD), que acessem os recursos computacionais disponÃveis de forma otimizada e em um nÃvel maior de abstraÃÃo, à um desafio para cientistas de diversos domÃnios. Esta Tese apresenta a proposta de uma nuvem de componentes chamada HPC Shelf, pano de fundo onde as aplicaÃÃes CAD executam, e o arcabouÃo SAFe, Front-End para criaÃÃo de aplicaÃÃes na HPC Shelf e contribuiÃÃo principal do autor. O SAFe toma como base o projeto de sistemas gerenciadores de workflows cientÃficos (SGWC), permitindo a implementaÃÃo de soluÃÃes computacionais baseadas em componentes para resolver os problemas especificados por meio de uma interface de nÃvel de abstraÃÃo mais alto. Para isso, foi desenvolvido o SAFeSWL, uma linguagem de descriÃÃo arquitetural e orquestraÃÃo de worflows cientÃficos. Comparado com outros SGWC, alÃm de livrar usuÃrios finais de preocupaÃÃes em relaÃÃo à construÃÃo de soluÃÃes computacionais paralelas e eficientes a partir dos componentes oferecidos pela nuvem, o SAFe faz uso de um sistema de contratos contextuais integrado a um sistema de descoberta (resoluÃÃo) dinÃmica de componentes. A linguagem SAFeSWL permite o controle explÃcito das etapas do ciclo de vida de um componente em execuÃÃo (resoluÃÃo, implantaÃÃo, instanciaÃÃo e execuÃÃo), atravÃs de operadores embutidos, a fim de otimizar o uso dos recursos da nuvem e minimizar os custos de sua utilizaÃÃo. Montage e Map/Reduce constituem os estudos de caso aplicados para demonstraÃÃo e avaliaÃÃo das propriedades originais do SAFe e do SAFeSWL na construÃÃo de aplicaÃÃes de CAD.
APA, Harvard, Vancouver, ISO, and other styles
27

Senna, Carlos Roberto 1956. "CEO - uma infraestrutura para orquestração de workflows de serviços em ambientes computacionais híbridos." [s.n.], 2014. http://repositorio.unicamp.br/jspui/handle/REPOSIP/275535.

Full text
Abstract:
Orientador: Edmundo Roberto Mauro Madeira<br>Tese (doutorado) - Universidade Estadual de Campinas, Instituto de Computação<br>Made available in DSpace on 2018-08-26T19:17:39Z (GMT). No. of bitstreams: 1 Senna_CarlosRoberto_D.pdf: 8854585 bytes, checksum: 5b62fa70896f9c7e8772752a133e9514 (MD5) Previous issue date: 2014<br>Resumo: Ao longo dos últimos anos bons resultados foram alcançados em iniciativas de computação paralela e distribuída que culminaram nas nuvens computacionais, transformando a Internet em uma usina virtual de processamento. Aplicações complexas organizadas através de workflows em geral são candidatas à paralelização e podem ter seu desempenho fortemente beneficiado quando executadas em grades, nuvens ou ambientes híbridos. No entanto, ainda cabe ao usuário uma parte significativa da preparação desse ambiente (software e hardware) antes de efetivamente usar todo esse poder computacional na execução de workflows. Esta tese apresenta uma infraestrutura para gerenciamento da execução de workflows de serviços fortemente acoplados em ambientes híbridos compostos por grades e nuvens computacionais. A infraestrutura denominada Cloud Execution Orchestration (CEO) é composta por um middleware que faz a gerência da execução de workflows e por uma linguagem (CEO Language - CEOL) especialmente desenhada para a construção de workflows de serviços para esses ambientes. A infraestrutura proposta suporta a execução de workflows interagindo com os domínios do ambiente (grades e nuvens) de forma transparente sem qualquer intervenção do usuário. Com a CEOL o usuário pode construir workflows abstratos, sem a localização dos recursos computacionais, pois os recursos serão escolhidos em conjunto com o escalonador e serão devidamente preparados pelo CEO para a execução. Além das funcionalidades para aprovisionamento de serviços sob demanda durante a execução de workflows, a macro arquitetura facilita a conexão da nuvem privada com nuvens públicas e oferece suporte ao processamento paralelo na medida em que opera em ambientes totalmente híbridos formados pela combinação de grades e nuvens computacionais<br>Abstract: Over the last few years good results have been achieved in initiatives of parallel and distributed computing that culminated in computational clouds, transforming the Internet into a virtual processing plant. Complex applications organized by workflows in general are candidates for parallelization, and can have their performance strongly benefited when executed on grids, clouds, or hybrid environments. However, it is still up to the user a significant part of preparing this environment (software and hardware) before using all that computing power for execution of workflows. This thesis presents an infrastructure for managing the execution of workflows with tightly coupled services in hybrid environments composed of grids and computational clouds. The Infrastructure, called Cloud Execution Orchestration (CEO), consists of a middleware that makes the management and execution of workflows through the CEO Language (CEOL) specially designed for building workflows of services for these environments. The proposed infrastructure supports the execution of workflows interacting with the domains of the environment (grids and clouds) transparently without any user intervention. With CEOL the user can build abstract workflows, without the location of computational resources, since resources will be chosen in conjunction with the scheduler and the CEO going to prepare them before the execution. Besides of functionalities for provisioning services on demand during the execution of workflows, the macro architecture facilitates the connection of private cloud with public cloud and supports parallel processing because may operate in hybrid environments formed by the combination of computational grids and clouds<br>Doutorado<br>Ciência da Computação<br>Doutor em Ciência da Computação
APA, Harvard, Vancouver, ISO, and other styles
28

Muresan, Adrian. "Ordonnancement et déploiement d'applications de gestion de données à grande échelle sur des plates-formes de type Clouds." Phd thesis, Ecole normale supérieure de lyon - ENS LYON, 2012. http://tel.archives-ouvertes.fr/tel-00793092.

Full text
Abstract:
L'usage des plateformes de Cloud Computing offrant une Infrastructure en tant que service (IaaS) a augmenté au sein de l'industrie. Les infrastructures IaaS fournissent des ressources virtuelles depuis un catalogue de types prédéfinis. Les avancées dans le domaine de la virtualisation rendent possible la création et la destruction de machines virtuelles au fur et à mesure, avec un faible surcout d'exploitation. En conséquence, le bénéfice offert par les plate-formes IaaS est la possibilité de dimensionner une architecture virtuelle au fur et à mesure de l'utilisation, et de payer uniquement les ressources utilisées. D'un point de vue scientifique, les plateformes IaaS soulèvent de nouvelles questions concernant l'efficacité des décisions prises en terme de passage à l'échelle, et également l'ordonnancement des applications sur les plateformes dynamiques. Les travaux de cette thèse explorent ce thème et proposent des solutions à ces deux problématiques. La première contribution décrite dans cette thèse concerne la gestion des ressources. Nous avons travaillé sur le redimensionnement automatique des applications clientes de Cloud afin de modéliser les variations d'utilisation de la plateforme. De nombreuses études ont montré des autosimilarités dans le trafic web des plateformes, ce qui implique l'existence de motifs répétitifs pouvant être périodiques ou non. Nous avons développé une stratégie automatique de dimensionnement, capable de prédire le temps d'utilisation de la plateforme en identifiant les motifs répétitifs non périodiques. Dans un second temps, nous avons proposé d'étendre les fonctionnalités d'un intergiciel de grilles, en implémentant une utilisation des ressources à la demandes.Nous avons développé une extension pour l'intergiciel DIET (Distributed Interactive Engineering Toolkit), qui utilise un marché virtuel pour gérer l'allocation des ressources. Chaque utilisateur se voit attribué un montant de monnaie virtuelle qu'il utilisera pour exécuter ses tâches. Le mécanisme d'aide assure un partage équitable des ressources de la plateforme entre les différents utilisateurs. La troisième et dernière contribution vise la gestion d'applications pour les plateformes IaaS. Nous avons étudié et développé une stratégie d'allocation des ressources pour les applications de type workflow avec des contraintes budgétaires. L'abstraction des applications de type workflow est très fréquente au sein des applications scientifiques, dans des domaines variés allant de la géologie à la bioinformatique. Dans ces travaux, nous avons considéré un modèle général d'applications de type workflow qui contient des tâches parallèles et permet des transitions non déterministes. Nous avons élaboré deux stratégies d'allocations à contraintes budgétaires pour ce type d'applications. Le problème est une optimisation à deux critères dans la mesure où nous optimisons le budget et le temps total du flux d'opérations. Ces travaux ont été validés de façon expérimentale par leurs implémentations au sein de la plateforme de Cloud libre Nimbus et de moteur de workflow MADAG présent au sein de DIET. Les tests ont été effectuées sur une simulation de cosmologie appelée RAMSES. RAMSES est une application parallèle qui, dans le cadre de ces travaux, a été portée sur des plateformes virtuelles dynamiques. L'ensemble des résultats théoriques et pratiques ont débouché sur des résultats encourageants et des améliorations.
APA, Harvard, Vancouver, ISO, and other styles
29

Schroeter, Julia. "Feature-based configuration management of reconfigurable cloud applications." Doctoral thesis, Saechsische Landesbibliothek- Staats- und Universitaetsbibliothek Dresden, 2014. http://nbn-resolving.de/urn:nbn:de:bsz:14-qucosa-141415.

Full text
Abstract:
A recent trend in software industry is to provide enterprise applications in the cloud that are accessible everywhere and on any device. As the market is highly competitive, customer orientation plays an important role. Companies therefore start providing applications as a service, which are directly configurable by customers in an online self-service portal. However, customer configurations are usually deployed in separated application instances. Thus, each instance is provisioned manually and must be maintained separately. Due to the induced redundancy in software and hardware components, resources are not optimally utilized. A multi-tenant aware application architecture eliminates redundancy, as a single application instance serves multiple customers renting the application. The combination of a configuration self-service portal with a multi-tenant aware application architecture allows serving customers just-in-time by automating the deployment process. Furthermore, self-service portals improve application scalability in terms of functionality, as customers can adapt application configurations on themselves according to their changing demands. However, the configurability of current multi-tenant aware applications is rather limited. Solutions implementing variability are mainly developed for a single business case and cannot be directly transferred to other application scenarios. The goal of this thesis is to provide a generic framework for handling application variability, automating configuration and reconfiguration processes essential for self-service portals, while exploiting the advantages of multi-tenancy. A promising solution to achieve this goal is the application of software product line methods. In software product line research, feature models are in wide use to express variability of software intense systems on an abstract level, as features are a common notion in software engineering and prominent in matching customer requirements against product functionality. This thesis introduces a framework for feature-based configuration management of reconfigurable cloud applications. The contribution is three-fold. First, a development strategy for flexible multi-tenant aware applications is proposed, capable of integrating customer configurations at application runtime. Second, a generic method for defining concern-specific configuration perspectives is contributed. Perspectives can be tailored for certain application scopes and facilitate the handling of numerous configuration options. Third, a novel method is proposed to model and automate structured configuration processes that adapt to varying stakeholders and reduce configuration redundancies. Therefore, configuration processes are modeled as workflows and adapted by applying rewrite rules triggered by stakeholder events. The applicability of the proposed concepts is evaluated in different case studies in the industrial and academic context. Summarizing, the introduced framework for feature-based configuration management is a foundation for automating configuration and reconfiguration processes of multi-tenant aware cloud applications, while enabling application scalability in terms of functionality.
APA, Harvard, Vancouver, ISO, and other styles
30

Genez, Thiago Augusto Lopes 1987. "Escalonamento de workflows para provedores de SaaS/PaaS considerando dois níveis de SLA." [s.n.], 2012. http://repositorio.unicamp.br/jspui/handle/REPOSIP/275686.

Full text
Abstract:
Orientador: Edmundo Roberto Mauro Madeira<br>Dissertação (mestrado) - Universidade Estadual de Campinas, Instituto de Computação<br>Made available in DSpace on 2018-08-21T15:49:43Z (GMT). No. of bitstreams: 1 Genez_ThiagoAugustoLopes_M.pdf: 2286902 bytes, checksum: 384fd51ae0278ea91a914ba4b047c6cc (MD5) Previous issue date: 2012<br>Resumo: Computação em nuvem oferece utilidades computacionais de acordo com a necessidade do usuário através do modelo "pago-pelo-uso". Usuários podem fazer o uso da nuvem através dos provedores de software (Software as a Service - SaaS), de plataforma (Plataform as a Service - PaaS) ou de infraestrutura (Infrastructure as a Service - IaaS). Computação em nuvem esta atualmente sendo muito utilizada. ...Observação: O resumo, na íntegra, poderá ser visualizado no texto completo da tese digital<br>Abstract: Cloud computing offers utility computing according to the user's needs in a "pay-per-use" basis. Customers can make use of the cloud via Software as a Service (SaaS), Plataform as a Service (PaaS), or Infrastructure as a Service (IaaS) providers. ...Note: The complete abstract is available with the full electronic document<br>Mestrado<br>Ciência da Computação<br>Mestre em Ciência da Computação
APA, Harvard, Vancouver, ISO, and other styles
31

De, Paris Renata. "An effective method to optimize docking-based virtual screening in a clustered fully-flexible receptor model deployed on cloud platforms." Pontif?cia Universidade Cat?lica do Rio Grande do Sul, 2016. http://tede2.pucrs.br/tede2/handle/tede/7329.

Full text
Abstract:
Submitted by Caroline Xavier (caroline.xavier@pucrs.br) on 2017-06-05T14:58:52Z No. of bitstreams: 1 TES_RENATA_DE_PARIS_COMPLETO.pdf: 8873897 bytes, checksum: 43b2a883518fc9ce39978e816042ab5f (MD5)<br>Made available in DSpace on 2017-06-05T14:58:53Z (GMT). No. of bitstreams: 1 TES_RENATA_DE_PARIS_COMPLETO.pdf: 8873897 bytes, checksum: 43b2a883518fc9ce39978e816042ab5f (MD5) Previous issue date: 2016-10-28<br>Conselho Nacional de Pesquisa e Desenvolvimento Cient?fico e Tecnol?gico - CNPq<br>O uso de conforma??es obtidas por trajet?rias da din?mica molecular nos experimentos de docagem molecular ? a abordagem mais precisa para simular o comportamento de receptores e ligantes em ambientes moleculares. Entretanto, tais simula??es exigem alto custo computacional e a sua completa execu??o pode se tornar uma tarefa impratic?vel devido ao vasto n?mero de informa??es estruturais consideradas para representar a expl?cita flexibilidade de receptores. Al?m disso, o problema ? ainda mais desafiante quando deseja-se utilizar modelos de receptores totalmente flex?veis (Fully-Flexible Receptor - FFR) para realizar a triagem virtual em bibliotecas de ligantes. Este estudo apresenta um m?todo inovador para otimizar a triagem virtual baseada em docagem molecular de modelos FFR por meio da redu??o do n?mero de experimentos de docagem e, da invoca??o escalar de workflows de docagem para m?quinas virtuais de plataformas em nuvem. Para esse prop?sito, o workflow cient?fico basedo em nuvem, chamado e-FReDock, foi desenvolvido para acelerar as simula??es da docagem molecular em larga escala. e-FReDock ? baseado em um m?todo seletivo sem param?tros para executar experimentos de docagem ensemble com m?ltiplos ligantes. Como dados de entrada do e-FReDock, aplicou-se seis m?todos de agrupamento para particionar conforma??es com diferentes caracter?sticas estruturais no s?tio de liga??o da cavidade do substrato do receptor, visando identificar grupos de conforma??es favor?veis a interagir com espec?ficos ligantes durante os experimentos de docagem. Os resultados mostram o elevado n?vel de qualidade obtido pelos modelos de receptores totalmente flex?veis reduzidos (Reduced Fully-Flexible Receptor - RFFR) ao final dos experimentos em dois conjuntos de an?lises. O primeiro mostra que e-FReDock ? capaz de preservar a qualidade do modelo FFR entre 84,00% e 94,00%, enquanto a sua dimensionalidade reduz em uma m?dia de 49,68%. O segundo relata que os modelos RFFR resultantes s?o capazes de melhorar os resultados de docagem molecular em 97,00% dos ligantes testados quando comparados com a vers?o r?gida do modelo FFR.<br>The use of conformations obtained from molecular dynamics trajectories in the molecular docking experiments is the most accurate approach to simulate the behavior of receptors and ligands in molecular environments. However, such simulations are computationally expensive and their execution may become an infeasible task due to the large number of structural information, typically considered to represent the explicit flexibility of receptors. In addition, the computational demand increases when Fully-Flexible Receptor (FFR) models are routinely applied for screening of large compounds libraries. This study presents a novel method to optimize docking-based virtual screening of FFR models by reducing the size of FFR models at docking runtime, and scaling docking workflow invocations out onto virtual machines from cloud platforms. For this purpose, we developed e-FReDock, a cloud-based scientific workflow that assists in faster high-throughput docking simulations of flexible receptors and ligands. e-FReDock is based on a free-parameter selective method to perform ensemble docking experiments with multiple ligands from a clustered FFR model. The e-FReDock input data was generated by applying six clustering methods for partitioning conformations with different features in their substrate-binding cavities, aiming at identifying groups of snapshots with favorable interactions for specific ligands at docking runtime. Experimental results show the high quality Reduced Fully-Flexible Receptor (RFFR) models achieved by e-FReDock in two distinct sets of analyses. The first analysis shows that e-FReDock is able to preserve the quality of the FFR model between 84.00% and 94.00%, while its dimensionality reduces on average 49.68%. The second analysis reports that resulting RFFR models are able to reach better docking results than those obtained from the rigid version of the FFR model in 97.00% of the ligands tested.
APA, Harvard, Vancouver, ISO, and other styles
32

Kašpar, Jan. "Posouzení informačního systému firmy a návrh změn." Master's thesis, Vysoké učení technické v Brně. Fakulta podnikatelská, 2016. http://www.nusl.cz/ntk/nusl-241578.

Full text
Abstract:
The Master‘s thesis deals with the design of information system of the company KLAS Nekoř a.s. Based on the analysis, I will present suggestions to improve information system of the company. Master‘s thesis assesses the current situation of the information system of the company and establishes requirements for the selection of the optimal solution. It also describes the steps for the implementation of a new information system in the company KLAS Nekoř a.s.
APA, Harvard, Vancouver, ISO, and other styles
33

Liu, Ke. "Scheduling algorithms for instance-intensive cloud workflows." Swinburne Research Bank, 2009. http://hdl.handle.net/1959.3/68752.

Full text
Abstract:
Thesis (PhD) - Swinburne University of Technology, Faculty of Engineering and Industrial Sciences, Centre for Complex Software Systems and Services, 2009.<br>A thesis submitted to CS3 - Centre for Complex Software Systems and Services, Faculty of Engineering and Industrial Sciences, Swinburne University of Technology for the degree of Doctor of Philosophy, 2009. Typescript. "June 2009". Bibliography: p. 122-135.
APA, Harvard, Vancouver, ISO, and other styles
34

Liu, Ji. "Gestion multisite de workflows scientifiques dans le cloud." Thesis, Montpellier, 2016. http://www.theses.fr/2016MONTT260/document.

Full text
Abstract:
Les in silico expérimentations scientifiques à grande échelle contiennent généralement plusieurs activités de calcule pour traiter big data. Workflows scientifiques (SWfs) permettent aux scientifiques de modéliser les activités de traitement de données. Puisque les SWfs moulinent grandes quantités de données, les SWfs orientés données deviennent un problème important. Dans un SWf orienté donnée, les activités sont liées par des dépendances de données ou de contrôle et une activité correspond à plusieurs tâches pour traiter les différentes parties de données. Afin d’exécuter automatiquement les SWfs orientés données, Système de management pour workflows scientifiques (SWfMSs) peut être utilisé en exploitant High Perfmance Comuting (HPC) fournisse par un cluster, grille ou cloud. En outre, SWfMSs génèrent des données de provenance pour tracer l’exécution des SWfs.Puisque le cloud fournit des services stables, diverses ressources, la capacité de calcul et de stockage virtuellement infinie, il devient une infrastructure intéressante pour l’exécution de SWf. Le cloud données essentiellement trois types de services, i.e. Infrastructure en tant que Service (IaaS), Plateforme en tant que Service (PaaS) et Logiciel en tant que Service (SaaS). SWfMSs peuvent être déployés dans le cloud en utilisant des Machines Virtuelles (VMs) pour exécuter les SWfs orientés données. Avec la méthode de pay-as-you-go, les utilisateurs de cloud n’ont pas besoin d’acheter des machines physiques et la maintenance des machines sont assurée par les fournisseurs de cloud. Actuellement, le cloud généralement se compose de plusieurs sites (ou centres de données), chacun avec ses propres ressources et données. Du fait qu’un SWf orienté donnée peut-être traite les données distribuées dans différents sites, l’exécution de SWf orienté donnée doit être adaptée aux multisite cloud en utilisant des ressources de calcul et de stockage distribuées.Dans cette thèse, nous étudions les méthodes pour exécuter SWfs orientés données dans un environnement de multisite cloud. Certains SWfMSs existent déjà alors que la plupart d’entre eux sont conçus pour des grappes d’ordinateurs, grille ou cloud d’un site. En outre, les approches existantes sont limitées aux ressources de calcul statique ou à l’exécution d’un seul site. Nous vous proposons des algorithmes pour partitionner SWfs et d’un algorithme d’ordonnancement des tâches pour l’exécution des SWfs dans un multisite cloud. Nos algorithmes proposés peuvent réduire considérablement le temps global d’exécution d’un SWf dans un multisite cloud.En particulier, nous proposons une solution générale basée sur l’ordonnancement multi-objectif afin d’exécuter SWfs dans un multisite cloud. La solution se compose d’un modèle de coût, un algorithme de provisionnement de VMs et un algorithme d’ordonnancement des activités. L’algorithme de provisionnement de VMs est basé sur notre modèle de coût pour générer les plans à provisionner VMs pour exécuter SWfs dans un cloud d’un site. L’algorithme d’ordonnancement des activités permet l’exécution de SWf avec le coût minimum, composé de temps d’exécution et le coût monétaire, dans un multisite cloud. Nous avons effectué beaucoup d’expérimentations et les résultats montrent que nos algorithmes peuvent réduire considérablement le coût global pour l’exécution de SWf dans un multisite cloud<br>Large-scale in silico scientific experiments generally contain multiple computational activities to process big data. Scientific Workflows (SWfs) enable scientists to model the data processing activities. Since SWfs deal with large amounts of data, data-intensive SWfs is an important issue. In a data-intensive SWf, the activities are related by data or control dependencies and one activity may consist of multiple tasks to process different parts of experimental data. In order to automatically execute data-intensive SWfs, Scientific Work- flow Management Systems (SWfMSs) can be used to exploit High Performance Computing (HPC) environments provided by a cluster, grid or cloud. In addition, SWfMSs generate provenance data for tracing the execution of SWfs.Since a cloud offers stable services, diverse resources, virtually infinite computing and storage capacity, it becomes an interesting infrastructure for SWf execution. Clouds basically provide three types of services, i.e. Infrastructure-as-a-Service (IaaS), Platform- as-a-Service (PaaS) and Software-as-a-Service (SaaS). SWfMSs can be deployed in the cloud using Virtual Machines (VMs) to execute data-intensive SWfs. With a pay-as-you- go method, the users of clouds do not need to buy physical machines and the maintenance of the machines are ensured by the cloud providers. Nowadays, a cloud is typically made of several sites (or data centers), each with its own resources and data. Since a data- intensive SWf may process distributed data at different sites, the SWf execution should be adapted to multisite clouds while using distributed computing or storage resources.In this thesis, we study the methods to execute data-intensive SWfs in a multisite cloud environment. Some SWfMSs already exist while most of them are designed for computer clusters, grid or single cloud site. In addition, the existing approaches are limited to static computing resources or single site execution. We propose SWf partitioning algorithms and a task scheduling algorithm for SWf execution in a multisite cloud. Our proposed algorithms can significantly reduce the overall SWf execution time in a multisite cloud.In particular, we propose a general solution based on multi-objective scheduling in order to execute SWfs in a multisite cloud. The general solution is composed of a cost model, a VM provisioning algorithm, and an activity scheduling algorithm. The VM provisioning algorithm is based on our proposed cost model to generate VM provisioning plans to execute SWfs at a single cloud site. The activity scheduling algorithm enables SWf execution with the minimum cost, composed of execution time and monetary cost, in a multisite cloud. We made extensive experiments and the results show that our algorithms can reduce considerably the overall cost of the SWf execution in a multisite cloud
APA, Harvard, Vancouver, ISO, and other styles
35

Jiang, Qingye. "Executing Large Scale Scientific Workflows in Public Clouds." Thesis, The University of Sydney, 2015. http://hdl.handle.net/2123/13888.

Full text
Abstract:
Scientists in different fields, such as high-energy physics, earth science, and astronomy are developing large-scale workflow applications. In many use cases, scientists need to run a set of interrelated but independent workflows (i.e., workflow ensembles) for the entire scientific analysis. As a workflow ensemble usually contains many sub-workflows in each of which hundreds or thousands of jobs exist with precedence constraints, the execution of such a workflow ensemble makes a great concern with cost even using elastic and pay-as-you-go cloud resources. In this thesis, we develop a set of methods to optimize the execution of large-scale scientific workflows in public clouds with both cost and deadline constraints with a two-step approach. Firstly, we present a set of methods to optimize the execution of scientific workflow in public clouds, with the Montage astronomical mosaic engine running on Amazon EC2 as an example. Secondly, we address three main challenges in realizing benefits of using public clouds when executing large-scale workflow ensembles: (1) execution coordination, (2) resource provisioning, and (3) data staging. To this end, we develop a new pulling-based workflow execution system with a profiling-based resource provisioning strategy. Our results show that our solution system can achieve 80% speed-up, by removing scheduling overhead, compared to the well-known Pegasus workflow management system when running scientific workflow ensembles. Besides, our evaluation using Montage workflow ensembles on around 1000-core Amazon EC2 clusters has demonstrated the efficacy of our resource provisioning strategy in terms of cost effectiveness within deadline.
APA, Harvard, Vancouver, ISO, and other styles
36

Yassa, Sonia. "Allocation optimale multicontraintes des workflows aux ressources d’un environnement Cloud Computing." Thesis, Cergy-Pontoise, 2014. http://www.theses.fr/2014CERG0730/document.

Full text
Abstract:
Le Cloud Computing est de plus en plus reconnu comme une nouvelle façon d'utiliser, à la demande, les services de calcul, de stockage et de réseau d'une manière transparente et efficace. Dans cette thèse, nous abordons le problème d'ordonnancement de workflows sur les infrastructures distribuées hétérogènes du Cloud Computing. Les approches d'ordonnancement de workflows existantes dans le Cloud se concentrent principalement sur l'optimisation biobjectif du makespan et du coût. Dans cette thèse, nous proposons des algorithmes d'ordonnancement de workflows basés sur des métaheuristiques. Nos algorithmes sont capables de gérer plus de deux métriques de QoS (Quality of Service), notamment, le makespan, le coût, la fiabilité, la disponibilité et l'énergie dans le cas de ressources physiques. En outre, ils traitent plusieurs contraintes selon les exigences spécifiées dans le SLA (Service Level Agreement). Nos algorithmes ont été évalués par simulation en utilisant (1) comme applications: des workflows synthétiques et des workflows scientifiques issues du monde réel ayant des structures différentes; (2) et comme ressources Cloud: les caractéristiques des services de Amazon EC2. Les résultats obtenus montrent l'efficacité de nos algorithmes pour le traitement de plusieurs QoS. Nos algorithmes génèrent une ou plusieurs solutions dont certaines surpassent la solution de l'heuristique HEFT sur toutes les QoS considérées, y compris le makespan pour lequel HEFT est censé donner de bons résultats<br>Cloud Computing is increasingly recognized as a new way to use on-demand, computing, storage and network services in a transparent and efficient way. In this thesis, we address the problem of workflows scheduling on distributed heterogeneous infrastructure of Cloud Computing. The existing workflows scheduling approaches mainly focus on the bi-objective optimization of the makespan and the cost. In this thesis, we propose news workflows scheduling algorithms based on metaheuristics. Our algorithms are able to handle more than two QoS (Quality of Service) metrics, namely, makespan, cost, reliability, availability and energy in the case of physical resources. In addition, they address several constraints according to the specified requirements in the SLA (Service Level Agreement). Our algorithms have been evaluated by simulations. We used (1) synthetic workflows and real world scientific workflows having different structures, for our applications; and (2) the features of Amazon EC2 services for our Cloud. The obtained results show the effectiveness of our algorithms when dealing multiple QoS metrics. Our algorithms produce one or more solutions which some of them outperform the solution produced by HEFT heuristic over all the QoS considered, including the makespan for which HEFT is supposed to give good results
APA, Harvard, Vancouver, ISO, and other styles
37

Khaleel, Mustafa Ibrahim. "ENERGY-AWARE JOB SCHEDULING AND CONSOLIDATION APPROACHES FOR WORKFLOWS IN CLOUD." OpenSIUC, 2016. https://opensiuc.lib.siu.edu/dissertations/1165.

Full text
Abstract:
Cloud computing offers several types of on-demand and scalable access to software, computing resources, and storage services through web browsers based on pay-as-you- go model. In order to meet the growing demand of active users and reduce the skyrocketing cost of electricity for powering the data centers, cloud service providers are highly motivated to implement performance guaranteed and cost-effective job schedulers. Many researchers have been focusing on scheduling jobs with high performance, and their primary concern has been execution time considerations. As a result of this thinking, little attention was paid to energy consumption and energy costs. However, nowadays energy cost has gained more and more attention from the service providers. This new reality has posed many new challenges for providers who are both concerned about meeting the execution time constraints and reducing energy costs. In recent years, there has been a growing body of research which focused on improving resource utilization by adopting new strategies and ideas that can be used to improve energy efficiency while maintaining high system throughput. One of these strategies used is known as task consolidation. This is one of the most effective techniques for increasing system-wide resource utilization. The research clearly shows that by switching o idle servers to sleep mode a vast amount of energy can be saved. In this research, a job scheduling approach called multi-procedure energy-aware heuristic scientific workflow scheduling method referred to as Time and Energy Aware Scheduling (TEAS) is proposed to tackle an energy optimization problem. This method is based on a rigorous cost and energy model that could be used to maximize resource utilization performance. The objectives focused on maximizing resource utilization and minimizing power consumption without compromising Quality of Service (QoS) such as workflow response time specified in the Service Level Agreements (SLA). The scientific applications are formulated as Directed Acycle Graph (DAG)-structured workflow to be processed as a group using virtualization techniques over cloud resources. Furthermore, the underlying cloud hardware/Virtual Machine (VM) resource availability is time-dependent because of the dual operation modes of on-demand and reservation. The resource provision and allocation algorithm can be separated into three steps with different objectives. The first step (Datacenter Selection) selects the most efficient data center to execute module applications. The second step (Time and Energy Aware Scheduling Forward Mapping) primarily focuses on estimating the execution time of scheduling a batch of workflows over VMs on underlying cloud servers and the objective is to achieve the minimum End-to-End Delay (EED). The last, and the most important step is related to the energy saving and resource utilization (Time and Energy Aware Scheduling Backward Mapping) which is concerned with minimizing energy consumption. This task is accomplished by restricting CPU usage between double thresholds and keeping the total utilization of the CPU by all the VMs allocated to a single server between these two thresholds. In addition, cloud module could migrate to other servers to either reduce the number of active servers or achieve better performance. In this case, the communication cost would be factored into the energy cost model. The performance of our algorithm is compared to algorithms such as the Pegasus Workflow Management system, Minimum Power Consumption Minimum Power Consumption (MPC-MPC) algorithm, and Greedy algorithm. The simulation results show that the Time and Energy Aware Scheduling heuristic can significantly decrease the power consumption of cloud servers with high resource utilization for the underlying clouds.
APA, Harvard, Vancouver, ISO, and other styles
38

Chiu, David T. "Auspice: Automatic Service Planning in Cloud/Grid Environments." The Ohio State University, 2010. http://rave.ohiolink.edu/etdc/view?acc_num=osu1275012033.

Full text
APA, Harvard, Vancouver, ISO, and other styles
39

Wen, Zhenyu. "Partitioning workflow applications over federated clouds to meet non-functional requirements." Thesis, University of Newcastle upon Tyne, 2016. http://hdl.handle.net/10443/3343.

Full text
Abstract:
With cloud computing, users can acquire computer resources when they need them on a pay-as-you-go business model. Because of this, many applications are now being deployed in the cloud, and there are many di erent cloud providers worldwide. Importantly, all these various infrastructure providers o er services with di erent levels of quality. For example, cloud data centres are governed by the privacy and security policies of the country where the centre is located, while many organisations have created their own internal \private cloud" to meet security needs. With all this varieties and uncertainties, application developers who decide to host their system in the cloud face the issue of which cloud to choose to get the best operational conditions in terms of price, reliability and security. And the decision becomes even more complicated if their application consists of a number of distributed components, each with slightly di erent requirements. Rather than trying to identify the single best cloud for an application, this thesis considers an alternative approach, that is, combining di erent clouds to meet users' non-functional requirements. Cloud federation o ers the ability to distribute a single application across two or more clouds, so that the application can bene t from the advantages of each one of them. The key challenge for this approach is how to nd the distribution (or deployment) of application components, which can yield the greatest bene ts. In this thesis, we tackle this problem and propose a set of algorithms, and a framework, to partition a work ow-based application over federated clouds in order to exploit the strengths of each cloud. The speci c goal is to split a distributed application structured as a work ow such that the security and reliability requirements of each component are met, whilst the overall cost of execution is minimised. To achieve this, we propose and evaluate a cloud broker for partitioning a work ow application over federated clouds. The broker integrates with the e-Science Central cloud platform to automatically deploy a work ow over public and private clouds. We developed a deployment planning algorithm to partition a large work ow appli- - i - cation across federated clouds so as to meet security requirements and minimise the monetary cost. A more generic framework is then proposed to model, quantify and guide the partitioning and deployment of work ows over federated clouds. This framework considers the situation where changes in cloud availability (including cloud failure) arise during work ow execution.
APA, Harvard, Vancouver, ISO, and other styles
40

Teixeira, Eduardo Cotrin. "Informações de suporte ao escalonamento de workflows científicos para a execução em plataformas de computação em nuvem." Universidade de São Paulo, 2016. http://www.teses.usp.br/teses/disponiveis/45/45134/tde-28062016-155756/.

Full text
Abstract:
A ciência tem feito uso frequente de recursos computacionais para execução de experimentos e processos científicos, que podem ser modelados como workflows que manipulam grandes volumes de dados e executam ações como seleção, análise e visualização desses dados segundo um procedimento determinado. Workflows científicos têm sido usados por cientistas de várias áreas, como astronomia e bioinformática, e tendem a ser computacionalmente intensivos e fortemente voltados à manipulação de grandes volumes de dados, o que requer o uso de plataformas de execução de alto desempenho como grades ou nuvens de computadores. Para execução dos workflows nesse tipo de plataforma é necessário o mapeamento dos recursos computacionais disponíveis para as atividades do workflow, processo conhecido como escalonamento. Plataformas de computação em nuvem têm se mostrado um alternativa viável para a execução de workflows científicos, mas o escalonamento nesse tipo de plataforma geralmente deve considerar restrições específicas como orçamento limitado ou o tipo de recurso computacional a ser utilizado na execução. Nesse contexto, informações como a duração estimada da execução ou limites de tempo e de custo (chamadas aqui de informações de suporte ao escalonamento) são importantes para garantir que o escalonamento seja eficiente e a execução ocorra de forma a atingir os resultados esperados. Este trabalho identifica as informações de suporte que podem ser adicionadas aos modelos de workflows científicos para amparar o escalonamento e a execução eficiente em plataformas de computação em nuvem. É proposta uma classificação dessas informações, e seu uso nos principais Sistemas Gerenciadores de Workflows Científicos (SGWC) é analisado. Para avaliar o impacto do uso das informações no escalonamento foram realizados experimentos utilizando modelos de workflows científicos com diferentes informações de suporte, escalonados com algoritmos que foram adaptados para considerar as informações inseridas. Nos experimentos realizados, observou-se uma redução no custo financeiro de execução do workflow em nuvem de até 59% e redução no makespan chegando a 8,6% se comparados à execução dos mesmos workflows sendo escalonados sem nenhuma informação de suporte disponível.<br>Science has been using computing resources to perform scientific processes and experiments that can be modeled as workflows handling large data volumes and performing actions such as selection, analysis and visualization of these data according to a specific procedure. Scientific workflows have been used by scientists from many areas, such as astronomy and bioinformatics, and tend to be computationally intensive and heavily focused on handling large data volumes, which requires using high-performance computing platforms such as grids or clouds. For workflow execution in these platforms it is necessary to assign the workflow activities to the available computational resources, a process known as scheduling. Cloud computing platforms have proved to be a viable alternative for scientific workflows execution, but scheduling in cloud must take into account specific constraints such as limited budget or the type of computing resources to be used in execution. In this context, information such as the estimated duration of execution, or time and cost limits (here this information is generally referred to as scheduling support information) become important for efficient scheduling and execution, aiming to achieve the expected results. This work identifies support information that can be added to scientific workflow models to support efficient scheduling and execution in cloud computing platforms. We propose and analyze a classification of such information and its use in Scientific Workflows Management Systems (SWMS). To assess the impact of support information on scheduling, experiments were conducted with scientific workflow models using different support information, scheduled with algorithms that were adapted to consider the added information. The experiments have shown a reduction of up to 59% on the financial cost of workflow execution in the cloud, and a reduction reaching 8,6% on the makespan if compared to workflow execution scheduled without any available supporting information.
APA, Harvard, Vancouver, ISO, and other styles
41

Pineda, Morales Luis Eduardo. "Efficient support for data-intensive scientific workflows on geo-distributed clouds." Thesis, Rennes, INSA, 2017. http://www.theses.fr/2017ISAR0012/document.

Full text
Abstract:
D’ici 2020, l’univers numérique atteindra 44 zettaoctets puisqu’il double tous les deux ans. Les données se présentent sous les formes les plus diverses et proviennent de sources géographiquement dispersées. L’explosion de données crée un besoin sans précédent en terme de stockage et de traitement de données, mais aussi en terme de logiciels de traitement de données capables d’exploiter au mieux ces ressources informatiques. Ces applications à grande échelle prennent souvent la forme de workflows qui aident à définir les dépendances de données entre leurs différents composants. De plus en plus de workflows scientifiques sont exécutés sur des clouds car ils constituent une alternative rentable pour le calcul intensif. Parfois, les workflows doivent être répartis sur plusieurs data centers. Soit parce qu’ils dépassent la capacité d’un site unique en raison de leurs énormes besoins de stockage et de calcul, soit car les données qu’ils traitent sont dispersées dans différents endroits. L’exécution de workflows multisite entraîne plusieurs problèmes, pour lesquels peu de solutions ont été développées : il n’existe pas de système de fichiers commun pour le transfert de données, les latences inter-sites sont élevées et la gestion centralisée devient un goulet d’étranglement. Cette thèse présente trois contributions qui visent à réduire l’écart entre les exécutions de workflows sur un seul site ou plusieurs data centers. Tout d’abord, nous présentons plusieurs stratégies pour le soutien efficace de l’exécution des workflows sur des clouds multisite en réduisant le coût des opérations de métadonnées. Ensuite, nous expliquons comment la manipulation sélective des métadonnées, classées par fréquence d’accès, améliore la performance des workflows dans un environnement multisite. Enfin, nous examinons une approche différente pour optimiser l’exécution de workflows sur le cloud en étudiant les paramètres d’exécution pour modéliser le passage élastique à l’échelle<br>By 2020, the digital universe is expected to reach 44 zettabytes, as it is doubling every two years. Data come in the most diverse shapes and from the most geographically dispersed sources ever. The data explosion calls for applications capable of highlyscalable, distributed computation, and for infrastructures with massive storage and processing power to support them. These large-scale applications are often expressed as workflows that help defining data dependencies between their different components. More and more scientific workflows are executed on clouds, for they are a cost-effective alternative for intensive computing. Sometimes, workflows must be executed across multiple geodistributed cloud datacenters. It is either because these workflows exceed a single site capacity due to their huge storage and computation requirements, or because the data they process is scattered in different locations. Multisite workflow execution brings about several issues, for which little support has been developed: there is no common ile system for data transfer, inter-site latencies are high, and centralized management becomes a bottleneck. This thesis consists of three contributions towards bridging the gap between single- and multisite workflow execution. First, we present several design strategies to eficiently support the execution of workflow engines across multisite clouds, by reducing the cost of metadata operations. Then, we take one step further and explain how selective handling of metadata, classified by frequency of access, improves workflows performance in a multisite environment. Finally, we look into a different approach to optimize cloud workflow execution by studying some parameters to model and steer elastic scaling
APA, Harvard, Vancouver, ISO, and other styles
42

Barbosa, Margarida de Carvalho Jerónimo. "As-built building information modeling (BIM) workflows." Doctoral thesis, Universidade de Lisboa, Faculdade de Arquitetura, 2018. http://hdl.handle.net/10400.5/16380.

Full text
Abstract:
Tese de Doutoramento em Arquitetura, com a especialização em Conservação e Restauro apresentada na Faculdade de Arquitetura da Universidade de Lisboa para obtenção do grau de Doutor.<br>As metodologias associadas ao software BIM (Building Information Modeling) representam nos dias de hoje um dos sistemas integrados mais utilizado para a construção de novos edifícios. Ao usar BIM no desenvolvimento de projetos, a colaboração entre os diferentes intervenientes num projeto de arquitetura, engenharia e construção, melhora de um modo muito significativo. Esta tecnologia também pode ser aplicada para intervenções em edifícios existentes. Na presente tese pretende-se melhorar os processos de registo, documentação e gestão da informação, recorrendo a ferramentas BIM para estabelecer um conjunto de diretrizes de fluxo de trabalho, para modelar de forma eficiente as estruturas existentes a partir de nuvens de pontos, complementados com outros métodos apropriados. Há vários desafios que impedem a adoção do software BIM para o planeamento de intervenções em edifícios existentes. Volk et al. (2014) indica que os principais obstáculos de adoção BIM são o esforço de modelação/conversão dos elementos do edifício captados em objetos BIM, a dificuldade em actualizar informação em BIM e as dificuldades em lidar com as incertezas associadas a dados, objetos e relações que ocorrem em edifícios existentes. A partir desta análise, foram desenvolvidas algumas diretrizes de fluxo de trabalho BIM para modelação de edifícios existentes. As propostas indicadas para as diretrizes BIM em edifícios existentes, incluem tolerâncias e standards para modelar elementos de edifícios existentes. Tal metodologia permite que as partes interessadas tenham um entendimento e um acordo sobre o que é suposto ser modelado. Na presente tese, foi investigado um conjunto de tópicos de pesquisa que foram formuladas e colocadas, enquadrando os diferentes obstáculos e direcionando o foco de pesquisa segundo quatro vectores fundamentais: 1. Os diferentes tipos de dados de um edifício que podem ser adquiridos a partir de nuvens de pontos; 2. Os diferentes tipos de análise de edifícios; 3. A utilização de standards e BIM para edifícios existentes; 4. Fluxos de trabalho BIM para edifícios existentes e diretrizes para ateliers de arquitectura. A partir da pesquisa efetuada, pode-se concluir que é há necessidade de uma melhor utilização da informação na tomada de decisão no âmbito de um projeto de intervenção arquitetónica. Diferentes tipos de dados, não apenas geométricos, são necessários como base para a análise dos edifícios. Os dados não geométricos podem referir-se a características físicas do tecido construído, tais como materiais, aparência e condição. Além disso, o desempenho ambiental, estrutural e mecânico de um edifício, bem como valores culturais, históricos e arquitetónicos, essenciais para a compreensão do seu estado atual. Estas informações são fundamentais para uma análise mais profunda que permita a compreensão das ações de intervenção que são necessárias no edifício. Através de tecnologias Fotogrametria (ADP) e Laser Scanning (TLS), pode ser gerada informação precisa e actual. O produto final da ADP e TLS são nuvens de pontos, que podem ser usadas de forma complementar. A combinação destas técnicas com o levantamento tradicional Robotic Total Station (RTS) fornece uma base de dados exata que, juntamente com outras informações existentes, permitem o planeamento adequado da intervenção. Os problemas de utilização de BIM para intervenção em edifícios existentes referem-se principalmente à análise e criação de geometria do edifício, o que geralmente é uma etapa prévia para a conexão de informação não-geométrica de edifícios. Por esta razão, a presente tese centra-se principalmente na busca de diretrizes para diminuir a dificuldade em criar os elementos necessários para o BIMs. Para tratar dados incertos e pouco claros ou informações semânticas não visíveis, pode-se complementar os dados originais com informação adicional. Os fluxos de trabalho apresentados na presente tese focam-se principalmente na falta de informação visível. No caso de projetos de remodelação, a informação não visível pode ser adquirida de forma limitada através de levantamentos ADP ou TLS após a demolição de alguns elementos e/ou camadas de parede. Tal metodologia permite um melhor entendimento das camadas de materiais não visíveis dos elementos do edifício, quando a intervenção é uma demolição parcial. Este processo é útil apenas se uma parte do material do elemento é removida e não pode ser aplicada a elementos não intervencionados. O tratamento da informação em falta pode ser feito através da integração de diferentes tipos de dados com diferentes origens. Devem ser implementados os fluxos de trabalho para a integração da informação. Diferentes fluxos de trabalho podem criar informação em falta, usada como complemento ou como base para a tomada de decisão quando não há dados disponíveis. Relativamente à adição de dados em falta através da geração de nuvem de pontos, os casos de estudo destacam a importância de planear o levantamento, fazendo com que todas as partes compreendam as necessidades associadas ao projeto. Além da precisão, o nível de tolerância de interpretação e modelação, requeridos pelo projeto, também devem ser acordados e entendidos. Nem todas as ferramentas e métodos de pesquisa são adequados para todos os edifícios. A escala, os materiais e a acessibilidade do edifício desempenham um papel importante no planeamento do levantamento. Para lidar com o elevado esforço de modelação, é necessário entender os fluxos de trabalho necessários para analisar a geometria dos elementos do edifício. Os BIMs construídos são normalmente gerados manualmente através de desenhos CAD e/ou nuvens de pontos. Estes são usados como base geométrica a partir da qual a informação é extraída. A informação utilizada para planear a intervenção do edifício deve ser verificada, confirmando se é uma representação do estado actual do edifício. As técnicas de levantamento 3D para capturar a condição atual do edifício devem ser integradas no fluxo de trabalho BIM, construído para capturar os dados do edifício sobre os quais serão feitas as decisões de intervenção. O resultado destas técnicas deve ser integrado com diferentes tipos de dados para fornecer uma base mais precisa e completa. O atelier de arquitetura deve estar habilitado com competências técnicas adequadas para saber o que pedir e o que utilizar da forma mais adequada. Os requisitos de modelação devem concentrar-se principalmente no conteúdo deste processo, ou seja, o que modelar, como desenvolver os elementos no modelo, quais as informações que o modelo deve conter e como deve ocorrer a troca de informações no modelo. O levantamento das nuvens de pontos deve ser efectuado após ter sido estipulado o objetivo do projeto, standards, tolerâncias e tipo de conteúdo na modelação. As tolerâncias e normas de modelação são diferentes entre empresas e países. Independentemente destas diferenças, os documentos standard têm como objetivo produzir e receber informação num formato de dados consistente e em fluxos de trabalho de troca eficiente entre os diferentes intervenientes do projeto. O pensamento crítico do fluxo de trabalho de modelação e a comunicação e acordo entre todas os intervenientes são os principais objetivos das diretrizes apresentadas nesta tese. O estabelecimento e o acordo de tolerâncias de modelação e o nível de desenvolvimento e detalhes presentes nas BIMs, entre as diferentes partes envolvidas no projeto, são mais importantes do que as definições existentes atualmente e que são utilizadas pela indústria da AEC. As ferramentas automáticas ou semi-automáticas para extração da forma geométrica, eliminação ou redução de tarefas repetitivas durante o desenvolvimento de BIMs e a análise de condições de ambiente ou de cenários, são também um processo de diminuição do esforço de modelação. Uma das razões que justifica a necessidade de standards é a estrutura e a melhoria da colaboração, não só para os intervenientes fora da empresa, mas também dentro dos ateliers de arquitetura. Os dados e standards de fluxo de trabalho são difíceis de implementar diariamente de forma eficiente, resultando muitas vezes em dados e fluxos de trabalho confusos. Quando tal situação ocorre, a qualidade dos resultados do projeto reduz-se e pode ficar comprometida. As normas aplicadas aos BIMs construídos, exatamente como as normas aplicadas aos BIMs para edifícios novos, contribuem para a criação de informação credível e útil. Para atualizar um BIMs durante o ciclo de vida de um edifício,é necessário adquirir a informação sobre o estado actual do edifício. A monitorização de dados pode ser composta por fotografias, PCM, dados de sensores, ou dados resultantes da comparação de PCM e BIMs e podem representar uma maneira de atualizar BIMs existentes. Isto permite adicionar continuamente informações, documentando a evolução e a história da construção e possibilita avaliar possíveis intervenções de prevenção para a sua valorização. BIM não é geralmente usado para documentar edifícios existentes ou intervenções em edifícios existentes. No presente trabalho propõe-se melhorar tal situação usando standards e/ou diretrizes BIM e apresentar uma visão inicial e geral dos componentes que devem ser incluídos em tais standards e/ou linhas de orientação.<br>ABSTRACT: Building information modeling (BIM) is most often used for the construction of new buildings. By using BIM in such projects, collaboration among stakeholders in an architecture, engineering and construction project is improved. This scenario might also be targeted for interventions in existing buildings. This thesis intends to enhance processes of recording, documenting and managing information by establishing a set of workflow guidelines to efficiently model existing structures with BIM tools from point cloud data, complemented with any other appropriate methods. There are several challenges hampering BIM software adoption for planning interventions in existing buildings. Volk et al. (2014) outlines that the as-built BIM adoption main obstacles are: the required modeling/conversion effort from captured building data into semantic BIM objects; the difficulty in maintaining information in a BIM; and the difficulties in handling uncertain data, objects, and relations occurring in existing buildings. From this analysis, it was developped a case for devising BIM workflow guidelines for modeling existing buildings. The proposed content for BIM guidelines includes tolerances and standards for modeling existing building elements. This allows stakeholders to have a common understanding and agreement of what is supposed to be modeled and exchanged.In this thesis, the authors investigate a set of research questions that were formed and posed, framing obstacles and directing the research focus in four parts: 1. the different kind of building data acquired; 2. the different kind of building data analysis processes; 3. the use of standards and as-built BIM and; 4. as-built BIM workflows and guidelines for architectural offices. From this research, the authors can conclude that there is a need for better use of documentation in which architectural intervention project decisions are made. Different kind of data, not just geometric, is needed as a basis for the analysis of the current building state. Non-geometric information can refer to physical characteristics of the built fabric, such as materials, appearance and condition. Furthermore environmental, structural and mechanical building performance, as well as cultural, historical and architectural values, style and age are vital to the understanding of the current state of the building. These information is necessary for further analysis allowing the understanding of the necessary actions to intervene. Accurate and up to date information information can be generated through ADP and TLS surveys. The final product of ADP and TLS are the point clouds, which can be used to complement each other. The combination of these techniques with traditional RTS survey provide an accurate and up to date base that, along with other existing information, allow the planning of building interventions. As-built BIM adoption problems refer mainly to the analysis and generation of building geometry, which usually is a previous step to the link of non-geometric building information. For this reason the present thesis focus mainly in finding guidelines to decrease the difficulty in generating the as-built-BIMs elements. To handle uncertain data and unclear or hidden semantic information, one can complement the original data with additional missing information. The workflows in the present thesis address mainly the missing visible information. In the case of refurbishment projects the hidden information can be acquired to some extend with ADP or TLS surveys after demolition of some elements and wall layers. This allows a better understanding of the non visible materials layers of a building element whenever it is a partial demolition. This process is only useful if a part of the element material is removed, it can not be applied to the non intervened elements. The handling of visible missing data, objects and relations can be done by integrating different kind of data from different kind of sources. Workflows to connect them in a more integrated way should be implemented. Different workflows can create additional missing information, used to complement or as a base for decision making when no data is available. Relating to adding missing data through point cloud data generation the study cases outlined the importance of planning the survey, with all parts understanding what the project needs are. In addition to accuracy, the level of interpretation and modelling tolerances, required by the project, must also be agreed and understood. Not all survey tools and methods are suitable for all buildings: the scale, materials and accessibility of building play a major role in the survey planning. To handle the high modeling/conversion effort one has to understand the current workflows to analyse building geometry. As-built BIMs are majorly manually generated through CAD drawings and/or PCM data. These are used as a geometric basis input from where information is extracted. The information used to plan the building intervention should be checked, confirming it is a representation of the as-is state of the building. The 3D surveys techniques to capture the as-is state of the building should be integrated in the as-built BIM workflow to capture the building data in which intervention decisions are made. The output of these techniques should be integrated with different kind of data to provide the most accurate and complete basis. The architectural company should have technical skills to know what to ask for and to use it appropriately. Modeling requirements should focus primarily on the content of this process: what to model, how to develop the elements in the model, what information should the model contain, and how should information in the model be exchanged. The point clouds survey should be done after stipulating the project goal, standards, tolerances and modeling content. Tolerances and modeling guidelines change across companies and countries. Regardless of these differences the standards documents have the purpose of producing and receiving information in a consistent data format, in efficient exchange workflows between project stakeholders. The critical thinking of the modeling workflow and, the communication and agreement between all parts involved in the project, is the prime product of this thesis guidelines. The establishment and agreement of modeling tolerances and the level of development and detail present in the BIMs, between the different parts involved on the project, is more important than which of the existing definitions currently in use by the AEC industry is chosen. Automated or semi-automated tools for elements shape extraction, elimination or reduction of repetitive tasks during the BIMs development and, analysis of environment or scenario conditions are also a way of decreasing the modeling effort. One of the reasons why standards are needed is the structure and improvement of the collaboration not only with outside parts but also inside architectural offices. Data and workflow standards are very hard to implement daily, in a practical way, resulting in confusing data and workflows. These reduce the quality of communication and project outputs. As-built BIM standards, exactly like BIM standards, contribute to the creation of reliable and useful information. To update a BIMs during the building life-cycle, one needs to acquire the as-is building state information. Monitoring data, whether consisted by photos, PCM, sensor data, or data resulting from the comparison of PCM and BIMs can be a way of updating existing BIMs. It allows adding continuously information, documenting the building evolution and story, and evaluating possible prevention interventions for its enhancement. BIM environments are not often used to document existing buildings or interventions in existing buildings. The authors propose to improve the situation by using BIM standards and/or guidelines, and the authors give an initial overview of components that should be included in such a standard and/or guideline.<br>N/A
APA, Harvard, Vancouver, ISO, and other styles
43

Bendoukha, Sofiane [Verfasser], and Norbert [Akademischer Betreuer] Ritter. "Multi-Agent Approach for Managing Workflows in an Inter-Cloud Environment / Sofiane Bendoukha ; Betreuer: Norbert Ritter." Hamburg : Staats- und Universitätsbibliothek Hamburg, 2017. http://d-nb.info/1124591133/34.

Full text
APA, Harvard, Vancouver, ISO, and other styles
44

Bendoukha, Sofiane Verfasser], and Norbert [Akademischer Betreuer] [Ritter. "Multi-Agent Approach for Managing Workflows in an Inter-Cloud Environment / Sofiane Bendoukha ; Betreuer: Norbert Ritter." Hamburg : Staats- und Universitätsbibliothek Hamburg, 2017. http://nbn-resolving.de/urn:nbn:de:gbv:18-82417.

Full text
APA, Harvard, Vancouver, ISO, and other styles
45

Christensen, Scott D. "A Comprehensive Python Toolkit for Harnessing Cloud-Based High-Throughput Computing to Support Hydrologic Modeling Workflows." BYU ScholarsArchive, 2016. https://scholarsarchive.byu.edu/etd/5667.

Full text
Abstract:
Advances in water resources modeling are improving the information that can be supplied to support decisions that affect the safety and sustainability of society, but these advances result in models being more computationally demanding. To facilitate the use of cost- effective computing resources to meet the increased demand through high-throughput computing (HTC) and cloud computing in modeling workflows and web applications, I developed a comprehensive Python toolkit that provides the following features: (1) programmatic access to diverse, dynamically scalable computing resources; (2) a batch scheduling system to queue and dispatch the jobs to the computing resources; (3) data management for job inputs and outputs; and (4) the ability for jobs to be dynamically created, submitted, and monitored from the scripting environment. To compose this comprehensive computing toolkit, I created two Python libraries (TethysCluster and CondorPy) that leverage two existing software tools (StarCluster and HTCondor). I further facilitated access to HTC in web applications by using these libraries to create powerful and flexible computing tools for Tethys Platform, a development and hosting platform for web-based water resources applications. I tested this toolkit while collaborating with other researchers to perform several modeling applications that required scalable computing. These applications included a parameter sweep with 57,600 realizations of a distributed, hydrologic model; a set of web applications for retrieving and formatting data; a web application for evaluating the hydrologic impact of land-use change; and an operational, national-scale, high- resolution, ensemble streamflow forecasting tool. In each of these applications the toolkit was successful in automating the process of running the large-scale modeling computations in an HTC environment.
APA, Harvard, Vancouver, ISO, and other styles
46

de, Beste Eugene. "Enabling the processing of bioinformatics workflows where data is located through the use of cloud and container technologies." University of the Western Cape, 2019. http://hdl.handle.net/11394/6767.

Full text
Abstract:
>Magister Scientiae - MSc<br>The growing size of raw data and the lack of internet communication technology to keep up with that growth is introducing unique challenges to academic researchers. This is especially true for those residing in rural areas or countries with sub-par telecommunication infrastructure. In this project I investigate the usefulness of cloud computing technology, data analysis workflow languages and portable computation for institutions that generate data. I introduce the concept of a software solution that could be used to simplify the way that researchers execute their analysis on data sets at remote sources, rather than having to move the data. The scope of this project involved conceptualising and designing a software system to simplify the use of a cloud environment as well as implementing a working prototype of said software for the OpenStack cloud computing platform. I conclude that it is possible to improve the performance of research pipelines by removing the need for researchers to have operating system or cloud computing knowledge and that utilising technologies such as this can ease the burden of moving data.
APA, Harvard, Vancouver, ISO, and other styles
47

Truong, Huu Tram. "Optimisation des performances et du coût de flots applicatifs s'exécutant sur des infrastructures de cloud." Phd thesis, Université de Nice Sophia-Antipolis, 2010. http://tel.archives-ouvertes.fr/tel-00805511.

Full text
Abstract:
Les infrastructures virtuelles de cloud sont de plus en plus exploitées pour relever les défis de calcul intensif en sciences comme dans l'industrie. Elles fournissent des ressources de calcul, de communication et de stockage à la demande pour satisfaire les besoins des applications à grande échelle. Pour s'adapter à la diversité de ces infrastructures, de nouveaux outils et modèles sont nécessaires. L'estimation de la quantité de ressources consommées par chaque application est un problème particulièrement difficile, tant pour les utilisateurs qui visent à minimiser leurs coûts que pour les fournisseurs d'infrastructure qui visent à contrôler l'allocation des ressources. Même si une quantité quasi illimitée de ressources peut être allouée, un compromis doit être trouvé entre (i) le coût de l'infrastructure allouée, (ii) la performance attendue et (iii) la performance optimale atteignable qui dépend du niveau de parallélisme inhérent à l'application. Partant du cas d'utilisation de l'analyse d'images médicales, un domaine scientifique représentatif d'un grand nombre d'applications à grande échelle, cette thèse propose un modèle de coût à grain fin qui s'appuie sur l'expertise extraite de l'application formalisée comme un flot. Quatre stratégies d'allocation des ressources basées sur ce modèle de coût sont introduites. En tenant compte à la fois des ressources de calcul et de communication, ces stratégies permettent aux utilisateurs de déterminer la quantité de ressources de calcul et de bande passante à réserver afin de composer leur environnement d'exécution. De plus, l'optimisation du transfert de données et la faible fiabilité des systèmes à grande échelle, qui sont des problèmes bien connus ayant un impact sur la performance de l'application et donc sur le coût d'utilisation des infrastructures, sont également prises en considération. Les expériences exposées dans cette thèse ont été effectuées sur la plateforme Aladdin/Grid'5000, en utilisant l'intergiciel HIPerNet. Ce gestionnaire de plateforme virtuelle permet la virtualisation de ressources de calcul et de communication. Une application réelle d'analyse d'images médicales a été utilisée pour toutes les validations expérimentales. Les résultats expérimentaux montrent la validité de l'approche en termes de contrôle du coût de l'infrastructure et de la performance des applications. Nos contributions facilitent à la fois l'exploitation des infrastructures de cloud, offrant une meilleure qualité de services aux utilisateurs, et la planification de la mise à disposition des ressources virtualisées.
APA, Harvard, Vancouver, ISO, and other styles
48

Cavalcante, Everton Ranielly de Sousa. "Cloud Integrator: uma plataforma para composi??o de servi?os em ambientes de computa??o em nuvem." Universidade Federal do Rio Grande do Norte, 2013. http://repositorio.ufrn.br:8080/jspui/handle/123456789/18065.

Full text
Abstract:
Made available in DSpace on 2014-12-17T15:48:05Z (GMT). No. of bitstreams: 1 EvertonRSC_DISSERT.pdf: 4653595 bytes, checksum: 83e897be68464555082a55505fd406ea (MD5) Previous issue date: 2013-01-31<br>Conselho Nacional de Desenvolvimento Cient?fico e Tecnol?gico<br>With the advance of the Cloud Computing paradigm, a single service offered by a cloud platform may not be enough to meet all the application requirements. To fulfill such requirements, it may be necessary, instead of a single service, a composition of services that aggregates services provided by different cloud platforms. In order to generate aggregated value for the user, this composition of services provided by several Cloud Computing platforms requires a solution in terms of platforms integration, which encompasses the manipulation of a wide number of noninteroperable APIs and protocols from different platform vendors. In this scenario, this work presents Cloud Integrator, a middleware platform for composing services provided by different Cloud Computing platforms. Besides providing an environment that facilitates the development and execution of applications that use such services, Cloud Integrator works as a mediator by providing mechanisms for building applications through composition and selection of semantic Web services that take into account metadata about the services, such as QoS (Quality of Service), prices, etc. Moreover, the proposed middleware platform provides an adaptation mechanism that can be triggered in case of failure or quality degradation of one or more services used by the running application in order to ensure its quality and availability. In this work, through a case study that consists of an application that use services provided by different cloud platforms, Cloud Integrator is evaluated in terms of the efficiency of the performed service composition, selection and adaptation processes, as well as the potential of using this middleware in heterogeneous computational clouds scenarios<br>Com o avan?o do paradigma de Computa??o em Nuvem, um ?nico servi?o oferecido por uma plataforma de nuvem pode n?o ser suficiente para satisfazer todos os requisitos da aplica??o. Para satisfazer tais requisitos, ao inv?s de um ?nico servi?o, pode ser necess?ria uma composi??o que agrega servi?os providos por diferentes plataformas de nuvem. A fim de gerar valor agregado para o usu?rio, essa composi??o de servi?os providos por diferentes plataformas de Computa??o em Nuvem requer uma solu??o em termos de integra??o de plataformas, envolvendo a manipula??o de um vasto n?mero de APIs e protocolos n?o interoper?veis de diferentes provedores. Nesse cen?rio, este trabalho apresenta o Cloud Integrator, uma plataforma de middleware para composi??o de servi?os providos por diferentes plataformas de Computa??o em Nuvem. Al?m de prover um ambiente que facilita o desenvolvimento e a execu??o de aplica??es que utilizam tais servi?os, o Cloud Integrator funciona como um mediador provendo mecanismos para a constru??o de aplica??es atrav?s da composi??o e sele??o de servi?os Web sem?nticos que consideram metadados acerca dos servi?os, como QoS (Quality of Service), pre?os etc. Adicionalmente, a plataforma de middleware proposta prov? um mecanismo de adapta??o que pode ser disparado em caso de falha ou degrada??o da qualidade de um ou mais servi?os utilizados pela aplica??o em quest?o, a fim de garantir sua a qualidade e disponibilidade. Neste trabalho, atrav?s de um estudo de caso que consiste de uma aplica??o que utiliza servi?os providos por diferentes plataformas de nuvem, o Cloud Integrator ? avaliado em termos da efici?ncia dos processos de composi??o de servi?os, sele??o e adapta??o realizados, bem como da potencialidade do seu uso em cen?rios de nuvens computacionais heterog?neas
APA, Harvard, Vancouver, ISO, and other styles
49

Chieregato, Federico. "Modelling task execution time in Directed Acyclic Graphs for efficient distributed management." Master's thesis, Alma Mater Studiorum - Università di Bologna, 2022.

Find full text
Abstract:
In this thesis, has been shown a framework that predicts the execution time of tasks in Directed Acyclic Graphs (DAG), each task is the smallest unit of work that executes a function over a set of inputs and in this scenario represents a vertex in a DAG. This thesis includes an implementation for extracting profiling information from Apache Spark, as well, an evaluation of the framework for the Spark decision support benchmark TPC-DS and an in-house and completely different DAG runtime system for real-world DAGS, involving computational quantum chemistry applications. Speeding up the execution in Spark or other workflows is an important problem for many real-time applications; since it is impractical to generate a predictive model that considers the actual values of the inputs to tasks, has been explored the use of Surrogates as the number of parents and the mean parent duration of a task. For this reason, this solution takes the name of PRODIGIOUS, Performance modelling of DAGs via surrogate features. Since the duration of the tasks is a float value, have been studied different regression algorithms, tuning the Hyperparameters through GridSearchCV. The main objective of PRODIGIOUS concern, not only to understand if the use of surrogates instead of actual inputs is enough to predict the execution time of tasks of the same DAG type, but also if it is possible to predict the execution time of tasks of different DAG type creating so a DAG agnostic framework that could help scientist and computer engineer making more efficient their workflow. Others agnostic feature chosen were, the core for each task, the RAM of the benchmark, the data access type, and the number of executors.
APA, Harvard, Vancouver, ISO, and other styles
50

Iqbal, Muhammad Safdar. "The Multi-tiered Future of Storage: Understanding Cost and Performance Trade-offs in Modern Storage Systems." Thesis, Virginia Tech, 2017. http://hdl.handle.net/10919/79142.

Full text
Abstract:
In the last decade, the landscape of storage hardware and software has changed considerably. Storage hardware has diversified from hard disk drives and solid state drives to include persistent memory (PMEM) devices such as phase change memory (PCM) and Flash-backed DRAM. On the software side, the increasing adoption of cloud services for building and deploying consumer and enterprise applications is driving the use of cloud storage services. Cloud providers have responded by providing a plethora of choices of storage services, each of which have unique performance characteristics and pricing. We argue this variety represents an opportunity for modern storage systems, and it can be leveraged to improve operational costs of the systems. We propose that storage tiering is an effective technique for balancing operational or de- ployment costs and performance in such modern storage systems. We demonstrate this via three key techniques. First, THMCache, which leverages tiering to conserve the lifetime of PMEM devices, hence saving hardware upgrade costs. Second, CAST, which leverages tiering between multiple types of cloud storage to deliver higher utility (i.e. performance per unit of cost) for cloud tenants. Third, we propose a dynamic pricing scheme for cloud storage services, which leverages tiering to increase the cloud provider's profit or offset their management costs.<br>Master of Science
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!