Dissertations / Theses: 'Data-Intensive Systems'

1

Xu, Yiqi. "Storage Management of Data-intensive Computing Systems." FIU Digital Commons, 2016. http://digitalcommons.fiu.edu/etd/2474.

Full text

Abstract:

Computing systems are becoming increasingly data-intensive because of the explosion of data and the needs for processing the data, and storage management is critical to application performance in such data-intensive computing systems. However, existing resource management frameworks in these systems lack the support for storage management, which causes unpredictable performance degradations when applications are under I/O contention. Storage management of data-intensive systems is a challenging problem because I/O resources cannot be easily partitioned and distributed storage systems require scalable management. This dissertation presents the solutions to address these challenges for typical data-intensive systems including high-performance computing (HPC) systems and big-data systems. For HPC systems, the dissertation presents vPFS, a performance virtualization layer for parallel file system (PFS) based storage systems. It employs user-level PFS proxies to interpose and schedule parallel I/Os on a per-application basis. Based on this framework, it enables SFQ(D)+, a new proportional-share scheduling algorithm which allows diverse applications with good performance isolation and resource utilization. To manage an HPC system’s total I/O service, it also provides two complementary synchronization schemes to coordinate the scheduling of large numbers of storage nodes in a scalable manner. For big-data systems, the dissertation presents IBIS, an interposition-based big-data I/O scheduler. By interposing the different I/O phases of big-data applications, it schedules the I/Os transparently to the applications. It enables a new proportional-share scheduling algorithm, SFQ(D2), to address the dynamics of the underlying storage by adaptively adjusting the I/O concurrency. Moreover, it employs a scalable broker to coordinate the distributed I/O schedulers and provide proportional sharing of a big-data system’s total I/O service. Experimental evaluations show that these solutions have low-overhead and provide strong I/O performance isolation. For example, vPFS’ overhead is less than 3% in through- put and it delivers proportional sharing within 96% of the target for diverse workloads; and IBIS provides up to 99% better performance isolation for WordCount and 30% better proportional slowdown for TeraSort and TeraGen than native YARN.

APA, Harvard, Vancouver, ISO, and other styles

2

Cai, Simin. "Systematic Design of Data Management for Real-Time Data-Intensive Applications." Licentiate thesis, Mälardalens högskola, Inbyggda system, 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:mdh:diva-35369.

Full text

Abstract:

Modern real-time data-intensive systems generate large amounts of data that are processed using complex data-related computations such as data aggregation. In order to maintain the consistency of data, such computations must be both logically correct (producing correct and consistent results) and temporally correct (completing before specified deadlines). One solution to ensure logical and temporal correctness is to model these computations as transactions and manage them using a Real-Time Database Management System (RTDBMS). Ideally, depending on the particular system, the transactions are customized with the desired logical and temporal correctness properties, which are achieved by the customized RTDBMS with appropriate run-time mechanisms. However, developing such a data management solution with provided guarantees is not easy, partly due to inadequate support for systematic analysis during the design. Firstly, designers do not have means to identify the characteristics of the computations, especially data aggregation, and to reason about their implications. Design flaws might not be discovered, and thus they may be propagated to the implementation. Secondly, trade-off analysis of conflicting properties, such as conflicts between transaction isolation and temporal correctness, is mainly performed ad-hoc, which increases the risk of unpredictable behavior. In this thesis, we propose a systematic approach to develop transaction-based data management with data aggregation support for real-time systems. Our approach includes the following contributions: (i) a taxonomy of data aggregation, (ii) a process for customizing transaction models and RTDBMS, and (iii) a pattern-based method of modeling transactions in the timed automata framework, which we show how to verify with respect to transaction isolation and temporal correctness. Our proposed taxonomy of data aggregation processes helps in identifying their common and variable characteristics, based on which their implications can be reasoned about. Our proposed process allows designers to derive transaction models with desired properties for the data-related computations from system requirements, and decide the appropriate run-time mechanisms for the customized RTDBMS to achieve the desired properties. To perform systematic trade-off analysis between transaction isolation and temporal correctness specifically, we propose a method to create formal models of transactions with concurrency control, based on which the isolation and temporal correctness properties can be verified by model checking, using the UPPAAL tool. By applying the proposed approach to the development of an industrial demonstrator, we validate the applicability of our approach.
DAGGERS

APA, Harvard, Vancouver, ISO, and other styles

3

Schnell, Felicia. "Multicast Communication for Increased Data Exchange in Data- Intensive Distributed Systems." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-232132.

Full text

Abstract:

Modern applications are required to handle and communicate an increasing amount of data. Meanwhile, distributed systems containing multiple computationally weak components becomes more common, resulting in a problematic situation. Choosing among communication strategies, used for delivering message between entities, therefore becomes crucial in order to efficiently utilize available resources. Systems where identical data is delivered to many recipients are common nowadays, but may apply an underlying communication strategy based on direct interaction between sender and receiver which is insufficient. Multicasting refers to a technique for group communication where messages can be distributed to participating nodes in a single transmission. This technique is developed to circumvent the problem of high workload on sender side and redundant traffic in the network, and constitutes the focus for this thesis. Within the area of Electronic Warfare and self-protection systems, time constitutes a critical aspect in order to provide relevant information for decision making. Self-protection systems developed by Saab, used in military aircrafts, must provide situational awareness to guarantee that correct decisions can be made at the right time. With more advanced systems, where the amount of data needed to be transmitted increases, the need of fast communication is essential to achieve quality of service. This thesis investigates how the deployment of multicast, in a distributed data-intensive system, could prepare a system for increased data exchange. The result is a communication design which allows for the system to distribute messages to a group of receivers with less effort from the sender and with reduced redundant traffic transferred over the same link. Comparative measurements are conducted between the new implementation and the old system. The result of the evaluation shows that the multicast solution both can decrease the time for message handling as well as the workload on endpoints significantly.
Nutidens applikationer måste kunna hantera och kommunicera en ökad datamängd. Samtidigt har distribuerade system bestående av många beräkningsmässigt svaga enheter blivit allt mer vanligt, vilket är problematiskt. Valet av kommunikationsstrategi, för att leverera data mellan enheter i ett system, är därför av stor betydelse för att uppnå effektivt utnyttjande av tillgängliga resurser. System där identisk information ska distribueras till flertalet mottagare är vanligt förekommande idag. Den underliggande kommunikationsstrategin som används kan dock baseras på direkt interaktion mellan sändare och mottagare vilket är ineffektivt. Multicast (Flersändning) syftar till ett samlingsbegrepp inom datorkommunikation baserat på gruppsändning av information. Denna teknik är utvecklad för att kringgå problematiken med hög belastning på sändarsidan och dessutom minska belastningen på nätverket, och utgör fokus för detta arbete. Inom telekrigföring och självskyddssystem utgör tiden en betydande faktor för att kunna tillhandahålla relevant information som kan stödja beslutsfattning. För självskyddssystem utvecklade av Saab, vilka används i militärflygplan, är situationsmedvetenhet av stor betydelse då det möjliggör för att korrekta beslut kan tas vid rätt tidpunkt. Genom utvecklingen av mer avancerade system, där mängden meddelanden som måste passera genom nätverket ökar, tillkommer höga krav på snabb kommunikation för att kunna åstadkomma kvalité. Denna uppsatsrapport undersöker hur införandet av multicast, i ett dataintensivt distribuerat system, kan förbereda ett system för ökat datautbyte. Arbetet har resulterat i en kommunikationsdesign som gör det möjligt för systemet att distribuera meddelanden till grupp av mottagare med minskad belastning på sändarsidan och mindre redundant trafik på de utgående länkarna. Jämförandet mätningar har gjorts mellan den nya implementationen och det gamla systemet. Resultaten visar att multicast-lösningen både kan reducera tiden för meddelande hantering samt belastningen på ändnoder avsevärt.

APA, Harvard, Vancouver, ISO, and other styles

4

Yeom, Jae-seung. "Optimizing Data Accesses for Scaling Data-intensive Scientific Applications." Diss., Virginia Tech, 2014. http://hdl.handle.net/10919/64180.

Full text

Abstract:

Data-intensive scientific applications often process an enormous amount of data. The scalability of such applications depends critically on how to manage the locality of data. Our study explores two common types of applications that are vastly different in terms of memory access pattern and workload variation. One includes those with multi-stride accesses in regular nested parallel loops. The other is for processing large-scale irregular social network graphs. In the former case, the memory location or the data item accessed in a loop is predictable and the load on processing a unit work (an array element) is relatively uniform with no significant variation. On the other hand, in the latter case, the data access per unit work (a vertex) is highly irregular in terms of the number of accesses and the locations being accessed. This property is further tied to the load and presents significant challenges in the scalability of the application performance. Designing platforms to support extreme performance scaling requires understanding of how application specific information can be used to control the locality and improve the performance. Such insights are necessary to determine which control and which abstraction to provide for interfacing an underlying system and an application as well as for designing a new system. Our goal is to expose common requirements of data-intensive scientific applications for scalability. For the former type of applications, those with regular accesses and uniform workload, we contribute new methods to improve the temporal locality of software-managed local memories, and optimize the critical path of scheduling data transfers for multi-dimensional arrays in nested loops. In particular, we provide a runtime framework allowing transparent optimization by source-to-source compilers or automatic fine tuning by programmers. Finally, we demonstrate the effectiveness of the approach by comparing against a state-of-the-art language-based framework. For the latter type, those with irregular accesses and non-uniform workload, we analyze how the heavy-tailed property of input graphs limits the scalability of the application. Then, we introduce an application-specific workload model as well as a decomposition method that allows us to optimize locality with the custom load balancing constraints of the application. Finally, we demonstrate unprecedented strong scaling of a contagion simulation on two state-of-the-art high performance computing platforms.
Ph. D.

APA, Harvard, Vancouver, ISO, and other styles

5

Khemiri, Wael. "Data-intensive interactive workflows for visual analytics." Phd thesis, Université Paris Sud - Paris XI, 2011. http://tel.archives-ouvertes.fr/tel-00659227.

Full text

Abstract:

The increasing amounts of electronic data of all forms, produced by humans (e.g. Web pages, structured content such as Wikipedia or the blogosphere etc.) and/or automatic tools (loggers, sensors, Web services, scientific programs or analysis tools etc.) leads to a situation of unprecedented potential for extracting new knowledge, finding new correlations, or simply making sense of the data.Visual analytics aims at combining interactive data visualization with data analysis tasks. Given the explosion in volume and complexity of scientific data, e.g., associated to biological or physical processes or social networks, visual analytics is called to play an important role in scientific data management.Most visual analytics platforms, however, are memory-based, and are therefore limited in the volume of data handled. Moreover, the integration of each new algorithm (e.g. for clustering) requires integrating it by hand into the platform. Finally, they lack the capability to define and deploy well-structured processes where users with different roles interact in a coordinated way sharing the same data and possibly the same visualizations.This work is at the convergence of three research areas: information visualization, database query processing and optimization, and workflow modeling. It provides two main contributions: (i) We propose a generic architecture for deploying a visual analytics platform on top of a database management system (DBMS) (ii) We show how to propagate data changes to the DBMS and visualizations, through the workflow process. Our approach has been implemented in a prototype called EdiFlow, and validated through several applications. It clearly demonstrates that visual analytics applications can benefit from robust storage and automatic process deployment provided by the DBMS while obtaining good performance and thus it provides scalability.Conversely, it could also be integrated into a data-intensive scientific workflow platform in order to increase its visualization features.

APA, Harvard, Vancouver, ISO, and other styles

6

Vijayakumar, Sruthi. "Hadoop Based Data Intensive Computation on IAAS Cloud Platforms." UNF Digital Commons, 2015. http://digitalcommons.unf.edu/etd/567.

Full text

Abstract:

Cloud computing is a relatively new form of computing which uses virtualized resources. It is dynamically scalable and is often provided as pay for use service over the Internet or Intranet or both. With increasing demand for data storage in the cloud, the study of data-intensive applications is becoming a primary focus. Data intensive applications are those which involve high CPU usage, processing large volumes of data typically in size of hundreds of gigabytes, terabytes or petabytes. The research in this thesis is focused on the Amazon’s Elastic Cloud Compute (EC2) and Amazon Elastic Map Reduce (EMR) using HiBench Hadoop Benchmark suite. HiBench is a Hadoop benchmark suite and is used for performing and evaluating Hadoop based data intensive computation on both these cloud platforms. Both quantitative and qualitative comparisons of Amazon EC2 and Amazon EMR are presented. Also presented are their pricing models and suggestions for future research.

APA, Harvard, Vancouver, ISO, and other styles

7

Maheshwari, Ketan. "Data-intensive scientific workflows : representations of parallelism and enactment on distributed systems." Nice, 2011. http://www.theses.fr/2011NICE4007.

Full text

Abstract:

Le portage d'applications manipulant de grandes masses de données sur des infrastructures de calcul distribué à grande échelle est un problème difficile. Combler l'écart entre l'application et sa description sous forme de workflow soulève des défis fia différents niveaux. Le défi au niveau de l'utilisateur final est le besoin d'exprimer la logique de l'application et des dépendances de flots de données dans un domaine non-technique. Au niveau de l'infrastructure, il s'agit d'un défi pour le portage de l'application sur infrastructures fia grande échelle en optimisant l'exploitation des ressources distribuées. Les workflows permettent le déploiement d'applications distribuées grâce fia la représentation formelle de composants les constituant, de leurs interactions et des flots de données véhiculés. Cependant, la description de workflows et leurs gestionnaires d'exécution nécessitent des améliorations pour relever les défis mentionnés. Faciliter la description du parallélisme sous une forme concise, la combinaison des données et des structures de données de haut niveau de manière cohérente est nécessaire. Cette thèse vise fia satisfaire ces exigences. Partant du cas d'utilisation de traitement d'images médicales, plusieurs stratégies sont développées afin d'exprimer le parallélisme et l'exécution asynchrone de worflkows complexes en fournissant une expression concise et un gestionnaire d'exécution interfacé avec des infrastructures fia grande échelle. Les contributions principales de cette thèse sont: a) Un langage riche de workflows disposant de deux représentations. L'exécution des applications de traitement d'images médicales décrites avec ce langage sur la grille de calcul européenne (EGI) donne des résultats expérimentaux fructueux. B) Une extension d'un environnement d'exécution existant de flots applicatifs (Taverna) pour permettre l'exécution de l'application sur les infrastructures fia grande échelle
Porting data-intensive applications on large scale distributed computing infrastructures is not trivial. Bridging the gap between application and its workflow expression poses challenges at different levels. The challenge at the end-user level is a need to express the application's logic and data flow requirements from a non-technical domain. At the infrastructure level, it is a challenge to port the application such that a maximum exploitation of the underlying resources can takes place. Workflows enable distributed application deployment by recognizing the application component's inter-connections and the flow among them. However, workflow expressions and engines need enhancements to meet the challenges outlined. Facilitation of a concise expression of parallelism, data combinations and higher level data structures in a coherent fashion is required. This work targets to fulfill these requirements. It is driven by the use-cases in the field of medical image processing domain. Various strategies are developed to efficiently express asynchronous and maximum parallel execution of complex flows by providing concise expression and enactments interfaced with large scale distributed computing infrastructures. The main contributions of this research are: a) A rich workflow language with two-way expression and fruitful results from the experiments carried out on enactment of medical image processing applications workflows on the European Grid Computing Infrastructure; and b) Extension of an existing workflow environment (Taverna) to interface with the Grid Computing Infrastructures

APA, Harvard, Vancouver, ISO, and other styles

8

Schäler, Martin [Verfasser], and Gunter [Akademischer Betreuer] Saake. "Minimal-invasive provenance integration into data-intensive systems / Martin Schäler. Betreuer: Gunter Saake." Magdeburg : Universitätsbibliothek, 2014. http://d-nb.info/1066295352/34.

Full text

APA, Harvard, Vancouver, ISO, and other styles

9

Shang, Pengju. "Research in high performance and low power computer systems for data-intensive environment." Doctoral diss., University of Central Florida, 2011. http://digital.library.ucf.edu/cdm/ref/collection/ETD/id/5033.

Full text

Abstract:

According to the data affinity, DAFA re-organizes data to maximize the parallelism of the affinitive data, and also subjective to the overall load balance. This enables DAFA to realize the maximum number of map tasks with data-locality. Besides the system performance, power consumption is another important concern of current computer systems. In the U.S. alone, the energy used by servers which could be saved comes to 3.17 million tons of carbon dioxide, or 580,678 cars {Kar09}. However, the goals of high performance and low energy consumption are at odds with each other. An ideal power management strategy should be able to dynamically respond to the change (either linear or nonlinear, or non-model) of workloads and system configuration without violating the performance requirement. We propose a novel power management scheme called MAR (modeless, adaptive, rule-based) in multiprocessor systems to minimize the CPU power consumption under performance constraints. By using richer feedback factors, e.g. the I/O wait, MAR is able to accurately describe the relationships among core frequencies, performance and power consumption. We adopt a modeless control model to reduce the complexity of system modeling. MAR is designed for CMP (Chip Multi Processor) systems by employing multi-input/multi-output (MIMO) theory and per-core level DVFS (Dynamic Voltage and Frequency Scaling).; TRAID deduplicates this overlap by only logging one compact version (XOR results) of recovery references for the updating data. It minimizes the amount of log content as well as the log flushing overhead, thereby boosts the overall transaction processing performance. At the same time, TRAID guarantees comparable RAID reliability, the same recovery correctness and ACID semantics of traditional transactional processing systems. On the other hand, the emerging myriad data intensive applications place a demand for high-performance computing resources with massive storage. Academia and industry pioneers have been developing big data parallel computing frameworks and large-scale distributed file systems (DFS) widely used to facilitate the high-performance runs of data-intensive applications, such as bio-informatics {Sch09}, astronomy {RSG10}, and high-energy physics {LGC06}. Our recent work {SMW10} reported that data distribution in DFS can significantly affect the efficiency of data processing and hence the overall application performance. This is especially true for those with sophisticated access patterns. For example, Yahoo's Hadoop {refg} clusters employs a random data placement strategy for load balance and simplicity {reff}. This allows the MapReduce {DG08} programs to access all the data (without or not distinguishing interest locality) at full parallelism. Our work focuses on Hadoop systems. We observed that the data distribution is one of the most important factors that affect the parallel programming performance. However, the default Hadoop adopts random data distribution strategy, which does not consider the data semantics, specifically, data affinity. We propose a Data-Affinity-Aware (DAFA) data placement scheme to address the above problem. DAFA builds a history data access graph to exploit the data affinity.; The evolution of computer science and engineering is always motivated by the requirements for better performance, power efficiency, security, user interface (UI), etc {CM02}. The first two factors are potential tradeoffs: better performance usually requires better hardware, e.g., the CPUs with larger number of transistors, the disks with higher rotation speed; however, the increasing number of transistors on the single die or chip reveals super-linear growth in CPU power consumption {FAA08a}, and the change in disk rotation speed has a quadratic effect on disk power consumption {GSK03}. We propose three new systematic approaches as shown in Figure 1.1, Transactional RAID, data-affinity-aware data placement DAFA and Modeless power management, to tackle the performance problem in Database systems, large scale clusters or cloud platforms, and the power management problem in Chip Multi Processors, respectively. The first design, Transactional RAID (TRAID), is motivated by the fact that in recent years, more storage system applications have employed transaction processing techniques Figure 1.1 Research Work Overview] to ensure data integrity and consistency. In transaction processing systems(TPS), log is a kind of redundancy to ensure transaction ACID (atomicity, consistency, isolation, durability) properties and data recoverability. Furthermore, high reliable storage systems, such as redundant array of inexpensive disks (RAID), are widely used as the underlying storage system for Databases to guarantee system reliability and availability with high I/O performance. However, the Databases and storage systems tend to implement their independent fault tolerant mechanisms {GR93, Tho05} from their own perspectives and thereby leading to potential high overhead. We observe the overlapped redundancies between the TPS and RAID systems, and propose a novel reliable storage architecture called Transactional RAID (TRAID).
ID: 030423445; System requirements: World Wide Web browser and PDF reader.; Mode of access: World Wide Web.; Thesis (Ph.D.)--University of Central Florida, 2011.; Includes bibliographical references (p. 119-128).
Ph.D.
Doctorate
Electrical Engineering and Computer Science
Engineering and Computer Science
Computer Science

APA, Harvard, Vancouver, ISO, and other styles

10

Saito, Yasushi. "Functionally homogeneous clustering : a framework for building scalable data-intensive internet services /." Thesis, Connect to this title online; UW restricted, 2001. http://hdl.handle.net/1773/6936.

Full text

APA, Harvard, Vancouver, ISO, and other styles

11

Goldhill, David Raymond. "Identifying priorities in intensive care : a description of a system for collecting intensive care data, an analysis of the data collected, a critique of aspects of severity scoring systems used to compare intensive care outcome, identification of priorities in intensive care and proposals to improve outcome for intensive care patients." Thesis, Queen Mary, University of London, 1999. http://qmro.qmul.ac.uk/xmlui/handle/123456789/1405.

Full text

Abstract:

This thesis reviews the requirements for intensive care audit data and describes the development of ICARUS (Intensive Care Audit and Resource Utilisation System), a system to collect and analyse intensive care audit information. By the end of 1998 ICARUS contained information on over 45,000 intensive care admissions. A study was performed to determine the accuracy of the data collection and entry in ICARUS. The data in ICARUS was used to investigate some limitations of the APACHE II severity scoring system. The studies examined the effect of changes in physiological values and post-intensive care deaths, and the effect of casemix adjustment on mortality predicted by APACHE II. A hypothesis is presented that excess intensive care mortality in the United Kingdom may be concealed by intensive care mortality prediction models. A critical analysis of ICARUS data was undertaken to identify patient groups most likely to benefit from intensive care. This analysis revealed a high mortality in critically ill patients admitted from the wards to the intensive care unit. To help identify critically ill ward patients, the physiological values and procedures in the 24 hours before intensive care admission from the ward were recorded: examination of the results suggested that management of these patients could be improved. This led to the setting up of a patient at risk team (PART). Two studies report the effect of the PART on patients on the wards and on the patients admitted from the wards to the intensive care unit. Additional care for surgical patients on the wards is suggested as a way of improving the management of high-risk postoperative patients. The thesis concludes by discussing the benefits of the ICARUS system and speculating on the direction that should be taken for intensive care audit in the future.

APA, Harvard, Vancouver, ISO, and other styles

12

Gamatié, Abdoulaye. "Design and Analysis for Multi-Clock and Data-Intensive Applications on Multiprocessor Systems-on-Chip." Habilitation à diriger des recherches, Université des Sciences et Technologie de Lille - Lille I, 2012. http://tel.archives-ouvertes.fr/tel-00756967.

Full text

Abstract:

Avec l'intégration croissante des fonctions, les systèmes embarqués modernes deviennent très intelligents et sophistiqués. Les exemples les plus emblématiques de cette tendance sont les téléphones portables de dernière génération, qui offrent à leurs utilisateurs un large panel de services pour la communication, la musique, la vidéo, la photographie, l'accès à Internet, etc. Ces services sont réalisés au travers d'un certain nombre d'applications traitant d'énormes quantités d'informations, qualifiées d'applications de traitements intensifs de données. Ces applications sont également caractérisées par des comportements multi-horloges car elles comportent souvent des composants fonctionnant à des rythmes différents d'activations lors de l'exécution. Les systèmes embarqués ont souvent des contraintes temps réel. Par exemple, une application de traitement vidéo se voit généralement imposer des contraintes de taux ou de délai d'affichage d'images. Pour cette raison, les plates-formes d'exécution doivent souvent fournir la puissance de calcul requise. Le parallélisme joue un rôle central dans la réponse à cette attente. L'intégration de plusieurs cœurs ou processeurs sur une seule puce, menant aux systèmes multiprocesseurs sur puce (en anglais, "multiprocessor systems-on-chip - MPSoCs") est une solution-clé pour fournir aux applications des performances suffisantes, à un coût réduit en termes d'énergie pour l'exécution. Afin de trouver un bon compromis entre performance et consommation d'énergie, l'hétérogénéité des ressources est exploitée dans les MPSoC en incluant des unités de traitements aux caractéristiques variées. Typiquement, des processeurs classiques sont combinés avec des accélérateurs (unités de traitements graphiques ou accélérateurs matériels). Outre l'hétérogénéité, l'adaptativité est une autre caractéristique importante des systèmes embarqués modernes. Elle permet de gérer de manière souple les paramètres de performances en fonction des variations de l'environnement et d'une plate-forme d'exécution d'un système. Dans un tel contexte, la complexité du développement des systèmes embarqués modernes paraît évidente. Elle soulève un certain nombre de défis traités dans nos contributions, comme suit : 1) tout d'abord, puisque les MPSoC sont des systèmes distribués, comment peut-on aborder avec succès la correction de leur conception, de telle sorte que les propriétés fonctionnelles des applications multi-horloges déployées puissent être garanties ? Cela est étudié en considérant une méthodologie de distribution "correcte-par-construction" pour ces applications sur plates-formes multiprocesseurs. 2) Ensuite, pour les applications de traitement intensif de données à exécuter sur de telles plates-formes, comment peut-on aborder leur conception et leur analyse de manière adéquate, tout en tenant pleinement compte de leur caractère réactif et de leur parallélisme potentiel ? 3) Enfin, en considérant l'exécution de ces applications sur des MPSoC, comment peut-on analyser leurs propriétés non fonctionnelles (par exemple, temps d'exécution ou énergie), afin de pouvoir prédire leurs performances ? La réponse à cette question devrait alors servir à l'exploration d'espaces complexes de conception. Nos travaux visent à répondre aux trois défis ci-dessus de manière pragmatique, en adoptant une vision basée sur des modèles. Pour cela, ils considèrent deux paradigmes complémentaires de modélisation flot de données : la "modélisation polychrone" liée à l'approche synchrone réactive, et la "modélisation de structures répétitives" liée à la programmation orientée tableaux pour le parallélisme de données. Le premier paradigme permet de raisonner sur des systèmes multi-horloges dans lesquels les composants interagissent, sans supposer l'existence d'une horloge de référence. Le second paradigme est quant à lui suffisamment expressif pour permettre la spécification du parallélisme massif d'un système.

APA, Harvard, Vancouver, ISO, and other styles

13

Krishnajith, Anaththa Pathiranage Dhanushka. "Memory management and parallelization of data intensive all-to-all comparison in shared-memory systems." Thesis, Queensland University of Technology, 2014. https://eprints.qut.edu.au/79187/1/Anaththa%20Pathiranage%20Dhanushka_Krishnajith_Thesis.pdf.

Full text

Abstract:

This thesis presents a novel program parallelization technique incorporating with dynamic and static scheduling. It utilizes a problem specific pattern developed from the prior knowledge of the targeted problem abstraction. Suitable for solving complex parallelization problems such as data intensive all-to-all comparison constrained by memory, the technique delivers more robust and faster task scheduling compared to the state-of-the art techniques. Good performance is achieved from the technique in data intensive bioinformatics applications.

APA, Harvard, Vancouver, ISO, and other styles

14

Bicer, Tekin. "Supporting Fault Tolerance and Dynamic Load Balancing in FREERIDE-G." The Ohio State University, 2010. http://rave.ohiolink.edu/etdc/view?acc_num=osu1267638588.

Full text

APA, Harvard, Vancouver, ISO, and other styles

15

Martí, Fraiz Jonathan. "dataClay : next generation object storage." Doctoral thesis, Universitat Politècnica de Catalunya, 2017. http://hdl.handle.net/10803/405907.

Full text

Abstract:

Existing solutions for data sharing are not fully compatible with multi-provider contexts. Traditionally, providers offer their datasets through hermetic Data Services with restricted APIs. Therefore, consumers are compelled to adapt their applications to current functionality, and their chances of contributing with their own know-how are very limited. With regard to data management, current database management systems (DBMSs) that sustain these Data Services are designed for a single-provider scenario, forcing a centralized administration conducted by the single role of the database administrator (DBA). This DBA defines the conceptual schema and the corresponding integrity constraints, and determines the external schema to be offered to the end users. The problem is that a multi-provider environment cannot assume the existence of a central role for the administration of all the datasets. In terms of data processing, the different representations of the data model at different tiers, from the application level, to the Data Service or DBMS layers; causes the applications to dedicate between 20\% and 50\% of the code to perform the proper transformations. This causes a negative impact both on developers' productivity and on the global performance of data-intensive workflows. In light of the foregoing, this thesis proposes three novel techniques that enable a data store to support a multi-provider ecosystem, facilitating the collaboration within all the players, and the development of efficient data-intensive applications. In particular, and after the convenient decentralization of the database administration, this thesis contributes to the community with: 1) the proper mechanisms to enable consumers to extend current schema and functionality without compromising providers constraints. 2) the proper mechanisms to enable any provider to define his own policies and integrity constraints in a way that will never be jeopardized. 3) the integration of a parallel programming model with the data model to drastically reduce data transformations and being designed to be compliant with near future storage devices. These contributions have been validated by means of the design and implementation of dataClay, as an example of a multi-provider data store that fulfills the defined requirements. Furthermore, regarding the first and third contributions, different performance analysis are exposed to evaluate and prove their feasibility (notice that the second contribution is merely logical).
Les solucions actuals per a compartir dades no són compatibles per a contexts multi-proveïdor. Tradicionalment, els proveïdors de dades les ofereixen via Data Services hermètics amb APIs molt restringides. De manera que els consumidors per una banda es veuen obligats a adaptar les seves aplicacions a la funcionalitat actual, i d'altra banda veuen com les possibilitats de contribuir amb el seu propi know-how queden molt limitades. A nivell de gestió, els sistemes gestors de bases de dades que sostenen aquests Data Services estan dissenyats per a escenaris amb un únic proveïdor, forçant una administració centralitzada que recau en el rol de l'administrador de la base de dades o DBA. El DBA defineix les restriccions d'integritat necessàries i especifica el model extern de les dades a oferir als usuaris. El problema és que en un entorn multi-proveïdor, no podem assumir l'existència d'un únic administrador central que s'ocupi de les dades de tothom. A nivell de processament, el fet de tenir diferents representacions de les dades segons es processin a nivell aplicació, de servei, o de base de dades; fa que les aplicacions hagin de dedicar d'entre un 20 i un 50% del codi a realitzar les transformacions corresponents. Això té un impacte negatiu tan a nivell de productivitat dels programadors, com a nivell de rendiment global en aplicacions que fa un ús intensiu de les dades. Tenint en compte aquestes dificultats, aquesta tesi proposa tres nous mecanismes per fer possible que un sistema gestor de dades pugui donar suport a entorns multi-proveïdor, on es faciliti la col·laboració amb els consumidors i el desenvolupament d'aplicacions que facin un ús intensiu de les dades. En concret, partint de la descentralització de l'administració de les dades i d'un model de dades orientat a objectes, aquesta tesi contribueix a la comunitat científica amb: 1) un mecanisme per permetre que els consumidors puguin estendre el model extern de les dades i la funcionalitat oferta, sense comprometre les restriccions dels proveïdors. 2) un mecanisme per permetre que cada proveïdor pugui definir les restriccions d'integritat que cregui convenients sobre el model de les dades, i de tal manera que sempre siguin respectades independentment de l'ús que se'n faci i les extensions que hi hagi. 3) la integració d'un model de programació paral·lela amb el model de dades per millorar el rendiment de les aplicacions i la productivitat dels programadors, reduint significativament les transformacions de les dades i el codi necessari per accedir-les. Aquestes contribucions es validen per mitjà del disseny i implementació de dataClay, com a exemple de gestor de dades multi-proveïdor que compleix els requisits definits. A més, en relació a la primera i tercera contribucions, es mostren una serie d'estudis de rendiment que n'avaluen i en demostren la seva viabilitat (la segona contribució és només lògica).

APA, Harvard, Vancouver, ISO, and other styles

16

Farahanchi, Ali. "The impact of strategic investment on success of capital-intensive ventures." Thesis, Massachusetts Institute of Technology, 2017. http://hdl.handle.net/1721.1/112623.

Full text

Abstract:

Thesis: Ph. D. in Engineering Systems, Massachusetts Institute of Technology, School of Engineering, Institute for Data, Systems, and Society, 2017.
Cataloged from PDF version of thesis.
Includes bibliographical references (pages 408-417).
Established companies in technology-enabled industries such as software, telecommunications, pharmaceuticals, and semiconductors, have used corporate venture capital as a lever to access and screen technological advances, and to drive innovation outside the traditional firm boundaries. Recent years have witnessed emergence of a new wave of corporate venture capital funds that increasingly interact and compete with traditional venture capital firms in the entrepreneurial ecosystem. The incremental benefits of financing a startup through corporate venture capital have been a subject of study by researchers across Economics, Finance, Strategy, and Innovation fields. First, this thesis examines entrepreneurs' rationale for raising capital from corporate investors. Through the analysis of an online survey conducted with startups based in the US and founded between 2010-15, we identify that startups that operate in capital-intensive industries, such as life sciences and manufacturing, raise capital from corporate investors in order to establish strategic partnership with corporates, significantly more than do startups in capital-light industries such as enterprise and consumer software. Second, through an empirical analysis of a panel of 8,190 startups founded in the US between 2000-10, this thesis shows that corporate venture capital is more beneficial to startups that operate in capital-intensive industries. Using a bi-variate probit model, this thesis shows that startups backed by corporate venture capital are more likely to be acquired or go public, and that the likelihood of an exit event increases as capital-intensity of the industry magnifies, as measured by the level of fixed assets on companies' balance sheets. In addition, we provide empirical evidence that participation of corporate venture capital in a financing round, helps a capital-intensive startup to raise further funding from reputable traditional venture capital firms. Third, this thesis presents empirical evidence that establishing strategic collaboration between capital-intensive startups and corporate parents of venture capital firms, in forms of joint research, product development, or commercialization, is a main source of value for startups. Using data gathered on 130 corporate news announcements on strategic collaborations, this thesis shows that capital-intensive startups backed by corporate venture capital, are significantly more likely to succeed when they establish strategic collaboration with corporate parents. The final contribution of this thesis is a formal assessment of traditional venture capital firms' investment behavior in presence of corporate investors. We present a game-theoretic model and identify the circumstances under which traditional venture capital firms benefit financially from corporate investors participation in financing a capital-intensive startup. By leveraging data gathered on 8,190 startups, we apply the game-theoretic model and Monte-Carlo method to simulate financial returns for a traditional venture capital firm investing a capital-intensive startup in the pharmaceutical industry.
by Ali Farahanchi.
Ph. D. in Engineering Systems

APA, Harvard, Vancouver, ISO, and other styles

17

Fumai, Nicola. "A database for an intensive care unit patient data management system." Thesis, McGill University, 1992. http://digitool.Library.McGill.CA:80/R/?func=dbin-jump-full&object_id=22500.

Full text

Abstract:

Computerization has had a large impact on hospital intensive care units, allowing continuous monitoring and display of physiological patient data. Treatment of the critically ill patient, however, now requires assimilating large amounts of patient data.
Computers can help by processing the data and displaying the information in easy to understand formats. Also, knowledge-based systems can provide advice in diagnosis and treatment of patients. If these systems are to be effective, they must be integrated into the total hospital information system and the separate computer data must be jointly integrated into a new database which will become the primary medical record.
This thesis presents the design and implementation of a computerized database for an intensive care unit patient data management system being developed for the Montreal Children's Hospital. The database integrates data from the various PDMS components into one logical information store. The patient data currently managed includes physiological parameter data, patient administrative data and fluid balance data.
A simulator design is also described, which allows for thorough validation and verification of the Patient Data Management System. This simulator can easily be extended for use as a teaching and training tool for PDMS users.
The database and simulator were developed in C and implemented under the OS/2 operating system environment. The database is based on the OS/2 Extended Edition relational Database Manager.

APA, Harvard, Vancouver, ISO, and other styles

18

Baker, Lawrence S. M. (Lawrence M. )Massachusetts Institute of Technology. "Characterisation of glucose management in intensive care." Thesis, Massachusetts Institute of Technology, 2019. https://hdl.handle.net/1721.1/124577.

Full text

Abstract:

This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.
Thesis: S.M. in Technology and Policy, Massachusetts Institute of Technology, School of Engineering, Institute for Data, Systems, and Society, 2019
Cataloged from student-submitted PDF version of thesis.
Includes bibliographical references (pages 121-130).
Patients in intensive care routinely have their blood glucose monitored and controlled using insulin. Two decades of on-going research has attempted to establish optimal glucose targets and treatment policy for patients with hyperglycemia in the intensive care unit (ICU). These efforts rely on the assumption that health care providers can reliably meet given targets. Significant proportions of the ICU population are either hypoglycemic or hyperglycemic and poor blood glucose control may lead to adverse patient outcomes. This thesis analyses approximately 20,000 ICU stays at the Beth Israel Deaconess Medical Center (BIDMC) which occurred between 2008 and 2018. These data are used to describe the state of clinical practice in the ICU and identify areas where treatment may be suboptimal. Even at a world-renowned teaching hospital, blood sugars are not optimally managed. 41.8% of diabetics and 14.2% of non-diabetics are severely hyperglycemic (>215mg/dL) each day. Insulin boluses are given more frequently than insulin infusions, despite guidelines recommending infusions for most critical care patients. When infusions are given, rates do not follow a consistent set of rules. Blood sugar management faces several challenges, including unreliable readings. Laboratory and fingerstick measurements that were taken at the same time had an R² of only 0.63 and the fingerstick measurements read on average 10mg/dL higher. Overcoming these challenges is an important part of improving care in the ICU. It is hoped that publicly sharing the code used to extract and clean data used for analysis will encourage further research. Code can be found at https://github.com/lawbaker/MIMIC-Glucose-Management
by Lawrence Baker.
S.M. in Technology and Policy
S.M.inTechnologyandPolicy Massachusetts Institute of Technology, School of Engineering, Institute for Data, Systems, and Society

APA, Harvard, Vancouver, ISO, and other styles

19

Suthakar, Uthayanath. "A scalable data store and analytic platform for real-time monitoring of data-intensive scientific infrastructure." Thesis, Brunel University, 2017. http://bura.brunel.ac.uk/handle/2438/15788.

Full text

Abstract:

Monitoring data-intensive scientific infrastructures in real-time such as jobs, data transfers, and hardware failures is vital for efficient operation. Due to the high volume and velocity of events that are produced, traditional methods are no longer optimal. Several techniques, as well as enabling architectures, are available to support the Big Data issue. In this respect, this thesis complements existing survey work by contributing an extensive literature review of both traditional and emerging Big Data architecture. Scalability, low-latency, fault-tolerance, and intelligence are key challenges of the traditional architecture. However, Big Data technologies and approaches have become increasingly popular for use cases that demand the use of scalable, data intensive processing (parallel), and fault-tolerance (data replication) and support for low-latency computations. In the context of a scalable data store and analytics platform for monitoring data-intensive scientific infrastructure, Lambda Architecture was adapted and evaluated on the Worldwide LHC Computing Grid, which has been proven effective. This is especially true for computationally and data-intensive use cases. In this thesis, an efficient strategy for the collection and storage of large volumes of data for computation is presented. By moving the transformation logic out from the data pipeline and moving to analytics layers, it simplifies the architecture and overall process. Time utilised is reduced, untampered raw data are kept at storage level for fault-tolerance, and the required transformation can be done when needed. An optimised Lambda Architecture (OLA), which involved modelling an efficient way of joining batch layer and streaming layer with minimum code duplications in order to support scalability, low-latency, and fault-tolerance is presented. A few models were evaluated; pure streaming layer, pure batch layer and the combination of both batch and streaming layers. Experimental results demonstrate that OLA performed better than the traditional architecture as well the Lambda Architecture. The OLA was also enhanced by adding an intelligence layer for predicting data access pattern. The intelligence layer actively adapts and updates the model built by the batch layer, which eliminates the re-training time while providing a high level of accuracy using the Deep Learning technique. The fundamental contribution to knowledge is a scalable, low-latency, fault-tolerant, intelligent, and heterogeneous-based architecture for monitoring a data-intensive scientific infrastructure, that can benefit from Big Data, technologies and approaches.

APA, Harvard, Vancouver, ISO, and other styles

20

Paz, Alvarez Alfonso. "Deviation occurrence analysis in a human intensive production environment by using MES data." Thesis, KTH, Industriell produktion, 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-230674.

Full text

Abstract:

Despite decades of automation initiatives, manual assembly still represents one of the most cost-effective approaches in scenarios with high product variety and complex geometry. It represents 50% of total production time and 20% of total production cost. Understanding human performance and its impact in the assembly line is key in order to improve the overall performance of an assembly line. Along this thesis work, by studying the deviations occurring in the line, it is aimed to understand how human workers are affected by certain functioning aspects of the assembly line. To do so, three different influence factors have been chosen, and then observed its impact in human performance: i. How past events occurring in the line affect the current action of the worker. ii. How do scheduled stops affect the current action of the worker. iii. How does theoretical cycle time affect the performance of the worker. In order to observe these influence relationships, it has been used data gathered in the shop floor from SCANIA's Manufacturing Execution System (MES). By applying methods of Knowledge Discovery in Database (KDD) data has been indexed and the analyzed providing the necessary results for the study. Finally, from the results shown, it can be inferred that variability on the functioning of the line does have an impact on human performance overall. However, due the complexity of the manufacturing system, impact in human performance might not be as regular as initially thought.
Trots decennier av automatiseringsinitiativ utgör manuell montering fortfarande en av de mest kostnadseffektiva metoderna i scenarier med hög produktsortiment och komplex geometri. Den representerar 50% av den totala produktionstiden och 20% av den totala produktionskostnaden. Att förstå mänsklig prestanda och dess inverkan i monteringsledningen är nyckeln för att förbättra den totala prestandan hos en monteringslinje. Utöver detta avhandlingsarbete, genom att studera avvikelserna som uppstår i linjen, syftar det till att förstå hur mänskliga arbetstagare påverkas av vissa fungerande aspekter av monteringslinjen. För att göra det har tre olika inflytningsfaktorer valts och sedan observerat dess inverkan i mänsklig prestation: i. Hur tidigare händelser som uppstår i linjen påverkar arbetarens nuvarande åtgärder. ii. Hur påverkar planerade stopp arbetstagarens nuvarande åtgärder. iii. Hur påverkar teoretisk cykeltid arbetarens prestation. För att observera dessa inflytningsrelationer har det använts data som samlats in i butiksgolvetfrån SCANIAs Manufacturing Execution System (MES). Genom att tillämpa metoder för Knowledge Discovery i Database (KDD) har data indexerats och analyseras vilket ger de nödvändiga resultaten för studien. Slutligen kan det framgå av de visade resultaten att variationen i linjens funktion har en inverkan på den mänskliga prestationen övergripande. På grund av tillverkningssystemets komplexitet kan emellertid effekten i mänsklig prestanda inte vara så regelbunden som ursprungligen tänkt.

APA, Harvard, Vancouver, ISO, and other styles

21

Shahzad, Khurram. "Energy Efficient Wireless Sensor Node Architecture for Data and Computation Intensive Applications." Doctoral thesis, Mittuniversitetet, Avdelningen för elektronikkonstruktion, 2014. http://urn.kb.se/resolve?urn=urn:nbn:se:miun:diva-21956.

Full text

Abstract:

Wireless Sensor Networks (WSNs), in addition to enabling monitoring solutions for numerous new applications areas, have gained huge popularity as a cost-effective, dynamically scalable, easy to deploy and maintainable alternatives to conventional infrastructure-based monitoring solutions. A WSN consists of spatially distributed autonomous wireless sensor nodes that measure desired physical phenomena and operate in a collaborative manner to relay the acquired information wirelessly to a central location. A wireless sensor node, integrating the required resources to enable infrastructure-less distributed monitoring, is constrained by its size, cost and energy. In order to address these constraints, a typical wireless sensor node is designed based on low-power and low-cost modules that in turn provide limited communication and processing performances. Data and computation intensive wireless monitoring applications, on the other hand, not only demand higher communication bandwidth and computational performance but also require practically feasible operational lifetimes so as to reduce the maintenance cost associated with the replacement of batteries. In relation to the communication and processing requirements of such applications and the constraints associated with a typical wireless sensor node, this thesis explores energy efficient wireless sensor node architecture that enables realization of data and computation intensive applications. Architectures enabling raw data transmission and in-sensor processing with various technological alternatives are explored. The potential architectural alternatives are evaluated both analytically and quantitatively with regards to different design parameters, in particular, the performance and the energy consumption. For quantitative evaluation purposes, the experiments are conducted on vibration and image-based industrial condition monitoring applications that are not only data and computation intensive but also are of practical importance. Regarding the choice of an appropriate wireless technology in an architecture enabling raw data transmission, standard based communication technologies including infrared, mobile broadband, WiMax, LAN, Bluetooth, and ZigBee are investigated. With regards to in-sensor processing, different architectures comprising of sequential processors and FPGAs are realized to evaluate different design parameters, especially the performance and energy efficiency. Afterwards, the architectures enabling raw data transmission only and those involving in-sensor processing are evaluated so as to find an energy efficient solution. The results of this investigation show that in-sensor processing architecture, comprising of an FPGA for computation purposes, is more energy efficient when compared with other alternatives in relation to the data and computation intensive applications. Based on the results obtained and the experiences learned in the architectural evaluation study, an FPGA-based high-performance wireless sensor platform, the SENTIOF, is designed and developed. In addition to performance, the SETNIOF is designed to enable dynamic optimization of energy consumption. This includes enabling integrated modules to be completely switched-off and providing a fast configuration support to the FPGA. In order to validate the results of the evaluation studies, and to assess the performance and energy consumption of real implementations, both the vibration and image-based industrial monitoring applications are realized using the SENTIOF. In terms of computational performance for both of these applications, the real-time processing goals are achieved. For example, in the case of vibration-based monitoring, real-time processing performance for tri-axes (horizontal, vertical and axial) vibration data are achieved for sampling rates of more than 100 kHz. With regards to energy consumption, based on the measured power consumption that also includes the power consumed during the FPGA’s configuration process, the operational lifetimes are estimated using a single cell battery (similar to an AA battery in terms of shape and size) with a typical capacity of 2600 mA. In the case of vibration-based condition monitoring, an operational lifetime of more than two years can be achieved for duty-cycle interval of 10 minutes or more. The achievable operational lifetime of image-based monitoring is more than 3 years for a duty-cycle interval of 5 minutes or more.

APA, Harvard, Vancouver, ISO, and other styles

22

Wang, Yuying. "Type-2 fuzzy probabilistic system for proactive monitoring of uncertain data-intensive seasonal time series." Thesis, De Montfort University, 2014. http://hdl.handle.net/2086/11059.

Full text

Abstract:

This research realises a type-2 fuzzy probabilistic system for proactive monitoring of uncertain data-intensive seasonal time series in both theoretical and practical implications. In this thesis, a new form of representation, J˜-plane, is proposed for concave and unnormalized type-2 fuzzy events as well as convex and normalized ones, which facilitates bridging the gaps between higher order fuzzy probability realizations and real world problems. Since J˜-plane representation, the investigation of type-2 fuzzy probability theory and the proposal of a type-2 fuzzy probabilistic system become possible. Based on J˜-plane representation, a new fuzzy systemmodel - a type-2 fuzzy probabilistic system is proposed incorporating probabilistic inference with type-2 fuzzy sets. A special case study, a type-2 fuzzy SARIMA system is proposed and experimented in forecasting singleton and uncertain non-singleton bench mark data - Mackey-Glass time series. The results show that the type-2 fuzzy SARIMA system has achieved significant improvements beyond its predecessors - the classical statistical model - SARIMA, type-1 and general type-2 fuzzy logic systems, no matter whether in the singleton or the non-singleton experiments, whereas a SARIMA model cannot forecast non-singleton data at all. The type-2 fuzzy SARIMA system is applied in a real world scenario - WSS CAPS proactive monitoring, and compared with the results of the statistical model SARIMA, type-1 and general type-2 fuzzy logic systems to show that, the type-2 fuzzy SARIMA system can monitor practical uncertain data-intensive seasonal time series proactively and accurately, whereas its predecessors - the statistical model SARIMA, type-1 and general type-2 fuzzy logic systems - cannot deal with this at all. As a series of concepts, algorithms, experiments, practical implements and comparisons prove that, a type-2 fuzzy probabilistic system is viable in practice which realises that type-2 fuzzy systems evolve from rule-based fuzzy systems to the systems incorporating probabilistic inference with type-2 fuzzy sets.

APA, Harvard, Vancouver, ISO, and other styles

23

Jiang, Wei. "A Map-Reduce-Like System for Programming and Optimizing Data-Intensive Computations on Emerging Parallel Architectures." The Ohio State University, 2012. http://rave.ohiolink.edu/etdc/view?acc_num=osu1343677821.

Full text

APA, Harvard, Vancouver, ISO, and other styles

24

Oluwaseun, Ajayi Olabode. "An evaluation of galaxy and ruffus-scripting workflows system for DNA-seq analysis." University of the Western Cape, 2018. http://hdl.handle.net/11394/6765.

Full text

Abstract:

>Magister Scientiae - MSc
Functional genomics determines the biological functions of genes on a global scale by using large volumes of data obtained through techniques including next-generation sequencing (NGS). The application of NGS in biomedical research is gaining in momentum, and with its adoption becoming more widespread, there is an increasing need for access to customizable computational workflows that can simplify, and offer access to, computer intensive analyses of genomic data. In this study, the Galaxy and Ruffus frameworks were designed and implemented with a view to address the challenges faced in biomedical research. Galaxy, a graphical web-based framework, allows researchers to build a graphical NGS data analysis pipeline for accessible, reproducible, and collaborative data-sharing. Ruffus, a UNIX command-line framework used by bioinformaticians as Python library to write scripts in object-oriented style, allows for building a workflow in terms of task dependencies and execution logic. In this study, a dual data analysis technique was explored which focuses on a comparative evaluation of Galaxy and Ruffus frameworks that are used in composing analysis pipelines. To this end, we developed an analysis pipeline in Galaxy, and Ruffus, for the analysis of Mycobacterium tuberculosis sequence data. Furthermore, this study aimed to compare the Galaxy framework to Ruffus with preliminary analysis revealing that the analysis pipeline in Galaxy displayed a higher percentage of load and store instructions. In comparison, pipelines in Ruffus tended to be CPU bound and memory intensive. The CPU usage, memory utilization, and runtime execution are graphically represented in this study. Our evaluation suggests that workflow frameworks have distinctly different features from ease of use, flexibility, and portability, to architectural designs.

APA, Harvard, Vancouver, ISO, and other styles

25

Teng, Sin Yong. "Intelligent Energy-Savings and Process Improvement Strategies in Energy-Intensive Industries." Doctoral thesis, Vysoké učení technické v Brně. Fakulta strojního inženýrství, 2020. http://www.nusl.cz/ntk/nusl-433427.

Full text

Abstract:

S tím, jak se neustále vyvíjejí nové technologie pro energeticky náročná průmyslová odvětví, stávající zařízení postupně zaostávají v efektivitě a produktivitě. Tvrdá konkurence na trhu a legislativa v oblasti životního prostředí nutí tato tradiční zařízení k ukončení provozu a k odstavení. Zlepšování procesu a projekty modernizace jsou zásadní v udržování provozních výkonů těchto zařízení. Současné přístupy pro zlepšování procesů jsou hlavně: integrace procesů, optimalizace procesů a intenzifikace procesů. Obecně se v těchto oblastech využívá matematické optimalizace, zkušeností řešitele a provozní heuristiky. Tyto přístupy slouží jako základ pro zlepšování procesů. Avšak, jejich výkon lze dále zlepšit pomocí moderní výpočtové inteligence. Účelem této práce je tudíž aplikace pokročilých technik umělé inteligence a strojového učení za účelem zlepšování procesů v energeticky náročných průmyslových procesech. V této práci je využit přístup, který řeší tento problém simulací průmyslových systémů a přispívá následujícím: (i)Aplikace techniky strojového učení, která zahrnuje jednorázové učení a neuro-evoluci pro modelování a optimalizaci jednotlivých jednotek na základě dat. (ii) Aplikace redukce dimenze (např. Analýza hlavních komponent, autoendkodér) pro vícekriteriální optimalizaci procesu s více jednotkami. (iii) Návrh nového nástroje pro analýzu problematických částí systému za účelem jejich odstranění (bottleneck tree analysis – BOTA). Bylo také navrženo rozšíření nástroje, které umožňuje řešit vícerozměrné problémy pomocí přístupu založeného na datech. (iv) Prokázání účinnosti simulací Monte-Carlo, neuronové sítě a rozhodovacích stromů pro rozhodování při integraci nové technologie procesu do stávajících procesů. (v) Porovnání techniky HTM (Hierarchical Temporal Memory) a duální optimalizace s několika prediktivními nástroji pro podporu managementu provozu v reálném čase. (vi) Implementace umělé neuronové sítě v rámci rozhraní pro konvenční procesní graf (P-graf). (vii) Zdůraznění budoucnosti umělé inteligence a procesního inženýrství v biosystémech prostřednictvím komerčně založeného paradigmatu multi-omics.

APA, Harvard, Vancouver, ISO, and other styles

26

Nilsson, Johanna, and Helena Roos. ""PDMS skapar flera nyanser av patientsäkerhet" : En kvalitativ intervjustudie om intensivvårdssjuksköterskors erfarenheter av att arbeta med ett Patient Data Management System." Thesis, Linnéuniversitetet, Institutionen för hälso- och vårdvetenskap (HV), 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:lnu:diva-75749.

Full text

Abstract:

Bakgrund: PDMS, Patient Data Management System, är ett kliniskt informationssystem som är speciellt utvecklat för intensivvård som genererar stora mängder patientdata. Systemet samlar automatiskt in patientdata från övervakning och medicinskteknisk apparatur och presenterar informationen på ett överskådligt sätt. Tidigare forskning belyser framförallt användbarheten, den minskade dokumentationstiden samt fördelar med läkemedelshanteringen men visar motsägelsefulla resultat vad gäller vad den frigjorda tiden används till. Syfte: att belysa intensivvårdssjuksköterskors erfarenhet av att använda PDMS i vårdarbetet. Metod: kvalitativ intervjustudie med intensivvårdssjuksköterskor som analyserats med hjälp av en kvalitativ innehållsanalys. Resultat: i resultatet framträder fem kategorier; patientnära vårdande, evidensbaserad vård, olika former av kvalitetsutveckling, säker vård och informatik. Dessa kategorier återger intensivvårdssjuksköterskornas erfarenheter av att arbeta med PDMS och resultatet visar tydligt att PDMS ökar patientsäkerheten på flera sätt. Konklusion: Ökad vårdkvalitet, minskad dokumentationstid, mer lättförståeligt kontinuerligt lärande för personalen, möjlighet till uppföljning och forskning samt säkrare läkemedelshantering anses vara de största vinsterna med PDMS. Sammantaget bidrar samtliga faktorer till en ökad patientsäkerhet.
Background: PDMS, Patient Data Management System, is a clinical information system specially developed for intensive care which generates a large amount of patient data. The system automatically collects patient data from monitoring and medical equipment and presents the information in a clear overall view. Previous research highlights especially usability of the system, reduced time spent on documentation and benefits with handling medications but shows contradictory results in terms of what the released time is used for. Aim: to highlight intensive care nurses experiences of using PDMS in nursing. Method: qualitative interview-study with intensive care nurses which was analyzed with qualitative content analysis. Results: five categories came forward in the result; close-to-patient care, evidece-based care, different forms of quality developement, safe care and informatics. These categories reflects the experiences of working with PDMS among intensive care nurses and the results clearly demonstrate that PDMS increases patient safety for several reasons. Conclusion: increased quality of care, reduced documentation time, more easy-to-understand continuous learning for the staff, opportunity to follow-up and posibility for reasearch and safer handling with medications are considered the biggest gains with PDMS. Overall, all factors contribute to increased patient safety.

APA, Harvard, Vancouver, ISO, and other styles

27

Callerström, Emma. "Clinicians' demands on monitoring support in an Intensive Care Unit : A pilot study, at Capio S:t Görans Hospital." Thesis, KTH, Skolan för teknik och hälsa (STH), 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-202541.

Full text

Abstract:

Patients treated at intensive care units (ICUs) are failing in one or several organs and requireappropriate monitoring and treatment in order to maintain a meaningful life. Today clinicians inintensive care units (ICUs) manage a large amount of data generated from monitoring devices.The monitoring parameters can either be noted down manually on a monitoring sheet or, for some parameters, transferred automatically to storage. In both cases the information is stored withthe aim to support clinicians throughout the intensive care and be easily accessible. Patient datamanagement systems (PDMSs) facilitate ICUs to retrieve and integrate data. Before managinga new configuration of patient data system, it is required that the ICU makes careful analysis ofwhat data desired to be registered. This pilot study provides knowledge of how the monitoringis performed in an Intensive Care Unit in an emergency hospital in Stockholm.The aim of this thesis project was to collect data about what the clinicians require and whatequipment they use today for monitoring. Requirement elicitation is a technique to collectrequirements. Methods used to collect data were active observations and qualitative interviews.Patterns have been found about what the assistant nurses, nurses and physicians’ require of systems supporting the clinician’s with monitoring parameters. Assistant nurses would like tobe released from tasks of taking notes manually. They also question the need for atomized datacollection since they are present observing the patient bed-side. Nurses describe a demanding burden of care and no more activities increasing that burden of care is required. Physicians require support in order to see how an intervention leads to a certain result for individual patients.The results also show that there is information about decision support but no easy way to applythem, better than the ones used today. Clinicians state that there is a need to be able to evaluatethe clinical work with the help of monitoring parameters. The results provide knowledge about which areas the clinicians needs are not supported enough by the exciting tools.To conclude results show that depending on what profession and experience the clinicians have the demands on monitoring support di↵ers. Monitoring at the ICU is performed while observing individual patients, parameters from medical devices, results from medical tests and physical examinations. Information from all these sources is considered by the clinicians and is desired to be supported accordingly before clinicians commit to action resulting in certain treatment,diagnosis and/or care.
Patienter som vårdas på intensivvårdsavdelningar har svikt i ett eller flera organ. Övervakning sker av patienterna för att kunna bidra till den vård som behövs för att upprätthålla ett meningsfullt liv. Idag hanterar sjukvårdpersonal en stor mängd data som genereras från övervakningsutrustning och system förknippade med övervakningsutrustning. Övervakningsparameterar kan antecknas förhand på ett övervakningspapper eller direkt sparas i digitalt format. Parameterarna sparas med syfte att vara ett lättillgängligt underlag under hela intensivvårdsprocessen. Patient data management systems (PDMSs) förenklar hämtning och integrering av data på exempelvis intensivvårdsavdelningar. Innan en ny konfiguration av ett patientdatasystem erhålls, är det eftersträvnadsvärt att intensivvårdsavdelningen analyserar vilken datasom skall hanteras. Detta examensarbete bidrog till kunskap om hur övervakning utförs på en intensivvårdsavdelning, på ett akutsjukhus i Stockholm. Målet med detta examensarbete var att insamla data om vad klinikerna behöver och vilken utrustning och system som de använder idag för att utföra övervakning. Behovsframkallning är en teknik som kan användas för att insamla krav. I detta projekt insamlades data genom aktivaobservationer och kvalitativa intervjuer. Mönster har hittats bland undersköterskornas, sjuksköterskornas och läkarnas behov av teknisksupport från system och utrustning som stödjer sjukvårdspersonalen under övervakningen av en patient. Undersköterskor uttrycker ett behov av att bli avlastade från uppgifter så som att manuellt skrivaner vitala parametervärden. De ifrågasätter behovet av automatiserad datahämtning eftersom de ständigt är närvarande bredvid patienten. Sjuksköterskor beskriver en hög vårdtyngd och önskaratt inte bli tillägnade fler aktiviteter som ökar den vårdtyngden. Läkare beskriver ett behov av ökat stöd för hur en interversion leder till resultat för individuella patienter. Resultaten visar attdet finns information om möjliga kliniska beslutsstöd utan givet sätt att applicera dessa, bättre än de sätt som används idag. Sjukvårdspersonalen hävdar att det det finns ett behov av att utvärdera det kliniska arbetet med hjälp av övervakningsparametrar. Resultaten utgör kunskap om vilka områden som sjukvårdpersonalens behov inte har stöd av nuvarnade verktyg. Resultaten visar att beroende på vilken profession och erfarenhet som sjukvårdspersonalen har, är behoven olika. På intensivvårdsavdelningen sker övervakning då enskilda patienter visuellt observeras såväl som övervakningsparametrar från medicintekniska produkter, resultat från medicinska tester och fysiska examinationer. Det finns behov att integrera och presenterainformation från dessa källor givet kunskap om att sjukvårdpersonalen fattar beslut på dessa som resulterar i behandling, diagnostik och/eller vård.

APA, Harvard, Vancouver, ISO, and other styles

28

Ortscheid, Julius, and Thomas Jensen. "Patient Data Management System (PDMS) : Anestesi- och intensivvårdspersonalens upplevelser av implementering och arbete med PDMS." Thesis, Linnéuniversitetet, Institutionen för hälso- och vårdvetenskap (HV), 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:lnu:diva-64031.

Full text

Abstract:

Titel: Patient Data Management System (PDMS) – Anestesi- och intensivvårdspersonalens upplevelser av implementering och arbete med PDMS. Bakgrund: Dagens och framtidens sjukvård innebär en ökande användning av digitala system i omvårdnaden. Patient Data Management System (PDMS) är ett kliniskt informationssystem och beslutsstöd som implementeras allt mer på svenska sjukhus. Tidigare forskning visar på skilda upplevelser av digitala systems påverkan på omvårdnaden, arbetsbelastning och tidsåtgång. Syfte: Syftet är att beskriva anestesi- och intensivvårdspersonalens upplevelser av implementering och arbete med PDMS. Metod: Studien genomfördes som en intervjustudie med kvalitativ ansats. Resultat: I resultatet framträder fyra teman, införandeprocessen, användarvänlighet, informationsöverföring samt patientsäkerhet. Dessa fyra teman skildrar vårdpersonalens upplevelser av införandet och arbetet med PDMS. Konklusion: PDMS implementeras på allt fler sjukhus i Sverige. Vårdpersonalen anser att det är mycket viktigt med information och utbildning inför implementeringen av PDMS. Helhetssynen på sjukhusets datasystem är viktigt då det framkommer att olika system inte alltid kommunicerar med varandra. Det leder till ökad arbetsbelastning och ökad risk för patientsäkerheten. Mer forskning om PDMS påverkan på omvårdnadsarbetet och patientsäkerheten behövs.
Title: Patient Data Management System (PDMS) – Anesthesia- and intensive care staff experiences of implementation and work with PDMS. Background: Todays and future healthcare means an increasing use of digital systems in nursing care. Patient Data Management System (PDMS) is a clinical information system and clinical decision support which is implemented in swedish hospitals. Previous research shows different experiences of digital systems impact on nursing care, workload and patient safety. Aim: The purpose was to describe anesthesia- and intensive care unit staff experiences of implementation and work with PDMS. Method: The study was conducted by interviews with a qualitative approach. Results: In the result four themes appear, process of introduction, serviceability, transfer of information and patient safety. The four themes depict the anesthesia- and intensive care unit staff experiences of the implementation and work with PDMS. Conclusion: PDMS is implemented in an increasing number of swedish hospitals. The anesthesia- and intensive care unit staff consider it very important with information and education before implementation of PDMS. The comprehensive view on the hospitals computer system is important due to the fact that these systems appear not to always be in synchronization with each other. That leads to an increased workload and also an increased risk regarding patient safety. More research on the PDMS impact on nursing and patient safety are needed.

APA, Harvard, Vancouver, ISO, and other styles

29

Brossier, David. "Élaboration et validation d'une base de données haute résolution destinée à la calibration d'un patient virtuel utilisable pour l'enseignement et la prise en charge personnalisée des patients en réanimation pédiatrique Perpetual and Virtual Patients for Cardiorespiratory Physiological Studies Creating a High-Frequency Electronic Database in the PICU: The Perpetual Patient Qualitative subjective assessment of a high-resolution database in a paediatric intensive care unit-Elaborating the perpetual patient's ID card Validation Process of a High-Resolution Database in a Pediatric Intensive Care Unit – Describing the Perpetual Patient’s Validation Evaluation of SIMULRESP©: a simulation software of child and teenager cardiorespiratory physiology." Thesis, Normandie, 2019. http://www.theses.fr/2019NORMC428.

Full text

Abstract:

La complexité des patients de réanimation justifie le recours à des systèmes d’aide à la décision thérapeutique. Ces systèmes rassemblent des protocoles automatisés de prise en charge permettant le respect des recommandations et des simulateurs physiologiques ou patients virtuels, utilisables pour personnaliser de façon sécuritaire les prises en charge. Ces dispositifs fonctionnant à partir d’algorithmes et d’équations mathématiques ne peuvent être développés qu’à partir d’un grand nombre de données de patients. Le principal objectif de cette thèse était la mise en place d’une base de données haute résolution automatiquement collectée de patients de réanimation pédiatrique dont le but sera de servir au développement et à la validation d’un simulateur physiologique : SimulResp© . Ce travail présente l’ensemble du processus de mise en place de la base de données, du concept jusqu’à son utilisation
The complexity of the patients in the intensive care unit requires the use of clinical decision support systems. These systems bring together automated management protocols that enable adherence to guidelines and virtual physiological or patient simulators that can be used to safely customize management. These devices operating from algorithms and mathematical equations can only be developed from a large number of patients’ data. The main objective of the work was the elaboration of a high resolution database automatically collected from critically ill children. This database will be used to develop and validate a physiological simulator called SimulResp© . This manuscript presents the whole process of setting up the database from concept to use

APA, Harvard, Vancouver, ISO, and other styles

30

Bailly, Sébastien. "Utilisation des antifongiques chez le patient non neutropénique en réanimation." Thesis, Université Grenoble Alpes (ComUE), 2015. http://www.theses.fr/2015GREAS013/document.

Full text

Abstract:

Les levures du genre Candida figurent parmi les pathogènes majeurs isolés chez les patients en soins intensifs et sont responsables d'infections systémiques : les candidoses invasives. Le retard et le manque de fiabilité du diagnostic sont susceptibles d'aggraver l'état du patient et d'augmenter le risque de décès à court terme. Pour respecter les objectifs de traitement, les experts recommandent de traiter le plus précocement possible les patients à haut risque de candidose invasive. Cette attitude permet de proposer un traitement précoce aux malades atteints, mais peut entraîner un traitement inutile et coûteux et favoriser l'émergence de souches de moindre sensibilité aux antifongiques utilisés.Ce travail applique des méthodes statistiques modernes à des données observationnelles longitudinales. Il étudie l'impact des traitements antifongiques systémiques sur la répartition des quatre principales espèces de Candida dans les différents prélèvements de patients en réanimation médicale, sur leur sensibilité à ces antifongiques, sur le diagnostic des candidémies ainsi que sur le pronostic des patients. Les analyses de séries de données temporelles à l'aide de modèles ARIMA (moyenne mobile autorégressive intégrée) ont confirmé l'impact négatif de l'utilisation des antifongiques sur la sensibilité des principales espèces de Candida ainsi que la modification de leur répartition sur une période de dix ans. L'utilisation de modèles hiérarchiques sur données répétées a montré que le traitement influence négativement la détection des levures et augmente le délai de positivité des hémocultures dans le diagnostic des candidémies. Enfin, l'utilisation des méthodes d'inférence causale a montré qu'un traitement antifongique préventif n'a pas d'impact sur le pronostic des patients non neutropéniques, non transplantés et qu'il est possible de commencer une désescalade précoce du traitement antifongique entre le premier et le cinquième jour après son initiation sans aggraver le pronostic
Candida species are among the main pathogens isolated from patients in intensive care units (ICUs) and are responsible for a serious systemic infection: invasive candidiasis. A late and unreliable diagnosis of invasive candidiasis aggravates the patient's status and increases the risk of short-term death. The current guidelines recommend an early treatment of patients with high risks of invasive candidiasis, even in absence of documented fungal infection. However, increased antifungal drug consumption is correlated with increased costs and the emergence of drug resistance whereas there is yet no consensus about the benefits of the probabilistic antifungal treatment.The present work used modern statistical methods on longitudinal observational data. It investigated the impact of systemic antifungal treatment (SAT) on the distribution of the four Candida species most frequently isolated from ICU patients', their susceptibilities to SATs, the diagnosis of candidemia, and the prognosis of ICU patients. The use of autoregressive integrated moving average (ARIMA) models for time series confirmed the negative impact of SAT use on the susceptibilities of the four Candida species and on their relative distribution over a ten-year period. Hierarchical models for repeated measures showed that SAT has a negative impact on the diagnosis of candidemia: it decreases the rate of positive blood cultures and increases the time to positivity of these cultures. Finally, the use of causal inference models showed that early SAT has no impact on non-neutropenic, non-transplanted patient prognosis and that SAT de-escalation within 5 days after its initiation in critically ill patients is safe and does not influence the prognosis

APA, Harvard, Vancouver, ISO, and other styles

31

Ramraj, Varun. "Exploiting whole-PDB analysis in novel bioinformatics applications." Thesis, University of Oxford, 2014. http://ora.ox.ac.uk/objects/uuid:6c59c813-2a4c-440c-940b-d334c02dd075.

Full text

Abstract:

The Protein Data Bank (PDB) is the definitive electronic repository for experimentally-derived protein structures, composed mainly of those determined by X-ray crystallography. Approximately 200 new structures are added weekly to the PDB, and at the time of writing, it contains approximately 97,000 structures. This represents an expanding wealth of high-quality information but there seem to be few bioinformatics tools that consider and analyse these data as an ensemble. This thesis explores the development of three efficient, fast algorithms and software implementations to study protein structure using the entire PDB. The first project is a crystal-form matching tool that takes a unit cell and quickly (< 1 second) retrieves the most related matches from the PDB. The unit cell matches are combined with sequence alignments using a novel Family Clustering Algorithm to display the results in a user-friendly way. The software tool, Nearest-cell, has been incorporated into the X-ray data collection pipeline at the Diamond Light Source, and is also available as a public web service. The bulk of the thesis is devoted to the study and prediction of protein disorder. Initially, trying to update and extend an existing predictor, RONN, the limitations of the method were exposed and a novel predictor (called MoreRONN) was developed that incorporates a novel sequence-based clustering approach to disorder data inferred from the PDB and DisProt. MoreRONN is now clearly the best-in-class disorder predictor and will soon be offered as a public web service. The third project explores the development of a clustering algorithm for protein structural fragments that can work on the scale of the whole PDB. While protein structures have long been clustered into loose families, there has to date been no comprehensive analytical clustering of short (~6 residue) fragments. A novel fragment clustering tool was built that is now leading to a public database of fragment families and representative structural fragments that should prove extremely helpful for both basic understanding and experimentation. Together, these three projects exemplify how cutting-edge computational approaches applied to extensive protein structure libraries can provide user-friendly tools that address critical everyday issues for structural biologists.

APA, Harvard, Vancouver, ISO, and other styles

32

Chiossi, Luca. "High-Performance Persistent Caching in Multi- and Hybrid- Cloud Environments." Master's thesis, Alma Mater Studiorum - Università di Bologna, 2020. http://amslaurea.unibo.it/20089/.

Full text

Abstract:

Il modello di lavoro noto come Multi Cloud sta emergendo come una naturale evoluzione del Cloud Computing per rispondere alle nuove esigenze di business delle aziende. Un tipico esempio è il modello noto come Cloud Ibrido dove si ha un Cloud Privato connesso ad un Cloud Pubblico per consentire alle applicazioni di scalare al bisogno e contemporaneamente rispondere ai bisogni di privacy, costi e sicurezza. Data la distribuzione dei dati su diverse strutture, quando delle applicazioni in esecuzione su un centro di calcolo devono utilizzare dati memorizzati remotamente, diventa necessario accedere alla rete che connette le diverse infrastrutture. Questo ha grossi impatti negativi su carichi di lavoro che consumano dati in modo intensivo e che di conseguenza vengono influenzati da ritardi dovuti alla bassa banda e latenza tipici delle connessioni di rete. Applicazioni di Intelligenza Artificiale e Calcolo Scientifico sono esempi di questo tipo di carichi di lavoro che, grazie all’uso sempre maggiore di acceleratori come GPU e FPGA, diventano capaci di consumare dati ad una velocità maggiore di quella con cui diventano disponibili. Implementare un livello di cache che fornisce e memorizza i dati di calcolo dal dispositivo di memorizzazione lento (remoto) a quello più veloce (ma costoso) dove i calcoli sono eseguiti, sembra essere la migliore soluzione per trovare il compromesso ottimale tra il costo dei dispositivi di memorizzazione offerti come servizi Cloud e la grande velocità di calcolo delle moderne applicazioni. Il sistema cache presentato in questo lavoro è stato sviluppato tenendo conto di tutte le peculiarità dei servizi di memorizzazione Cloud che fanno uso di API S3 per comunicare con i clienti. La soluzione proposta è stata ottenuta lavorando con il sistema di memorizzazione distribuito Ceph che implementa molti dei servizi caratterizzanti la semantica S3 ed inoltre, essendo pensato per lavorare su ambienti Cloud si inserisce bene in scenari Multi Cloud.

APA, Harvard, Vancouver, ISO, and other styles

33

Herodotou, Herodotos. "Automatic Tuning of Data-Intensive Analytical Workloads." Diss., 2012. http://hdl.handle.net/10161/5415.

Full text

Abstract:

Modern industrial, government, and academic organizations are collecting massive amounts of data ("Big Data") at an unprecedented scale and pace. The ability to perform timely and cost-effective analytical processing of such large datasets in order to extract deep insights is now a key ingredient for success. These insights can drive automated processes for advertisement placement, improve customer relationship management, and lead to major scientific breakthroughs.

Existing database systems are adapting to the new status quo while large-scale dataflow systems (like Dryad and MapReduce) are becoming popular for executing analytical workloads on Big Data. Ensuring good and robust performance automatically on such systems poses several challenges. First, workloads often analyze a hybrid mix of structured and unstructured datasets stored in nontraditional data layouts. The structure and properties of the data may not be known upfront, and will evolve over time. Complex analysis techniques and rapid development needs necessitate the use of both declarative and procedural programming languages for workload specification. Finally, the space of workload tuning choices is very large and high-dimensional, spanning configuration parameter settings, cluster resource provisioning (spurred by recent innovations in cloud computing), and data layouts.

We have developed a novel dynamic optimization approach that can form the basis for tuning workload performance automatically across different tuning scenarios and systems. Our solution is based on (i) collecting monitoring information in order to learn the run-time behavior of workloads, (ii) deploying appropriate models to predict the impact of hypothetical tuning choices on workload behavior, and (iii) using efficient search strategies to find tuning choices that give good workload performance. The dynamic nature enables our solution to overcome the new challenges posed by Big Data, and also makes our solution applicable to both MapReduce and Database systems. We have developed the first cost-based optimization framework for MapReduce systems for determining the cluster resources and configuration parameter settings to meet desired requirements on execution time and cost for a given analytic workload. We have also developed a novel tuning-based optimizer in Database systems to collect targeted run-time information, perform optimization, and repeat as needed to perform fine-grained tuning of SQL queries.

Dissertation

APA, Harvard, Vancouver, ISO, and other styles

34

Yu, Boyang. "On exploiting location flexibility in data-intensive distributed systems." Thesis, 2016. http://hdl.handle.net/1828/7602.

Full text

Abstract:

With the fast growth of data-intensive distributed systems today, more novel and principled approaches are needed to improve the system efficiency, ensure the service quality to satisfy the user requirements, and lower the system running cost. This dissertation studies the design issues in the data-intensive distributed systems, which are differentiated from other systems by the heavy workload of data movement and are characterized by the fact that the destination of each data flow is limited to a subset of available locations, such as those servers holding the requested data. Besides, even among the feasible subset, different locations may result in different performance. The studies in this dissertation improve the data-intensive systems by exploiting the data storage location flexibility. It addresses how to reasonably determine the data placement based on the measured request patterns, to improve a series of performance metrics, such as the data access latency, system throughput and various costs, by the proposed hypergraph models for data placement. To implement the proposal with a lower overhead, a sketch-based data placement scheme is presented, which constructs the sparsified hypergraph under a distributed and streaming-based system model, achieving a good approximation on the performance improvement. As the network can potentially become the bottleneck of distributed data-intensive systems due to the frequent data movement among storage nodes, the online data placement by reinforcement learning is proposed which intelligently determines the storage locations of each data item at the moment that the item is going to be written or updated, with the joint-awareness of network conditions and request patterns. Meanwhile, noticing that distributed memory caches are effective measures in lowering the workload to the backend storage systems, the auto-scaling of memory cache clusters is studied, which tries to balance the energy cost of the service and the performance ensured. As the outcome of this dissertation, the designed schemes and methods essentially help to improve the running efficiency of data-intensive distributed systems. Therefore, they can either help to improve the user-perceived service quality under the same level of system resource investment, or help to lower the monetary expense and energy consumption in maintaining the system under the same performance standard. From the two perspectives, both the end users and the system providers could obtain benefits from the results of the studies.
Graduate

APA, Harvard, Vancouver, ISO, and other styles

35

Khoshkbar, Foroushha Ali Reza. "Workload Modelling and Elasticity Management of Data-Intensive Systems." Phd thesis, 2018. http://hdl.handle.net/1885/154330.

Full text

Abstract:

Efficiently and effectively processing large volume of data (often at high velocity) using an optimal mix of data-intensive systems (e.g., batch processing, stream processing, NoSQL) is the key step in the big data value chain. Availability and affordability of these data-intensive systems as cloud managed services (e.g, AmazonElastic MapReduce, Amazon DynamoDB) have enabled data scientists and software engineers to deploy versatile data analytics flow applications, such as click-stream analysis and collaborative filtering with less efforts. Although easy to deploy, run-time performance and elasticity management of these complex data analytics flow applications has emerged as a major challenge. As we discuss later in this thesis, the data analytics flow applications combine multiple programming models for per-forming specialized and pre-defined set of activities, such as ingestion, analytics, and storage of data. To support users across such heterogeneous workloads where they are charged for every CPU cycle used and every data byte transferred in or out of the cloud datacenter, we need a set of intelligent performance and workload management techniques and tools. Our research methodology investigates and develops these techniques and tools by significantly extending the well known formal mod-els available from other disciplines of computer science including machine learning, optimization and control theory. To this end, this PhD dissertation makes the following core research contributions: a) investigates a novel workload prediction models (based on machine learn-ing techniques, such as Mixture Density Networks) to forecast how performance parameters of data-intensive systems are affected due to run-time variations in dataflow behaviours (e.g. data volume, data velocity, query mix) b) investigates control-theoretic approach for managing elasticity of data-intensive systems for ensuring the achievement of service level objectives. In the former (a), we propose a novel application of Mixture Density Networks in distribution-based resource and performance modelling of both stream and batch processing data-intensive systems. We argue that distribution-based resource and performance modelling approach, unlike the existing single point techniques, is able to predict the whole spectrum of resource usage and performance behaviours as probability distribution functions. Therefore, they provide more valuable statistical measures about the system performance at run-time. To demonstrate the usefulness of our technique, we apply it to undertake following workload management activities: i) predictable auto-scaling policy setting which highlights the potential of distribution prediction in consistent definition of cloud elasticity rules; and ii) designing a predictive admission controller which is able to efficiently admit or reject incoming queries based on probabilistic service level agreements compliance goals. In the latter (b), we apply advanced techniques in control and optimization theory, for designing an adaptive control scheme that is able to continuously detect and self-adapt to workload changes for meeting the users’ service level objectives. More-over, we also develop a workload management tool called Flower for end-to-end elasticity management of different data-intensive systems across the data analytics flows. Through extensive numerical and empirical evaluation we validate the pro-posed models, techniques and tools.

APA, Harvard, Vancouver, ISO, and other styles

36

Albanese, Ilijc. "Periodic Data Structures for Bandwidth-intensive Applications." Thesis, 2014. http://hdl.handle.net/1828/5851.

Full text

Abstract:

Current telecommunication infrastructure is undergoing significant changes. Such changes involve the type of traffic traveling through the network as well as the requirements imposed by the new traffic mix (e.g. strict delay control and low end-to-end delay). In this new networking scenario, the current infrastructure, which remained almost unchanged for the last several decades, is struggling to adapt, and its limitations in terms of power consumption, scalability, and economical viability have become more evident. In this dissertation we explore the potential advantages of using periodic data structures to handle efficiently bandwidth-intensive transactions, which constitute a significant portion of today's network traffic. We start by implementing an approach that can work as a standalone system aiming to provide the same advantages promised by all-optical approaches such as OBS and OFS. We show that our approach is able to provide similar advantages (e.g. energy efficiency, link utilization, and low computational load for the network hardware) while avoiding the drawbacks (e.g. use of optical buffers, inefficient resource utilization, and costly deployment), using commercially available hardware. Aware of the issues of large scale hardware redeployment, we adapt our approach to work within the current transport network architecture, reusing most of the hardware and protocols that are already in place, offering a more gradual evolutionary path, while retaining the advantages of our standalone system. We then apply our approach to Data Center Networks (DCNs), showing its ability to achieve significant improvements in terms of network performance stability, predictability, performance isolation, agility, and goodput with respect to popular DCN approaches. We also show our approach is able to work in concert with many proposed and deployed DCN architectures, providing DCNs with a simple, efficient, and versatile protocol to handle bandwidth-intensive applications within the DCs.
Graduate

APA, Harvard, Vancouver, ISO, and other styles

37

Borisov, Nedyalko Krasimirov. "Integrated Management of the Persistent-Storage and Data-Processing Layers in Data-Intensive Computing Systems." Diss., 2012. http://hdl.handle.net/10161/5806.

Full text

Abstract:

Over the next decade, it is estimated that the number of servers (virtual and physical) in enterprise datacenters will grow by a factor of 10, the amount of data managed by these datacenters will grow by a factor of 50, and the number of files the datacenter has to deal with will grow by a factor of 75. Meanwhile, skilled information technology (IT) staff to manage the growing number of servers and data will increase less than 1.5 times. Thus, a system administrator will face the challenging task of managing larger and larger numbers of production systems. We have developed solutions to make the system administrator more productive by automating some of the hard and time-consuming tasks in system management. In particular, we make new contributions in the Monitoring, Problem Diagnosing, and Testing phases of the system management cycle.

We start by describing our contributions in the Monitoring phase. We have developed a tool called Amulet that can continuously monitor and proactively detect problems on production systems. A notoriously hard problem that Amulet can detect is that of data corruption where bits of data in persistent storage differ from their true values. Once a problem is detected, our DiaDS tool helps in diagnosing the cause of the problem. DiaDS uses a novel combination of machine learning techniques and domain knowledge encoded in a symptoms database to guide the system administrator towards the root cause of the problem.

Before applying any change (e.g., changing a configuration parameter setting) to the production system, the system administrator needs to thoroughly understand the effect that this change can have. Well-meaning changes to production systems have led to performance or availability problems in the past. For this phase, our Flex tool enables administrators to evaluate the change hypothetically in a manner that is fairly accurate while avoiding overheads on the production system. We have conducted a comprehensive evaluation of Amulet, DiaDS, and Flex in terms of effectiveness, efficiency, integration of these contributions in the system management cycle, and how these tools bring data-intensive computing systems closer the goal of self-managing systems.

Dissertation

APA, Harvard, Vancouver, ISO, and other styles

38

"EIS for ICU: information requirements determination." 1997. http://library.cuhk.edu.hk/record=b5889218.

Full text

Abstract:

by Leung Ho-Yin.
Thesis (M.Phil.)--Chinese University of Hong Kong, 1997.
Includes bibliographical references (leaves 82-89).
Abstract --- p.ii
Table of Contents --- p.iv
LIST of Figures --- p.viii
List of Tables --- p.ix
Acknowledgments --- p.xi
Chapter
Chapter 1. --- Introduction --- p.1
Chapter 1.1 --- Intensive Care Unit --- p.2
Chapter 1.1.1 --- Expensive Costs of Intensive Care --- p.2
Chapter 1.1.2 --- Tremendous Demands with Limited Resources --- p.3
Chapter 1.1.3 --- Conflicting Roles of ICU Physicians --- p.3
Chapter 1.1.4 --- Disorganized Patient Information --- p.4
Chapter 1.2 --- ICU Management Problems --- p.5
Chapter 1.3 --- Executive Information Systems (EIS) for ICU Physician --- p.6
Chapter 1.4 --- Determine Information Requirements of the EIS --- p.7
Chapter 1.5 --- Scope of the Study --- p.8
Chapter 1.6 --- Organization of the Report --- p.8
Chapter 2. --- Literature Review --- p.9
Chapter 2.1 --- Intensive Care Unit --- p.9
Chapter 2.1.1 --- Costs of ICU --- p.10
Chapter 2.2 --- ICU Physicians are Executives --- p.10
Chapter 2.3 --- Computers in ICU --- p.11
Chapter 2.3.1 --- Record Keeping --- p.11
Chapter 2.3.2 --- Data Management --- p.12
Chapter 2.3.3 --- Decision Making --- p.13
Chapter 2.4 --- Problems Facing ICU Physicians --- p.14
Chapter 2.4.1 --- Conflicting Role --- p.14
Chapter 2.4.2 --- Information Overload --- p.14
Chapter 2.4.3 --- Poor Information Quality --- p.15
Chapter 2.4.4 --- Technophobia --- p.16
Chapter 2.5 --- Executive Information Systems --- p.16
Chapter 2.5.1 --- Definition --- p.16
Chapter 2.5.2 --- Characteristics of EIS --- p.17
Chapter 2.5.3 --- EIS in Healthcare Industry --- p.20
Chapter 2.6 --- Determining Information Requirements --- p.20
Chapter 2.6.1 --- Strategies and Methods to Determine Information Requirements --- p.21
Chapter 2.6.2 --- Critical Success Factors Analysis --- p.25
Chapter 2.6.2.1 --- Definition of CSFs --- p.26
Chapter 2.6.2.2 --- Different Executives Have Different CSFs and Different Information Needs --- p.26
Chapter 2.6.2.3 --- Hierarchical Nature of CSFs --- p.26
Chapter 2.6.2.4 --- Steps in the CSFs Approach --- p.28
Chapter 2.6.2.5 --- "Critical Information, Assumptions, and Decisions" --- p.29
Chapter 3. --- Research Methodology --- p.31
Chapter 3.1 --- Literature Review --- p.31
Chapter 3.2 --- Design a Methodology for Information Requirements Determination --- p.32
Chapter 3.3 --- ICU Admission Case Study --- p.34
Chapter 3.4 --- Analysis and Validation --- p.35
Chapter 3.5 --- COPD Survey: The Importance of Medical History --- p.36
Chapter 3.5.1 --- Chronic Obstructive Pulmonary Disease --- p.36
Chapter 3.5.2 --- The Survey --- p.38
Chapter 4. --- A Three-Stage Methodology --- p.41
Chapter 4.1 --- Stage 1 - Understanding ICU Operations --- p.42
Chapter 4.2 --- Stage 2 - Determine CSFs within the ICU --- p.43
Chapter 4.2.1 --- CSFs Analysis Steps in the Study --- p.44
Chapter 4.2.2 --- Step 1: Determine CSFs of ICUs --- p.44
Chapter 4.2.3 --- Step 2: Determine CSFs of the ICU Physicians --- p.45
Chapter 4.2.4 --- Step 3: Determine CSFs of the ICU Admission --- p.45
Chapter 4.3 --- Stage 3 譯 Determine Information Requirements --- p.45
Chapter 4.4 --- Importance of Medical History: A COPD Survey --- p.46
Chapter 4.4.1 --- COPD Questionnaire --- p.46
Chapter 5. --- Findings --- p.48
Chapter 5.1 --- Findings in Stage 1 --- p.48
Chapter 5.1.1 --- Decision Making in ICU --- p.49
Chapter 5.2 --- Findings in Stage 2 - CSFs --- p.54
Chapter 5.2.1 --- CSFs of the ICU --- p.54
Chapter 5.2.2 --- CSFs of the ICU Physicians --- p.56
Chapter 5.2.3 --- CSFs of the ICU Admission --- p.56
Chapter 5.3 --- Findings in Stage 3 --- p.58
Chapter 5.3.1 --- Types of Information Requirement --- p.58
Chapter 5.3.2 --- Detailed Contents of the Information Requirements --- p.59
Chapter 6. --- Analysis --- p.65
Chapter 6.1 --- A Three-Stage Methodology for Information Requirements Determination --- p.65
Chapter 6.1.1 --- Comparison of the Three-Stage Methodology with CSFs Analysis --- p.66
Chapter 6.1.2 --- A Case Study Using the Three-Stage Methodology --- p.67
Chapter 6.2 --- Roles of Information Types in Admission Decision --- p.68
Chapter 6.2.1 --- Admitting Patients from Different Sources --- p.69
Chapter 6.2.2 --- Admitting Patients with Different Diseases --- p.70
Chapter 6.3 --- The Importance of Medical History --- p.71
Chapter 7 --- Conclusions --- p.78
Bibliography --- p.82
Interviews --- p.90
Appendices --- p.91

APA, Harvard, Vancouver, ISO, and other styles

39

"Executive information systems (EIS): its roles in decision making on patients' discharge in intensive care unit." Chinese University of Hong Kong, 1995. http://library.cuhk.edu.hk/record=b5888309.

Full text

Abstract:

by Chow Wai-hung.
Thesis (M.B.A.)--Chinese University of Hong Kong, 1995.
Includes bibliographical references (leaves 56-57).
ABSTRACT --- p.ii
TABLE OF CONTENTS --- p.iv
LIST OF FIGURES --- p.vi
LIST OF TABLES --- p.vii
ACKNOWLEDGMENT --- p.viii
Chapter
Chapter I. --- INTRODUCTION --- p.1
Intensive Care Services --- p.1
Clinician as an Information Processor --- p.2
Executive Information System (EIS) for Intensive Care Services --- p.7
Scope of the Study --- p.7
The Organization of the Remaining Report --- p.8
Chapter II. --- LITERATURE REVIEW --- p.9
Sickness Scoring Systems --- p.9
Executive Information Systems (EIS) --- p.15
Information Requirements Determination for EIS --- p.17
Future Direction of EIS in Intensive Care --- p.20
Chapter III. --- RESEARCH METHODOLOGY --- p.22
Survey by Mailed Questionnaire --- p.23
Personal Interview --- p.24
Subjects Selection --- p.26
Analysis --- p.27
Chapter IV. --- RESULTS AND FINDINGS --- p.28
Part 1 - Questionnaires --- p.29
Part 2 - Interviews --- p.31
Chapter V. --- ANALYSIS AND DISCUSSION --- p.44
Analysis of Results and Findings --- p.44
Evaluation on Information Requirements Determination for an EIS --- p.50
Chapter VI. --- CONCLUSION --- p.52
Chapter VII. --- FUTURE DIRECTION OF DECISION SUPPORT IN CRITICAL CARE --- p.54
REFERENCES --- p.56
INTERVIEWS --- p.59
APPENDIX --- p.60
Chapter 1. --- A Sample of Hospital Information System Requirement Survey Questionnaire --- p.61
Chapter 2. --- Samples of Visual Display --- p.67
Chapter 3. --- A Sample of Format of a Structured Report --- p.70

APA, Harvard, Vancouver, ISO, and other styles

40

Hübert, Heiko [Verfasser]. "MEMTRACE: a memory, performance and energy profiler targeting RISC-based embedded systems for data intensive applications / von Heiko Hübert." 2009. http://d-nb.info/995210012/34.

Full text

APA, Harvard, Vancouver, ISO, and other styles

41

Braga, André Filipe Gonçalves Névoa Fernandes. "Pervasive patient timeline." Master's thesis, 2015. http://hdl.handle.net/1822/40094.

Full text

Abstract:

Dissertação de mestrado integrado em Engenharia e Gestão de Sistemas de Informação
Em Medicina Intensiva, a apresentação de informação médica nas Unidades de Cuidados Intensivos (UCI) é feita de diversas formas (gráficos, tabelas, texto, …), pois depende do tipo de análises realizadas, dos dados recolhidos em tempo real pelos sistemas de monitorização, entre outros. A forma como é apresentada a informação pode dificultar a leitura da condição clínica dos doentes por parte dos profissionais de saúde, principalmente quando há a necessidade de um cruzamento entre vários tipos de dados clínicos/fontes de informação. A evolução das tecnologias para novos padrões como a ubiquidade e o pervasive torna possível a recolha e o armazenamento de vários tipos de informação, possibilitando um acesso em temporeal sem restrições de espaço e tempo. A representação de timelines em papel transformou-se em algo desatualizado e por vezes inutilizável devido às diversas vantagens da representação em formato digital. O uso de Sistemas de Apoio à Decisão Clínica (SADC) em UCI não é uma novidade, sendo que a sua principal função é facilitar o processo de tomada de decisão dos profissionais de saúde. No entanto, a associação de timelines a SADC, com o intuito de melhorar a forma como a informação é apresentada, é uma abordagem inovadora, especialmente nas UCI. Este trabalho procura explorar uma nova forma de apresentar a informação relativa aos doentes, tendo por base o espaço temporal em que os eventos ocorrem. Através do desenvolvimento de uma Pervasive Patient Timeline interativa, os profissionais de saúde terão acesso a um ambiente, em tempo real, onde podem consultar o historial clínico dos doentes, desde a sua admissão na unidade de cuidados intensivos até ao momento da alta. Torna-se assim possível visualizar os dados relativos a sinais vitais, análises clínicas, entre outros. A incorporação de modelos de Data Mining (DM) produzidos pelo sistema INTCare é também uma realidade possível, tendo neste âmbito sido induzidos modelos de DM para a previsão da toma de vasopressores, que foram incorporados na Pervasive Patient Timeline. Deste modo os profissionais de saúde passam assim a ter uma nova plataforma capaz de os ajudar a tomarem decisões de uma forma mais precisa.
In Intensive Care Medicine, the presentation of medical information in the Intensive Care Units (ICU) is done in many shapes (graphics, tables, text,…). It depends on the type of exams executed, the data collected in real time by monitoring systems, among others. The way in which information is presented can make it difficult for health professionals to read the clinical condition of patients. When there is the need to cross between several types of clinical data/information sources the situation is even worse. The evolution of technologies for emerging standards such as ubiquity and pervasive makes it possible to gather and storage various types of information, thus making it available in real time and anywhere. Also with the advancement of technologies, the representation of timelines on paper turned into something outdated and sometimes unusable due to the many advantages of representation in digital format. The use of Clinical Decision Support Systems (CDSS) is not a novelty, and its main function is to facilitate the decision-making process, through predictive models, continuous information monitoring, among others. However, the association of timelines to CDSS, in order to improve the way information is presented, is an innovative approach, especially in the ICU. This work seeks to explore a new way of presenting information about patients, based on the time frame in which events occur. By developing an interactive Pervasive Patient Timeline, health professionals will have access to an environment in real time, where they can consult the medical history of patients. The medical history will be available from the moment in which patients are admitted in the ICU until their discharge, allowing health professionals to analyze data regarding vital signs, medication, exams, among others. The incorporation of Data Mining (DM) models produced by the INTCare system is also a reality, and in this context, DM models were induced for predicting the intake of vasopressors, which were incorporated in Pervasive Patient Timeline. Thus health professionals will have a new platform that can help them to make decisions in a more accurate manner.

APA, Harvard, Vancouver, ISO, and other styles

42

Ribeiro, Ana Catarina Vieira. "Previsão dos fatores de risco e caracterização de doentes internados nos cuidados intensivos." Master's thesis, 2016. http://hdl.handle.net/1822/54545.

Full text

Abstract:

Dissertação de mestrado integrado em Engenharia e Gestão de Sistemas de Informação
A Medicina Intensiva (MI) é uma das áreas mais críticas da Medicina. A sua característica multidisciplinar torna-a muito abrangente, reunindo todo o tipo de profissionais de saúde, bem como um local com equipamentos e condições especiais, denominadas Unidades de Cuidados Intensivos (UCI). Tendo em conta o seu ambiente crítico torna-se evidente a necessidade de prever admissões às UCI, pois, para além de constituírem custos adicionais para as instituições e ocuparem recursos desnecessariamente, admissões não planeadas são arriscadas para os doentes que se encontram debilitados. Ao longo dos anos os Sistemas de Informação (SI) têm acompanhando o desenvolvimento da Medicina, tornando-se instrumentos imprescindíveis para o tratamento de doentes, sobretudo através dos Sistemas de Apoio à Decisão (SAD) que apresentam as informações pertinentes sobre os doentes, sem necessidade análise manual de dados. Deste modo, a utilização de SAD na Medicina é crucial, principalmente na MI, em que as decisões têm, muito frequentemente, de ser tomadas com celeridade sempre no melhor interesse do doente. Um SAD pode ser constituído por diferentes técnicas, como é o caso do Data Mining (DM). A presente dissertação envolve descoberta de conhecimento em bases de dados extraídas a partir do sistema de apoio à decisão INTCare, localizado no Centro Hospitalar do Porto (CHP). Foi utilizado um conjunto de técnicas de DM, nomeadamente Clustering e Classificação, tendo por base diferentes algoritmos e métricas de avaliação. Assim foram descobertos padrões naturais nos dados, nomeadamente através da formação de dois grupos de características (Clusters) dos doentes internados em UCI e identificando os atributos mais críticos nestes Clusters. Além disso, foram obtidas previsões com cerca de 97% de capacidade de acertar nos doentes internados (sensibilidade) e que, apesar de criar demasiados Falsos Positivos (63% de especificidade), permitiu obter modelos que permitam que os médicos possam agir de forma proactiva e preventiva, tendo sido esta uma das principais motivações desta dissertação. A presente dissertação serviu para aumentar o número de estudos que aplicam técnicas de DM em MI, particularmente para realização de previsão de internamentos em UCI. Deste modo, contribui-se com conhecimento para a comunidade científica não só de DM, mas também para a Medicina, de modo a potenciar o processo de tomada de decisão médica e na procura pela melhoria dos serviços prestados aos doentes.
Intensive Medicine is one of the most critical areas of medicine. Its multidisciplinary feature makes it a very wide area that gathers all kinds of health professionals as well as a place with special equipment and conditions known as Intensive Care Unit. Having in account its critical environment it becomes evident the need to forecast Intensive Care Unit admissions because, besides being additional costs for institutions and occupy resources unnecessarily, unplanned admissions are risky for patients who are debilitated. Over the years, Information Systems are accompanying the development of medicine and have become essentials instruments for the treatment of patients especially using Clinical Systems Decision Support that have relevant information about patients without the need to manually analyse clinical data. Therefore, the use of DSS is crucial in medicine, particularly in the IM in which decisions must very often be taken speedily always in the best interest of the patient. This Decision Support Systems may be constituted by different techniques such as Data Mining (DM). This dissertation involves knowledge discovery in databases extracted from the Clinical Decision Support System being used in Centro Hospital do Porto (CHP) and named INTCare System. It was used a set of DM and rating techniques including clustering and classification which are based on different algorithms and evaluation metrics. Thereby, natural patterns were discovered in the data particularly through the formation of two groups of characteristics (clusters) of patients admitted to Intensive Care Unit and through the identification the most critical attributes in these clusters. Moreover, it was obtained predictions with approximately 97% of ability to get properly forecast admissions to Intensive Care Unit (Sensitivity) and despite creating too many false positives (63% specificity) it also created models that allow doctors to act proactively and preventively which is one of the main motivations of this dissertation. This dissertation served to increase the number of studies that apply DM techniques in Intensive Medicine particularly for performing predictions of admissions to Intensive Care Units. Thus, knowledge was created for the scientific community not only of DM, but also of medicine in order to promote the process of clinical decision-making and to improve services rendered to patients.

APA, Harvard, Vancouver, ISO, and other styles

43

(7022108), Gowtham Kaki. "Automatic Reasoning Techniques for Non-Serializable Data-Intensive Applications." Thesis, 2019.

Find full text

Abstract:

The performance bottlenecks in modern data-intensive applications have induced database implementors to forsake high-level abstractions and trade-off simplicity and ease of reasoning for performance. Among the first casualties of this trade-off are the well-known ACID guarantees, which simplify the reasoning about concurrent database transactions. ACID semantics have become increasingly obsolete in practice due to serializable isolation – an integral aspect of ACID, being exorbitantly expensive. Databases, including the popular commercial offerings, default to weaker levels of isolation where effects of concurrent transactions are visible to each other. Such weak isolation guarantees, however, are extremely hard to reason about, and have led to serious safety violations in real applications. The problem is further complicated in a distributed setting with asynchronous state replications, where high availability and low latency requirements compel large-scale web applications to embrace weaker forms of consistency (e.g., eventual consistency) besides weak isolation. Given the serious practical implications of safety violations in data-intensive applications, there is a pressing need to extend the state-of-the-art in program verification to reach non- serializable data-intensive applications operating in a weakly-consistent distributed setting.

This thesis sets out to do just that. It introduces new language abstractions, program logics, reasoning methods, and automated verification and synthesis techniques that collectively allow programmers to reason about non-serializable data-intensive applications in the same way as their serializable counterparts. The contributions

xi

made are broadly threefold. Firstly, the thesis introduces a uniform formal model to reason about weakly isolated (non-serializable) transactions on a sequentially consistent (SC) relational database machine. A reasoning method that relates the semantics of weak isolation to the semantics of the database program is presented, and an automation technique, implemented in a tool called ACIDifier is also described. The second contribution of this thesis is a relaxation of the machine model from sequential consistency to a specifiable level of weak consistency, and a generalization of the data model from relational to schema-less or key-value. A specification language to express weak consistency semantics at the machine level is described, and a bounded verification technique, implemented in a tool called Q9 is presented that bridges the gap between consistency specifications and program semantics, thus allowing high-level safety properties to be verified under arbitrary consistency levels. The final contribution of the thesis is a programming model inspired by version control systems that guarantees correct-by-construction replicated data types (RDTs) for building complex distributed applications with arbitrarily-structured replicated state. A technique based on decomposing inductively-defined data types into characteristic relations is presented, which is used to reason about the semantics of the data type under state replication, and eventually derive its correct-by-construction replicated variant automatically. An implementation of the programming model, called Quark, on top of a content-addressable storage is described, and the practicality of the programming model is demonstrated with help of various case studies.

APA, Harvard, Vancouver, ISO, and other styles

44

Yu-TangHuang and 黃昱棠. "A Memcached-Based Inter-Framework Caching System for Multi-Layer Data-Intensive Computing." Thesis, 2014. http://ndltd.ncl.edu.tw/handle/nf48e2.

Full text

Abstract:

碩士
國立成功大學
電腦與通信工程研究所
102
In the age of information explosion, the conventional computing platforms cannot deal with the huge amount of data. MapReduce is a parallel distributed framework that is proposed by google. It is used for processing data-intensive computing. Hadoop implemented the MapReduce framework and Hadoop Distributed File System cluster to process large amounts of data. Nowadays, a lot of research organizations and enterprises each build their own Hadoop platform to process large-scale data. Various frameworks have been proposed according to different requirements. For example, Storm is used to deal with streaming data, Spark is used to interactive query. Therefore, fast data access and transport of same or different frameworks have become an important topic. In this thesis, we propose a system that improves the Hadoop 2.0 framework called ” Inter-Framework Caching ” .The purpose of this thesis is that we provide an inter-framework distributed cache storage system to speed up data access and transport , it can reduce the disk access frequency and improve the performance.

APA, Harvard, Vancouver, ISO, and other styles

45

Armstrong, Hannah Marie. "Evaluation of an Intensive Data Collection System for Tennessee Surface Water Quality Assessment and Watershed Model Calibration." 2011. http://trace.tennessee.edu/utk_gradthes/948.

Full text

Abstract:

Water quality regulators, such as the Tennessee Department of Environment and Conservation, are challenged by data scarcity when identifying surface water quality impairment causes and pollutant sources. Surface water quality model users also seek to identify pollutant sources and design and place best management practices to efficiently improve water quality, but have insufficient data for model calibration. This research documents the design and evaluation of a novel, intensive water quality data collection system consisting of a automatic sampler, bi-weekly grab sampling, and a long term deployment sonde. System design characteristics that were emphasized included a focus on gathering data for common impairment causes (pathogens, siltation, nutrients, and dissolved oxygen-DO) and water quality criteria not currently being evaluated (pH and temperature rate of change and diurnal DO fluctuations). In addition, the system was designed to gather data for watershed model calibration in rural, un-gauged watersheds because agriculture is listed as the predominant source of water quality impairment in Tennessee. Thus, the system was unmanned to reduce labor input, self-powered because of limited access to the electrical grid, provided sample preservation (refrigeration at low pH), and included stage measurement. Two identical prototype systems were installed in adjacent ecoregion 67g watersheds in Greene County, Tennessee: Lick Creek, impaired for pathogens, nutrients, and low DO, and Little Chucky Creek, which is unimpaired and a former ecoregion reference stream. The two primary objectives of this research were to evaluate the system power demand and determine whether a large water quality dataset improved impairment cause and source identification. A 270 watt solar panel power supply ultimately failed at Lick Creek during the summer when the refrigerated sampler cooling demand peaked, but was sufficient at Little Chucky Creek. System power supply design equations are provided, but with optimization the power supply used would likely be sufficient. The data collected did significantly improve insight into impairment cause identification. For example, total phosphorus rather than total nitrogen concentrations and low DO appeared to be a potential cause of impairment at Lick Creek. The system design was reliable and could be used to calibrate watershed models to improve source assessment.

APA, Harvard, Vancouver, ISO, and other styles

46

Portela, Filipe. "Pervasive intelligent decision support in critical health care." Doctoral thesis, 2013. http://hdl.handle.net/1822/27792.

Full text

Abstract:

Tese de doutoramento (área de especialização em Tecnologias e Sistemas de Informação)
Intensive Care Units (ICU) are recognized as being critical environments, due to the fact that patients admitted to these units typically find themselves in situations of organ failure or serious health conditions. ICU professionals (doctors and nurses) dedicate most of their time taking care for the patients, relegating to a second plan all documentation tasks. Tasks such as recording vital signs, treatment planning and calculation of indicators, are only performed when patients are in a stable clinical condition. These records can occur with a lag of several hours. Since this is a critical environment, the Process of Decision Making (PDM) has to be fast, objective and effective. Any error or delay in the implementation of a particular decision may result in the loss of a human life. Aiming to minimize the human effort in bureaucratic processes and improve the PDM, dematerialization of information is required, eliminating paper-based recording and promoting an automatic registration of electronic and real-time data of patients. These data can then be used as a complement to the PDM, e.g. in Decision Support Systems that use Data Mining (DM) models. At the same time it is important for PDM to overcome barriers of time and space, making the platforms as universal as possible, accessible anywhere and anytime, regardless of the devices used. In this sense, it has been observed a proliferation of pervasive systems in healthcare. These systems are focused on providing healthcare to anyone, anytime and anywhere by removing restrictions of time and place, increasing both the coverage and quality of health care. This approach is mainly based on information that is stored and available online. With the aim of supporting the PDM a set of tests were carried out using static DM models making use of data that had been collected and entered manually in Euricus database. Preliminary results of these tests showed that it was possible to predict organ failure and outcome of a patient using DM techniques considering a set of physiological and clinical variables as input. High rates of sensitivity were achieved: Cardiovascular - 93.4%; Respiratory - 96.2%; Renal - 98.1%; Liver - 98.3%; hematologic - 97.5%; and Outcome and 98.3%. Upon completion of this study a challenge emerged: how to achieve the same results but in a dynamic way and in real time? A research question has been postulated as: "To what extent, Intelligent Decision Support Systems (IDSS) may be appropriate for critical clinical settings in a pervasive way? “. Research work included: 1. To percept what challenges a universal approach brings to IDSS, in the context of critical environments; 2. To understand how pervasive approaches can be adapted to critical environments; 3. To develop and test predictive models for pervasive approaches in health care. The main results achieved in this work made possible: 1. To prove the adequacy of pervasive approach in critical environments; 2. To design a new architecture that includes the information requirements for a pervasive approach, able to automate the process of knowledge discovery in databases; 3. To develop models to support pervasive intelligent decision able to act automatically and in real time. To induce DM ensembles in real time able to adapt autonomously in order to achieve predefined quality thresholds (total error < = 40 %, sensitivity > = 85 % and accuracy > = 60 %). Main contributions of this work include new knowledge to help overcoming the requirements of a pervasive approach in critical environments. Some barriers inherent to information systems, like the acquisition and processing of data in real time and the induction of adaptive ensembles in real time using DM, have been broken. The dissemination of results is done via devices located anywhere and anytime.
As Unidades de Cuidados Intensivos (UCIs) são conhecidas por serem ambientes críticos, uma vez que os doentes admitidos nestas unidades encontram-se, tipicamente, em situações de falência orgânica ou em graves condições de saúde. Os profissionais das UCIs (médicos e enfermeiros) dedicam a maioria do seu tempo no cuidado aos doentes, relegando para segundo plano todas as tarefas relacionadas com documentação. Tarefas como o registo dos sinais vitais, o planeamento do tratamento e o cálculo de indicadores são apenas realizados quando os doentes se encontram numa situação clínica estável. Devido a esta situação, estes registos podem ocorrer com um atraso de várias horas. Dado que este é um ambiente crítico, o Processo de Tomada de Decisão (PTD) tem de ser rápido, objetivo e eficaz. Qualquer erro ou atraso na implementação de uma determinada decisão pode resultar na perda de uma vida humana. Com o intuito de minimizar os esforços humanos em processos burocráticos e de otimizar o PTD, é necessário proceder à desmaterialização da informação, eliminando o registo em papel, e promover o registo automático e eletrónico dos dados dos doentes obtidos em tempo real. Estes dados podem, assim, ser usados com um complemento ao PTD, ou seja, podem ser usados em Sistemas de Apoio à Decisão que utilizem modelos de Data Mining (DM). Ao mesmo tempo, é imperativo para o PTD superar barreiras ao nível de tempo e espaço, desenvolvendo plataformas tão universais quanto possíveis, acessíveis em qualquer lugar e a qualquer hora, independentemente dos dispositivos usados. Nesse sentido, tem-se verificado uma proliferação dos sistemas pervasive na saúde. Estes sistemas focam-se na prestação de cuidados de saúde a qualquer pessoa, a qualquer altura e em qualquer lugar através da eliminação das restrições ao nível do tempo e espaço, aumentando a cobertura e a qualidade na área da saúde. Esta abordagem é, principalmente, baseada em informações que estão armazenadas disponíveis online. Com o objetivo de suportar o PTD, foi realizado um conjunto de testes com modelos de DM estáticos, recorrendo a dados recolhidos e introduzidos manualmente na base de dados “Euricus”. Os resultados preliminares destes testes mostraram que era possível prever a falência orgânica ou a alta hospitalar de um doente, através de técnicas de DM utilizando como valores de entrada um conjunto de variáveis clínicas e fisiológicas. Nos testes efetuados, foram obtidos elevados níveis de sensibilidade: cardiovascular - 93.4%; respiratório - 96.2%; renal - 98.1%; hepático - 98.3%; hematológico - 97.5%; e alta hospitalar - 98.3%. Com a finalização deste estudo, observou-se o aparecimento de um novo desafio: como alcançar os mesmos resultados mas em modo dinâmico e em tempo real? Uma questão de investigação foi postulada: “Em que medida os Sistemas de Apoio à Decisão Inteligentes (SADIs) podem ser adequados às configurações clínicas críticas num modo pervasive?”. Face ao exposto, o trabalho de investigação inclui os seguintes pontos: 1. Perceber quais os desafios que uma abordagem universal traz para os SADIs, no contexto dos ambientes críticos; 2. Compreender como as abordagens pervasive podem ser adaptadas aos ambientes críticos; 3. Desenvolver e testar modelos de previsão para abordagens pervasive na área da saúde. Os principais resultados alcançados neste trabalho tornaram possível: 1. Provar a adequação da abordagem pervasive em ambientes críticos; 2. Conceber uma nova arquitetura que inclui os requisitos de informação para uma abordagem pervasive, capaz de automatizar o processo de descoberta de conhecimento em base de dados; 3. Desenvolver modelos de suporte à decisão inteligente e pervasive, capazes de atuar automaticamente e em tempo real. Induzir ensembles DM em tempo real, capazes de se adaptarem de forma autónoma, com o intuito de alcançar as medidas de qualidade pré-definidas (erro total <= 40 %, sensibilidade> = 85 % e acuidade> = 60 %). As principais contribuições deste trabalho incluem novos conhecimentos para ajudar a ultrapassar as exigências de uma abordagem pervasive em ambientes críticos. Algumas barreiras inerentes aos sistemas de informação, como a aquisição e o processamento de dados em tempo real e a indução de ensembles adaptativos em tempo real utilizando DM, foram transpostas. A divulgação dos resultados é feita através de dispositivos localizados, em qualquer lugar e a qualquer hora.
Intensive Care Units (ICU) are recognized as being critical environments, due to the fact that patients admitted to these units typically find themselves in situations of organ failure or serious health conditions. ICU professionals (doctors and nurses) dedicate most of their time taking care for the patients, relegating to a second plan all documentation tasks. Tasks such as recording vital signs, treatment planning and calculation of indicators, are only performed when patients are in a stable clinical condition. These records can occur with a lag of several hours. Since this is a critical environment, the Process of Decision Making (PDM) has to be fast, objective and effective. Any error or delay in the implementation of a particular decision may result in the loss of a human life. Aiming to minimize the human effort in bureaucratic processes and improve the PDM, dematerialization of information is required, eliminating paper-based recording and promoting an automatic registration of electronic and real-time data of patients. These data can then be used as a complement to the PDM, e.g. in Decision Support Systems that use Data Mining (DM) models. At the same time it is important for PDM to overcome barriers of time and space, making the platforms as universal as possible, accessible anywhere and anytime, regardless of the devices used. In this sense, it has been observed a proliferation of pervasive systems in healthcare. These systems are focused on providing healthcare to anyone, anytime and anywhere by removing restrictions of time and place, increasing both the coverage and quality of health care. This approach is mainly based on information that is stored and available online. With the aim of supporting the PDM a set of tests were carried out using static DM models making use of data that had been collected and entered manually in Euricus database. Preliminary results of these tests showed that it was possible to predict organ failure and outcome of a patient using DM techniques considering a set of physiological and clinical variables as input. High rates of sensitivity were achieved: Cardiovascular - 93.4%; Respiratory - 96.2%; Renal - 98.1%; Liver - 98.3%; hematologic - 97.5%; and Outcome and 98.3%. Upon completion of this study a challenge emerged: how to achieve the same results but in a dynamic way and in real time? A research question has been postulated as: "To what extent, Intelligent Decision Support Systems (IDSS) may be appropriate for critical clinical settings in a pervasive way? “. Research work included: 1. To percept what challenges a universal approach brings to IDSS, in the context of critical environments; 2. To understand how pervasive approaches can be adapted to critical environments; 3. To develop and test predictive models for pervasive approaches in health care. The main results achieved in this work made possible: 1. To prove the adequacy of pervasive approach in critical environments; 2. To design a new architecture that includes the information requirements for a pervasive approach, able to automate the process of knowledge discovery in databases; 3. To develop models to support pervasive intelligent decision able to act automatically and in real time. To induce DM ensembles in real time able to adapt autonomously in order to achieve predefined quality thresholds (total error < = 40 %, sensitivity > = 85 % and accuracy > = 60 %). Main contributions of this work include new knowledge to help overcoming the requirements of a pervasive approach in critical environments. Some barriers inherent to information systems, like the acquisition and processing of data in real time and the induction of adaptive ensembles in real time using DM, have been broken. The dissemination of results is done via devices located anywhere and anytime.

APA, Harvard, Vancouver, ISO, and other styles

47

Yang, Tao. "Brand and usability in content-intensive websites." Thesis, 2014. http://hdl.handle.net/1805/4667.

Full text

Abstract:

Indiana University-Purdue University Indianapolis (IUPUI)
Our connections to the digital world are invoked by brands, but the intersection of branding and interaction design is still an under-investigated area. Particularly, current websites are designed not only to support essential user tasks, but also to communicate an institution's intended brand values and traits. What we do not yet know, however, is which design factors affect which aspect of a brand. To demystify this issue, three sub-projects were conducted. The first project developed a systematic approach for evaluating the branding effectiveness of content-intensive websites (BREW). BREW gauges users' brand perceptions on four well-known branding constructs: brand as product, brand as organization, user image, and brand as person. It also provides rich guidelines for eBranding researchers in regard to planning and executing a user study and making improvement recommendations based on the study results. The second project offered a standardized perceived usability questionnaire entitled DEEP (design-oriented evaluation of perceived web usability). DEEP captures the perceived website usability on five design-oriented dimensions: content, information architecture, navigation, layout consistency, and visual guidance. While existing questionnaires assess more holistic concepts, such as ease-of-use and learnability, DEEP can more transparently reveal where the problem actually lies. Moreover, DEEP suggests that the two most critical and reliable usability dimensions are interface consistency and visual guidance. Capitalizing on the BREW approach and the findings from DEEP, a controlled experiment (N=261) was conducted by manipulating interface consistency and visual guidance of an anonymized university website to see how these variables may affect the university's image. Unexpectedly, consistency did not significantly predict brand image, while the effect of visual guidance on brand perception showed a remarkable gender difference. When visual guidance was significantly worsened, females became much less satisfied with the university in terms of brand as product (e.g., teaching and research quality) and user image (e.g., students' characteristics). In contrast, males' perceptions of the university's brand image stayed the same in most circumstances. The reason for this gender difference was revealed through a further path analysis and a follow-up interview, which inspired new research directions to unpack even more the nexus between branding and interaction design.

APA, Harvard, Vancouver, ISO, and other styles

48

Chen, Tseng-Yi, and 陳增益. "Based on a Novel Economic Evaluation Model to Design an Energy-efficient and Reliable Storage Mechanism with Associated Tools for Data-intensive Archive System." Thesis, 2015. http://ndltd.ncl.edu.tw/handle/30953709336081462185.

Full text

Abstract:

博士
國立清華大學
資訊工程學系
103
Recently, a green data center issue has garnered much attention due to the dramatic growth of data in every conceivable industry and application. With high network bandwidth, mobile applications and user clients always backups program/user data in remote data centers. In addition to the data from users, a data center usually employs a data fault-tolerance mechanism to generate redundant data, so as to keep user data from getting lost/error. To preserve numerous data in data centers, a storage system consumes about 27%-35% of the power consumption in a typical data center. Reducing the energy consumption of storage systems, previous studies conserved power in their respective storage systems by switching idle disks to standby/sleep modes. According to research conducted by Google and the IDEMA standard, frequently setting the disk status to standby mode will increase the disk's Annual Failure Rate and reduce its lifespan. However, in most cases, the authors did not analyze the reliability of their solutions. To address the issue, we propose an evaluation function called E3SaRC (Economic Evaluation of Energy saving with Reliability Constraint), which comprehensively evaluates the effects of a energy-efficient solution by considering the cost of hardware failure when applying energy saving schemes. With system reliability and energy-efficient considerations, this study proposes an energy-efficient and reliable storage system that is composed of an energy-efficient storage scheme with a data fault-tolerance algorithm, an adaptive simulation tool and a monitor framework. First of all, because power consumption is the most important issue in this dissertation, we developed a data placement mechanism called CacheRAID based on a Redundant Array of Independent Disks (RAID-5) architecture to mitigate the random access problems that implicitly exist in RAID techniques and thereby reduce the energy consumption of RAID disks. On system reliability issue, CacheRAID applies a control mechanism to the spin-down algorithm. To further enhance system energy-efficiency of the proposed system, an adaptive simulation tool has been proposed to find the best system parameters for CacheRAID by quickly simulating the current workload on storage systems. At the end, the contributions of this dissertation are presented in two parts. In the first part, our experimental results show that the proposed storage system can reduce the power consumption of the conventional software RAID 5 system by 65-80%. Moreover, according to the E3SaRC measurement, the overall saved cost of CacheRAID, is the largest among the systems that we compared. Second, the analytical results demonstrate that the measurement error of the proposed simulation tool is 2.5% lower than that achieved in real-world experiments involving energy estimation experiments. Therefore, the proposed tool can accurately simulate the power consumption of a storage system under different system settings. According to the experimental results, the proposed system can significantly reduce storage system power consumption and increase the system reliability.

APA, Harvard, Vancouver, ISO, and other styles

49

Brossier, David. "Élaboration et validation d’une base de données haute résolution destinée à la calibration d’un patient virtuel utilisable pour l’enseignement et la prise en charge personnalisée des patients en réanimation pédiatrique." Thesis, 2019. http://hdl.handle.net/1866/24620.

Full text

Abstract:

Cotutelle internationale avec l'université de Caen
La complexité des patients de réanimation justifie le recours à des systèmes d’aide à la décision thérapeutique. Ces systèmes rassemblent des protocoles automatisés de prise en charge permettant le respect des recommandations et des simulateurs physiologiques ou patients virtuels, utilisables pour personnaliser de façon sécuritaire les prises en charge. Ces dispositifs fonctionnant à partir d’algorithmes et d’équations mathématiques ne peuvent être développés qu’à partir d’un grand nombre de données de patients. Le principal objectif de cette thèse était la mise en place d’une base de données haute résolution automatiquement collectée de patients de réanimation pédiatrique dont le but sera de servir au développement et à la validation d’un simulateur physiologique : SimulResp©. Ce travail présente l’ensemble du processus de mise en place de la base de données, du concept jusqu’à son utilisation.
The complexity of the patients in the intensive care unit requires the use of clinical decision support systems. These systems bring together automated management protocols that enable adherence to guidelines and virtual physiological or patient simulators that can be used to safely customize management. These devices operating from algorithms and mathematical equations can only be developed from a large number of patients’ data. The main objective of the work was the elaboration of a high resolution database automatically collected from critically ill children. This database will be used to develop and validate a physiological simulator called SimulResp© . This manuscript presents the whole process of setting up the database from concept to use.

APA, Harvard, Vancouver, ISO, and other styles

Dissertations / Theses on the topic 'Data-Intensive Systems'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles