Дисертації з теми "Data freshness and consistency"

Щоб переглянути інші типи публікацій з цієї теми, перейдіть за посиланням: Data freshness and consistency.

Оформте джерело за APA, MLA, Chicago, Harvard та іншими стилями

Оберіть тип джерела:

Ознайомтеся з топ-50 дисертацій для дослідження на тему "Data freshness and consistency".

Біля кожної праці в переліку літератури доступна кнопка «Додати до бібліографії». Скористайтеся нею – і ми автоматично оформимо бібліографічне посилання на обрану працю в потрібному вам стилі цитування: APA, MLA, «Гарвард», «Чикаго», «Ванкувер» тощо.

Також ви можете завантажити повний текст наукової публікації у форматі «.pdf» та прочитати онлайн анотацію до роботи, якщо відповідні параметри наявні в метаданих.

Переглядайте дисертації для різних дисциплін та оформлюйте правильно вашу бібліографію.

1

Bedewy, Ahmed M. "OPTIMIZING DATA FRESHNESS IN INFORMATION UPDATE SYSTEMS." The Ohio State University, 2021. http://rave.ohiolink.edu/etdc/view?acc_num=osu1618573325086709.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
2

Mueller, G. "Data Consistency Checks on Flight Test Data." International Foundation for Telemetering, 2014. http://hdl.handle.net/10150/577405.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
ITC/USA 2014 Conference Proceedings / The Fiftieth Annual International Telemetering Conference and Technical Exhibition / October 20-23, 2014 / Town and Country Resort & Convention Center, San Diego, CA
This paper reflects the principal results of a study performed internally by Airbus's flight test centers. The purpose of this study was to share the body of knowledge concerning data consistency checks between all Airbus business units. An analysis of the test process is followed by the identification of the process stakeholders involved in ensuring data consistency. In the main part of the paper several different possibilities for improving data consistency are listed; it is left to the discretion of the reader to determine the appropriateness these methods.
3

Tran, Sy Nguyen. "Consistency techniques for test data generation." Université catholique de Louvain, 2005. http://edoc.bib.ucl.ac.be:81/ETD-db/collection/available/BelnUcetd-05272005-173308/.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
This thesis presents a new approach for automated test data generation of imperative programs containing integer, boolean and/or float variables. A test program (with procedure calls) is represented by an Interprocedural Control Flow Graph (ICFG). The classical testing criteria (statement, branch, and path coverage), widely used in unit testing, are extended to the ICFG. Path coverage is the core of our approach. Given a specified path of the ICFG, a path constraint is derived and solved to obtain a test case. The constraint solving is carried out based on a consistency notion. For statement (and branch) coverage, paths reaching a specified node or branch are dynamically constructed. The search for suitable paths is guided by the interprocedural control dependences of the program. The search is also pruned by our consistency filter. Finally, test data are generated by the application of the proposed path coverage algorithm. A prototype system implements our approach for C programs. Experimental results, including complex numerical programs, demonstrate the feasibility of the method and the efficiency of the system, as well as its versatility and flexibility to different classes of problems (integer and/or float variables; arrays, procedures, path coverage, statement coverage).
4

Yu, Wenyuan. "Improving data quality : data consistency, deduplication, currency and accuracy." Thesis, University of Edinburgh, 2013. http://hdl.handle.net/1842/8899.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
Data quality is one of the key problems in data management. An unprecedented amount of data has been accumulated and has become a valuable asset of an organization. The value of the data relies greatly on its quality. However, data is often dirty in real life. It may be inconsistent, duplicated, stale, inaccurate or incomplete, which can reduce its usability and increase the cost of businesses. Consequently the need for improving data quality arises, which comprises of five central issues of improving data quality, namely, data consistency, data deduplication, data currency, data accuracy and information completeness. This thesis presents the results of our work on the first four issues with regards to data consistency, deduplication, currency and accuracy. The first part of the thesis investigates incremental verifications of data consistencies in distributed data. Given a distributed database D, a set S of conditional functional dependencies (CFDs), the set V of violations of the CFDs in D, and updates ΔD to D, it is to find, with minimum data shipment, changes ΔV to V in response to ΔD. Although the problems are intractable, we show that they are bounded: there exist algorithms to detect errors such that their computational cost and data shipment are both linear in the size of ΔD and ΔV, independent of the size of the database D. Such incremental algorithms are provided for both vertically and horizontally partitioned data, and we show that the algorithms are optimal. The second part of the thesis studies the interaction between record matching and data repairing. Record matching, the main technique underlying data deduplication, aims to identify tuples that refer to the same real-world object, and repairing is to make a database consistent by fixing errors in the data using constraints. These are treated as separate processes in most data cleaning systems, based on heuristic solutions. However, our studies show that repairing can effectively help us identify matches, and vice versa. To capture the interaction, a uniform framework that seamlessly unifies repairing and matching operations is proposed to clean a database based on integrity constraints, matching rules and master data. The third part of the thesis presents our study of finding certain fixes that are absolutely correct for data repairing. Data repairing methods based on integrity constraints are normally heuristic, and they may not find certain fixes. Worse still, they may even introduce new errors when attempting to repair the data, which may not work well when repairing critical data such as medical records, in which a seemingly minor error often has disastrous consequences. We propose a framework and an algorithm to find certain fixes, based on master data, a class of editing rules and user interactions. A prototype system is also developed. The fourth part of the thesis introduces inferring data currency and consistency for conflict resolution, where data currency aims to identify the current values of entities, and conflict resolution is to combine tuples that pertain to the same real-world entity into a single tuple and resolve conflicts, which is also an important issue for data deduplication. We show that data currency and consistency help each other in resolving conflicts. We study a number of associated fundamental problems, and develop an approach for conflict resolution by inferring data currency and consistency. The last part of the thesis reports our study of data accuracy on the longstanding relative accuracy problem which is to determine, given tuples t1 and t2 that refer to the same entity e, whether t1[A] is more accurate than t2[A], i.e., t1[A] is closer to the true value of the A attribute of e than t2[A]. We introduce a class of accuracy rules and an inference system with a chase procedure to deduce relative accuracy, and the related fundamental problems are studied. We also propose a framework and algorithms for inferring accurate values with users’ interaction.
5

Ntaryamira, Evariste. "Une méthode asynchrone généralisée préservant la qualité des données des systèmes temps réel embarqués : cas de l’autopilote PX4-RT." Thesis, Sorbonne université, 2021. https://tel.archives-ouvertes.fr/tel-03789654.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
Les systèmes embarqués en temps réel, malgré leurs ressources limitées, évoluent très rapidement. Pour ces systèmes, il est impératif de garantir que les tâches ne manquent pas leurs échéances, mais aussi la bonne qualité des données transmises de tâche en tâche. Il est obligatoire de trouver des compromis entre les contraintes d'ordonnancement du système et celles appliquées aux données. Pour garantir ces propriétés, nous considérons le mécanisme sans attente. L'accès aux ressources partagées suit le principe d'un seul producteur, plusieurs lecteurs. Pour contenir toutes les particularités de communication apportées par le mécanisme de communication uORB, nous avons modélisé les interactions entre les tâches par un graphe biparti que nous avons appelé graphe de communication et qui est composé d'ensembles de messages dits de domaine. Pour améliorer la prévisibilité de la communication inter-tâches, nous étendons le modèle de Liu & Layland avec le paramètre état de communication utilisé pour contrôler les points d'écriture/lecture.Nous avons considéré deux types de contraintes de données : les contraintes locales de données et les contraintes globales de données. Pour vérifier les contraintes locales des données, nous nous appuyons sur le mécanisme de sous-échantillonnage destiné à vérifier les contraintes locales des données. En ce qui concerne les contraintes globales des données, nous avons introduit deux nouveaux mécanismes : le " dernier lecteur de marque" et le " mécanisme de défilement ou d'écrasement ". Ces 2 mécanismes sont en quelque sorte complémentaires. Le premier fonctionne au début du fuseau tandis que le second fonctionne à la fin du fuseau
Real-time embedded systems, despite their limited resources, are evolving very quickly. For such systems, it is not enough to ensure that all jobs do not miss their deadlines, it is also mandatory to ensure the good quality of the data being transmitted from tasks to tasks. Speaking of the data quality constraints, they are expressed by the maintenance of a set of properties that a data sample must exhibit to be considered as relevant. It is mandatory to find trade-offs between the system scheduling constraints and those applied to the data. To ensure such properties, we consider the wait-free mechanism. The size of each communication buffer is based on the lifetime bound method. Access to the shared resources follows the single writer, many readers. To contain all the communication particularities brought by the uORB communication mechanism we modeled the interactions between the tasks by a bipartite graph that we called communication graph which is comprised of sets of so-called domain messages. To enhance the predictability of inter-task communication, we extend Liu and Layland model with the parameter communication state used to control writing/reading points.We considered two types of data constraints: data local constraints and data global constraints. To verify the data local constraints, we rely on the sub-sampling mechanism meant to verify data local constraints. Regarding the data global constraints, we introduced two new mechanism: the last reader tags mechanism and the scroll or overwrite mechanism. These 2 mechanisms are to some extent complementary. The first one works at the beginning of the spindle while the second one works at the end of the spindle
6

湯志輝 and Chi-fai Tong. "On checking the temporal consistency of data." Thesis, The University of Hong Kong (Pokfulam, Hong Kong), 1993. http://hub.hku.hk/bib/B31211914.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
7

Tong, Chi-fai. "On checking the temporal consistency of data /." [Hong Kong : University of Hong Kong], 1993. http://sunzi.lib.hku.hk/hkuto/record.jsp?B13570353.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
8

Shah, Nikhil Jeevanlal. "A simulation framework to ensure data consistency in sensor networks." Manhattan, Kan. : Kansas State University, 2008. http://hdl.handle.net/2097/541.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
9

Gustafsson, Thomas. "Maintaining data consistency in embedded databases for vehicular systems." Licentiate thesis, Linköping : Univ, 2004. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-5681.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
10

Khan, Tareq Jamal. "Robust, fault-tolerant majority based key-value data store supporting multiple data consistency." Thesis, KTH, Skolan för informations- och kommunikationsteknik (ICT), 2011. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-42474.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
Web 2.0 has significantly transformed the way how modern society works now-a-days. In today‘s Web, information not only flows top down from the web sites to the readers; but also flows bottom up contributed by mass user. Hugely popular Web 2.0 applications like Wikis, social applications (e.g. Facebook, MySpace), media sharing applications (e.g. YouTube, Flickr), blogging and numerous others generate lots of user generated contents and make heavy use of the underlying storage. Data storage system is the heart of these applications as all user activities are translated to read and write requests and directed to the database for further action. Hence focus is on the storage that serves data to support the applications and its reliable and efficient design is instrumental for applications to perform in line with expectations. Large scale storage systems are being used by popular social networking services like Facebook, MySpace where millions of users‘ data have been stored and fully accessed by these companies. However from users‘ point of view there has been justified concern about user data ownership and lack of control over personal data. For example, on more than one occasions Facebook have exercised its control over users‘ data without respecting users‘ rights to ownership of their own content and manipulated data for its own business interest without users‘ knowledge or consent. The thesis proposes, designs and implements a large scale, robust and fault-tolerant key-value data storage prototype that is peer-to-peer based and intends to back away from the client-server paradigm with a view to relieving the companies from data storage and management responsibilities and letting users control their own personal data. Several read and write APIs (similar to Yahoo!‘s P NUTS but different in terms of underlying design and the environment they are targeted for) with various data consistency guarantees are provided from which a wide range of web applications would be able to choose the APIs according to their data consistency, performance and availability requirements. An analytical comparison is also made against the PNUTS system that targets a more stable environment. For evaluation, simulation has been carried out to test the system availability, scalability and fault-tolerance in a dynamic environment. The results are then analyzed and conclusion is drawn that the system is scalable, available and shows acceptable performance.
11

Chihoub, Houssem Eddine. "Managing consistency for big data applications : tradeoffs and self-adaptiveness." Thesis, Cachan, Ecole normale supérieure, 2013. http://www.theses.fr/2013DENS0059/document.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
Dans l’ère de Big Data, les applications intensives en données gèrent des volumes de données extrêmement grand. De plus, ils ont besoin de temps de traitement rapide. Une grande partie de ces applications sont déployées sur des infrastructures cloud. Ceci est afin de bénéficier de l’élasticité des clouds, les déploiements sur demande et les coûts réduits strictement relatifs à l’usage. Dans ce contexte, la réplication est un moyen essentiel dans le cloud afin de surmonter les défis de Big Data. En effet, la réplication fournit les moyens pour assurer la disponibilité des données à travers de nombreuses copies de données, des accès plus rapide aux copies locales, la tolérance aux fautes. Cependant, la réplication introduit le problème majeur de la cohérence de données. La gestion de la cohérence est primordiale pour les systèmes de Big Data. Les modèles à cohérence forte présentent de grandes limitations aux aspects liées aux performances et au passage à l’échelle à cause des besoins de synchronisation. En revanche, les modèles à cohérence faible et éventuelle promettent de meilleures performances ainsi qu’une meilleure disponibilité de données. Toutefois, ces derniers modèles peuvent tolérer, sous certaines conditions, trop d’incohérence temporelle. Dans le cadre du travail de cette thèse, on s'adresse particulièrement aux problèmes liés aux compromis de cohérence dans les systèmes à large échelle de Big Data. Premièrement, on étudie la gestion de cohérence au niveau du système de stockage. On introduit un modèle de cohérence auto-adaptative (nommé Harmony). Ce modèle augmente et diminue de manière automatique le niveau de cohérence et le nombre de copies impliquées dans les opérations. Ceci permet de fournir de meilleures performances toute en satisfaisant les besoins de cohérence de l’application. De plus, on introduit une étude détaillée sur l'impact de la gestion de la cohérence sur le coût financier dans le cloud. On emploi cette étude afin de proposer une gestion de cohérence efficace qui réduit les coûts. Dans une troisième direction, on étudie les effets de gestion de cohérence sur la consommation en énergie des systèmes de stockage distribués. Cette étude nous mène à analyser les gains potentiels des reconfigurations adaptatives des systèmes de stockage en matière de réduction de la consommation. Afin de compléter notre travail au niveau système de stockage, on s'adresse à la gestion de cohérence au niveau de l’application. Les applications de Big Data sont de nature différente et ont des besoins de cohérence différents. Par conséquent, on introduit une approche de modélisation du comportement de l’application lors de ses accès aux données. Le modèle résultant facilite la compréhension des besoins en cohérence. De plus, ce modèle est utilisé afin de délivrer une cohérence customisée spécifique à l’application
In the era of Big Data, data-intensive applications handle extremely large volumes of data while requiring fast processing times. A large number of such applications run in the cloud in order to benefit from cloud elasticity, easy on-demand deployments, and cost-efficient Pays-As-You-Go usage. In this context, replication is an essential feature in the cloud in order to deal with Big Data challenges. Therefore, replication therefore, enables high availability through multiple replicas, fast data access to local replicas, fault tolerance, and disaster recovery. However, replication introduces the major issue of data consistency across different copies. Consistency management is a critical for Big Data systems. Strong consistency models introduce serious limitations to systems scalability and performance due to the required synchronization efforts. In contrast, weak and eventual consistency models reduce the performance overhead and enable high levels of availability. However, these models may tolerate, under certain scenarios, too much temporal inconsistency. In this Ph.D thesis, we address this issue of consistency tradeoffs in large-scale Big Data systems and applications. We first, focus on consistency management at the storage system level. Accordingly, we propose an automated self-adaptive model (named Harmony) that scale up/down the consistency level at runtime when needed in order to provide as high performance as possible while preserving the application consistency requirements. In addition, we present a thorough study of consistency management impact on the monetary cost of running in the cloud. Hereafter, we leverage this study in order to propose a cost efficient consistency tuning (named Bismar) in the cloud. In a third direction, we study the consistency management impact on energy consumption within the data center. According to our findings, we investigate adaptive configurations of the storage system cluster that target energy saving. In order to complete our system-side study, we focus on the application level. Applications are different and so are their consistency requirements. Understanding such requirements at the storage system level is not possible. Therefore, we propose an application behavior modeling that apprehend the consistency requirements of an application. Based on the model, we propose an online prediction approach- named Chameleon that adapts to the application specific needs and provides customized consistency
12

Chen, Xin. "Techniques of data prefetching, replication, and consistency in the Internet." W&M ScholarWorks, 2005. https://scholarworks.wm.edu/etd/1539623464.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
Internet has become a major infrastructure for information sharing in our daily life, and indispensable to critical and large applications in industry, government, business, and education. Internet bandwidth (or the network speed to transfer data) has been dramatically increased, however, the latency time (or the delay to physically access data) has been reduced in a much slower pace. The rich bandwidth and lagging latency can be effectively coped with in Internet systems by three data management techniques: caching, replication, and prefetching. The focus of this dissertation is to address the latency problem in Internet by utilizing the rich bandwidth and large storage capacity for efficiently prefetching data to significantly improve the Web content caching performance, by proposing and implementing scalable data consistency maintenance methods to handle Internet Web address caching in distributed name systems (DNS), and to handle massive data replications in peer-to-peer systems. While the DNS service is critical in Internet, peer-to-peer data sharing is being accepted as an important activity in Internet.;We have made three contributions in developing prefetching techniques. First, we have proposed an efficient data structure for maintaining Web access information, called popularity-based Prediction by Partial Matching (PB-PPM), where data are placed and replaced guided by popularity information of Web accesses, thus only important and useful information is stored. PB-PPM greatly reduces the required storage space, and improves the prediction accuracy. Second, a major weakness in existing Web servers is that prefetching activities are scheduled independently of dynamically changing server workloads. Without a proper control and coordination between the two kinds of activities, prefetching can negatively affect the Web services and degrade the Web access performance. to address this problem, we have developed a queuing model to characterize the interactions. Guided by the model, we have designed a coordination scheme that dynamically adjusts the prefetching aggressiveness in Web Servers. This scheme not only prevents the Web servers from being overloaded, but it can also minimize the average server response time. Finally, we have proposed a scheme that effectively coordinates the sharing of access information for both proxy and Web servers. With the support of this scheme, the accuracy of prefetching decisions is significantly improved.;Regarding data consistency support for Internet caching and data replications, we have conducted three significant studies. First, we have developed a consistency support technique to maintain the data consistency among the replicas in structured P2P networks. Based on Pastry, an existing and popular P2P system, we have implemented this scheme, and show that it can effectively maintain consistency while prevent hot-spot and node-failure problems. Second, we have designed and implemented a DNS cache update protocol, called DNScup, to provide strong consistency for domain/IP mappings. Finally, we have developed a dynamic lease scheme to timely update the replicas in Internet.
13

Patil, Vivek. "Criteria for Data Consistency Evaluation Prior to Modal Parameter Estimation." University of Cincinnati / OhioLINK, 2021. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1627667589352536.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
14

Chen, Raymond C. "Consistency control and memory semantics for persistent objects." Diss., Georgia Institute of Technology, 1991. http://hdl.handle.net/1853/8149.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
15

Torres-Rojas, Francisco Jose. "Scalable approximations to causality and consistency of distributed objects." Diss., Georgia Institute of Technology, 1999. http://hdl.handle.net/1853/9155.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
16

Wu, Zhixue. "A new approach to implementing atomic data types." Thesis, University of Cambridge, 1993. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.319890.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
17

Moncur, Robert Aaron. "Data Consistency and Conflict Avoidance in a Multi-User CAx Environment." BYU ScholarsArchive, 2012. https://scholarsarchive.byu.edu/etd/3675.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
This research presents a new method to preserve data consistency in a multi-user CAx environment. The new method includes three types of constraints which work by constraining and controlling both features and users across an entire multi-user CAx platform. The first type of constraint includes locking or reserving features to enable only one user at a time to edit a given feature. The second type of constraint, collaborative feature constraints, allows flexible constraining of each individual feature in a model, and the data that defines it. The third type of constraint, collaborative user constraints, allows the constraining of user permissions and user actions individually or as a group while providing as much flexibility as possible. To further present this method, mock-ups and suggested implementation guidelines are presented. To demonstrate the effectiveness of the method, a proof-of-concept implementation was built using the CATIA Connect multi-user CAD prototype developed at BYU. Using this implementation usage examples are provided to show how this method provides important tools that increase collaborative capabilities to a multi-user CAx system. By using the suggested method design teams will be able to better control how their data is used and edited, maintaining better data consistency and preventing data conflict and data misuse.
18

Lin, Pengpeng. "A Framework for Consistency Based Feature Selection." TopSCHOLAR®, 2009. http://digitalcommons.wku.edu/theses/62.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
Feature selection is an effective technique in reducing the dimensionality of features in many applications where datasets involve hundreds or thousands of features. The objective of feature selection is to find an optimal subset of relevant features such that the feature size is reduced and understandability of a learning process is improved without significantly decreasing the overall accuracy and applicability. This thesis focuses on the consistency measure where a feature subset is consistent if there exists a set of instances of length more than two with the same feature values and the same class labels. This thesis introduces a new consistency-based algorithm, Automatic Hybrid Search (AHS) and reviews several existing feature selection algorithms (ES, PS and HS) which are based on the consistency rate. After that, we conclude this work by conducting an empirical study to a comparative analysis of different search algorithms.
19

Wieweg, William. "Towards Arc Consistency in PLAS." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-232081.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
The Planning And Scheduling (PLAS) module of ICE (Intelligent Control Environment) is responsible for planning and scheduling a large fleet of vehicles. This process involves the creation of tasks to be executed by the vehicles. Using this information, PLAS decides which vehicles should execute which tasks, which are modelled as constraint satisfaction problems. Solving the constraint satisfaction problems is slow. To improve efficiency, a number of different techniques exist. One of these is arc consistency, that entails taking a constraint satisfaction problem and evaluating its variables pairwise by applying the constraints among them. Using arc consistency, we can discern the candidate solutions to constraint satisfaction problems faster than doing a pure search. In addition, arc consistency allows us to detect and act early on inconsistencies in constraint satisfaction problems. The work in this master thesis includes the implementation of a constraint solver for symbolic constraints, containing the arc consistency algorithm AC3. Furthermore, it encompasses the implementation of a constraint satisfaction problem generator, based on the Erdős-Rényi graph model, inspired by the quasigroup completion problem with holes, that allows the evaluation of the constraint solver on large-sized problems. Using the constraint satisfaction problem generator, a set of experiments were performed to evaluate the constraint solver. Furthermore, a set of complementary scenarios using manually created constraint satisfaction problems were performed to augment the experiments. The results show that the performance scales up well.
Schemaläggningsmodulen PLAS som är en del av ICE (Intelligent Control Environment) är ansvarig för planering och schemaläggning av stora mängder fordonsflottor. Denna process involverar skapandet av uppgifter som behöver utföras av fordonen. Utifrån denna information bestämmer PLAS vilka fordon som ska utföra vilka uppgifter, vilket är modellerat som villkorsuppfyllelseproblem. Att lösa villkorsuppfyllelseproblem är långsamt. För att förbättra prestandan, så finns det en mängd olika tekniker. En av dessa är bågkonsekvens, vilket involverar att betrakta ett villkorsuppfyllelseproblem och utvärdera dess variabler parvis genom att tillämpa villkoren mellan dem. Med hjälp av bågkonsekvens kan vi utröna kandidatlösningar för villkorsuppfyllelseproblemen snabbare, jämfört med ren sökning. Vidare, bågkonsenvens möjliggör upptäckande och bearbetning av inkonsekvenser i villkorsuppfyllelseproblem. Arbetet i denna masteruppsats omfattar genomförandet av en villkorslösare för symboliska villkor, innehållandes bågkonsekvensalgoritmen AC3. Vidare, så innefattar det genomförandet av en villkorsuppfyllelseproblemgenerator, baserad på grafmodellen Erdős-Rényi, inspirerad av kvasigruppkompletteringsproblem med hål, villket möjliggör utvärdering av villkorslösaren på stora problem. Med hjälp av villkorsuppfyllelseproblemgeneratorn så utfördes en mängd experiment för att utvärdera villkorslösaren. Vidare så kompletterades experimenten av en mängd scenarion utförda på manuellt skapade villkorsuppfyllelseproblem. Resultaten visar att prestandan skalar upp bra.
20

Shao, Cheng. "Multi-writer consistency conditions for shared memory objects." Texas A&M University, 2007. http://hdl.handle.net/1969.1/85806.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
Regularity is a shared memory consistency condition that has received considerable attention, notably in connection with quorum-based shared memory. Lamport's original definition of regularity assumed a single-writer model, however, and is not well defined when each shared variable may have multiple writers. In this thesis, we address this need by formally extending the notion of regularity to a multi-writer model. We have shown that the extension is not trivial. While there exist various ways to extend the single-writer definition, the resulting definitions will have different strengths. Specifically, we give several possible definitions of regularity in the presence of multiple writers. We then present a quorum-based algorithm to implement each of the proposed definitions and prove them correct. We study the relationships between these definitions and a number of other well-known consistency conditions, and give a partial order describing the relative strengths of these consistency conditions. Finally, we provide a practical context for our results by studying the correctness of two well-known algorithms for mutual exclusion under each of our proposed consistency conditions.
21

Ben, Hafaiedh Khaled. "Studying the Properties of a Distributed Decentralized b+ Tree with Weak-Consistency." Thèse, Université d'Ottawa / University of Ottawa, 2012. http://hdl.handle.net/10393/20578.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
Distributed computing is very popular in the field of computer science and is widely used in web applications. In such systems, tasks and resources are partitioned among several computers so that the workload can be shared among the different computers in the network, in contrast to systems using a single server computer. Distributed system designs are used for many practical reasons and are often found to be more scalable, robust and suitable for many applications. The aim of this thesis is to study the properties of a distributed tree data-structure that allow searches, insertions and deletions of data elements. In particular, the b- tree structure [13] is considered, which is a generalization of a binary search tree. The study consists of analyzing the effect of distributing such a tree among several computers and investigates the behavior of such structure over a long period of time by growing the network of computers supporting the tree, while the state of the structure is instantly updated as insertions and deletions operations are performed. It also attempts to validate the necessary and sufficient invariants of the b-tree-structure that guarantee the correctness of the search operations. A simulation study is also conducted to verify the validity of such distributed data-structure and the performance of the algorithm that implements it. Finally, a discussion is provided in the end of the thesis to compare the performance of the system design with other distributed tree structure designs.
22

Chihoub, Houssem-Eddine. "Managing Consistency for Big Data Applications on Clouds: Tradeoffs and Self Adaptiveness." Phd thesis, École normale supérieure de Cachan - ENS Cachan, 2013. http://tel.archives-ouvertes.fr/tel-00915091.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
A l'ère de Big Data, les applications de traitement intensif de données gèrent des volumes de données extrêmement grands. De plus, ils requièrent des temps de traitement très rapides. Une grande partie de ces applications sont déployées sur des clouds, afin de bénéficier des avantages de ces infrastructures. Dans ce contexte, la réplication est un moyen essentiel dans le cloud afin de surmonter les défis de Big Data. Cependant, la réplication introduit le problème important de la cohérence des données. La gestion de la cohérence est primordiale. Les modèles à cohérence forte induisent des coûts importants en terme de performance et ont des difficultés à passer à l'échelle à cause des besoins de synchronisation. A l'inverse, les modèles à cohérence faible (la cohérence à terme, par exemple) fournissent de meilleures performances ainsi qu'une meilleure disponibilité de données. Toutefois, ces derniers modèles peuvent tolérer, sous certaines conditions, trop d'incohérence temporaire. Dans le cadre du travail de cette thèse, nous abordons les problèmes liés aux compromis suscités par la gestion de la cohérence dans les systèmes de Big Data. Premièrement, nous proposons un modèle de cohérence auto-adaptative qui augmente et diminue de manière automatique le niveau de cohérence. Ceci permet de fournir de meilleures performances tout en satisfaisant les besoins des applications. En deuxième lieu, nous abordons les enjeux financiers liés à la gestion de cohérence dans le cloud. Par conséquent, nous proposons une gestion de la cohérence efficace en termes de coût. La troisième contribution consiste à étudier les effets de gestion de cohérence sur la consommation d'énergie des systèmes de stockage distribués. Cette étude nous mène à analyser les gains potentiels des reconfigurations adaptatives des systèmes de stockage en matière de réduction de la consommation. Afin de compléter notre travail au niveau système, nous abordons la gestion de cohérence au niveau de l'application. Nous introduisons une approche pour la modélisation du comportement de l'application lors de ses accès aux données. Le modèle proposé facilite la compréhension des besoins en cohérence. De plus, ce modèle est utilisé afin de gérer la cohérence de manière spécifique à l'application lors de l'exécution. Des évaluations approfondies sur les plates-formes Grid'5000 et Amazon EC2 démontrent l'efficacité des approches proposées.
23

Gustafsson, Thomas. "Management of Real-Time Data Consistency and Transient Overloads in Embedded Systems." Doctoral thesis, Linköping : Department of Computer and Information Science, Linköpings universitet, 2007. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-9782.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
24

Bonds, August. "Hash-based Eventual Consistency to Scale the HDFS Block Report." Thesis, KTH, Skolan för informations- och kommunikationsteknik (ICT), 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-222363.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
The architecture of the distributed hierarchical file system HDFS imposes limitations on its scalability. All metadata is stored in-memory on a single machine, and in practice, this limits the cluster size to about 4000 servers. Larger HDFS clusters must resort to namespace federation which divides the filesystem into isolated volumes and changes the semantics of cross-volume filesystem operations (for example, file move becomes a non-atomic combination of copy and delete). Ideally, organizations want to consolidate their data in as few clusters and namespaces as possible to avoid such issues and increase operating efficiency, utility, and maintenance. HopsFS, a new distribution of HDFS developed at KTH, uses an in-memory distributed database for storing metadata. It scales to 10k nodes and has shown that in principle it can support clusters of at least 15 times the size of traditional non-federated HDFS clusters. However, an eventually consistent data loss protection mechanism in HDFS, called the Block Report protocol, prevents HopsFS from reaching its full potential. This thesis provides a solution to scaling the Block Report protocol for HopsFS that uses an incremental, hash-based eventual consistency mechanism to avoid duplicated work. In the average case, our simulations indicate that the solution can reduce the load on the database by an order of magnitude at the cost of less than 10 percent overhead on file mutations while performing similarly to the old solution in the worst case.
Det distribuerade, hierarkiska filsystemet Apache HDFS arkitektur begränsar dess skalbarhet. All metadata lagras i minnet i ett av klustrets noder, och i praktiken begränsar detta ett HDFS-klusters storlek till ungefär 4000 noder. Större kluster tvingas partitionera filsystemet i isolerade delar, vilket förändrar beteendet vid operationer som korsar partitionens gränser (exempelvis fil-flytter blir ickeatomära kombinationer av kopiera och radera). I idealfallet kan organisationer sammanslå alla sina lagringslösningar i ett och samma filträd för att undvika sådana beteendeförändringar och därför minska administrationen, samt öka användningen av den hårdvara de väljer att behålla. HopsFS är en ny utgåva av Apache HDFS, utvecklad på KTH, som använder en minnesbaserad distribuerad databaslösning för att lagra metadata. Lösningen kan hantera en klusterstorlek på 10000 noder och har visat att det i princip kan stöda klusterstorlekar på upp till femton gånger Apache HDFS. Ett av de hinder som kvarstår för att HopsFS ska kunna nå dessa nivåer är en så-småningom-konsekvent algoritm för dataförlustskydd i Apache HDFS som kallas Block Report. Detta arbete föreslår en lösning för att öka skalbarheten i HDFS Block Report som använder sig av en hash-baserad så-småningom-konsekvent mekanism för att undvika dubbelt arbete. Simuleringar indikerar att den nya lösningen i genomsnitt kan minska trycket på databasen med en hel storleksordning, till en prestandakostnad om mindre än tio procent på filsystemets vanliga operationer, medan databasanvändningen i värsta-fallet är jämförbart med den gamla lösningen.
25

Weidlich, Matthias. "Behavioural profiles : a relational approach to behaviour consistency." Phd thesis, Universität Potsdam, 2011. http://opus.kobv.de/ubp/volltexte/2011/5559/.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
Business Process Management (BPM) emerged as a means to control, analyse, and optimise business operations. Conceptual models are of central importance for BPM. Most prominently, process models define the behaviour that is performed to achieve a business value. In essence, a process model is a mapping of properties of the original business process to the model, created for a purpose. Different modelling purposes, therefore, result in different models of a business process. Against this background, the misalignment of process models often observed in the field of BPM is no surprise. Even if the same business scenario is considered, models created for strategic decision making differ in content significantly from models created for process automation. Despite their differences, process models that refer to the same business process should be consistent, i.e., free of contradictions. Apparently, there is a trade-off between strictness of a notion of consistency and appropriateness of process models serving different purposes. Existing work on consistency analysis builds upon behaviour equivalences and hierarchical refinements between process models. Hence, these approaches are computationally hard and do not offer the flexibility to gradually relax consistency requirements towards a certain setting. This thesis presents a framework for the analysis of behaviour consistency that takes a fundamentally different approach. As a first step, an alignment between corresponding elements of related process models is constructed. Then, this thesis conducts behavioural analysis grounded on a relational abstraction of the behaviour of a process model, its behavioural profile. Different variants of these profiles are proposed, along with efficient computation techniques for a broad class of process models. Using behavioural profiles, consistency of an alignment between process models is judged by different notions and measures. The consistency measures are also adjusted to assess conformance of process logs that capture the observed execution of a process. Further, this thesis proposes various complementary techniques to support consistency management. It elaborates on how to implement consistent change propagation between process models, addresses the exploration of behavioural commonalities and differences, and proposes a model synthesis for behavioural profiles.
Das Geschäftsprozessmanagement umfasst Methoden zur Steuerung, Analyse sowie Optimierung von Geschäftsprozessen. Es stützt sich auf konzeptionelle Modelle, Prozessmodelle, welche den Ablauf zur Erreichung eines Geschäftszieles beschreiben. Demnach ist ein Prozessmodell eine Abbildung eines Geschäftsprozesses, erstellt hinsichtlich eines Modellierungsziels. Unterschiedliche Modellierungsziele resultieren somit in unterschiedlichen Modellen desselben Prozesses. Beispielsweise unterscheiden sich zwei Modelle erheblich, sofern eines für die strategische Entscheidungsfindung und eines für die Automatisierung erstellt wurde. Trotz der in unterschiedlichen Modellierungszielen begründeten Unterschiede sollten die entsprechenden Modelle konsistent, d.h. frei von Widersprüchen sein. Die Striktheit des Konsistenzbegriffs steht hierbei in Konflikt mit der Eignung der Prozessmodelle für einen bestimmten Zweck. Existierende Ansätze zur Analyse von Verhaltenskonsistenz basieren auf Verhaltensäquivalenzen und nehmen an, dass Prozessmodelle in einer hierarchischen Verfeinerungsrelation stehen. Folglich weisen sie eine hohe Berechnungskomplexität auf und erlauben es nicht, den Konsistenzbegriff graduell für einen bestimmten Anwendungsfalls anzupassen. Die vorliegende Arbeit stellt einen Ansatz für die Analyse von Verhaltenskonsistenz vor, welcher sich fundamental von existierenden Arbeiten unterscheidet. Zunächst werden korrespondierende Elemente von Prozessmodellen, welche den gleichen Geschäftsprozess darstellen, identifiziert. Auf Basis dieser Korrespondenzen wird ein Ansatz zur Konsistenzanalyse vorgestellt. Jener basiert auf einer relationalen Verhaltensabstraktion, dem Verhaltensprofil eines Prozessmodells. Die Arbeit führt verschiedene Varianten dieses Profils ein und zeigt wie sie für bestimmte Modellklassen effizient berechnet werden. Mithilfe von Verhaltensprofilen werden Konsistenzbegriffe und Konsistenzmaße für die Beurteilung von Korrespondenzen zwischen Prozessmodellen definiert. Weiterhin werden die Konsistenzmaße auch für den Anwendungsfall der Konformität angepasst, welcher sich auf beobachtete Abläufe in Form von Ausführungsdaten bezieht. Darüber hinaus stellt die Arbeit eine Reihe von Methoden vor, welche die Analyse von Verhaltenskonsistenz ergänzen. So werden Lösungen für das konsistente Übertragen von Änderungen eines Modells auf ein anderes, die explorative Analyse von Verhaltensgemeinsamkeiten, sowie eine Modellsynthese für Verhaltensprofile vorgestellt.
26

Hepworth, Ammon Ikaika. "Conflict Management and Model Consistency in Multi-user CAD." BYU ScholarsArchive, 2014. https://scholarsarchive.byu.edu/etd/5586.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
The NSF Center for e-Design, Brigham Young University (BYU) site has re-architected Computer Aided Design (CAD) tools, enabling multiple users to concurrently create, modify and view the same CAD part or assembly. This technology allows engineers, designers and manufacturing personnel to simultaneously contribute to the design of a part or assembly in real time,enabling parallel work environments within the CAD system. Multi-user systems are only as robust and efficient as their methods for managing conflicts and preserving model consistency. Conflicts occur in multi-user CAD when multiple users interoperate with the same or dependent geometry. Some conflicts can lead to model inconsistencies which means that each user's instance of the model are not identical. Other conflicts cause redundant work or waste in the design process. This dissertation presents methods to avoid and resolve conflicts which lead to model inconsistency and waste in the design process. The automated feature reservation method is presented which prevents multiple users from simultaneously editing the same feature, thus avoiding conflicts. In addition, a method is also presented which ensures that copies of the model stay consistent between distributed CAD clients by enforcing modeling operations to occur in the same order on all the clients. In cases of conflict, the conflicting operations are preserved locally for manual resolution by the user. An efficient model consistency method is presented which provides consistent references to the topological entities in a CAD model, ensuring operations are applied consistently on all models. An integrated task management system is also presented which avoids conflicts related to varying user design intent. Implementations and results of each method are presented. Results show that the methods effectively manage conflicts and ensure model consistency, thus providing a solution for a robust multi-user CAD system.
27

Gupta, Bharat. "Efficient replication of large volumes of data and maintaining data consistency by using P2P techniques in Desktop Grid." Thesis, University of Westminster, 2014. https://westminsterresearch.westminster.ac.uk/item/99352/efficient-replication-of-large-volumes-of-data-and-maintaining-data-consistency-by-using-p2p-techniques-in-desktop-grid.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
Desktop Grid is increasing in popularity because of relatively very low cost and good performance in institutions. Data-intensive applications require data management in scientific experiments conducted by researchers and scientists in Desktop Grid-based Distributed Computing Infrastructure (DCI). Some of these data-intensive applications deal with large volumes of data. Several solutions for data-intensive applications have been proposed for Desktop Grid (DG) but they are not efficient in handling large volumes of data. Data management in this environment deals with data access and integration, maintaining basic properties of databases, architecture for querying data, etc. Data in data-intensive applications has to be replicated in multiple nodes for improving data availability and reducing response time. Peer-to-Peer (P2P) is a well established technique for handling large volumes of data and is widely used on the internet. Its environment is similar to the environment of DG. The performance of existing P2P-based solution dealing with generic architecture for replicating large volumes of data is not efficient in DG-based DCI. Therefore, there is a need for a generic architecture for replicating large volumes of data efficiently by using P2P in BOINC based Desktop Grid. Present solutions for data-intensive applications mainly deal with read only data. New type of applications are emerging which deal large volumes of data and Read/Write of data. In emerging scientific experiments, some nodes of DG generate new snapshot of scientific data after regular intervals. This new snapshot of data is generated by updating some of the values of existing data fields. This updated data has to be synchronised in all DG nodes for maintaining data consistency. The performance of data management in DG can be improved by addressing efficient data replication and consistency. Therefore, there is need for algorithms which deal with data Read/Write consistency along with replication for large volumes of data in BOINC based Desktop Grid. The research is to identify efficient solutions for data replication in handling large volumes of data and maintaining Read/Write data consistency using Peer-to-Peer techniques in BOINC based Desktop Grid. This thesis presents the solutions that have been carried out to complete the research.
28

Welmers, Laura Hazel. "The implementation of an input/output consistency checker for a requirements specification document." Thesis, Kansas State University, 1985. http://hdl.handle.net/2097/9889.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
29

Zhan, Zhiyuan. "Meeting Data Sharing Needs of Heterogeneous Distributed Users." Diss., Georgia Institute of Technology, 2007. http://hdl.handle.net/1853/14598.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
The fast growth of wireless networking and mobile computing devices has enabled us to access information from anywhere at any time. However, varying user needs and system resource constraints are two major heterogeneity factors that pose a challenge to information sharing systems. For instance, when a new information item is produced, different users may have different requirements for when the new value should become visible. The resources that each device can contribute to such information sharing applications also vary. Therefore, how to enable information sharing across computing platforms with varying resources to meet different user demands is an important problem for distributed systems research. In this thesis, we address the heterogeneity challenge faced by such systems. We assume that shared information is encapsulated in distributed objects, and we use object replication to increase system scalability and robustness, which introduces the consistency problem. Many consistency models have been proposed in recent years but they are either too strong and do not scale very well, or too weak to meet many users' requirements. We propose a Mixed Consistency (MC) model as a solution. We introduce an access constraints based approach to combine both strong and weak consistency models together. We also propose a MC protocol that combines existing implementations together with minimum modifications. It is designed to tolerate crash failures and slow processes/communication links in the system. We also explore how the heterogeneity challenge can be addressed in the transportation layer by developing an agile dissemination protocol. We implement our MC protocol on top of a distributed publisher-subscriber middleware, Echo. We finally measure the performance of our MC implementation. The results of the experiments are consistent with our expectations. Based on the functionality and performance of mixed consistency protocols, we believe that this model is effective in addressing the heterogeneity of user requirements and available resources in distributed systems.
30

Müller, Simon Peter [Verfasser], and Jürgen [Akademischer Betreuer] Dippon. "Consistency and bandwidth selection for dependent data in non-parametric functional data analysis / Simon Peter Müller. Betreuer: Jürgen Dippon." Stuttgart : Universitätsbibliothek der Universität Stuttgart, 2011. http://d-nb.info/1017485550/34.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
31

Biswas, Swarnendu. "Practical Support for Strong, Serializability-Based Memory Consistency." The Ohio State University, 2016. http://rave.ohiolink.edu/etdc/view?acc_num=osu1470957618.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
32

Majuntke, Matthias [Verfasser], Neeraj [Akademischer Betreuer] Suri, and Christof [Akademischer Betreuer] Fetzer. "Data Consistency and Coordination for Untrusted Environments / Matthias Majuntke. Betreuer: Neeraj Suri ; Christof Fetzer." Darmstadt : Universitäts- und Landesbibliothek Darmstadt, 2012. http://d-nb.info/1106117956/34.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
33

Tavares, Joao Alberto Vianna. "Eureka : a distributed shared memory system based on the Lazy Data Merging consistency model /." Thesis, Monterey, Calif. : Springfield, Va. : Naval Postgraduate School ; Available from National Technical Information Service, 1995. http://handle.dtic.mil/100.2/ADA304327.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
34

Geraldo, Issa Cherif. "On the consistency of some constrained maximum likelihood estimator used in crash data modelling." Thesis, Lille 1, 2015. http://www.theses.fr/2015LIL10184/document.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
L'ensemble des méthodes statistiques utilisées dans la modélisation de données nécessite la recherche de solutions optimales locales mais aussi l’estimation de la précision (écart-type) liée à ces solutions. Ces méthodes consistent à optimiser, par approximations itératives, la fonction de vraisemblance ou une version approchée. Classiquement, on utilise des versions adaptées de la méthode de Newton-Raphson ou des scores de Fisher. Du fait qu'elles nécessitent des inversions matricielles, ces méthodes peuvent être complexes à mettre en œuvre numériquement en grandes dimensions ou lorsque les matrices impliquées ne sont pas inversibles. Pour contourner ces difficultés, des procédures itératives ne nécessitant pas d’inversion matricielle telles que les algorithmes MM (Minorization-Maximization) ont été proposées et sont considérés comme pertinents pour les problèmes en grandes dimensions et pour certaines distributions discrètes multivariées. Parmi les nouvelles approches proposées dans le cadre de la modélisation en sécurité routière, figure un algorithme nommé algorithme cyclique itératif (CA). Cette thèse a un double objectif. Le premier est d'étudier l'algorithme CA des points de vue algorithmique et stochastique; le second est de généraliser l'algorithme cyclique itératif à des modèles plus complexes intégrant des distributions discrètes multivariées et de comparer la performance de l’algorithme CA généralisé à celle de ses compétiteurs
Most of the statistical methods used in data modeling require the search for local optimal solutions but also the estimation of standard errors linked to these solutions. These methods consist in maximizing by successive approximations the likelihood function or its approximation. Generally, one uses numerical methods adapted from the Newton-Raphson method or Fisher’s scoring. Because they require matrix inversions, these methods can be complex to implement numerically in large dimensions or when involved matrices are not invertible. To overcome these difficulties, iterative procedures requiring no matrix inversion such as MM (Minorization-Maximization) algorithms have been proposed and are considered to be efficient for problems in large dimensions and some multivariate discrete distributions. Among the new approaches proposed for data modeling in road safety, is an algorithm called iterative cyclic algorithm (CA). This thesis has two main objectives: (a) the first is to study the convergence properties of the cyclic algorithm from both numerical and stochastic viewpoints and (b) the second is to generalize the CA to more general models integrating discrete multivariate distributions and compare the performance of the generalized CA to those of its competitors
35

Rönnberg, Axel. "Semi-Supervised Deep Learning using Consistency-Based Methods for Segmentation of Medical Images." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-279579.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
In radiation therapy, a form of cancer treatment, accurately locating the anatomical structures is required in order to limit the impact on healthy cells. The automatic task of delineating these structures and organs is called segmentation, where each pixel in an image is classified and assigned a label. Recently, deep neural networks have proven to be efficient at automatic medical segmentation. However, deep learning requires large amounts of training data. This is a restricting feature, especially in the medical field due to factors such as patient confidentiality. Nonetheless, the main challenge is not the image data itself but the lack of high-quality annotations. It is thus interesting to investigate methods for semi-supervised learning, where only a subset of the images re- quires annotations. This raises the question if these methods can be acceptable for organ segmentation, and if they will result in an increased performance in comparison to supervised models. A category of semi-supervised methods applies the strategy of encouraging consistency between predictions. Consistency Training and Mean Teacher are two methods in which the network weights are updated in order to minimize the impact of input perturbations such as data augmentations. In addition, the Mean Teacher method trains two models, a Teacher and a Student. The Teacher is updated as an average of consecutive Student models, using Temporal Ensembling. To resolve the question whether semi-supervised learning could be beneficial, the two mentioned techniques are investigated. They are used in training deep neural networks with an U-net architecture to segment the bladder and anorectum in 3D CT images. The results showed signs of promise for Consistency Training and Mean Teacher, with nearly all model configurations having improved segmentations. Results also showed that the methods caused a reduction in performance variance, primarily by limiting poor delineations. With these results in hand, the use of semi-supervised learning should definitely be considered. However, since the segmentation improvement was not repeated in all experiment configurations, more research needs to be done.
Inom radioterapi, en form av cancerbehandling, är precis lokalisering av anatomiska strukturer nödvändig för att begränsa påverkan på friska celler. Det automatiska arbetet att avbilda de här strukturerna och organen kallas för segmentering, där varje pixel i en bild är klassificerad och anvisad en etikett. Nyligen har djupa neurala nätverk visat sig vara effektiva för automatisk, medicinsk segmentering. Emellertid kräver djupinlärning stora mängder tränings- data. Det är ett begränsande drag, speciellt i det medicinska fältet, på grund av faktorer som patientsekretess. Trots det är den stora utmaningen inte bilddatan själv, utan bristen på högkvalitativa annoteringar. Det är därför intressant att undersöka metoder för semi-övervakad inlärning, där endast en delmängd av bilderna behöver annoteringar. Det höjer frågan om de här metoderna kan vara kliniskt acceptabla för organsegmentering, och om de resulterar i en ökad prestanda i jämförelse med övervakade modeller. En kategori av semi-övervakade metoder applicerar strategin att uppmuntra konsistens mellan prediktioner. Consistency Training och Mean Teacher är två metoder där nätverkets vikter är uppdaterade så att påverkan av rubbningar av input, som dataökningar, minimeras. Därtill tränar Mean Teacher två modeller, en Lärare och en Student. Läraren uppdateras som ett genomsnitt av konsekutiva Studentmodeller, användandes av Temporal Ensembling. För att lösa frågan huruvida semi-övervakad inlärning kan vara fördelaktig är de två nämnda metoderna undersökta. De används för att träna djupa neurala nät- verk med en U-net arkitektur för att segmentera blåsan och anorektum i 3D CT-bilder. Resultaten visade tecken på potential för Consistency Training och Mean Teacher, med förbättrad segmentering för nästan alla modellkonfigurationer. Resultaten visade även att metoderna medförde en reduktion i varians av prestanda, främst genom att begränsa dåliga segmenteringar. I och med de här resultaten borde användandet av semi-övervakad inlärning övervägas. Emellertid behöver mer forskning utföras, då förbättringen av segmenteringen inte upprepades i alla experiment.
36

Gonçalves, André Miguel Augusto. "Estimating data divergence in cloud computing storage systems." Master's thesis, Faculdade de Ciências e Tecnologia, 2013. http://hdl.handle.net/10362/10852.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
Dissertação para obtenção do Grau de Mestre em Engenharia Informática
Many internet services are provided through cloud computing infrastructures that are composed of multiple data centers. To provide high availability and low latency, data is replicated in machines in different data centers, which introduces the complexity of guaranteeing that clients view data consistently. Data stores often opt for a relaxed approach to replication, guaranteeing only eventual consistency, since it improves latency of operations. However, this may lead to replicas having different values for the same data. One solution to control the divergence of data in eventually consistent systems is the usage of metrics that measure how stale data is for a replica. In the past, several algorithms have been proposed to estimate the value of these metrics in a deterministic way. An alternative solution is to rely on probabilistic metrics that estimate divergence with a certain degree of certainty. This relaxes the need to contact all replicas while still providing a relatively accurate measurement. In this work we designed and implemented a solution to estimate the divergence of data in eventually consistent data stores, that scale to many replicas by allowing clientside caching. Measuring the divergence when there is a large number of clients calls for the development of new algorithms that provide probabilistic guarantees. Additionally, unlike previous works, we intend to focus on measuring the divergence relative to a state that can lead to the violation of application invariants.
Partially funded by project PTDC/EIA EIA/108963/2008 and by an ERC Starting Grant, Agreement Number 307732
37

Hyllienmark, Erik. "Evaluation of two vulnerability scanners accuracy and consistency in a cyber range." Thesis, Linköpings universitet, Institutionen för datavetenskap, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-160092.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
One challenge when conducting exercises in a cyber range is to know what applications and vulnerabilities are present on deployed computers. In this paper, the reliability of application-and vulnerability reporting by two vulnerability scanners, OpenVas and Nexpose, have been evaluated based on their accuracy and consistency. Followed by an experiment, the configurations on two virtual computers were varied in order to identify where each scanner gathers information. Accuracy was evaluated with the f1-score, which combines the precision and recall metric into a single number. Precision and recall values were calculated by comparing installed ap-plications and vulnerabilities on virtual computers with the scanning reports. Consistency was evaluated by quantifying how similar the reporting of applications and vulnerabilities between multiple vulnerability scans were into a number between 0 and 1. The vulnerabilities reported by both scanners were also combined with their union and intersection to increase the accuracy. The evaluation reveal that neither Nexpose or OpenVas accurately and consistently report installed applications and vulnerabilities. Nexpose reported vulnerabilities better than OpenVas with an accuracy of 0.78. Nexpose also reported applications more accurately with an accuracy of 0.96. None of the scanners reported both applications and vulnerabilities consistently over three vulnerability scans. By taking the union of the reported vulnerabilities by both scanners, the accuracy increased by 8 percent compared with the accuracy of Nexpose alone. However, our conclusion is that the scanners’ reporting does not perform well enough to be used for a reliable inventory of applications and vulnerabilities in a cyber range.
38

Hedkvist, Pierre. "Collaborative Editing of Graphical Network using Eventual Consistency." Thesis, Linköpings universitet, Programvara och system, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-154856.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
This thesis compares different approaches of creating a collaborative editing application using different methods such as OT, CRDT and Locking. After a comparison between these methods an implementation based on CRDT was done. The implementation of a collaborative graphical network was made such that consistency is guaranteed. The implementation uses the 2P2P-Graph which was extended in order to support moving of nodes, and uses the client-server communication model. An evaluation of the implementation was made by creating a time-complexity and a space complexity analysis. The result of the thesis includes a comparison between different methods and by an evaluation of the Extended 2P2P-Graph.
39

Ulriksson, Jenny. "Consistency management in collaborative modelling and simulation." Licentiate thesis, KTH, Microelectronics and Information Technology, IMIT, 2005. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-571.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:

The aim of this thesis is to exploit the technological capabilities of computer supported collaborative work (CSCW) in the field of collaborative Modelling and Simulation (M&S). The thesis focuses on addressing two main problems: (i) providing flexible means of consistency management in collaborative M&S, and (ii) the ability of providing platform and application independent services for collaborative M&S.

In this work, some CSCW technologies and how some of the concepts can be incorporated in a distributed collaborative M&S environment, have been studied. An environment for component based simulation development and visualization, which provides support for collaborative M&S, has been designed. Some consistency policies that can be used in conjunction with distributed simulation and the High Level Architecture (HLA) have been investigated. Furthermore, the efficient utilization of HLA and XML in combination, as the foundation of a CSCW infrastructure has been proved. Two consistency policies were implemented utilizing HLA, a strict and an optimistic, in the distributed collaborative environment. Their performance was compared to the performance of a totally relaxed policy, in various collaboration situations.

40

Surajbali, Bholanathsingh, Paul Grace, and Geoff Coulson. "Preserving dynamic reconfiguration consistency in aspect oriented middleware." Universität Potsdam, 2010. http://opus.kobv.de/ubp/volltexte/2010/4137/.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
Aspect-oriented middleware is a promising technology for the realisation of dynamic reconfiguration in heterogeneous distributed systems. However, like other dynamic reconfiguration approaches, AO-middleware-based reconfiguration requires that the consistency of the system is maintained across reconfigurations. AO-middleware-based reconfiguration is an ongoing research topic and several consistency approaches have been proposed. However, most of these approaches tend to be targeted at specific contexts, whereas for distributed systems it is crucial to cover a wide range of operating conditions. In this paper we propose an approach that offers distributed, dynamic reconfiguration in a consistent manner, and features a flexible framework-based consistency management approach to cover a wide range of operating conditions. We evaluate our approach by investigating the configurability and transparency of our approach and also quantify the performance overheads of the associated consistency mechanisms.
41

Mallur, Vikram. "A Model for Managing Data Integrity." Thesis, Université d'Ottawa / University of Ottawa, 2011. http://hdl.handle.net/10393/20233.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
Consistent, accurate and timely data are essential to the functioning of a modern organization. Managing the integrity of an organization’s data assets in a systematic manner is a challenging task in the face of continuous update, transformation and processing to support business operations. Classic approaches to constraint-based integrity focus on logical consistency within a database and reject any transaction that violates consistency, but leave unresolved how to fix or manage violations. More ad hoc approaches focus on the accuracy of the data and attempt to clean data assets after the fact, using queries to flag records with potential violations and using manual efforts to repair. Neither approach satisfactorily addresses the problem from an organizational point of view. In this thesis, we provide a conceptual model of constraint-based integrity management (CBIM) that flexibly combines both approaches in a systematic manner to provide improved integrity management. We perform a gap analysis that examines the criteria that are desirable for efficient management of data integrity. Our approach involves creating a Data Integrity Zone and an On Deck Zone in the database for separating the clean data from data that violates integrity constraints. We provide tool support for specifying constraints in a tabular form and generating triggers that flag violations of dependencies. We validate this by performing case studies on two systems used to manage healthcare data: PAL-IS and iMED-Learn. Our case studies show that using views to implement the zones does not cause any significant increase in the running time of a process.
42

Padawitz, Peter Verfasser], Hartmut [Gutachter] Ehrig, and Dirk [Gutachter] [Siefkes. "Correctness, completeness, and consistency of equational data type specifications / Peter Padawitz ; Gutachter: Hartmut Ehrig, Dirk Siefkes." Berlin : Technische Universität Berlin, 2016. http://d-nb.info/1156180457/34.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
43

Dietrich, Georg [Verfasser], and Frank [Gutachter] Puppe. "Ad Hoc Information Extraction in a Clinical Data Warehouse with Case Studies for Data Exploration and Consistency Checks / Georg Dietrich ; Gutachter: Frank Puppe." Würzburg : Universität Würzburg, 2019. http://d-nb.info/1191102610/34.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
44

Cuce, Simon. "GLOMAR : a component based framework for maintaining consistency of data objects within a heterogeneous distributed file system." Monash University, School of Computer Science and Software Engineering, 2003. http://arrow.monash.edu.au/hdl/1959.1/5743.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
45

Cakir, Fahrettin. "Data-centric solution methodologies for vehicle routing problems." Diss., University of Iowa, 2016. https://ir.uiowa.edu/etd/2052.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
Data-driven decision making has become more popular in today’s businesses including logistics and vehicle routing. Leveraging historical data, companies can achieve goals such as customer satisfaction management, scalable and efficient operation, and higher overall revenue. In the management of customer satisfaction, logistics companies use consistent assignment of their drivers to customers over time. Creating this consistency takes time and depends on the history experienced between the company and the customer. While pursuing this goal, companies trade off the cost of capacity with consistency because demand is unknown on a daily basis. We propose concepts and methods that enable a parcel delivery company to balance the trade-off between cost and customer satisfaction. We use clustering methods that use cumulative historical service data to generate better consistency using the information entropy measure. Parcel delivery companies route many vehicles to serve customer requests on a daily basis. While clustering was important to the development of early routing algorithms, modern solution methods rely on metaheuristics, which are not easily deployable and often do not have open source code bases. We propose a two-stage, shape-based clustering approach that efficiently obtains a clustering of delivery request locations. Our solution technique is based on creating clusters that form certain shapes with respect to the depot. We obtain a routing solution by ordering all locations in every cluster separately. Our results are competitive with a state-of-the-art vehicle routing solver in terms of quality. Moreover, the results show that the algorithm is more scalable and is robust to problem parameters in terms of runtime. Fish trawling can be considered as a vehicle routing problem where the main objective is to maximize the amount of fish (revenue) facing uncertainty on catch. This uncertainty creates an embedded prediction problem before deciding where to harvest. Using previous catch data to train prediction models, we solve the routing problem a fish trawler faces using dynamically updated routing decisions allowing for spatiotemporal correlation in the random catch. We investigate the relationship between the quality of predictions and the quality of revenue generated as a result.
46

Gustavsson, Sanny. "On recovery and consistency preservation in distributed real-time database systems." Thesis, University of Skövde, Department of Computer Science, 2000. http://urn.kb.se/resolve?urn=urn:nbn:se:his:diva-492.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:

In this dissertation, we consider the problem of recovering a crashed node in a distributed database. We especially focus on real-time recovery in eventually consistent databases, where the consistency of replicated data is traded off for increased predictability, availability and performance. To achieve this focus, we consider consistency preservation techniques as well as recovery mechanisms.

Our approach is to perform a thorough literature survey of these two fields. The literature survey considers not only recovery in real-time, distributed, eventually consistent databases, but also related techniques, such as recovery in main-memory resident or immediately consistent databases. We also examine different techniques for consistency preservation.

Based on this literature survey, we present a taxonomy and state-of-the-art report on recovery mechanisms and consistency preservation techniques. We contrast different recovery mechanisms, and highlight properties and aspects of these that make them more or less suitable for use in an eventually consistent database. We also identify unexplored areas and uninvestigated problems within the fields of database recovery and consistency preservation. We find that research on real-time recovery in distributed databases is lacking, and we also propose further investigation of how the choice of consistency preservation technique affects (or should affect) the design of a recovery mechanism for the system.

47

Berndt, Rüdiger [Verfasser], Reinhard [Akademischer Betreuer] German, and Richard [Akademischer Betreuer] Lenz. "Decision Diagrams for the Verification of Consistency in Automotive Product Data / Rüdiger Berndt. Gutachter: Reinhard German ; Richard Lenz." Erlangen : Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU), 2016. http://d-nb.info/108242644X/34.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
48

Roman, Pierre-Louis. "Exploring heterogeneity in loosely consistent decentralized data replication." Thesis, Rennes 1, 2018. http://www.theses.fr/2018REN1S091/document.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
Les systèmes décentralisés sont par nature extensibles mais sont également difficiles à coordonner en raison de leur faible couplage. La réplication de données dans ces systèmes géo-répartis est donc un défi inhérent à leur structure. Les deux contributions de cette thèse exploitent l'hétérogénéité des besoins des utilisateurs et permettent une qualité de service personnalisable pour la réplication de données dans les systèmes décentralisés. Notre première contribution Gossip Primary-Secondary étend le critère de cohérence Update consistency Primary-Secondary afin d'offrir des garanties différenciées de cohérence et de latence de messages pour la réplication de données à grande échelle. Notre seconde contribution Dietcoin enrichit Bitcoin avec des nœuds diet qui peuvent (i) vérifier la validité de sous-chaînes de blocs en évitant le coût exorbitant de la vérification initiale et (ii) choisir leur propres garanties de sécurité et de consommation de ressources
Decentralized systems are scalable by design but also difficult to coordinate due to their weak coupling. Replicating data in these geo-distributed systems is therefore a challenge inherent to their structure. The two contributions of this thesis exploit the heterogeneity of user requirements and enable personalizable quality of services for data replication in decentralized systems. Our first contribution Gossip Primary-Secondary enables the consistency criterion Update consistency Primary-Secondary to offer differentiated guarantees in terms of consistency and message delivery latency for large-scale data replication. Our second contribution Dietcoin enriches Bitcoin with diet nodes that can (i) verify the correctness of entire subchains of blocks while avoiding the exorbitant cost of bootstrap verification and (ii) personalize their own security and resource consumption guarantees
49

Lynch, O'Neil. "Mixture distributions with application to microarray data analysis." Scholar Commons, 2009. http://scholarcommons.usf.edu/etd/2075.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
The main goal in analyzing microarray data is to determine the genes that are differentially expressed across two types of tissue samples or samples obtained under two experimental conditions. In this dissertation we proposed two methods to determine differentially expressed genes. For the penalized normal mixture model (PMMM) to determine genes that are differentially expressed, we penalized both the variance and the mixing proportion parameters simultaneously. The variance parameter was penalized so that the log-likelihood will be bounded, while the mixing proportion parameter was penalized so that its estimates are not on the boundary of its parametric space. The null distribution of the likelihood ratio test statistic (LRTS) was simulated so that we could perform a hypothesis test for the number of components of the penalized normal mixture model. In addition to simulating the null distribution of the LRTS for the penalized normal mixture model, we showed that the maximum likelihood estimates were asymptotically normal, which is a first step that is necessary to prove the asymptotic null distribution of the LRTS. This result is a significant contribution to field of normal mixture model. The modified p-value approach for detecting differentially expressed genes was also discussed in this dissertation. The modified p-value approach was implemented so that a hypothesis test for the number of components can be conducted by using the modified likelihood ratio test. In the modified p-value approach we penalized the mixing proportion so that the estimates of the mixing proportion are not on the boundary of its parametric space. The null distribution of the (LRTS) was simulated so that the number of components of the uniform beta mixture model can be determined. Finally, for both modified methods, the penalized normal mixture model and the modified p-value approach were applied to simulated and real data.
50

Lladós, Segura Jordi. "Novel Consistency-based Approaches for Dealing with Large-scale Multiple Sequence Alignments." Doctoral thesis, Universitat de Lleida, 2018. http://hdl.handle.net/10803/663293.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
L'alineament múltiple de seqüències (MSA) ha esdevingut fonamental per tal de realitzar anàlisis de seqüències a l'era de la biologia moderna. Amb l'arribada de les tecnologies de seqüenciació de nova generació d'alt rendiment (NGS), el volum de dades generades pels seqüenciadors s'ha incrementat significativament. Per tant, s'han de definir nous alineadors que puguin treballar a gran escala. No obstant això, la naturalesa heurística dels mètodes MSA juntament amb la complexitat computacional (NP-hard) està alentint l'anàlisi d'alineaments a gran escala que involucren milers de seqüències o fins i tot a genomes complets. A més, la qualitat d'aquests alineaments es veu dràsticament reduïda quan s'incrementa el nombre de seqüències. Els alineadors basats en consistència asseguren mitigar aquest problema mitjançant la inclusió d'informació precalculada (anomenada com a llibreria de consistència) per cada parell de seqüències a tractar a la fase d'alineament. Aquests mètodes son capaços de produir alineaments d'alta qualitat. No obstant això, mantenir aquest gran volum d'informació, que involucra a tots els parells de seqüències, a memòria limita la quantitat de seqüències que es poden tractar simultàniament. L'objectiu d'aquest PhD és l'estudi i proposta de nous mètodes i eines per tal de permetre, als MSA basats en consistència, la capacitat d'escalar processant conjunts de dades molt més grans, millorant el rendiment i la qualitat de l'alineament. El principal obstacle per tal d'aconseguir dita escalabilitat en aquests mètodes són els requisits de recursos de la llibreria (memòria i temps de còmput) els quals creixen quadràticament amb el nombre de seqüències. Al present treball de tesis es proposen dos mètodes per millorar l'escalabilitat: 1) reduir la informació de la llibreria per tal de mantenir-la a memòria; i 2) emmagatzemar les dades de la llibreria a un sistema d'emmagatzemament secundari distribuït, utilitzant els nous paradigmes de Big Data (MapReduce, bases de dades no-sql) i arquitectures (Hadoop) per calcular, mantenir i accedir a la llibreria eficientment. A més de l'enfocament computacional, s'ha desenvolupat una nova funció objectiu secundària que permet incrementar la qualitat de l'alineament final. Els resultats demostren l'efectivitat de les propostes, les quals milloren l'escalabilitat, rendiment i qualitat de T-Coffee, l'eina emprada per validar les diferents propostes.
El alineamiento múltiple de secuencias (MSA) se ha demostrado como fundamental para poder realizar análisis de secuencias en la era de la biología moderna. Con la llegada de las tecnologías de secuenciación de nueva de generación y de altas prestaciones (NGS), el volumen de datos generados por los secuenciadores se ha incrementado significativamente. Por este motivo, es necesario desarrollar alineadores capaces de trabajar a gran escala. No obstante, la naturaleza heurística de los metodos de MSA, juntamente con su complejidad computacional (NP-hard) está retrasando el análisis de alineamientos a gran escala que involucran miles de secuencias o incluso a genomas completos. Además, la calidad de estos alineamientos se ve drásticamente reducida cuando se incrementa el número de secuencias a alinear. Los alineadores basados en consistencia permiten mitigar este problema añadiendo información precalculada (denominada librería de consistencia) para cada par de secuencias a tratar en la fase de alineamiento. Estos métodos son capaces de producir alineamientos de alta calidad. No obstante, almacenar este gran volumen de información, que involucra a todos los pares de secuencias, en memoria limita la cantidad de secuencias que se pueden tratar simultaneamente. El objetivo de este PhD es el estudio y propuesta de nuevos métodos y herramientas que permitan a los MSA basados en consistencia, escalar (procesando un mayor número de secuencias), mejorando el rendimiento y la calidad del alineamiento. El principal obstáculo para lograr dicha escalabilidad en estos métodos son los requisitos de recursos de la librería (memoria y tiempo de cómputo) los cuales crecen cuadráticamente con el número de secuencias. En el presente trabajo de tesis, se proponen dos métodos para mejorar la escalabilidad: 1) reducir la información de la librería para poder así mantenerla en memoria; y 2) almacenar los datos de la librería en un sistema de almacenamiento secundario distribuido, usando los nuevos paradigmas de Big Data (MapReduce, bases de datos no-sql) y arquitecturas (Hadoop) para calcular, almacenar y acceder a la librería eficientemente. Además del enfoque computacional, se ha desarrollado una nueva función objetivo secundaria para incrementar la calidad del alineamiento final. Los resultados demuestran la efectividad de las propuestas, las cuales mejoran la escalabilidad, rendimiento y calidad de T-Coffee, la herramienta utilizada para validar las diferentes propuestas.
Multiple Sequence Alignment (MSA) has become fundamental for performing sequence analysis in modern biology. With the advent of new high-throughput Next Generation Sequencing (NGS) technologies, the volume of data generated by sequencers has increased significantly. Thus, large-scale aligners are required. However, the heuristic nature of MSA methods together with their NP-hard computational complexity is slowing down the analysis of large-scale alignments with thousands of sequences or even whole genomes. Moreover, the accuracy of these methods is being drastically reduced when more sequences are aligned. Consistency methods proven to mitigate such errors add precomputed information (consistency library) for each pairwise to the alignment stage, and are capable of producing high-rated alignments. However, maintaining this whole collection of pairwise information in the memory limits the maximum number of sequences that can be dealt with at once. The objective of this PhD is the study and proposal of new methods and tools to enable scalability for consistency-based MSA aligners, processing bigger datasets, improving their overall performance and the alignment accuracy. The main obstacle to attain scalability on such methods is the library resource requirements (both memory and computing time) that grows quadratically with the number of sequences. Two methods are proposed to improve the scalability: 1) reducing the library information in order to fit it into the memory; and 2) storing the library data in secondary distributed storage, using the new Big Data paradigms (MapReduce, no-sql databases) and architectures (Hadoop) to calculate, store and access the library efficiently. In addition to the computational approaches, we propose an innovative secondary objective function to increase the accuracy of the final alignment. The results demonstrate the effectiveness of the proposals, which improve the scalability, performance and accuracy of T-Coffee, the tool used to validate the different proposals.

До бібліографії