Dissertations / Theses: 'Distributed data system'

1

Mészáros, István. "Distributed P2P Data Backup System." Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2013. http://www.nusl.cz/ntk/nusl-236378.

Full text

Abstract:

Tato diplomová práce představuje model a prototyp kooperativního distributivního systému zálohování dat založeném na P2P komunikační síti. Návrh systému umožňuje uživatelům přispět svým lokálním volným místem na disku do systému výměnou za spolehlivé úložiště jejich dat u jiných uživatelů. Představené řešení se snaží splnit požadavky uživatelů na ukládání dat, zároveň však také řeší, jak se vypořádat s mírou nepředvídatelnosti uživatelů ohledně poskytování volného místa. To je prováděno dvěma způsoby - využitím Reed - Solomon kódů a zároveň také tím, že poskytuje možnost nastavení parametrů dostupnosti. Jedním z těchto parametrů je časový rozvrh, který značí, kdy uživatel může nabídnout předvídatelný přínos do systému. Druhý parametr se týká spolehlivosti konkrétního uživatele v rámci jeho slíbeného časového úseku. Systém je schopen najít synchronizaci ukládaných dat na základě těchto parametrů. Práce se zaměřuje rovněž na řešení zabezpečení systému proti širšímu spektru možných útoků. Hlavním cílem je publikovat koncept a prototyp. Jelikož se jedná o relativně nové řešení, je důležitá také zpětná vazba od široké veřejnosti, která může produkt používat. Právě jejich komentáře a připomínky jsou podnětem pro další vývoj systému.

APA, Harvard, Vancouver, ISO, and other styles

2

PENHARLOW, DAVID. "MICROMINIATURE DISTRIBUTED DATA ACQUISITION SYSTEM." International Foundation for Telemetering, 1990. http://hdl.handle.net/10150/613485.

Full text

Abstract:

International Telemetering Conference Proceedings / October 29-November 02, 1990 / Riviera Hotel and Convention Center, Las Vegas, Nevada
The new generation of advanced tactical aircraft and missiles places unique demands on the electronic and mechanical designs for flight test instrumentation, high bit rates, operating temperature range and system interconnect wiring requirements. This paper describes a microminiature PCM distributed data acquisition system with integral signal conditioning (MMSC) which has been used in advanced aircraft and missile flight testing. The MMSC system is constructed from microminiature, stackable modules which allow the user to reconfigure the system as the requirements change. A second system is also described which uses the same circuitry in hermetic hybrid packages on plug-in circuit boards.

APA, Harvard, Vancouver, ISO, and other styles

3

Berdugo, Albert. "ADVANCED DISTRIBUTED WIDEBAND DATA ACQUISITION SYSTEM." International Foundation for Telemetering, 2005. http://hdl.handle.net/10150/604918.

Full text

Abstract:

ITC/USA 2005 Conference Proceedings / The Forty-First Annual International Telemetering Conference and Technical Exhibition / October 24-27, 2005 / Riviera Hotel & Convention Center, Las Vegas, Nevada
Wideband data acquisition units have been used as part of an instrumentation system for several decades. Historically, these units operated asynchronously from each other, and from the rest of the instrumentation system when installed on the same test vehicle. When many wideband units are required to slave their formats or sampling rate to the test vehicle’s event of interest such as external computer event clock, radar, or laser pulse train; few solutions were available. Additionally, a single test vehicle may use ten to thirty wideband units operating at up to 20 Mbps each. Such systems present a challenge to the instrumentation engineers to synchronize, transmit safety of flight information, and record. This paper will examine a distributed wideband data acquisition system in which each acquisition unit operates under its own data rate and format, yet remains fully synchronized to an external fixed or variable simultaneous sampling rate to provide total system coherency. The system aggregate rate can be as low as a few Mbps to as high as 1 Gbps. Data acquired from the acquisition units is further multiplexed per IRIG-106 chapter 10 using distributed data multiplexers for recording.

APA, Harvard, Vancouver, ISO, and other styles

4

Aga, Svein. "System Recovery in Large-Scale Distributed Storage Systems." Thesis, Norwegian University of Science and Technology, Department of Computer and Information Science, 2008. http://urn.kb.se/resolve?urn=urn:nbn:no:ntnu:diva-9724.

Full text

Abstract:

This report aims to describe and improve a system recovery process in large-scale storage systems. Inevitable, a recovery process results in the system being loaded with internal replication of data, and will extensively utilize several storage nodes. Such internal load can be categorized and generalized into a maintenance workload class. Obviously, a storage system will have external clients which also introduce load into the system. This can be users altering their data, uploading new content, etc. Load generated by clients can be generalized into a production workload class. When both workload classes are actively present in a system, i.e. the system is recovering while users are simultaneously accessing their data, there will be a competition of system resources between the different workload classes. The storage must ensure Quality of Service (QoS) for each workload class so that both are guaranteed system resources. We have created Dynamic Tree with Observed Metrics (DTOM), an algorithm designed to gracefully throttle resources between multiple different workload classes. DTOM can be used to enforce and ensure QoS for the variety of workloads in a system. Experimental results demonstrate that DTOM outperforms another well-known scheduling algorithm. In addition, we have designed a recovery model which aims to improve handling of critical maintenance workload. Although the model is intentionally intended for system recovery, it can also be applied to many other contexts.

APA, Harvard, Vancouver, ISO, and other styles

5

Gu, Xuan. "Selective Data Replication for Distributed Geographical Data Sets." Thesis, University of Canterbury. Computer Science and Software Engineering, 2008. http://hdl.handle.net/10092/2545.

Full text

Abstract:

The main purpose of this research is to incorporate additional higher-level semantics into the existing data replication strategies in such a way that their flexibility and performance can be improved in favour of both data providers and consumers. The resulting approach from this research is referred to as the selective data replication system. With this system, the data that has been updated by a data provider is captured and batched into messages known as update notifications. Once update notifications are received by data consumers, they are used to evaluate so-called update policies, which are specified by data consumers containing details on when data replications need to occur and what data needs to be updated during the replications.

APA, Harvard, Vancouver, ISO, and other styles

6

Wong, León Kevin, and Valdivia Diego Eduardo Antonio Rodríguez. "Distributed Social Media System - Multimedia Data Linkage." Bachelor's thesis, Universidad Peruana de Ciencias Aplicadas (UPC), 2014. http://hdl.handle.net/10757/324525.

Full text

Abstract:

Actualmente, las redes sociales en línea son uno de los principales medios donde se intercambia gran cantidad de información. En estas, los usuarios intentan reflejar su actividad diaria en forma de publicaciones en sus muros o de otros usuarios. Asimismo, las imágenes representan gran parte de la información sobre la actividad del usuario, por ejemplo, una foto en donde esté etiquetado. Estas interacciones del usuario en las redes ayudan a generar su identidad digital. La información revelada por la metadata de las imágenes enriquece este perfil y contribuye a mejorar los resultados en procesos como minería de datos, marketing, etc. El objetivo de este proyecto es generar un perfil digital en base a la información y actividad que contribuye un usuario a una red social, recopilando y mostrando explícitamente varios hechos que se revelan aprovechando la metadata de las imágenes y el factor temporal de la actividad en línea. Esto incluye el proceso de extracción, enriquecimiento y encapsulación de data en un modelo ontológico propuesto. Los resultados de los experimentos muestran que la información en el perfil, luego del enriquecimiento, es aproximadamente cuatro veces la información inicial, y la precisión de la nueva información está por encima del 75%. Trabajos futuros se inclinan hacia la detección del tipo de relación que existe entre una persona y uno de sus contactos. Asimismo, otro tema relevante a explorar incluye la extracción de un mayor rango de entidades, tales como eventos o temas de interés de un individuo, con el fin de mejorar el perfil digital del usuario. Finalmente, la minería de datos en el proceso de extracción de información ayudaría a enfocar mejor el marketing a los usuarios de redes sociales ya que dicha publicidad podría hacerse más personalizada. Palabras clave Linked data, información multimedia, perfil digital, redes sociales, metadata
Tesis

APA, Harvard, Vancouver, ISO, and other styles

7

Meth, Halli Elaine. "DecaFS: A Modular Distributed File System to Facilitate Distributed Systems Education." DigitalCommons@CalPoly, 2014. https://digitalcommons.calpoly.edu/theses/1206.

Full text

Abstract:

Data quantity, speed requirements, reliability constraints, and other factors encourage industry developers to build distributed systems and use distributed services. Software engineers are therefore exposed to distributed systems and services daily in the workplace. However, distributed computing is hard to teach in Computer Science courses due to the complexity distribution brings to all problem spaces. This presents a gap in education where students may not fully understand the challenges introduced with distributed systems. Teaching students distributed concepts would help better prepare them for industry development work. DecaFS, Distributed Educational Component Adaptable File System, is a modular distributed file system designed for educational use. The goal of the system is to teach distributed computing concepts to undergraduate and graduate level students by allowing them to develop small, digestible portions of the system. The system is broken up into layers, and each layer is broken up into modules so that students can build or modify different components in small, assignment- sized portions. Students can replace modules or entire layers by following the DecaFS APIs and recompiling the system. This allows the behavior of the DFS (Distributed File System) to change based on student implementation, while providing base functionality for students to work from. Our implementation includes a code base of core DecaFS Modules that students can work from and basic implementations of non-core DecaFS Modules. Our basic non-core modules can be modified to implement more complex distribution techniques without modifying core modules. We have shown the feasibility of developing a modular DFS, while adhering to requirements such as configurable sizes (file, stripe, chunk) and support of multiple data replication strategies.

APA, Harvard, Vancouver, ISO, and other styles

8

Coffey, Thomas. "A distributed global-wide security system." Thesis, University of Ulster, 1994. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.260989.

Full text

APA, Harvard, Vancouver, ISO, and other styles

9

Mota, Gilberto Ferreira. "Radar data processing using a distributed computational system." Thesis, Monterey, California. Naval Postgraduate School, 1992. http://hdl.handle.net/10945/24022.

Full text

Abstract:

This research specifies and validates a new concurrent decomposition scheme, called Confined Space Search Decomposition (CSSD), to exploit parallelism of Radar Data Processing algorithms using a Distributed Computational System. To formalize the specification we propose and apply an object-oriented methodology called Decomposition Cost Evaluation Model (DCEM). To reduce the penalties of load imbalance we propose a distributed dynamic load balance heuristic called Object Reincarnation (OR). To validate the research we first compare our decomposition with an identified alternative using the proposed DCEM model and then develop a theoretical prediction of selected parameters. We also develop a simulation to check the Object Reincarnation concept.

APA, Harvard, Vancouver, ISO, and other styles

10

Abdul-Huda, Bilal Anas Hamed. "A system for managing distributed multi-media data." Thesis, University of Ulster, 1989. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.328195.

Full text

APA, Harvard, Vancouver, ISO, and other styles

11

Kolb, John. "SIGNAL PROCESSING ABOUT A DISTRIBUTED DATA ACQUISITION SYSTEM." International Foundation for Telemetering, 2002. http://hdl.handle.net/10150/605610.

Full text

Abstract:

International Telemetering Conference Proceedings / October 21, 2002 / Town & Country Hotel and Conference Center, San Diego, California
Because modern data acquisition systems use digital backplanes, it is logical for more and more data processing to be done in each Data Acquisition Unit (DAU) or even in each module. The processing related to an analog acquisition module typically takes the form of digital signal conditioning for range adjust, linearization and filtering. Some of the advantages of this are discussed in this paper. The next stage is powerful processing boards within DAUs for data reduction and third-party algorithm development. Once data is being written to and from powerful processing modules an obvious next step is networking and decom-less access to data. This paper discusses some of the issues related to these types of processing.

APA, Harvard, Vancouver, ISO, and other styles

12

Jeon, Dae Kyung. "Methodologies for developing distributed systems in Ada with a simulation of a distributed Ada system." Virtual Press, 1989. http://liblink.bsu.edu/uhtbin/catkey/722459.

Full text

Abstract:

In recent years, the field of distributed processing, distributed systems, has undergone great change, and has been an area attracting tremendous research and development efforts. This thesis explores the various current methodologies for designing, developing and implementing distributed systems using the Ada programming language, and goes on to implement a simulation of a distributed store system using the "virtual node" design approach. After a brief introduction on distributed systems in general, an investigation of the basic issues and problems involved in distributing Ada programs coupled with an analysis and comparison of various approaches to developing distributed Ada systems is carried out. It is shown that one of the critical problems of Ada in a distributed environment is its implicit assumption of a single memory processor. A simulation of a distributed system (store system) is carried out using the virtual node method of developing distributed Ada systems. The various stages of this design method including interface task specification are stepped through. A sample run of the. system is given, including the customer file, stock file data and the monitored output of the system.
Department of Computer Science

APA, Harvard, Vancouver, ISO, and other styles

13

Pitts, David Vernon. "A storage management system for a reliable distributed operating system." Diss., Georgia Institute of Technology, 1986. http://hdl.handle.net/1853/16895.

Full text

APA, Harvard, Vancouver, ISO, and other styles

14

Andersson, Sara. "Data Processing and Collection in Distributed Systems." Thesis, Luleå tekniska universitet, Institutionen för system- och rymdteknik, 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:ltu:diva-85313.

Full text

Abstract:

Distributed systems can be seen in a variety of applications that is in use today. Tritech provides several systems that to some extent consist of distributed systems of nodes. These nodes collect data and the data have to be processed. A problem that often appears when designing these systems, is deciding where the data should be processed, i.e., which architecture is the most suitable one for the system. Decide the architecture for these systems are not simple, especially since it changes rather quickly due to the development in these areas. The thesis aims to perform a study regarding which factors affect the choice of architecture in a distributed system and how these factors relate to each other. To be able to analyze which factors do affect the choice of architecture and to what extent, a simulator was implemented. The simulator received information about the factors as input, and return one or several architecture configurations as output. By performing qualitative interviews, the input factors to the simulator were chosen. The factors that were analyzed in the thesis was: security, storage, working memory, size of data, number of nodes, data processing per data set, robust communication, battery consumption, and cost. From the qualitative interviews as well as from the prestudy five architecture configuration was chosen. The chosen architectures were: thin-client server, thick-client server, three-tier client-server, peer-to-peer, and cloud computing. The simulator was validated regarding the three given use cases: agriculture, the train industry, and industrial Internet of Things. The validation consisted of five existing projects from Tritech. From the results of the validation, the simulator produced correct results for three of the five projects. By using the simulator results, it could be seen which factors affect the choice of architecture more than others and are hard to provide in the same architecture since they are conflicting factors. The conflicting factors were security together with working memory and robust communication. The factor working memory together with battery consumption also showed to be conflicting factors and is hard to provide within the same architecture. Therefore, according to the simulator, it can be seen that the factors that affect the choice of architecture were working memory, battery consumption, security, and robust communication. By using the results of the simulator, a decision matrix was designed whose purpose was to facilitate the choice of architecture. The evaluation of the decision matrix consisted of four projects from Tritech including the three given use cases: agriculture, the train industry, and industrial Internet of Things. The evaluation of the decision matrix showed that the two architectures that received the most points, one of the architectures were used in the validated project.
Distribuerade system kan ses i en mängd olika applikationer som används idag. Tritech jobbar med flera produkter som till viss del består av distribuerade system av noder. Det dessa system har gemensamt är att noderna samlar in data och denna data kommer på ett eller ett annat sätt behöva bearbetas. En fråga som ofta behövs besvaras vid uppsättning av arkitekturen för sådana projekt är huruvida datan ska bearbetas, d.v.s. vilken arkitektkonfiguration som är mest lämplig för systemet. Att ta dessa beslut har visat sig inte alltid vara helt simpelt, och det ändrar sig relativt snabbt med den utvecklingen som sker på dessa områden. Denna uppsats syftar till att utföra en studie om vilka faktorer som påverkar valet av arkitektur för ett distribuerat system samt hur dessa faktorer förhåller sig mot varandra. För att kunna analysera vilka faktorer som påverkar valet av arkitektur och i vilken utsträckning, implementerades en simulator. Simulatorn tog faktorerna som input och returnerade en eller flera arkitekturkonfigurationer som output. Genom att utföra kvalitativa intervjuer valdes faktorerna till simulatorn. Faktorerna som analyserades i denna uppsats var: säkerhet, lagring, arbetsminne, storlek på data, antal noder, databearbetning per datamängd, robust kommunikation, batteriförbrukning och kostnad. Från de kvalitativa intervjuerna och från förstudien valdes även fem stycken arkitekturkonfigurationer. De valda arkitekturerna var: thin-client server, thick-client server, three-tier client-server, peer-to-peer, och cloud computing. Simulatorn validerades inom de tre givna användarfallen: lantbruk, tågindustri och industriell IoT. Valideringen bestod av fem befintliga projekt från Tritech. Från resultatet av valideringen producerade simulatorn korrekta resultat för tre av de fem projekten. Utifrån simulatorns resultat, kunde det ses vilka faktorer som påverkade mer vid valet av arkitektur och är svåra att kombinera i en och samma arkitekturkonfiguration. Dessa faktorer var säkerhet tillsammans med arbetsminne och robust kommunikation. Samt arbetsminne tillsammans med batteriförbrukning visade sig också vara faktorer som var svåra att kombinera i samma arkitektkonfiguration. Därför, enligt simulatorn, kan det ses att de faktorer som påverkar valet av arkitektur var arbetsminne, batteriförbrukning, säkerhet och robust kommunikation. Genom att använda simulatorns resultat utformades en beslutsmatris vars syfte var att underlätta valet av arkitektur. Utvärderingen av beslutsmatrisen bestod av fyra projekt från Tritech som inkluderade de tre givna användarfallen: lantbruk, tågindustrin och industriell IoT. Resultatet från utvärderingen av beslutsmatrisen visade att de två arkitekturerna som fick flest poäng, var en av arkitekturerna den som användes i det validerade projektet

APA, Harvard, Vancouver, ISO, and other styles

15

Vykunta, Venkateswara Rao. "Class management in a distributed actor system." Master's thesis, This resource online, 1994. http://scholar.lib.vt.edu/theses/available/etd-02022010-020159/.

Full text

APA, Harvard, Vancouver, ISO, and other styles

16

Rao, Ananth K. "The DFS distributed file system : design and implementation." Online version of thesis, 1989. http://hdl.handle.net/1850/10500.

Full text

APA, Harvard, Vancouver, ISO, and other styles

17

Date, Amit Vinayak. "Implementation of distributed database and reliable multicast for Distributed Conferencing System version 2." [Gainesville, Fla.] : University of Florida, 2001. http://purl.fcla.edu/fcla/etd/UFE0000313.

Full text

Abstract:

Thesis (M.S.)--University of Florida, 2001.
Title from title page of source document. Document formatted into pages; contains viii, 57 p.; also contains graphics. Includes vita. Includes bibliographical references.

APA, Harvard, Vancouver, ISO, and other styles

18

Schnell, Felicia. "Multicast Communication for Increased Data Exchange in Data- Intensive Distributed Systems." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-232132.

Full text

Abstract:

Modern applications are required to handle and communicate an increasing amount of data. Meanwhile, distributed systems containing multiple computationally weak components becomes more common, resulting in a problematic situation. Choosing among communication strategies, used for delivering message between entities, therefore becomes crucial in order to efficiently utilize available resources. Systems where identical data is delivered to many recipients are common nowadays, but may apply an underlying communication strategy based on direct interaction between sender and receiver which is insufficient. Multicasting refers to a technique for group communication where messages can be distributed to participating nodes in a single transmission. This technique is developed to circumvent the problem of high workload on sender side and redundant traffic in the network, and constitutes the focus for this thesis. Within the area of Electronic Warfare and self-protection systems, time constitutes a critical aspect in order to provide relevant information for decision making. Self-protection systems developed by Saab, used in military aircrafts, must provide situational awareness to guarantee that correct decisions can be made at the right time. With more advanced systems, where the amount of data needed to be transmitted increases, the need of fast communication is essential to achieve quality of service. This thesis investigates how the deployment of multicast, in a distributed data-intensive system, could prepare a system for increased data exchange. The result is a communication design which allows for the system to distribute messages to a group of receivers with less effort from the sender and with reduced redundant traffic transferred over the same link. Comparative measurements are conducted between the new implementation and the old system. The result of the evaluation shows that the multicast solution both can decrease the time for message handling as well as the workload on endpoints significantly.
Nutidens applikationer måste kunna hantera och kommunicera en ökad datamängd. Samtidigt har distribuerade system bestående av många beräkningsmässigt svaga enheter blivit allt mer vanligt, vilket är problematiskt. Valet av kommunikationsstrategi, för att leverera data mellan enheter i ett system, är därför av stor betydelse för att uppnå effektivt utnyttjande av tillgängliga resurser. System där identisk information ska distribueras till flertalet mottagare är vanligt förekommande idag. Den underliggande kommunikationsstrategin som används kan dock baseras på direkt interaktion mellan sändare och mottagare vilket är ineffektivt. Multicast (Flersändning) syftar till ett samlingsbegrepp inom datorkommunikation baserat på gruppsändning av information. Denna teknik är utvecklad för att kringgå problematiken med hög belastning på sändarsidan och dessutom minska belastningen på nätverket, och utgör fokus för detta arbete. Inom telekrigföring och självskyddssystem utgör tiden en betydande faktor för att kunna tillhandahålla relevant information som kan stödja beslutsfattning. För självskyddssystem utvecklade av Saab, vilka används i militärflygplan, är situationsmedvetenhet av stor betydelse då det möjliggör för att korrekta beslut kan tas vid rätt tidpunkt. Genom utvecklingen av mer avancerade system, där mängden meddelanden som måste passera genom nätverket ökar, tillkommer höga krav på snabb kommunikation för att kunna åstadkomma kvalité. Denna uppsatsrapport undersöker hur införandet av multicast, i ett dataintensivt distribuerat system, kan förbereda ett system för ökat datautbyte. Arbetet har resulterat i en kommunikationsdesign som gör det möjligt för systemet att distribuera meddelanden till grupp av mottagare med minskad belastning på sändarsidan och mindre redundant trafik på de utgående länkarna. Jämförandet mätningar har gjorts mellan den nya implementationen och det gamla systemet. Resultaten visar att multicast-lösningen både kan reducera tiden för meddelande hantering samt belastningen på ändnoder avsevärt.

APA, Harvard, Vancouver, ISO, and other styles

19

Pu, Calton. "Replication and nested transactions in the Eden Distributed System /." Thesis, Connect to this title online; UW restricted, 1986. http://hdl.handle.net/1773/6881.

Full text

APA, Harvard, Vancouver, ISO, and other styles

20

Rasmussen, Arthur N. "AN INTELLIGENT MANAGER FOR A DISTRIBUTED TELEMETRY SYSTEM." International Foundation for Telemetering, 1993. http://hdl.handle.net/10150/608865.

Full text

Abstract:

International Telemetering Conference Proceedings / October 25-28, 1993 / Riviera Hotel and Convention Center, Las Vegas, Nevada
A number of efforts at NASA's Johnson Space Center are exploring ways of improving operational efficiency and effectiveness of telemetry data distribution. An important component of this is the Real-Time Data System project in the Shuttle Mission Control Center. This project's telemetry system is based on a network of engineering workstations that acquire, distribute, analyze, and display the data. Telemetry data is acquired and partially processed through a commercial programmable telemetry processor. The data is then transferred into workstations where the remaining decommutation, conversion and calibration steps are performed. The results are sent over the network to applications operating within end user workstations. This complex distributed environment is managed by PILOT, an intelligent system that monitors data flow and process integrity with the goal of providing a very high level of availability requiring minimal human involvement. PILOT is a rule-based expert system that oversees the operation of the system. It interacts with agents that operate in the local environment of each workstation and advises the local agents of system status and configuration. This enables each local agent to manage its local environment and provides a resource to which it can come with issues that need a global view for resolution. PILOT is implemented using a commercially available real-time expert system shell and operates in a heterogeneous set of hardware platforms.

APA, Harvard, Vancouver, ISO, and other styles

21

潘淑欣 and Shuk-yan Poon. "A decentralized multi-agent system for restructured power system operation." Thesis, The University of Hong Kong (Pokfulam, Hong Kong), 1997. http://hub.hku.hk/bib/B31219810.

Full text

APA, Harvard, Vancouver, ISO, and other styles

22

Zhou, Dong. "JECho - An efficient, customizable, adaptive distributed event system." Diss., Georgia Institute of Technology, 2002. http://hdl.handle.net/1853/9180.

Full text

APA, Harvard, Vancouver, ISO, and other styles

23

Jayachandran, Prasanth. "A Distributed Interactive Cube Exploration System." The Ohio State University, 2013. http://rave.ohiolink.edu/etdc/view?acc_num=osu1366369292.

Full text

APA, Harvard, Vancouver, ISO, and other styles

24

Ward, Paul. "A Scalable Partial-Order Data Structure for Distributed-System Observation." Thesis, University of Waterloo, 2001. http://hdl.handle.net/10012/1161.

Full text

Abstract:

Distributed-system observation is foundational to understanding and controlling distributed computations. Existing tools for distributed-system observation are constrained in the size of computation that they can observe by three fundamental problems. They lack scalable information collection, scalable data-structures for storing and querying the information collected, and scalable information-abstraction schemes. This dissertation addresses the second of these problems. Two core problems were identified in providing a scalable data structure. First, in spite of the existence of several distributed-system-observation tools, the requirements of such a structure were not well-defined. Rather, current tools appear to be built on the basis of events as the core data structure. Events were assigned logical timestamps, typically Fidge/Mattern, as needed to capture causality. Algorithms then took advantage of additional properties of these timestamps that are not explicit in the formal semantics. This dissertation defines the data-structure interface precisely, and goes some way toward reworking algorithms in terms of that interface. The second problem is providing an efficient, scalable implementation for the defined data structure. The key issue in solving this is to provide a scalable precedence-test operation. Current tools use the Fidge/Mattern timestamp for this. While this provides a constant-time test, it requires space per event equal to the number of processes. As the number of processes increases, the space consumption becomes sufficient to affect the precedence-test time because of caching effects. It also becomes problematic when the timestamps need to be copied between processes or written to a file. Worse, existing theory suggested that the space-consumption requirement of Fidge/Mattern timestamps was optimal. In this dissertation we present two alternate timestamp algorithms that require substantially less space than does the Fidge/Mattern algorithm.

APA, Harvard, Vancouver, ISO, and other styles

25

Lin, Tsai S. (Tsai Shooumeei). "A Highly Fault-Tolerant Distributed Database System with Replicated Data." Thesis, University of North Texas, 1994. https://digital.library.unt.edu/ark:/67531/metadc278403/.

Full text

Abstract:

Because of the high cost and impracticality of a high connectivity network, most recent research in transaction processing has focused on a distributed replicated database system. In such a system, multiple copies of a data item are created and stored at several sites in the network, so that the system is able to tolerate more crash and communication failures and attain higher data availability. However, the multiple copies also introduce a global inconsistency problem, especially in a partitioned network. In this dissertation a tree quorum algorithm is proposed to solve this problem, imposing a logical tree structure along with dynamic system reconfiguration on all the copies of each data item. The proposed algorithm can be viewed as a dynamic voting technique which, with the help of an appropriate concurrency control algorithm, exhibits the major advantages of quorum-based replica control algorithms and of the available copies algorithm, so that a single copy is read for a read operation and a quorum of copies is written for a write operation. In addition, read and write quorums are computed dynamically and independently. As a result expensive read operations, like those that require several copies of a data item to be read in most quorum schemes, are eliminated. Furthermore, the message costs of read and write operations are reduced by the use of smaller quorum sizes. Quorum sizes can be reduced to a constant in a lightly loaded system, and log n in a failure-free network, as well as [n +1/2] in a partitioned network in a heavily loaded system. On average, our algorithm requires fewer messages than the best known tree quorum algorithm, while still maintaining the same upper bound on quorum size. One-copy serializability is guaranteed with higher data availability and highest degree of fault tolerance (up to n - 1 site failures).

APA, Harvard, Vancouver, ISO, and other styles

26

Spafford, Eugene Howard. "Kernel structures for a distributed operating system." Diss., Georgia Institute of Technology, 1986. http://hdl.handle.net/1853/9144.

Full text

APA, Harvard, Vancouver, ISO, and other styles

27

Gherfal, Fawzi Fathi. "On the reliability of an object based distributed system /." The Ohio State University, 1985. http://rave.ohiolink.edu/etdc/view?acc_num=osu1487260531954481.

Full text

APA, Harvard, Vancouver, ISO, and other styles

28

Mirza, Ahmed Kamal. "Managing high data availability in dynamic distributed derived data management system (D4M) under Churn." Thesis, KTH, Kommunikationssystem, CoS, 2012. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-95220.

Full text

Abstract:

The popularity of decentralized systems is increasing day by day. These decentralized systems are preferable to centralized systems for many reasons, specifically they are more reliable and more resource efficient. Decentralized systems are more effective in the area of information management in the case when the data is distributed across multiple peers and maintained in a synchronized manner. This data synchronization is the main requirement for information management systems deployed in a decentralized environment, especially when data/information is needed for monitoring purposes or some dependent data artifacts rely upon this data. In order to ensure a consistent and cohesive synchronization of dependent/derived data in a decentralized environment, a dependency management system is needed. In a dependency management system, when one chunk of data relies on another piece of data, the resulting derived data artifacts can use a decentralized systems approach but must consider several critical issues, such as how the system behaves if any peer goes down, how the dependent data can be recalculated, and how the data which was stored on a failed peer can be recovered. In case of a churn (resulting from failing peers), how does the system adapt the transmission of data artifacts with respect to their access patterns and how does the system provide consistency management? The major focus of this thesis was to addresses the churn behavior issues and to suggest and evaluate potential solutions while ensuring a load balanced network, within the scope of a dependency information management system running in a decentralized network. Additionally, in peer-to-peer (P2P) algorithms, it is a very common assumption that all peers in the network have similar resources and capacities which is not true in real world networks. The peer‟s characteristics can be quite different in actual P2P systems; as the peers may differ in available bandwidth, CPU load, available storage space, stability, etc. As a consequence, peers having low capacities are forced to handle the same computational load which the high capacity peers handle, resulting in poor overall system performance. In order to handle this situation, the concept of utility based replication is introduced in this thesis to avoid the assumption of peer equality, enabling efficient operation even in heterogeneous environments where the peers have different configurations. In addition, the proposed protocol assures a load balanced network while meeting the requirement for high data availability, thus keeping the distributed dependent data consistent and cohesive across the network. Furthermore, an implementation and evaluation in the PeerfactSim.KOM P2P simulator of an integrated dependency management framework, D4M, was done. In order to benchmark the implementation of proposed protocol, the performance and fairness tests were examined. A conclusion is that the proposed solution adds little overhead to the management of the data availability in a distributed data management systems despite using a heterogeneous P2P environment. Additionally, the results show that the various P2P clusters can be introduced in the network based on peer‟s capabilities.
Populariteten av decentraliserade system ökar varje dag. Dessa decentraliserade system är att föredra framför centraliserade system för många anledningar, speciellt de är mer säkra och mer resurseffektiv. Decentraliserade system är mer effektiva inom informationshantering i fall när data delas ut över flera Peers och underhållas på ett synkroniserat sätt. Dessa data synkronisering är huvudkravet för informationshantering som utplacerade i en decentraliserad miljö, särskilt när data / information behövs för att kontrollera eller några beroende artefakter uppgifter lita på dessa data. För att säkerställa en konsistent och härstammar synkronisering av beroende / härledd data i en decentraliserad miljö, är ett beroende ledningssystem behövs. I ett beroende ledningssystem, när en bit av data som beror på en annan bit av data, kan de resulterande erhållna uppgifterna artefakter använd decentraliserad system approach, men måste tänka på flera viktiga frågor, såsom hur systemet fungerar om någon peer går ner, hur beroende data kan omräknas, och hur de data som lagrats på en felaktig peer kan återvinnas. I fall av churn (på grund av brist Peers), hur systemet anpassar sändning av data artefakter med avseende på deras tillgång mönster och hur systemet ger konsistens förvaltning? Den viktigaste fokus för denna avhandling var att behandlas churn beteende frågor och föreslå och bedöma möjliga lösningar samtidigt som en belastning välbalanserat nätverk, inom ramen för ett beroende information management system som kör i ett decentraliserade nätverket. Dessutom, i peerto- peer (P2P) algoritmer, är det en mycket vanlig uppfattning att alla Peers i nätverket har liknande resurser och kapacitet vilket inte är sant i verkliga nätverk. Peer egenskaper kan vara ganska olika i verkliga P2P system, som de Peers kan skilja sig tillgänglig bandbredd, CPU tillgängligt lagringsutrymme, stabilitet, etc. Som en följd, är peers har låg kapacitet tvingade att hantera sammaberäkningsbelastningen som har hög kapacitet peer hanterar vilket resulterar i dåligsystemets totala prestanda. För att hantera den här situationen, är begreppet verktygetbaserad replikering införs i denna uppsats att undvika antagandet om peer jämlikhet, så att effektiv drift även i heterogena miljöer där Peers har olika konfigurationer. Dessutom säkerställer det föreslagna protokollet en belastning välbalanserat nätverk med iakttagande kraven på hög tillgänglighet och därför hålla distribuerade beroende datakonsekvent och kohesiv över nätverket. Vidare ett genomförande och utvärdering iPeerfactSim.KOM P2P simulatorn av en integrerad beroende förvaltningsram, D4M, var gjort[.] De prestandatester och tester rättvisa undersöktes för att riktmärka genomförandet avföreslagna protokollet. En slutsats är att den föreslagna lösningen tillagt lite overhead för förvaltningen av tillgången till uppgifterna inom ett distribuerade system för datahantering, trots med användning av en heterogen P2P miljö. Dessutom visar resultaten att de olikaP2P-kluster kan införas i nätverket baserat på peer-möjligheter.

APA, Harvard, Vancouver, ISO, and other styles

29

Silcock, Jackie, and mikewood@deakin edu au. "Programmer friendly and efficient distributed shared memory integrated into a distributed operating system." Deakin University. School of Computing and Mathematics, 1998. http://tux.lib.deakin.edu.au./adt-VDU/public/adt-VDU20051114.110240.

Full text

Abstract:

Distributed Shared Memory (DSM) provides programmers with a shared memory environment in systems where memory is not physically shared. Clusters of Workstations (COWs), an often untapped source of computing power, are characterised by a very low cost/performance ratio. The combination of Clusters of Workstations (COWs) with DSM provides an environment in which the programmer can use the well known approaches and methods of programming for physically shared memory systems and parallel processing can be carried out to make full use of the computing power and cost advantages of the COW. The aim of this research is to synthesise and develop a distributed shared memory system as an integral part of an operating system in order to provide application programmers with a convenient environment in which the development and execution of parallel applications can be done easily and efficiently, and which does this in a transparent manner. Furthermore, in order to satisfy our challenging design requirements we want to demonstrate that the operating system into which the DSM system is integrated should be a distributed operating system. In this thesis a study into the synthesis of a DSM system within a microkernel and client-server based distributed operating system which uses both strict and weak consistency models, with a write-invalidate and write-update based approach for consistency maintenance is reported. Furthermore a unique automatic initialisation system which allows the programmer to start the parallel execution of a group of processes with a single library call is reported. The number and location of these processes are determined by the operating system based on system load information. The DSM system proposed has a novel approach in that it provides programmers with a complete programming environment in which they are easily able to develop and run their code or indeed run existing shared memory code. A set of demanding DSM system design requirements are presented and the incentives for the placement of the DSM system with a distributed operating system and in particular in the memory management server have been reported. The new DSM system concentrated on an event-driven set of cooperating and distributed entities, and a detailed description of the events and reactions to these events that make up the operation of the DSM system is then presented. This is followed by a pseudocode form of the detailed design of the main modules and activities of the primitives used in the proposed DSM system. Quantitative results of performance tests and qualitative results showing the ease of programming and use of the RHODOS DSM system are reported. A study of five different application is given and the results of tests carried out on these applications together with a discussion of the results are given. A discussion of how RHODOS DSM allows programmers to write shared memory code in an easy to use and familiar environment and a comparative evaluation of RHODOS DSM with other DSM systems is presented. In particular, the ease of use and transparency of the DSM system have been demonstrated through the description of the ease with which a moderately inexperienced undergraduate programmer was able to convert, write and run applications for the testing of the DSM system. Furthermore, the description of the tests performed using physically shared memory shows that the latter is indistinguishable from distributed shared memory; this is further evidence that the DSM system is fully transparent. This study clearly demonstrates that the aim of the research has been achieved; it is possible to develop a programmer friendly and efficient DSM system fully integrated within a distributed operating system. It is clear from this research that client-server and microkernel based distributed operating system integrated DSM makes shared memory operations transparent and almost completely removes the involvement of the programmer beyond classical activities needed to deal with shared memory. The conclusion can be drawn that DSM, when implemented within a client-server and microkernel based distributed operating system, is one of the most encouraging approaches to parallel processing since it guarantees performance improvements with minimal programmer involvement.

APA, Harvard, Vancouver, ISO, and other styles

30

Sannellappanavar, Vijaya Laxmankumar. "DATAWAREHOUSE APPROACH TO DECISION SUPPORT SYSTEM FROM DISTRIBUTED, HETEROGENEOUS SOURCES." University of Akron / OhioLINK, 2006. http://rave.ohiolink.edu/etdc/view?acc_num=akron1153506475.

Full text

APA, Harvard, Vancouver, ISO, and other styles

31

Tacic, Ivan. "Efficient Synchronized Data Distribution Management in Distributed Simulations." Diss., Georgia Institute of Technology, 2005. http://hdl.handle.net/1853/6822.

Full text

Abstract:

Data distribution management (DDM) is a mechanism to interconnect data producers and data consumers in a distributed application. Data producers provide useful data to consumers in the form of messages. For each message produced, DDM determines the set of data consumers interested in receiving the message and delivers it to those consumers. We are particularly interested in DDM techniques for parallel and distributed discrete event simulations. Thus far, researchers have treated synchronization of events (i.e. time management) and DDM independent of each other. This research focuses on how to realize time managed DDM mechanisms. The main reason for time-managed DDM is to ensure that changes in the routing of messages from producers to consumers occur in a correct sequence. Also time managed DDM avoids non-determinism in the federation execution, which may result in non-repeatable executions. An optimistic approach to time managed DDM is proposed where one allows DDM events to be processed out of time stamp order, but a detection and recovery procedure is used to recover from such errors. These mechanisms are tailored to the semantics of the DDM operations to ensure an efficient realization. A correctness proof is presented to verify the algorithm correctly synchronizes DDM events. We have developed a fully distributed implementation of the algorithm within the framework of the Georgia Tech Federated Simulation Development Kit (FDK) software. A performance evaluation of the synchronized DDM mechanism has been completed in a loosely coupled distributed system consisting of a network of workstations connected over a local area network (LAN). We compare time-managed versus unsynchronized DDM for two applications that exercise different mobility patterns: one based on a military simulation and a second utilizing a synthetic workload. The experiments and analysis illustrate that synchronized DDM performance depends on several factors: the simulations model (e.g. lookahead), applications mobility patterns and the network hardware (e.g. size of network buffers). Under certain mobility patterns, time-managed DDM is as efficient as unsynchronized DDM. There are also mobility patterns where time-managed DDM overheads become significant, and we show how they can be reduced.

APA, Harvard, Vancouver, ISO, and other styles

32

Al-Sinayyid, Ali. "JOB SCHEDULING FOR STREAMING APPLICATIONS IN HETEROGENEOUS DISTRIBUTED PROCESSING SYSTEMS." OpenSIUC, 2020. https://opensiuc.lib.siu.edu/dissertations/1868.

Full text

Abstract:

The colossal amounts of data generated daily are increasing exponentially at a never-before-seen pace. A variety of applications—including stock trading, banking systems, health-care, Internet of Things (IoT), and social media networks, among others—have created an unprecedented volume of real-time stream data estimated to reach billions of terabytes in the near future. As a result, we are currently living in the so-called Big Data era and witnessing a transition to the so-called IoT era. Enterprises and organizations are tackling the challenge of interpreting the enormous amount of raw data streams to achieve an improved understanding of data, and thus make efficient and well-informed decisions (i.e., data-driven decisions). Researchers have designed distributed data stream processing systems that can directly process data in near real-time. To extract valuable information from raw data streams, analysts need to create and implement data stream processing applications structured as a directed acyclic graphs (DAG). The infrastructure of distributed data stream processing systems, as well as the various requirements of stream applications, impose new challenges. Cluster heterogeneity in a distributed environment results in different cluster resources for task execution and data transmission, which make the optimal scheduling algorithms an NP-complete problem. Scheduling streaming applications plays a key role in optimizing system performance, particularly in maximizing the frame-rate, or how many instances of data sets can be processed per unit of time. The scheduling algorithm must consider data locality, resource heterogeneity, and communicational and computational latencies. The latencies associated with the bottleneck from computation or transmission need to be minimized when mapped to the heterogeneous and distributed cluster resources. Recent work on task scheduling for distributed data stream processing systems has a number of limitations. Most of the current schedulers are not designed to manage heterogeneous clusters. They also lack the ability to consider both task and machine characteristics in scheduling decisions. Furthermore, current default schedulers do not allow the user to control data locality aspects in application deployment.In this thesis, we investigate the problem of scheduling streaming applications on a heterogeneous cluster environment and develop the maximum throughput scheduler algorithm (MT-Scheduler) for streaming applications. The proposed algorithm uses a dynamic programming technique to efficiently map the application topology onto a heterogeneous distributed system based on computing and data transfer requirements, while also taking into account the capacity of underlying cluster resources. The proposed approach maximizes the system throughput by identifying and minimizing the time incurred at the computing/transfer bottleneck. The MT-Scheduler supports scheduling applications that are structured as a DAG, such as Amazon Timestream, Google Millwheel, and Twitter Heron. We conducted experiments using three Storm microbenchmark topologies in both simulated and real Apache Storm environments. To evaluate performance, we compared the proposed MT-Scheduler with the simulated round-robin and the default Storm scheduler algorithms. The results indicated that the MT-Scheduler outperforms the default round-robin approach in terms of both average system latency and throughput.

APA, Harvard, Vancouver, ISO, and other styles

33

Ekman, Niklas. "Handling Big Data using a Distributed Search Engine : Preparing Log Data for On-Demand Analysis." Thesis, KTH, Skolan för informations- och kommunikationsteknik (ICT), 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-222373.

Full text

Abstract:

Big data are datasets that is very large and computational complex. With an increasing volume of data the time a trivial processing task can be challenging. Companies collects data at a fast rate but knowing what to do with the data can be hard. A search engine is a system that indexes data making it efficiently queryable by users. When a bug occurs in a computer system log data is consulted in order to understand why, but processing big log data can take a long time. The purpose of this thesis is to investigate, compare and implement a distributed search engine that can prepare log data for analysis, which will make it easier for a developer to investigate bugs. There are three popular search engines: Apache Lucene, Elasticsearch and Apache Solr. Elasticsearch and Apache Solr are built as distributed systems making them capable of handling big data. Requirements was established through interviews. Big log data of totally 40 GB was provided that would be indexed in the selected search engine. The log data provided was generated in a proprietary binary format and it had to be decoded before. The distributed search engines was evaluated based on: Distributed architecture, text analysis, indexing and querying. Elasticsearch was selected for implementation. A cluster was set up on Amazon Web Services and tests was executed in order to determine how different configurations performed. An indexing software was written that would transfer data to the cluster. Results was verified through a case-study with participants of the stakeholder.
Stordata är en datamängd som är mycket stora och komplexa att göra beräkningar på. När en datamängd ökar blir en trivial bearbetningsuppgift betydligt mera utmanande. Företagen samlar idag in data i allt snabbare takt men det är svårt att veta exakt vad man ska göra med den data. En sökmotor är ett system som indexerar data och gör det effektivt att för användare att söka i det. När ett fel inträffar i ett datorsystem går utvecklare igenom loggdata för att få en insikt i varför, men det kan ta lång tid att söka igenom en stor mängd loggdata. Syftet med denna avhandling är att undersöka, jämföra och implementera en distribuerad sökmotor som kan förbereda loggdata för analys, vilket gör det lättare för utvecklare att undersöka buggar. Det finns tre populära sökmotorer: Apache Lucene, Elasticsearch och Apache Solr. Elasticsearch och Apache Solr är byggda som distribuerade system och kan därav hantera stordata. Krav fastställdes genom intervjuer. En stor mängd loggdata på totalt 40 GB indexerades i den valda sökmotorn. Den loggdata som användes genererades i en proprietär binärt format som behövdes avkodas för att kunna användas. De distribuerade sökmotorerna utvärderades utifrån kriterierna: Distribuerad arkitektur, textanalys, indexering och förfrågningar. Elasticsearch valdes för att implementeras. Ett kluster sattes upp på Amazon Web Services och test utfördes för att bestämma hur olika konfigurationer presterade. En indexeringsprogramvara skrevs som skulle överföra data till klustret. Resultatet verifierades genom en studie med deltagare från intressenten.

APA, Harvard, Vancouver, ISO, and other styles

34

Stenkvist, Joel. "S3-HopsFS: A Scalable Cloud-native Distributed File System." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-254664.

Full text

Abstract:

Data has been regarded as the new oil in today’s modern world. Data is generated everywhere from how you do online shopping to where you travel. Companies rely on analyzing this data to make informed business decisions and improve their products and services. However, storing this massive amount of data can be very expensive. Current distributed file systems rely on commodity hardware to provide strongly consistent data storage for big data analytics applications, such as Hadoop and Spark. Running these storage clusters can be very costly; it is estimated that storing 100 TB in an HDFS cluster with AWS EC2 costs $47,000 per month. On the other hand, using cloud storage such as Amazon’s S3 to store 100 TB only costs about $3,000 per month however S3 is not sufficient due to eventual consistency and low performance. Therefore, combining these two solutions is optimal for a cheap, consistent, and fast file system.This thesis outlines and builds a new class of distributed file system that utilizes cloud native block storage as the data-layer, such as Amazon’s S3. AWS recently increased the bandwidth from S3 to EC2 from 5 Gbps to 25Gbps, sparking new interest in this area. The new system is built on top of HopsFS; a hierarchical, distributed file system with a scale-out metadata layer utilizing an in-memory, distributed database called NDB which dramatically increases the scalability of the file system. In combination with native cloud storage, this new file system reduces the price of deployment by up to 15 times, but at a performance cost of 25% of the original HopsFS system (four times slower). However, tests in this research shows that S3-HopsFS can be improved towards 38% of the original performance by comparing it with only using S3 by itself. In addition to the new HopsFS version, S3Guard was developed to use NDB instead of Amazon’s DynamoDB to store the file tree hierarchy metadata. S3Guard is a tool that allows big data analytics applications such as Hive to utilize S3 as a direct input and output source for queries. The eventual consistency problems of S3 have been solved and tests show a 36% performance boost when listing and deleting files and directories. S3Guard is sufficient to support some big data analytic applications like Hive, but we lose all the benefits of HopsFS like the performance, scalability, and extended metadata -therefore we need a new file system combining both solutions.
Data har ansetts vara den nya oljan i dagens moderna värld. Data kommer från överallt från hur du handlar online till var du reser. Företag är beroende på analysering av denna data för att kunna göra välgrundade affärsbeslut och förbättra sina produkter och tjänster. Det är väldigt dyrt att spara denna enorm mängd av data för analysering. Nuvarande distribuerade filsystem använder vanlig hårdvara för att kunna ge stark och konsekvent datalagring till stora dataanalysprogram, som Hadoop och Spark. Dessa lagrings kluster kan kosta väldigt mycket. Det beräknas att lagra 100 TB med ett HDFS-kluster i AWS EC2 kostar $47 000 per månad. På andra sidan kostar molnlagring med Amazons S3 bara cirka $ 3 000 per månad för 100 TB, men S3 är inte tillräckligt på grund av eventuell konsistens och låg prestanda. Därför är kombinationen av dessa två lösningar optimalt för ett billigt, konsekvent och snabbt filsystem. Forskningen i denna thesis designar och bygger en ny klass av distribue-rat filsystem som använder cloud blocklagring som datalagret, som Amazonas S3, istället för vanlig hårdvara. AWS ökade nyligen bandbredd från S3 till EC2 från 5 Gbps till 25Gbps, som gjorde ett nytt intresse i det här området. Det nya systemet är byggt på toppen av HopsFS; ett hierarkiskt, distribuerat filsystem med utökad metadata som utnyttjar av en in-memory-distribuerad databas som heter NDB som dramatiskt ökar filsystemets skalbarhet. I kombination med inbyggd molnlagring minskar detta nya filsystem priset för implementering upp till 15 gånger, men med en prestandakostnad på 25 % av det ursprungliga HopsFS-systemet (den är fyra gånger långsammare). Test i denna undersökning visar dock att S3-HopsFS kan förbättras till 38% av den ursprungliga prestandan genom att jämföra den med bara användning av S3.Förutom den nya HopsFS-versionen, utvecklades S3Guard för att använda NDB istället för Amazons DynamoDB för att spara fil systemets metadata. S3Guard är ett verktyg som tillåter stora dataanalysprogram som Hive att använda S3 istället för HDFS. De eventuella konsekvensproblemen i S3 är nu lösta och tester visar en 36% förbättring av prestanda när man listar och tar bort filer och kataloger. S3Guard är tillräckligt för att stödja flera dataanalys program som Hive, men vi förlorar alla fördelar med HopsFS som prestanda, skalbarhet och utökad metadata. Därför behöver vi ett nytt filsystem som kombinerar båda lösningarna.

APA, Harvard, Vancouver, ISO, and other styles

35

Rollerberg, Niklas. "A distributed navigation and guidance system for autonomous vessel." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-255011.

Full text

Abstract:

Using distributed micro-controllers in an autonomous vessel can result in higher robustness and lower energy consumption compared to a traditional centralized approach. However adapting traditional software (e.g. navigation and guidance systems) for a distributed microcontroller system is a complex and challenging task. In this thesis a software solution for controlling autonomous vessels with on board distributed micro-controllers is presented. It is capable of controlling the vehicle by itself, including routing around obstacles in moderately complex environments, or act as an interface to another computer. The routing capabilities come from the use of a high-level path-planner based on RRT, and a low-level vehicle controller based on potential fields to follow the path from the path-planner. By using these methods together the software is capable maneuvering a sailboat between waypoints. In our experiments distributed computing is investigated for the path planning and evaluated in terms of computational time for 1,2 and 3 nodes. A parallelization technique called OR parallelization was tested and it offered a reduction in computational time by 27% with two nodes and 36% for three nodes against a single node. Nevertheless, this gain may not be, significant enough to warrant the extra complexity.
Att använda distribuerade mikro-controllers i en autonom farkost kan resultera i ett robustare och energisnålare system jämfört med ett traditionellt centraliserat system, dock så är det en utmanande och komplex uppgift att anpassa existerande mjukvara för navigation och styrning till ett sådant system. I denna rapport presenteras en mjukvarulösning för styrning av autonoma farkoster med ett distribuerat system av mikro-controllers som finns ombord. Mjukvaran kan antingen kontrollera farkosten själv (inklusive rutt-planering runt hinder i måttligt komplexa miljöer), eller fungera som gränssnitt till ett annat system. Rutt-planeringen görs av en planeringsalgoritm som heter RRT tillsammans en lågnivåkontroller som använder sig av potentialfält för att följa rutten från planeraren. Med dessa två steg så kan mjukvaran styra en segelbåt mellan givna platser. Distribuerade beräkningar undersöks för ruttplaneringen och utvärderas i termer av beräkningstid för 1,2 och 3 noder. En parallelliseringsteknik som heter OR-parallellisering testades och gav prestandavinster på 27% respektive 36% för 2 och 3 noder. Författaren tycker dock inte att den prestandavinsten är hög nog för att väga upp den extra komplexitet som den medför.

APA, Harvard, Vancouver, ISO, and other styles

36

Söderholm, Matilda, and Lisa Habbe. "Estimating Time to Repair Failures in a Distributed System." Thesis, Linköpings universitet, Institutionen för datavetenskap, 2016. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-131847.

Full text

Abstract:

To ensure the quality of important services, high availability is critical. One aspect to be considered in availability is the downtime of the system, which can be measured in time to recover from failures. In this report we investigate current research on the subject of repair time and the possibility to estimate this metric based on relevant parameters such as hardware, the type of fault and so on. We thoroughly analyze a set of data containing 43 000 failure traces from Los Alamos National Laboratory on 22 different cluster organized systems. To enable the analysis we create and use a program which parses the raw data, sorts and categorizes it based on certain criteria and formats the output to enable visualization. We analyze this data set in consideration of type of fault, memory size, processor quantity and at what time repairs were started and completed. We visualize our findings of number of failures and average times of repair dependent on the different parameters. For different faults and time of day we also display the empirical cumulative distributionfunction to give an overview of the probability for different times of repair. The failures are caused by a variety of different faults, where hardware and software are most frequently occurring. These two along with network faults have the highest average downtime. Time of failure proves important since both day of week and hour of day shows patterns that can be explained by for example work schedules. The hardware characteristics of nodes seem to affect the repair time as well, how this correlation works is although difficult to conclude. Based on the data extracted we suggest two simple methods of formulating a mathematical model estimating downtime which both prove insufficient; more research on the subject and on how the parameters affect each other is required.

APA, Harvard, Vancouver, ISO, and other styles

37

Carlo, Gilles. "Dynamic loading and class management in a distributed actor system." Master's thesis, This resource online, 1993. http://scholar.lib.vt.edu/theses/available/etd-04272010-020040/.

Full text

APA, Harvard, Vancouver, ISO, and other styles

38

Alabsi, Mohammed. "A distributed system for integrating and sharing biology data and tools." [Ames, Iowa : Iowa State University], 2007.

Find full text

APA, Harvard, Vancouver, ISO, and other styles

39

Ayoubi, Tarek. "Distributed Data Management Supporting Healthcare Workflow from Patients’ Point of View." Thesis, Blekinge Tekniska Högskola, Avdelningen för för interaktion och systemdesign, 2007. http://urn.kb.se/resolve?urn=urn:nbn:se:bth-6030.

Full text

Abstract:

Patient’s mobility throughout his lifetime leaves a trial of information scattered in laboratories, clinical institutes, primary care units, and other hospitals. Hence, the medical history of a patient is valuable when subjected to special healthcare units or undergoes home-care/personal-care in elderly stage cases. Despite the rhetoric about patient-centred care, few attempts were made to measure and improve in this arena. In this thesis, we will describe and implement a high-level view of a Patient Centric information management, deploying at a preliminary stage, the use of Agent Technologies and Grid Computing. Thus, developing and proposing an infrastructure that allows us to monitor and survey the patient, from the doctor’s point of view, and investigate a Persona, from the patients’ side, that functions and collaborates among different medical information structures. The Persona will attempt to interconnect all the major agents (human and software), and realize a distributed grid info-structure that directly affect the patient, therefore, revealing an adequate and cost-effective solution for most critical information needs. The results comprehended in the literature survey, consolidating Healthcare Information Management with emerged intelligent Multi-Agent System Technologies (MAS) and Grid Computing; intends to provide a solid basis for further advancements and assessments in this field, by bridging and proposing a framework between the home-care sector and the flexible agent architecture throughout the healthcare domain.

APA, Harvard, Vancouver, ISO, and other styles

40

Eckart, J. Dana. "Garbage collection for functional languages in a distributed system." Diss., Georgia Institute of Technology, 1987. http://hdl.handle.net/1853/8159.

Full text

APA, Harvard, Vancouver, ISO, and other styles

41

Li, Xiaodong. "RDSS ; a reliable and efficient distributed storage system." Ohio University / OhioLINK, 2004. http://www.ohiolink.edu/etd/view.cgi?ohiou1103127547.

Full text

APA, Harvard, Vancouver, ISO, and other styles

42

Förster, Stefan. "A formal framework for modelling component extension and layers in distributed embedded systems /." Dresden : TUDpress, 2007. http://www.loc.gov/catdir/toc/fy0803/2007462554.html.

Full text

APA, Harvard, Vancouver, ISO, and other styles

43

Zhang, Gong. "Data and application migration in cloud based data centers --architectures and techniques." Diss., Georgia Institute of Technology, 2011. http://hdl.handle.net/1853/41078.

Full text

Abstract:

Computing and communication have continued to impact on the way we run business, the way we learn, and the way we live. The rapid technology evolution of computing has also expedited the growth of digital data, the workload of services, and the complexity of applications. Today, the cost of managing storage hardware ranges from two to ten times the acquisition cost of the storage hardware. We see an increasing demand on technologies for transferring management burden from humans to software. Data migration and application migration are one of popular technologies that enable computing and data storage management to be autonomic and self-managing. In this dissertation, we examine important issues in designing and developing scalable architectures and techniques for efficient and effective data migration and application migration. The first contribution we have made is to investigate the opportunity of automated data migration across multi-tier storage systems. The significant IO improvement in Solid State Disks (SSD) over traditional rotational hard disks (HDD) motivates the integration of SSD into existing storage hierarchy for enhanced performance. We developed adaptive look-ahead data migration approach to effectively integrate SSD into the multi-tiered storage architecture. When using the fast and expensive SSD tier to store the high temperature data (hot data) while placing the relatively low temperature data (low data) in the HDD tier, one of the important functionality is to manage the migration of data as their access patterns are changed from hot to cold and vice versa. For example, workloads during day time in typical banking applications can be dramatically different from those during night time. We designed and implemented an adaptive lookahead data migration model. A unique feature of our automated migration approach is its ability to dynamically adapt the data migration schedule to achieve the optimal migration effectiveness by taking into account of application specific characteristics and I/O profiles as well as workload deadlines. Our experiments running over the real system trace show that the basic look-ahead data migration model is effective in improving system resource utilization and the adaptive look-ahead migration model is more efficient for continuously improving and tuning of the performance and scalability of multi-tier storage systems. The second main contribution we have made in this dissertation research is to address the challenge of ensuring reliability and balancing loads across a network of computing nodes, managed in a decentralized service computing system. Considering providing location based services for geographically distributed mobile users, the continuous and massive service request workloads pose significant technical challenges for the system to guarantee scalable and reliable service provision. We design and develop a decentralized service computing architecture, called Reliable GeoGrid, with two unique features. First, we develop a distributed workload migration scheme with controlled replication, which utilizes a shortcut-based optimization to increase the resilience of the system against various node failures and network partition failures. Second, we devise a dynamic load balancing technique to scale the system in anticipation of unexpected workload changes. Our experimental results show that the Reliable GeoGrid architecture is highly scalable under changing service workloads with moving hotspots and highly reliable in the presence of massive node failures. The third research thrust in this dissertation research is focused on study the process of migrating applications from local physical data centers to Cloud. We design migration experiments and study the error types and further build the error model. Based on the analysis and observations in migration experiments, we propose the CloudMig system which provides both configuration validation and installation automation which effectively reduces the configuration errors and installation complexity. In this dissertation, I will provide an in-depth discussion of the principles of migration and its applications in improving data storage performance, balancing service workloads and adapting to cloud platform.

APA, Harvard, Vancouver, ISO, and other styles

44

Zhang, Junyao. "Researches on reverse lookup problem in distributed file system." Master's thesis, University of Central Florida, 2010. http://digital.library.ucf.edu/cdm/ref/collection/ETD/id/4638.

Full text

Abstract:

Recent years have witnessed an increasing demand for super data clusters. The super data clusters have reached the petabyte-scale can consist of thousands or tens of thousands storage nodes at a single site. For this architecture, reliability is becoming a great concern. In order to achieve a high reliability, data recovery and node reconstruction is a must. Although extensive research works have investigated how to sustain high performance and high reliability in case of node failures at large scale, a reverse lookup problem, namely finding the objects list for the failed node remains open. This is especially true for storage systems with high requirement of data integrity and availability, such as scientific research data clusters and etc. Existing solutions are either time consuming or expensive. Meanwhile, replication based block placement can be used to realize fast reverse lookup. However, they are designed for centralized, small-scale storage architectures. In this thesis, we propose a fast and efficient reverse lookup scheme named Group-based Shifted Declustering (G-SD) layout that is able to locate the whole content of the failed node. G-SD extends our previous shifted declustering layout and applies to large-scale file systems. Our mathematical proofs and real-life experiments show that G-SD is a scalable reverse lookup scheme that is up to one order of magnitude faster than existing schemes.
ID: 029049697; System requirements: World Wide Web browser and PDF reader.; Mode of access: World Wide Web.; Thesis (M.S.)--University of Central Florida, 2010.; Includes bibliographical references (p. 46-48).
M.S.
Masters
School of Electrical Engineering and Computer Science
Engineering and Computer Science

APA, Harvard, Vancouver, ISO, and other styles

45

Deorukhkar, Mayuresh. "Deadlock probability prediction and detection in distributed systems /." free to MU campus, to others for purchase, 2004. http://wwwlib.umi.com/cr/mo/fullcit?p1421130.

Full text

APA, Harvard, Vancouver, ISO, and other styles

46

Gandhi, Rajiv. "Communication infratructure for a distributed actor system /." This resource online, 1994. http://scholar.lib.vt.edu/theses/available/etd-03302010-020449/.

Full text

APA, Harvard, Vancouver, ISO, and other styles

47

Vuković, Ognjen. "Data Integrity and Availability in Power System Communication Infrastructures." Licentiate thesis, KTH, Kommunikationsnät, 2013. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-122447.

Full text

Abstract:

Society is increasingly dependent on the proper functioning of electric power systems. Today's electric power systems rely heavily on information and networking technology in order to achieve efficient and secure operation. Recent initiatives to upgrade power systems into smart grids target an even tighter integration with information and communication technologies in order to enable the integration of renewable energy sources, local and bulk generation and demand response. Therefore for a proper functioning of smart grids, it is essential that the communication network is secure and reliable both in the face of network failures and in the face of attacks. This thesis contributes to improving the security of power system applications against attacks on the communication infrastructure. The contributions lie in two areas. The first area is the interaction of network and transport layer protocols with power system application layer security. We consider single and multi-area power system state estimation based on redundant telemetry measurements. The state estimation is a basis for a set of applications used for information support in the control center, and therefore its security is an important concern. For the case of single-area state estimation, we look at the security of measurement aggregation over a wide area communication network. Due to the size and complexity of power systems, it can be prohibitively expensive to introduce cryptographic security in every component of the communication infrastructure. Therefore, we investigate how the application layer logic can be leveraged to optimize the deployment of network, transport and application layer security solutions. We define security metrics that quantify the importance of particular components of the network infrastructure. We provide efficient algorithms to calculate the metrics, and that allow identification of the weakest points in the infrastructure that have to be secured. For the case of multi-area state estimation, we look at the security of data exchange between the control centers of neighboring areas. Although the data exchange is typically cryptographically secure, the communication infrastructure of a control center may get compromised by a targeted trojan that could attack the data before the cryptographic protection is applied or after it is removed. We define multiple attack strategies for which we show that they can significantly disturb the state estimation. We also show a possible way to detect and to mitigate the attack. The second area is a study of the communication availability at the application layer. Communication availability in power systems has to be achieved in the case of network failures as well as in the case of attacks. Availability is not necessarily achieved by cryptography, since traffic analysis attacks combined with targeted denial-of-service attacks could significantly disturb the communication. Therefore, we study how anonymity networks can be used to improve availability, which comes at the price of increased communication overhead and delay. Because of the way anonymity networks operate, one would expect that availability would be improved with more overhead and delay. We show that surprisingly this is not always the case. Moreover, we show that it is better to overestimate than to underestimate the attacker's capabilities when configuring anonymity networks.

QC 20130522

APA, Harvard, Vancouver, ISO, and other styles

48

Mayott, Stewart W. "Implementation of a module implementor for an activity based distributed system /." Online version of thesis, 1988. http://hdl.handle.net/1850/10223.

Full text

APA, Harvard, Vancouver, ISO, and other styles

49

Chen, Changgui, and mikewood@deakin edu au. "A Reactive system model for building fault-tolerant distributed applications." Deakin University. School of Computing and Mathematics, 2001. http://tux.lib.deakin.edu.au./adt-VDU/public/adt-VDU20050915.134208.

Full text

Abstract:

The development of fault-tolerant computing systems is a very difficult task. Two reasons contributed to this difficulty can be described as follows. The First is that, in normal practice, fault-tolerant computing policies and mechanisms are deeply embedded into most application programs, so that these application programs cannot cope with changes in environments, policies and mechanisms. These factors may change frequently in a distributed environment, especially in a heterogeneous environment. Therefore, in order to develop better fault-tolerant systems that can cope with constant changes in environments and user requirements, it is essential to separate the fault tolerant computing policies and mechanisms in application programs. The second is, on the other hand, a number of techniques have been proposed for the construction of reliable and fault-tolerant computing systems. Many computer systems are being developed to tolerant various hardware and software failures. However, most of these systems are to be used in specific application areas, since it is extremely difficult to develop systems that can be used in general-purpose fault-tolerant computing. The motivation of this thesis is based on these two aspects. The focus of the thesis is on developing a model based on the reactive system concepts for building better fault-tolerant computing applications. The reactive system concepts are an attractive paradigm for system design, development and maintenance because it separates policies from mechanisms. The stress of the model is to provide flexible system architecture for the general-purpose fault-tolerant application development, and the model can be applied in many specific applications. With this reactive system model, we can separate fault-tolerant computing polices and mechanisms in the applications, so that the development and maintenance of fault-tolerant computing systems can be made easier.

APA, Harvard, Vancouver, ISO, and other styles

50

Kordale, Rammohan. "System support for scalable services." Diss., Georgia Institute of Technology, 1997. http://hdl.handle.net/1853/8246.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Dissertations / Theses on the topic 'Distributed data system'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles