Дисертації з теми "Data management and data science"
Оформте джерело за APA, MLA, Chicago, Harvard та іншими стилями
Ознайомтеся з топ-50 дисертацій для дослідження на тему "Data management and data science".
Біля кожної праці в переліку літератури доступна кнопка «Додати до бібліографії». Скористайтеся нею – і ми автоматично оформимо бібліографічне посилання на обрану працю в потрібному вам стилі цитування: APA, MLA, «Гарвард», «Чикаго», «Ванкувер» тощо.
Також ви можете завантажити повний текст наукової публікації у форматі «.pdf» та прочитати онлайн анотацію до роботи, якщо відповідні параметри наявні в метаданих.
Переглядайте дисертації для різних дисциплін та оформлюйте правильно вашу бібліографію.
Yang, Ying. "Interactive Data Management and Data Analysis." Thesis, State University of New York at Buffalo, 2017. http://pqdtopen.proquest.com/#viewpdf?dispub=10288109.
Повний текст джерелаEveryone today has a big data problem. Data is everywhere and in different formats, they can be referred to as data lakes, data streams, or data swamps. To extract knowledge or insights from the data or to support decision-making, we need to go through a process of collecting, cleaning, managing and analyzing the data. In this process, data cleaning and data analysis are two of the most important and time-consuming components.
One common challenge in these two components is a lack of interaction. The data cleaning and data analysis are typically done as a batch process, operating on the whole dataset without any feedback. This leads to long, frustrating delays during which users have no idea if the process is effective. Lacking interaction, human expert effort is needed to make decisions on which algorithms or parameters to use in the systems for these two components.
We should teach computers to talk to humans, not the other way around. This dissertation focuses on building systems --- Mimir and CIA --- that help user conduct data cleaning and analysis through interaction. Mimir is a system that allows users to clean big data in a cost- and time-efficient way through interaction, a process I call on-demand ETL. Convergent inference algorithms (CIA) are a family of inference algorithms in probabilistic graphical models (PGM) that enjoys the benefit of both exact and approximate inference algorithms through interaction.
Mimir provides a general language for user to express different data cleaning needs. It acts as a shim layer that wraps around the database making it possible for the bulk of the ETL process to remain within a classical deterministic system. Mimir also helps users to measure the quality of an analysis result and provides rankings for cleaning tasks to improve the result quality in a cost efficient manner. CIA focuses on providing user interaction through the process of inference in PGMs. The goal of CIA is to free users from the upfront commitment to either approximate or exact inference, and provide user more control over time/accuracy trade-offs to direct decision-making and computation instance allocations. This dissertation describes the Mimir and CIA frameworks to demonstrate that it is feasible to build efficient interactive data management and data analysis systems.
Dedge, Parks Dana M. "Defining Data Science and Data Scientist." Scholar Commons, 2017. http://scholarcommons.usf.edu/etd/7014.
Повний текст джерелаWason, Jasmin Lesley. "Automating data management in science and engineering." Thesis, University of Southampton, 2001. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.396143.
Повний текст джерелаWang, Yi. "Data Management and Data Processing Support on Array-Based Scientific Data." The Ohio State University, 2015. http://rave.ohiolink.edu/etdc/view?acc_num=osu1436157356.
Повний текст джерелаAnumalla, Kalyani. "DATA PREPROCESSING MANAGEMENT SYSTEM." University of Akron / OhioLINK, 2007. http://rave.ohiolink.edu/etdc/view?acc_num=akron1196650015.
Повний текст джерелаFernández, Moctezuma Rafael J. "A Data-Descriptive Feedback Framework for Data Stream Management Systems." PDXScholar, 2012. https://pdxscholar.library.pdx.edu/open_access_etds/116.
Повний текст джерелаNguyen, Benjamin. "Privacy-Centric Data Management." Habilitation à diriger des recherches, Université de Versailles-Saint Quentin en Yvelines, 2013. http://tel.archives-ouvertes.fr/tel-00936130.
Повний текст джерелаTran, Viet-Trung. "Scalable data-management systems for Big Data." Phd thesis, École normale supérieure de Cachan - ENS Cachan, 2013. http://tel.archives-ouvertes.fr/tel-00920432.
Повний текст джерелаNyström, Dag. "Data Management in Vehicle Control-Systems." Doctoral thesis, Mälardalen University, Department of Computer Science and Electronics, 2005. http://urn.kb.se/resolve?urn=urn:nbn:se:mdh:diva-66.
Повний текст джерелаAs the complexity of vehicle control-systems increases, the amount of information that these systems are intended to handle also increases. This thesis provides concepts relating to real-time database management systems to be used in such control-systems. By integrating a real-time database management system into a vehicle control-system, data management on a higher level of abstraction can be achieved. Current database management concepts are not sufficient for use in vehicles, and new concepts are necessary. A case-study at Volvo Construction Equipment Components AB in Eskilstuna, Sweden presented in this thesis, together with a survey of existing database platforms confirms this. The thesis specifically addresses data access issues by introducing; (i) a data access method, denoted database pointers, which enables data in a real-time database management system to be accessed efficiently. Database pointers, which resemble regular pointers variables, permit individual data elements in the database to be directly pointed out, without risking a violation of the database integrity. (ii) two concurrency-control algorithms, denoted 2V-DBP and 2V-DBP-SNAP which enable critical (hard real-time) and non-critical (soft real-time) data accesses to co-exist, without blocking of the hard real-time data accesses or risking unnecessary abortions of soft real-time data accesses. The thesis shows that 2V-DBP significantly outperforms a standard real-time concurrency control algorithm both with respect to lower response-times and minimized abortions. (iii) two concepts, denoted substitution and subscription queries that enable service- and diagnostics-tools to stimulate and monitor a control-system during run-time. The concepts presented in this thesis form a basis on which a data management concept suitable for embedded real-time systems, such as vehicle control-systems, can be built.
Ett modernt fordon är idag i princip helt styrt av inbyggda datorer. I takt med att funktionaliteten i fordonen ökar, blir programvaran i dessa datorer mer och mer komplex. Komplex programvara är svår och kostsam att konstruera. För att hantera denna komplexitet och underlätta konstruktion, satsar nu industrin på att finna metoder för att konstruera dessa system på en högre abstraktionsnivå. Dessa metoder syftar till att strukturera programvaran idess olika funktionella beståndsdelar, till exempel genom att använda så kallad komponentbaserad programvaruutveckling. Men, dessa metoder är inte effektiva vad gäller att hantera den ökande mängden information som följer med den ökande funktionaliteten i systemen. Exempel på information som skall hanteras är data från sensorer utspridda i bilen (temperaturer, tryck, varvtal osv.), styrdata från föraren (t.ex. rattutslag och gaspådrag), parameterdata, och loggdata som används för servicediagnostik. Denna information kan klassas som säkerhetskritisk eftersom den används för att styra beteendet av fordonet. På senare tid har dock mängden icke säkerhetskritisk information ökat, exempelvis i bekvämlighetssystem som multimedia-, navigations- och passagerarergonomisystem.
Denna avhandling syftar till att visa hur ett datahanteringssystem för inbyggda system, till exempel fordonssystem, kan konstrueras. Genom att använda ett realtidsdatabashanteringssystem för att lyfta upp datahanteringen på en högre abstraktionsnivå kan fordonssystem tillåtas att hantera stora mängder information på ett mycket enklare sätt än i nuvarande system. Ett sådant datahanteringssystem ger systemarkitekterna möjlighet att strukturera och modellera informationen på ett logiskt och överblickbart sätt. Informationen kan sedan läsas och uppdateras genom standardiserade gränssnitt anpassade förolika typer av funktionalitet. Avhandlingen behandlar specifikt problemet hur information i databasen, med hjälp av en concurrency-control algoritm, skall kunna delas av både säkerhetskritiska och icke säkerhetskritiska systemfunktioner i fordonet. Vidare avhandlas hur information kan distribueras både mellan olika datorsystem i fordonet, men också till diagnostik- och serviceverktyg som kan kopplas in i fordonet.
Karras, Panagiotis. "Data structures and algorithms for data representation in constrained environments." Thesis, Click to view the E-thesis via HKUTO, 2007. http://sunzi.lib.hku.hk/hkuto/record/B38897647.
Повний текст джерелаTatarinov, Igor. "Semantic data sharing with a peer data management system /." Thesis, Connect to this title online; UW restricted, 2004. http://hdl.handle.net/1773/6942.
Повний текст джерелаMatus, Castillejos Abel, and n/a. "Management of Time Series Data." University of Canberra. Information Sciences & Engineering, 2006. http://erl.canberra.edu.au./public/adt-AUC20070111.095300.
Повний текст джерелаVijayakumar, Nithya Nirmal. "Data management in distributed stream processing systems." [Bloomington, Ind.] : Indiana University, 2007. http://gateway.proquest.com/openurl?url_ver=Z39.88-2004&rft_val_fmt=info:ofi/fmt:kev:mtx:dissertation&res_dat=xri:pqdiss&rft_dat=xri:pqdiss:3278228.
Повний текст джерелаSource: Dissertation Abstracts International, Volume: 68-09, Section: B, page: 6093. Adviser: Beth Plale. Title from dissertation home page (viewed May 9, 2008).
Agbaw, Catherine E. (Catherine Ebenye). "Management data collection in a distributed environment." Thesis, McGill University, 1995. http://digitool.Library.McGill.CA:80/R/?func=dbin-jump-full&object_id=22713.
Повний текст джерелаAn approach for polling based on a variable polling frequency is proposed. A stateful model for a simple version CMIP proxy agent for SNMP which requires management information collected from SNMP agents to be stored in the proxy agent's MIB is also proposed. The proxy agent is implemented using the so-called OSIMIS-3.0 software package which implements CMIP, and an existing SNMP application. A policy of variable polling frequency which is based on the cost of polling, the cost of loss of relevant management information and the frequency of update of new information is used by the proxy agent. The agent is tested on a distributed network consisting of a LAN at McGill University and another LAN at the University of Montreal.
The results from the test show that using the above model of a proxy agent between CMIP and SNMP yields a better response time as compared to the stateless proxy agent model used by the Network Management Forum (NMF93), as well as an up-to-date information about the network to a CMIS manager during critical situations.
Zou, Beibei 1974. "Data mining with relational database management systems." Thesis, McGill University, 2005. http://digitool.Library.McGill.CA:80/R/?func=dbin-jump-full&object_id=82456.
Повний текст джерелаMa, Xuesong 1975. "Data mining using relational database management system." Thesis, McGill University, 2005. http://digitool.Library.McGill.CA:80/R/?func=dbin-jump-full&object_id=98757.
Повний текст джерелаTatikonda, Shirish. "Towards Efficient Data Analysis and Management of Semi-structured Data." The Ohio State University, 2010. http://rave.ohiolink.edu/etdc/view?acc_num=osu1275414859.
Повний текст джерелаKumar, Aman. "Metadata-Driven Management of Scientific Data." The Ohio State University, 2009. http://rave.ohiolink.edu/etdc/view?acc_num=osu1243898671.
Повний текст джерелаQuintero, Michael C. "Constructing a Clinical Research Data Management System." Thesis, University of South Florida, 2018. http://pqdtopen.proquest.com/#viewpdf?dispub=10640886.
Повний текст джерелаClinical study data is usually collected without knowing what kind of data is going to be collected in advance. In addition, all of the possible data points that can apply to a patient in any given clinical study is almost always a superset of the data points that are actually recorded for a given patient. As a result of this, clinical data resembles a set of sparse data with an evolving data schema. To help researchers at the Moffitt Cancer Center better manage clinical data, a tool was developed called GURU that uses the Entity Attribute Value model to handle sparse data and allow users to manage a database entity’s attributes without any changes to the database table definition. The Entity Attribute Value model’s read performance gets faster as the data gets sparser but it was observed to perform many times worse than a wide table if the attribute count is not sufficiently large. Ultimately, the design trades read performance for flexibility in the data schema.
Busack, Nancy Long. "The intelligent data object and its data base interface." Thesis, Kansas State University, 1985. http://hdl.handle.net/2097/9825.
Повний текст джерелаMa, Yu. "A composable data management architecture for scientific applications." [Bloomington, Ind.] : Indiana University, 2006. http://gateway.proquest.com/openurl?url_ver=Z39.88-2004&rft_val_fmt=info:ofi/fmt:kev:mtx:dissertation&res_dat=xri:pqdiss&rft_dat=xri:pqdiss:3243773.
Повний текст джерелаTitle from PDF t.p. (viewed Nov. 18, 2008). Source: Dissertation Abstracts International, Volume: 67-12, Section: B, page: 7170. Adviser: Randall Bramley.
Onolaja, Olufunmilola Oladunni. "Dynamic data-driven framework for reputation management." Thesis, University of Birmingham, 2012. http://etheses.bham.ac.uk//id/eprint/3824/.
Повний текст джерелаKelley, Ian Robert. "Data management in dynamic distributed computing environments." Thesis, Cardiff University, 2012. http://orca.cf.ac.uk/44477/.
Повний текст джерелаBranco, Miguel. "Distributed data management for large scale applications." Thesis, University of Southampton, 2009. https://eprints.soton.ac.uk/72283/.
Повний текст джерелаStrand, Mattias. "External Data Incorporation into Data Warehouses." Doctoral thesis, Kista : Skövde : Dept. of computer and system sciences, Stockholm University : School of humanities and informatics, University of Skövde, 2005. http://urn.kb.se/resolve?urn=urn:nbn:se:su:diva-660.
Повний текст джерелаKairouz, Joseph. "Patient data management system medical knowledge-base evaluation." Thesis, McGill University, 1996. http://digitool.Library.McGill.CA:80/R/?func=dbin-jump-full&object_id=24060.
Повний текст джерелаFollowing a literature survey on evaluation techniques and architecture of existing expert systems, an overview of the Patient Data Management System hardware and software components is presented. The design of the Expert Monitoring System is elaborated. Following its installation in the intensive Care Unit, the performance of the Expert Monitoring System is evaluated, operating on real vital sign data and corrections were formulated. A progressive evaluation technique, new methodology for evaluating an expert system knowledge-base is proposed for subsequent corrections and evaluations of the Expert Monitoring System.
Su, Yu. "Big Data Management Framework based on Virtualization and Bitmap Data Summarization." The Ohio State University, 2015. http://rave.ohiolink.edu/etdc/view?acc_num=osu1420738636.
Повний текст джерелаLi, Yujiang. "Development architecture for industrial data management." Licentiate thesis, KTH, Datorsystem för konstruktion och tillverkning, 2013. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-132244.
Повний текст джерелаQC 20131025
Tibbetts, Richard S. (Richard Singleton) 1979. "Linear Road : benchmarking stream-based data management systems." Thesis, Massachusetts Institute of Technology, 2003. http://hdl.handle.net/1721.1/18017.
Повний текст джерелаIncludes bibliographical references (p. 57-61).
This thesis describes the design, implementation, and execution of the Linear Road benchmark for stream-based data management systems. The motivation for benchmarking and the selection of the benchmark application are described. Test harness implementation is discussed, as are experiences using the benchmark to evaluate the Aurora engine. Effects of this work on the evolution of the Aurora engine are also discussed. Streams consist of continuous feeds of data from external data sources such as sensor networks or other monitoring systems. Stream data management systems execute continuous and historical queries over these streams, producing query results in real-time. This benchmark provides a means of comparing the functionality and performance of stream-based data management systems relative to each other and to relational systems. The benchmark presented is motivated by the increasing prevalence of "variable tolling" on highway systems throughout the world. Variable tolling uses dynamically determined factors such as congestion levels and accident proximity to calculate tolls. Linear Road specifies a variable tolling system for a fictional urban area, including such features as accident detection and alerts, traffic congestion measurements, toll calculations, and ad hoc requests for travel time predictions and account balances. This benchmark has already been adopted in the Aurora [ACC⁺03] and STREAM [MWA⁺03] streaming data management systems.
by Richard S. Tibbetts, III.
M.Eng.
Yip, Alexander Siumann 1979. "Improving web site security with data flow management." Thesis, Massachusetts Institute of Technology, 2009. http://hdl.handle.net/1721.1/54647.
Повний текст джерелаCataloged from PDF version of thesis.
Includes bibliographical references (p. 91-98).
This dissertation describes two systems, RESIN and BFLow, whose goal is to help Web developers build more secure Web sites. RESIN and BFLOW use data flow management to help reduce the security risks of using buggy or malicious code. RESIN provides programmers with language-level mechanisms to track and manage the flow of data within the server. These mechanisms make it easy for programmers to catch server-side data flow bugs that result in security vulnerabilities, and prevent these bugs from being exploited. BFLow is a system that adds information flow control, a restrictive form of data flow management, both to the Web browser and to the interface between a browser and a server. BFLOW makes it possible for a Web site to combine confidential data with untrusted JavaScript in its Web pages, without risking leaks of that data. This work makes a number of contributions. RESIN introduces the idea of a data flow assertion and demonstrates how to build them using three language-level mechanisms, policy objects, data tracking, and filter objects. We built prototype implementations of RESIN in both the PHP and Python runtimes. We adapt seven real off-the-shelf applications and implement 11 different security policies in RESIN which thwart at least 27 real security vulnerabilities. BFLow introduces an information flow control model that fits the JavaScript communication mechanisms, and a system that maps that model to JavaScript's existing isolation system.
(cont.) Together, these techniques allow untrusted JavaScript to read, compute with, and display confidential data without the risk of leaking that data, yet requires only minor changes to existing software. We built a prototype of the BFLow system and three different applications including a social networking application, a novel shared-data Web platform, and BFlogger, a third-party JavaScript platform similar to that of Blogger.com. We ported several untrusted JavaScript extensions from Blogger.com to BFlogger, and show that the extensions cannot leak data as they can in Blogger.com.
by Alexander Siumann Yip.
Ph.D.
Johnston, Steven. "Encouraging collaboration through a new data management approach." Thesis, University of Southampton, 2006. https://eprints.soton.ac.uk/65549/.
Повний текст джерелаRoger, Kathleen Mary Louise. "A nursing workload manager for a patient data management system /." Thesis, McGill University, 1992. http://digitool.Library.McGill.CA:80/R/?func=dbin-jump-full&object_id=61047.
Повний текст джерелаVellanki, Vivekanand. "Extending caching for two applications : disseminating live data and accessing data from disks." Diss., Georgia Institute of Technology, 2001. http://hdl.handle.net/1853/9243.
Повний текст джерелаLee, Jong Sik. "Space-based data management for high-performance distributed simulation." Diss., The University of Arizona, 2001. http://hdl.handle.net/10150/279803.
Повний текст джерелаLofstead, Gerald Fredrick. "Extreme scale data management in high performance computing." Diss., Georgia Institute of Technology, 2010. http://hdl.handle.net/1853/37232.
Повний текст джерелаWeigel, Tobias [Verfasser], and Thomas [Akademischer Betreuer] Ludwig. "Persistent Identifiers for Earth Science Data Management / Tobias Weigel. Betreuer: Thomas Ludwig." Hamburg : Staats- und Universitätsbibliothek Hamburg, 2016. http://d-nb.info/1097561712/34.
Повний текст джерелаWeigel, Tobias Verfasser], and Thomas [Akademischer Betreuer] [Ludwig. "Persistent Identifiers for Earth Science Data Management / Tobias Weigel. Betreuer: Thomas Ludwig." Hamburg : Staats- und Universitätsbibliothek Hamburg, 2016. http://d-nb.info/1097561712/34.
Повний текст джерелаRosenfeld, Abraham M. "Data collection and management of a mobile sensor platform." Thesis, Massachusetts Institute of Technology, 2013. http://hdl.handle.net/1721.1/85486.
Повний текст джерелаCataloged from PDF version of thesis.
Includes bibliographical references (page 53).
This thesis explores the development of a platform to better collect and manage data from multiple senor inputs mounted on a car sensor platform. Specifically, focusing on the collection and synchronization of multiple forms of data across a single mobile sensor system. The project will be implemented for three versions of a light-sensing platform, and will cover the different methods of data collection and different types of sensor devices implemented in each version. It will also cover the different technical challenges faced when collecting and managing data across multiple mobile sensors.
by Abraham M. Rosenfeld.
M. Eng.
Lisanskiy, Ilya 1976. "A data model for the Haystack document management system." Thesis, Massachusetts Institute of Technology, 1999. http://hdl.handle.net/1721.1/80103.
Повний текст джерелаIncludes bibliographical references (p. 97-98).
by Ilya Lisanskiy.
S.B.and M.Eng.
Lu, Kaiyuan. "Data distribution management schemes for HLA-compliant distributed simulation systems." Thesis, University of Ottawa (Canada), 2006. http://hdl.handle.net/10393/27151.
Повний текст джерелаFumai, Nicola. "A database for an intensive care unit patient data management system." Thesis, McGill University, 1992. http://digitool.Library.McGill.CA:80/R/?func=dbin-jump-full&object_id=22500.
Повний текст джерелаComputers can help by processing the data and displaying the information in easy to understand formats. Also, knowledge-based systems can provide advice in diagnosis and treatment of patients. If these systems are to be effective, they must be integrated into the total hospital information system and the separate computer data must be jointly integrated into a new database which will become the primary medical record.
This thesis presents the design and implementation of a computerized database for an intensive care unit patient data management system being developed for the Montreal Children's Hospital. The database integrates data from the various PDMS components into one logical information store. The patient data currently managed includes physiological parameter data, patient administrative data and fluid balance data.
A simulator design is also described, which allows for thorough validation and verification of the Patient Data Management System. This simulator can easily be extended for use as a teaching and training tool for PDMS users.
The database and simulator were developed in C and implemented under the OS/2 operating system environment. The database is based on the OS/2 Extended Edition relational Database Manager.
Yang, Haofan. "Reputation modelling in citizen science for environmental acoustic data analysis." Thesis, Queensland University of Technology, 2012. https://eprints.qut.edu.au/54657/1/Haofan_Yang_Thesis.pdf.
Повний текст джерелаWang, Yanchao. "Protein Structure Data Management System." Digital Archive @ GSU, 2007. http://digitalarchive.gsu.edu/cs_diss/20.
Повний текст джерелаNowak, Hans II(Hans Antoon). "Strategic capacity planning using data science, optimization, and machine learning." Thesis, Massachusetts Institute of Technology, 2020. https://hdl.handle.net/1721.1/126914.
Повний текст джерелаThesis: S.M., Massachusetts Institute of Technology, Department of Mechanical Engineering, in conjunction with the Leaders for Global Operations Program at MIT, May, 2020
Cataloged from the official PDF of thesis.
Includes bibliographical references (pages 101-104).
Raytheon's Circuit Card Assembly (CCA) factory in Andover, MA is Raytheon's largest factory and the largest Department of Defense (DOD) CCA manufacturer in the world. With over 500 operations, it manufactures over 7000 unique parts with a high degree of complexity and varying levels of demand. Recently, the factory has seen an increase in demand, making the ability to continuously analyze factory capacity and strategically plan for future operations much needed. This study seeks to develop a sustainable strategic capacity optimization model and capacity visualization tool that integrates demand data with historical manufacturing data. Through automated data mining algorithms of factory data sources, capacity utilization and overall equipment effectiveness (OEE) for factory operations are evaluated. Machine learning methods are then assessed to gain an accurate estimate of cycle time (CT) throughout the factory. Finally, a mixed-integer nonlinear program (MINLP) integrates the capacity utilization framework and machine learning predictions to compute the optimal strategic capacity planning decisions. Capacity utilization and OEE models are shown to be able to be generated through automated data mining algorithms. Machine learning models are shown to have a mean average error (MAE) of 1.55 on predictions for new data, which is 76.3% lower than the current CT prediction error. Finally, the MINLP is solved to optimality within a tolerance of 1.00e-04 and generates resource and production decisions that can be acted upon.
by Hans Nowak II.
M.B.A.
S.M.
M.B.A. Massachusetts Institute of Technology, Sloan School of Management
S.M. Massachusetts Institute of Technology, Department of Mechanical Engineering
Ahmad, Yasmeen. "Management, visualisation & mining of quantitative proteomics data." Thesis, University of Dundee, 2012. https://discovery.dundee.ac.uk/en/studentTheses/6ed071fc-e43b-410c-898d-50529dc298ce.
Повний текст джерелаSridharan, Vaikunth. "Sensor Data Streams Correlation Platform for Asthma Management." Wright State University / OhioLINK, 2018. http://rave.ohiolink.edu/etdc/view?acc_num=wright1527546937956439.
Повний текст джерелаOusterhout, Amy (Amy Elizabeth). "Flexplane : a programmable data plane for resource management in datacenters." Thesis, Massachusetts Institute of Technology, 2015. http://hdl.handle.net/1721.1/101584.
Повний текст джерелаCataloged from PDF version of thesis.
Includes bibliographical references (pages 47-51).
Network resource management schemes can significantly improve the performance of datacenter applications. However, it is difficult to experiment with and evaluate these schemes today because they require modifications to hardware routers. To address this we introduce Flexplane, a programmable network data plane for datacenters. Flexplane enables users to express their schemes in a high-level language (C++) and then run real datacenter applications over them at hardware rates. We demonstrate that Flexplane can accurately reproduce the behavior of schemes already supported in hardware (e.g. RED, DCTCP) and can be used to experiment with new schemes not yet supported in hardware, such as HULL. We also show that Flexplane is scalable and has the potential to support large networks.
by Amy Ousterhout.
S.M.
Cates, Josh 1977. "Robust and efficient data management for a distributed hash table." Thesis, Massachusetts Institute of Technology, 2003. http://hdl.handle.net/1721.1/87381.
Повний текст джерелаTsai, Eva Y. (Eva Yi-hua). "Inter-database data quality management : a relational-model based approach." Thesis, Massachusetts Institute of Technology, 1996. http://hdl.handle.net/1721.1/40202.
Повний текст джерелаMukkara, Anurag. "Techniques to improve dynamic cache management with static data classification." Thesis, Massachusetts Institute of Technology, 2016. http://hdl.handle.net/1721.1/105962.
Повний текст джерелаThis electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.
Cataloged from student-submitted PDF version of thesis.
Includes bibliographical references (pages 55-59).
Cache hierarchies are increasingly non-uniform and difficult to manage. Several techniques, such as scratchpads or reuse hints, use static information about how programs access data to manage the memory hierarchy. Static techniques are effective on regular programs, but because they set fixed policies, they are vulnerable to changes in program behavior or available cache space. Instead, most systems rely on dynamic caching policies that adapt to observed program behavior. Unfortunately, dynamic policies spend significant resources trying to learn how programs use memory, and yet they often perform worse than a static policy. This thesis presents Whirlpool, a novel approach that combines static information with dynamic policies to reap the benefits of each. Whirlpool statically classifies data into pools based on how the program uses memory. Whirlpool then uses dynamic policies to tune the cache to each pool. Hence, rather than setting policies statically, Whirlpool uses static analysis to guide dynamic policies. Whirlpool provides both an API that lets programmers specify pools manually and a profiling tool that discovers pools automatically in unmodified binaries. On a state-of-the-art NUCA cache, Whirlpool significantly outperforms prior approaches: on sequential programs, Whirlpool improves performance by up to 38% and reduces data movement energy by up to 53%; on parallel programs, Whirlpool improves performance by up to 67% and reduces data movement energy by up to 2.6x.
by Anurag Mukkara.
S.M.