Dissertations / Theses on the topic 'Data management and data science'
Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles
Consult the top 50 dissertations / theses for your research on the topic 'Data management and data science.'
Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.
You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.
Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.
Yang, Ying. "Interactive Data Management and Data Analysis." Thesis, State University of New York at Buffalo, 2017. http://pqdtopen.proquest.com/#viewpdf?dispub=10288109.
Full textEveryone today has a big data problem. Data is everywhere and in different formats, they can be referred to as data lakes, data streams, or data swamps. To extract knowledge or insights from the data or to support decision-making, we need to go through a process of collecting, cleaning, managing and analyzing the data. In this process, data cleaning and data analysis are two of the most important and time-consuming components.
One common challenge in these two components is a lack of interaction. The data cleaning and data analysis are typically done as a batch process, operating on the whole dataset without any feedback. This leads to long, frustrating delays during which users have no idea if the process is effective. Lacking interaction, human expert effort is needed to make decisions on which algorithms or parameters to use in the systems for these two components.
We should teach computers to talk to humans, not the other way around. This dissertation focuses on building systems --- Mimir and CIA --- that help user conduct data cleaning and analysis through interaction. Mimir is a system that allows users to clean big data in a cost- and time-efficient way through interaction, a process I call on-demand ETL. Convergent inference algorithms (CIA) are a family of inference algorithms in probabilistic graphical models (PGM) that enjoys the benefit of both exact and approximate inference algorithms through interaction.
Mimir provides a general language for user to express different data cleaning needs. It acts as a shim layer that wraps around the database making it possible for the bulk of the ETL process to remain within a classical deterministic system. Mimir also helps users to measure the quality of an analysis result and provides rankings for cleaning tasks to improve the result quality in a cost efficient manner. CIA focuses on providing user interaction through the process of inference in PGMs. The goal of CIA is to free users from the upfront commitment to either approximate or exact inference, and provide user more control over time/accuracy trade-offs to direct decision-making and computation instance allocations. This dissertation describes the Mimir and CIA frameworks to demonstrate that it is feasible to build efficient interactive data management and data analysis systems.
Dedge, Parks Dana M. "Defining Data Science and Data Scientist." Scholar Commons, 2017. http://scholarcommons.usf.edu/etd/7014.
Full textWason, Jasmin Lesley. "Automating data management in science and engineering." Thesis, University of Southampton, 2001. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.396143.
Full textWang, Yi. "Data Management and Data Processing Support on Array-Based Scientific Data." The Ohio State University, 2015. http://rave.ohiolink.edu/etdc/view?acc_num=osu1436157356.
Full textAnumalla, Kalyani. "DATA PREPROCESSING MANAGEMENT SYSTEM." University of Akron / OhioLINK, 2007. http://rave.ohiolink.edu/etdc/view?acc_num=akron1196650015.
Full textFernández, Moctezuma Rafael J. "A Data-Descriptive Feedback Framework for Data Stream Management Systems." PDXScholar, 2012. https://pdxscholar.library.pdx.edu/open_access_etds/116.
Full textNguyen, Benjamin. "Privacy-Centric Data Management." Habilitation à diriger des recherches, Université de Versailles-Saint Quentin en Yvelines, 2013. http://tel.archives-ouvertes.fr/tel-00936130.
Full textTran, Viet-Trung. "Scalable data-management systems for Big Data." Phd thesis, École normale supérieure de Cachan - ENS Cachan, 2013. http://tel.archives-ouvertes.fr/tel-00920432.
Full textNyström, Dag. "Data Management in Vehicle Control-Systems." Doctoral thesis, Mälardalen University, Department of Computer Science and Electronics, 2005. http://urn.kb.se/resolve?urn=urn:nbn:se:mdh:diva-66.
Full textAs the complexity of vehicle control-systems increases, the amount of information that these systems are intended to handle also increases. This thesis provides concepts relating to real-time database management systems to be used in such control-systems. By integrating a real-time database management system into a vehicle control-system, data management on a higher level of abstraction can be achieved. Current database management concepts are not sufficient for use in vehicles, and new concepts are necessary. A case-study at Volvo Construction Equipment Components AB in Eskilstuna, Sweden presented in this thesis, together with a survey of existing database platforms confirms this. The thesis specifically addresses data access issues by introducing; (i) a data access method, denoted database pointers, which enables data in a real-time database management system to be accessed efficiently. Database pointers, which resemble regular pointers variables, permit individual data elements in the database to be directly pointed out, without risking a violation of the database integrity. (ii) two concurrency-control algorithms, denoted 2V-DBP and 2V-DBP-SNAP which enable critical (hard real-time) and non-critical (soft real-time) data accesses to co-exist, without blocking of the hard real-time data accesses or risking unnecessary abortions of soft real-time data accesses. The thesis shows that 2V-DBP significantly outperforms a standard real-time concurrency control algorithm both with respect to lower response-times and minimized abortions. (iii) two concepts, denoted substitution and subscription queries that enable service- and diagnostics-tools to stimulate and monitor a control-system during run-time. The concepts presented in this thesis form a basis on which a data management concept suitable for embedded real-time systems, such as vehicle control-systems, can be built.
Ett modernt fordon är idag i princip helt styrt av inbyggda datorer. I takt med att funktionaliteten i fordonen ökar, blir programvaran i dessa datorer mer och mer komplex. Komplex programvara är svår och kostsam att konstruera. För att hantera denna komplexitet och underlätta konstruktion, satsar nu industrin på att finna metoder för att konstruera dessa system på en högre abstraktionsnivå. Dessa metoder syftar till att strukturera programvaran idess olika funktionella beståndsdelar, till exempel genom att använda så kallad komponentbaserad programvaruutveckling. Men, dessa metoder är inte effektiva vad gäller att hantera den ökande mängden information som följer med den ökande funktionaliteten i systemen. Exempel på information som skall hanteras är data från sensorer utspridda i bilen (temperaturer, tryck, varvtal osv.), styrdata från föraren (t.ex. rattutslag och gaspådrag), parameterdata, och loggdata som används för servicediagnostik. Denna information kan klassas som säkerhetskritisk eftersom den används för att styra beteendet av fordonet. På senare tid har dock mängden icke säkerhetskritisk information ökat, exempelvis i bekvämlighetssystem som multimedia-, navigations- och passagerarergonomisystem.
Denna avhandling syftar till att visa hur ett datahanteringssystem för inbyggda system, till exempel fordonssystem, kan konstrueras. Genom att använda ett realtidsdatabashanteringssystem för att lyfta upp datahanteringen på en högre abstraktionsnivå kan fordonssystem tillåtas att hantera stora mängder information på ett mycket enklare sätt än i nuvarande system. Ett sådant datahanteringssystem ger systemarkitekterna möjlighet att strukturera och modellera informationen på ett logiskt och överblickbart sätt. Informationen kan sedan läsas och uppdateras genom standardiserade gränssnitt anpassade förolika typer av funktionalitet. Avhandlingen behandlar specifikt problemet hur information i databasen, med hjälp av en concurrency-control algoritm, skall kunna delas av både säkerhetskritiska och icke säkerhetskritiska systemfunktioner i fordonet. Vidare avhandlas hur information kan distribueras både mellan olika datorsystem i fordonet, men också till diagnostik- och serviceverktyg som kan kopplas in i fordonet.
Karras, Panagiotis. "Data structures and algorithms for data representation in constrained environments." Thesis, Click to view the E-thesis via HKUTO, 2007. http://sunzi.lib.hku.hk/hkuto/record/B38897647.
Full textTatarinov, Igor. "Semantic data sharing with a peer data management system /." Thesis, Connect to this title online; UW restricted, 2004. http://hdl.handle.net/1773/6942.
Full textMatus, Castillejos Abel, and n/a. "Management of Time Series Data." University of Canberra. Information Sciences & Engineering, 2006. http://erl.canberra.edu.au./public/adt-AUC20070111.095300.
Full textVijayakumar, Nithya Nirmal. "Data management in distributed stream processing systems." [Bloomington, Ind.] : Indiana University, 2007. http://gateway.proquest.com/openurl?url_ver=Z39.88-2004&rft_val_fmt=info:ofi/fmt:kev:mtx:dissertation&res_dat=xri:pqdiss&rft_dat=xri:pqdiss:3278228.
Full textSource: Dissertation Abstracts International, Volume: 68-09, Section: B, page: 6093. Adviser: Beth Plale. Title from dissertation home page (viewed May 9, 2008).
Agbaw, Catherine E. (Catherine Ebenye). "Management data collection in a distributed environment." Thesis, McGill University, 1995. http://digitool.Library.McGill.CA:80/R/?func=dbin-jump-full&object_id=22713.
Full textAn approach for polling based on a variable polling frequency is proposed. A stateful model for a simple version CMIP proxy agent for SNMP which requires management information collected from SNMP agents to be stored in the proxy agent's MIB is also proposed. The proxy agent is implemented using the so-called OSIMIS-3.0 software package which implements CMIP, and an existing SNMP application. A policy of variable polling frequency which is based on the cost of polling, the cost of loss of relevant management information and the frequency of update of new information is used by the proxy agent. The agent is tested on a distributed network consisting of a LAN at McGill University and another LAN at the University of Montreal.
The results from the test show that using the above model of a proxy agent between CMIP and SNMP yields a better response time as compared to the stateless proxy agent model used by the Network Management Forum (NMF93), as well as an up-to-date information about the network to a CMIS manager during critical situations.
Zou, Beibei 1974. "Data mining with relational database management systems." Thesis, McGill University, 2005. http://digitool.Library.McGill.CA:80/R/?func=dbin-jump-full&object_id=82456.
Full textMa, Xuesong 1975. "Data mining using relational database management system." Thesis, McGill University, 2005. http://digitool.Library.McGill.CA:80/R/?func=dbin-jump-full&object_id=98757.
Full textTatikonda, Shirish. "Towards Efficient Data Analysis and Management of Semi-structured Data." The Ohio State University, 2010. http://rave.ohiolink.edu/etdc/view?acc_num=osu1275414859.
Full textKumar, Aman. "Metadata-Driven Management of Scientific Data." The Ohio State University, 2009. http://rave.ohiolink.edu/etdc/view?acc_num=osu1243898671.
Full textQuintero, Michael C. "Constructing a Clinical Research Data Management System." Thesis, University of South Florida, 2018. http://pqdtopen.proquest.com/#viewpdf?dispub=10640886.
Full textClinical study data is usually collected without knowing what kind of data is going to be collected in advance. In addition, all of the possible data points that can apply to a patient in any given clinical study is almost always a superset of the data points that are actually recorded for a given patient. As a result of this, clinical data resembles a set of sparse data with an evolving data schema. To help researchers at the Moffitt Cancer Center better manage clinical data, a tool was developed called GURU that uses the Entity Attribute Value model to handle sparse data and allow users to manage a database entity’s attributes without any changes to the database table definition. The Entity Attribute Value model’s read performance gets faster as the data gets sparser but it was observed to perform many times worse than a wide table if the attribute count is not sufficiently large. Ultimately, the design trades read performance for flexibility in the data schema.
Busack, Nancy Long. "The intelligent data object and its data base interface." Thesis, Kansas State University, 1985. http://hdl.handle.net/2097/9825.
Full textMa, Yu. "A composable data management architecture for scientific applications." [Bloomington, Ind.] : Indiana University, 2006. http://gateway.proquest.com/openurl?url_ver=Z39.88-2004&rft_val_fmt=info:ofi/fmt:kev:mtx:dissertation&res_dat=xri:pqdiss&rft_dat=xri:pqdiss:3243773.
Full textTitle from PDF t.p. (viewed Nov. 18, 2008). Source: Dissertation Abstracts International, Volume: 67-12, Section: B, page: 7170. Adviser: Randall Bramley.
Onolaja, Olufunmilola Oladunni. "Dynamic data-driven framework for reputation management." Thesis, University of Birmingham, 2012. http://etheses.bham.ac.uk//id/eprint/3824/.
Full textKelley, Ian Robert. "Data management in dynamic distributed computing environments." Thesis, Cardiff University, 2012. http://orca.cf.ac.uk/44477/.
Full textBranco, Miguel. "Distributed data management for large scale applications." Thesis, University of Southampton, 2009. https://eprints.soton.ac.uk/72283/.
Full textStrand, Mattias. "External Data Incorporation into Data Warehouses." Doctoral thesis, Kista : Skövde : Dept. of computer and system sciences, Stockholm University : School of humanities and informatics, University of Skövde, 2005. http://urn.kb.se/resolve?urn=urn:nbn:se:su:diva-660.
Full textKairouz, Joseph. "Patient data management system medical knowledge-base evaluation." Thesis, McGill University, 1996. http://digitool.Library.McGill.CA:80/R/?func=dbin-jump-full&object_id=24060.
Full textFollowing a literature survey on evaluation techniques and architecture of existing expert systems, an overview of the Patient Data Management System hardware and software components is presented. The design of the Expert Monitoring System is elaborated. Following its installation in the intensive Care Unit, the performance of the Expert Monitoring System is evaluated, operating on real vital sign data and corrections were formulated. A progressive evaluation technique, new methodology for evaluating an expert system knowledge-base is proposed for subsequent corrections and evaluations of the Expert Monitoring System.
Su, Yu. "Big Data Management Framework based on Virtualization and Bitmap Data Summarization." The Ohio State University, 2015. http://rave.ohiolink.edu/etdc/view?acc_num=osu1420738636.
Full textLi, Yujiang. "Development architecture for industrial data management." Licentiate thesis, KTH, Datorsystem för konstruktion och tillverkning, 2013. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-132244.
Full textQC 20131025
Tibbetts, Richard S. (Richard Singleton) 1979. "Linear Road : benchmarking stream-based data management systems." Thesis, Massachusetts Institute of Technology, 2003. http://hdl.handle.net/1721.1/18017.
Full textIncludes bibliographical references (p. 57-61).
This thesis describes the design, implementation, and execution of the Linear Road benchmark for stream-based data management systems. The motivation for benchmarking and the selection of the benchmark application are described. Test harness implementation is discussed, as are experiences using the benchmark to evaluate the Aurora engine. Effects of this work on the evolution of the Aurora engine are also discussed. Streams consist of continuous feeds of data from external data sources such as sensor networks or other monitoring systems. Stream data management systems execute continuous and historical queries over these streams, producing query results in real-time. This benchmark provides a means of comparing the functionality and performance of stream-based data management systems relative to each other and to relational systems. The benchmark presented is motivated by the increasing prevalence of "variable tolling" on highway systems throughout the world. Variable tolling uses dynamically determined factors such as congestion levels and accident proximity to calculate tolls. Linear Road specifies a variable tolling system for a fictional urban area, including such features as accident detection and alerts, traffic congestion measurements, toll calculations, and ad hoc requests for travel time predictions and account balances. This benchmark has already been adopted in the Aurora [ACC⁺03] and STREAM [MWA⁺03] streaming data management systems.
by Richard S. Tibbetts, III.
M.Eng.
Yip, Alexander Siumann 1979. "Improving web site security with data flow management." Thesis, Massachusetts Institute of Technology, 2009. http://hdl.handle.net/1721.1/54647.
Full textCataloged from PDF version of thesis.
Includes bibliographical references (p. 91-98).
This dissertation describes two systems, RESIN and BFLow, whose goal is to help Web developers build more secure Web sites. RESIN and BFLOW use data flow management to help reduce the security risks of using buggy or malicious code. RESIN provides programmers with language-level mechanisms to track and manage the flow of data within the server. These mechanisms make it easy for programmers to catch server-side data flow bugs that result in security vulnerabilities, and prevent these bugs from being exploited. BFLow is a system that adds information flow control, a restrictive form of data flow management, both to the Web browser and to the interface between a browser and a server. BFLOW makes it possible for a Web site to combine confidential data with untrusted JavaScript in its Web pages, without risking leaks of that data. This work makes a number of contributions. RESIN introduces the idea of a data flow assertion and demonstrates how to build them using three language-level mechanisms, policy objects, data tracking, and filter objects. We built prototype implementations of RESIN in both the PHP and Python runtimes. We adapt seven real off-the-shelf applications and implement 11 different security policies in RESIN which thwart at least 27 real security vulnerabilities. BFLow introduces an information flow control model that fits the JavaScript communication mechanisms, and a system that maps that model to JavaScript's existing isolation system.
(cont.) Together, these techniques allow untrusted JavaScript to read, compute with, and display confidential data without the risk of leaking that data, yet requires only minor changes to existing software. We built a prototype of the BFLow system and three different applications including a social networking application, a novel shared-data Web platform, and BFlogger, a third-party JavaScript platform similar to that of Blogger.com. We ported several untrusted JavaScript extensions from Blogger.com to BFlogger, and show that the extensions cannot leak data as they can in Blogger.com.
by Alexander Siumann Yip.
Ph.D.
Johnston, Steven. "Encouraging collaboration through a new data management approach." Thesis, University of Southampton, 2006. https://eprints.soton.ac.uk/65549/.
Full textRoger, Kathleen Mary Louise. "A nursing workload manager for a patient data management system /." Thesis, McGill University, 1992. http://digitool.Library.McGill.CA:80/R/?func=dbin-jump-full&object_id=61047.
Full textVellanki, Vivekanand. "Extending caching for two applications : disseminating live data and accessing data from disks." Diss., Georgia Institute of Technology, 2001. http://hdl.handle.net/1853/9243.
Full textLee, Jong Sik. "Space-based data management for high-performance distributed simulation." Diss., The University of Arizona, 2001. http://hdl.handle.net/10150/279803.
Full textLofstead, Gerald Fredrick. "Extreme scale data management in high performance computing." Diss., Georgia Institute of Technology, 2010. http://hdl.handle.net/1853/37232.
Full textWeigel, Tobias [Verfasser], and Thomas [Akademischer Betreuer] Ludwig. "Persistent Identifiers for Earth Science Data Management / Tobias Weigel. Betreuer: Thomas Ludwig." Hamburg : Staats- und Universitätsbibliothek Hamburg, 2016. http://d-nb.info/1097561712/34.
Full textWeigel, Tobias Verfasser], and Thomas [Akademischer Betreuer] [Ludwig. "Persistent Identifiers for Earth Science Data Management / Tobias Weigel. Betreuer: Thomas Ludwig." Hamburg : Staats- und Universitätsbibliothek Hamburg, 2016. http://d-nb.info/1097561712/34.
Full textRosenfeld, Abraham M. "Data collection and management of a mobile sensor platform." Thesis, Massachusetts Institute of Technology, 2013. http://hdl.handle.net/1721.1/85486.
Full textCataloged from PDF version of thesis.
Includes bibliographical references (page 53).
This thesis explores the development of a platform to better collect and manage data from multiple senor inputs mounted on a car sensor platform. Specifically, focusing on the collection and synchronization of multiple forms of data across a single mobile sensor system. The project will be implemented for three versions of a light-sensing platform, and will cover the different methods of data collection and different types of sensor devices implemented in each version. It will also cover the different technical challenges faced when collecting and managing data across multiple mobile sensors.
by Abraham M. Rosenfeld.
M. Eng.
Lisanskiy, Ilya 1976. "A data model for the Haystack document management system." Thesis, Massachusetts Institute of Technology, 1999. http://hdl.handle.net/1721.1/80103.
Full textIncludes bibliographical references (p. 97-98).
by Ilya Lisanskiy.
S.B.and M.Eng.
Lu, Kaiyuan. "Data distribution management schemes for HLA-compliant distributed simulation systems." Thesis, University of Ottawa (Canada), 2006. http://hdl.handle.net/10393/27151.
Full textFumai, Nicola. "A database for an intensive care unit patient data management system." Thesis, McGill University, 1992. http://digitool.Library.McGill.CA:80/R/?func=dbin-jump-full&object_id=22500.
Full textComputers can help by processing the data and displaying the information in easy to understand formats. Also, knowledge-based systems can provide advice in diagnosis and treatment of patients. If these systems are to be effective, they must be integrated into the total hospital information system and the separate computer data must be jointly integrated into a new database which will become the primary medical record.
This thesis presents the design and implementation of a computerized database for an intensive care unit patient data management system being developed for the Montreal Children's Hospital. The database integrates data from the various PDMS components into one logical information store. The patient data currently managed includes physiological parameter data, patient administrative data and fluid balance data.
A simulator design is also described, which allows for thorough validation and verification of the Patient Data Management System. This simulator can easily be extended for use as a teaching and training tool for PDMS users.
The database and simulator were developed in C and implemented under the OS/2 operating system environment. The database is based on the OS/2 Extended Edition relational Database Manager.
Yang, Haofan. "Reputation modelling in citizen science for environmental acoustic data analysis." Thesis, Queensland University of Technology, 2012. https://eprints.qut.edu.au/54657/1/Haofan_Yang_Thesis.pdf.
Full textWang, Yanchao. "Protein Structure Data Management System." Digital Archive @ GSU, 2007. http://digitalarchive.gsu.edu/cs_diss/20.
Full textNowak, Hans II(Hans Antoon). "Strategic capacity planning using data science, optimization, and machine learning." Thesis, Massachusetts Institute of Technology, 2020. https://hdl.handle.net/1721.1/126914.
Full textThesis: S.M., Massachusetts Institute of Technology, Department of Mechanical Engineering, in conjunction with the Leaders for Global Operations Program at MIT, May, 2020
Cataloged from the official PDF of thesis.
Includes bibliographical references (pages 101-104).
Raytheon's Circuit Card Assembly (CCA) factory in Andover, MA is Raytheon's largest factory and the largest Department of Defense (DOD) CCA manufacturer in the world. With over 500 operations, it manufactures over 7000 unique parts with a high degree of complexity and varying levels of demand. Recently, the factory has seen an increase in demand, making the ability to continuously analyze factory capacity and strategically plan for future operations much needed. This study seeks to develop a sustainable strategic capacity optimization model and capacity visualization tool that integrates demand data with historical manufacturing data. Through automated data mining algorithms of factory data sources, capacity utilization and overall equipment effectiveness (OEE) for factory operations are evaluated. Machine learning methods are then assessed to gain an accurate estimate of cycle time (CT) throughout the factory. Finally, a mixed-integer nonlinear program (MINLP) integrates the capacity utilization framework and machine learning predictions to compute the optimal strategic capacity planning decisions. Capacity utilization and OEE models are shown to be able to be generated through automated data mining algorithms. Machine learning models are shown to have a mean average error (MAE) of 1.55 on predictions for new data, which is 76.3% lower than the current CT prediction error. Finally, the MINLP is solved to optimality within a tolerance of 1.00e-04 and generates resource and production decisions that can be acted upon.
by Hans Nowak II.
M.B.A.
S.M.
M.B.A. Massachusetts Institute of Technology, Sloan School of Management
S.M. Massachusetts Institute of Technology, Department of Mechanical Engineering
Ahmad, Yasmeen. "Management, visualisation & mining of quantitative proteomics data." Thesis, University of Dundee, 2012. https://discovery.dundee.ac.uk/en/studentTheses/6ed071fc-e43b-410c-898d-50529dc298ce.
Full textSridharan, Vaikunth. "Sensor Data Streams Correlation Platform for Asthma Management." Wright State University / OhioLINK, 2018. http://rave.ohiolink.edu/etdc/view?acc_num=wright1527546937956439.
Full textOusterhout, Amy (Amy Elizabeth). "Flexplane : a programmable data plane for resource management in datacenters." Thesis, Massachusetts Institute of Technology, 2015. http://hdl.handle.net/1721.1/101584.
Full textCataloged from PDF version of thesis.
Includes bibliographical references (pages 47-51).
Network resource management schemes can significantly improve the performance of datacenter applications. However, it is difficult to experiment with and evaluate these schemes today because they require modifications to hardware routers. To address this we introduce Flexplane, a programmable network data plane for datacenters. Flexplane enables users to express their schemes in a high-level language (C++) and then run real datacenter applications over them at hardware rates. We demonstrate that Flexplane can accurately reproduce the behavior of schemes already supported in hardware (e.g. RED, DCTCP) and can be used to experiment with new schemes not yet supported in hardware, such as HULL. We also show that Flexplane is scalable and has the potential to support large networks.
by Amy Ousterhout.
S.M.
Cates, Josh 1977. "Robust and efficient data management for a distributed hash table." Thesis, Massachusetts Institute of Technology, 2003. http://hdl.handle.net/1721.1/87381.
Full textTsai, Eva Y. (Eva Yi-hua). "Inter-database data quality management : a relational-model based approach." Thesis, Massachusetts Institute of Technology, 1996. http://hdl.handle.net/1721.1/40202.
Full textMukkara, Anurag. "Techniques to improve dynamic cache management with static data classification." Thesis, Massachusetts Institute of Technology, 2016. http://hdl.handle.net/1721.1/105962.
Full textThis electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.
Cataloged from student-submitted PDF version of thesis.
Includes bibliographical references (pages 55-59).
Cache hierarchies are increasingly non-uniform and difficult to manage. Several techniques, such as scratchpads or reuse hints, use static information about how programs access data to manage the memory hierarchy. Static techniques are effective on regular programs, but because they set fixed policies, they are vulnerable to changes in program behavior or available cache space. Instead, most systems rely on dynamic caching policies that adapt to observed program behavior. Unfortunately, dynamic policies spend significant resources trying to learn how programs use memory, and yet they often perform worse than a static policy. This thesis presents Whirlpool, a novel approach that combines static information with dynamic policies to reap the benefits of each. Whirlpool statically classifies data into pools based on how the program uses memory. Whirlpool then uses dynamic policies to tune the cache to each pool. Hence, rather than setting policies statically, Whirlpool uses static analysis to guide dynamic policies. Whirlpool provides both an API that lets programmers specify pools manually and a profiling tool that discovers pools automatically in unmodified binaries. On a state-of-the-art NUCA cache, Whirlpool significantly outperforms prior approaches: on sequential programs, Whirlpool improves performance by up to 38% and reduces data movement energy by up to 53%; on parallel programs, Whirlpool improves performance by up to 67% and reduces data movement energy by up to 2.6x.
by Anurag Mukkara.
S.M.