Dissertations / Theses on the topic 'Data center'
Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles
Consult the top 50 dissertations / theses for your research on the topic 'Data center.'
Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.
You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.
Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.
Wiswell, Shane. "Data center migration." [Denver, Colo.] : Regis University, 2007. http://165.236.235.140/lib/SWiswell2007.pdf.
Full textSehery, Wile Ali. "OneSwitch Data Center Architecture." Diss., Virginia Tech, 2018. http://hdl.handle.net/10919/94376.
Full textPHD
Sergejev, Ivan. "Exposing the Data Center." Thesis, Virginia Tech, 2014. http://hdl.handle.net/10919/51838.
Full textMaster of Architecture
Wang, Qinjin. "Multi Data center Transaction Chain : Achieving ACID for cross data center multi-key transactions." Thesis, KTH, Skolan för informations- och kommunikationsteknik (ICT), 2016. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-198664.
Full textTalarico, Gui. "Urban Data Center: A Architectural Celebration of Data." Thesis, Virginia Tech, 2011. http://hdl.handle.net/10919/42855.
Full textMaster of Architecture
Müller, Thomas. "Innovative Technologien im Data Center." Universitätsbibliothek Chemnitz, 2009. http://nbn-resolving.de/urn:nbn:de:bsz:ch1-200900947.
Full textBjarnadóttir, Margrét Vilborg. "Data-driven approach to health care : applications using claims data." Thesis, Massachusetts Institute of Technology, 2008. http://hdl.handle.net/1721.1/45946.
Full textIncludes bibliographical references (p. 123-130).
Large population health insurance claims databases together with operations research and data mining methods have the potential of significantly impacting health care management. In this thesis we research how claims data can be utilized in three important areas of health care and medicine and apply our methods to a real claims database containing information of over two million health plan members. First, we develop forecasting models for health care costs that outperform previous results. Secondly, through examples we demonstrate how large-scale databases and advanced clustering algorithms can lead to discovery of medical knowledge. Lastly, we build a mathematical framework for a real-time drug surveillance system, and demonstrate with real data that side effects can be discovered faster than with the current post-marketing surveillance system.
by Margrét Vilborg Bjarnadóttir.
Ph.D.
Le, Guen Thibault. "Data-driven pricing." Thesis, Massachusetts Institute of Technology, 2008. http://hdl.handle.net/1721.1/45627.
Full textThis electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.
Includes bibliographical references (p. 143-146).
In this thesis, we develop a pricing strategy that enables a firm to learn the behavior of its customers as well as optimize its profit in a monopolistic setting. The single product case as well as the multi product case are considered under different parametric forms of demand, whose parameters are unknown to the manager. For the linear demand case in the single product setting, our main contribution is an algorithm that guarantees almost sure convergence of the estimated demand parameters to the true parameters. Moreover, the pricing strategy is also asymptotically optimal. Simulations are run to study the sensitivity to different parameters.Using our results on the single product case, we extend the approach to the multi product case with linear demand. The pricing strategy we introduce is easy to implement and guarantees not only learning of the demand parameters but also maximization of the profit. Finally, other parametric forms of the demand are considered. A heuristic that can be used for many parametric forms of the demand is introduced, and is shown to have good performance in practice.
by Thibault Le Guen.
S.M.
Javanshir, Marjan. "DC distribution system for data center." Thesis, Click to view the E-thesis via HKUTO, 2007. http://sunzi.lib.hku.hk/hkuto/record/B39344952.
Full textBennion, Laird. "Identifying data center supply and demand." Thesis, Massachusetts Institute of Technology, 2016. http://hdl.handle.net/1721.1/103457.
Full textCataloged from PDF version of thesis.
Includes bibliographical references (pages 66-69).
This thesis documents new methods for gauging supply and demand of data center capacity and addresses issues surrounding potential threats to data center demand. This document is divided between a primer on the composition and engineering of a current data center, discussion of issues surrounding data center demand, Moore's Law and cloud computing, and then transitions to presentation of research on data center demand and supply.
by Laird Bennion.
S.M. in Real Estate Development
Mahood, Christian. "Data center design & enterprise networking /." Online version of thesis, 2009. http://hdl.handle.net/1850/8699.
Full textSoares, Maria José. "Data center - a importância de uma arquitectura." Master's thesis, Universidade de Évora, 2011. http://hdl.handle.net/10174/11604.
Full textPipkin, Everest R. "It Was Raining in the Data Center." Research Showcase @ CMU, 2018. http://repository.cmu.edu/theses/138.
Full textJohansson, Jennifer. "Cooling storage for 5G EDGE data center." Thesis, Luleå tekniska universitet, Institutionen för teknikvetenskap och matematik, 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:ltu:diva-79126.
Full textLi, Yi. "Speaker Diarization System for Call-center data." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-286677.
Full textFör att svara på frågan vem som talade när är högtalardarisering (SD) ett kritiskt steg för många talapplikationer i praktiken. Uppdraget med vårt projekt är att bygga ett MFCC-vektorbaserat högtalar-diariseringssystem ovanpå ett högtalarverifieringssystem (SV), som är ett befintligt Call-center-program för att kontrollera kundens identitet från ett telefonsamtal. Vårt högtalarsystem använder 13-dimensionella MFCC: er som funktioner, utför Voice Active Detection (VAD), segmentering, linjär gruppering och hierarkisk gruppering baserat på GMM och BIC-poäng. Genom att tillämpa den minskar vi EER (Equal Error Rate) från 18,1 % i baslinjeexperimentet till 3,26 % för de allmänna samtalscentret. För att bättre analysera och utvärdera systemet simulerade vi också en uppsättning callcenter-data baserat på de offentliga ljuddatabaserna ICSI corpus.
LeBlanc, Robert-Lee Daniel. "Analysis of Data Center Network Convergence Technologies." BYU ScholarsArchive, 2014. https://scholarsarchive.byu.edu/etd/4150.
Full textDesmouceaux, Yoann. "Network-Layer Protocols for Data Center Scalability." Thesis, Université Paris-Saclay (ComUE), 2019. http://www.theses.fr/2019SACLX011/document.
Full textWith the development of demand for computing resources, data center architectures are growing both in scale and in complexity.In this context, this thesis takes a step back as compared to traditional network approaches, and shows that providing generic primitives directly within the network layer is a great way to improve efficiency of resource usage, and decrease network traffic and management overhead.Using recently-introduced network architectures, Segment Routing (SR) and Bit-Indexed Explicit Replication (BIER), network layer protocols are designed and analyzed to provide three high-level functions: (1) task mobility, (2) reliable content distribution and (3) load-balancing.First, task mobility is achieved by using SR to provide a zero-loss virtual machine migration service.This then opens the opportunity for studying how to orchestrate task placement and migration while aiming at (i) maximizing the inter-task throughput, while (ii) maximizing the number of newly-placed tasks, but (iii) minimizing the number of tasks to be migrated.Second, reliable content distribution is achieved by using BIER to provide a reliable multicast protocol, in which retransmissions of lost packets are targeted towards the precise set of destinations having missed that packet, thus incurring a minimal traffic overhead.To decrease the load on the source link, this is then extended to enable retransmissions by local peers from the same group, with SR as a helper to find a suitable retransmission candidate.Third, load-balancing is achieved by way of using SR to distribute queries through several application candidates, each of which taking local decisions as to whether to accept those, thus achieving better fairness as compared to centralized approaches.The feasibility of hardware implementation of this approach is investigated, and a solution using covert channels to transparently convey information to the load-balancer is implemented for a state-of-the-art programmable network card.Finally, the possibility of providing autoscaling as a network service is investigated: by letting queries go through a fixed chain of applications using SR, autoscaling is triggered by the last instance, depending on its local state
RUIU, PIETRO. "Energy Management in Large Data Center Networks." Doctoral thesis, Politecnico di Torino, 2018. http://hdl.handle.net/11583/2706336.
Full textShioda, Romy 1977. "Integer optimization in data mining." Thesis, Massachusetts Institute of Technology, 2003. http://hdl.handle.net/1721.1/17579.
Full textIncludes bibliographical references (p. 103-107).
While continuous optimization methods have been widely used in statistics and data mining over the last thirty years, integer optimization has had very limited impact in statistical computation. Thus, our objective is to develop a methodology utilizing state of the art integer optimization methods to exploit the discrete character of data mining problems. The thesis consists of two parts: The first part illustrates a mixed-integer optimization method for classification and regression that we call Classification and Regression via Integer Optimization (CRIO). CRIO separates data points in different polyhedral regions. In classification each region is assigned a class, while in regression each region has its own distinct regression coefficients. Computational experimentation with real data sets shows that CRIO is comparable to and often outperforms the current leading methods in classification and regression. The second part describes our cardinality-constrained quadratic mixed-integer optimization algorithm, used to solve subset selection in regression and portfolio selection in asset allocation. We take advantage of the special structures of these problems by implementing a combination of implicit branch-and-bound, Lemke's pivoting method, variable deletion and problem reformulation. Testing against popular heuristic methods and CPLEX 8.0's quadratic mixed-integer solver, we see that our tailored approach to these quadratic variable selection problems have significant advantages over simple heuristics and generalized solvers.
by Romy Shioda.
Ph.D.
Ehret, Anna. "Entwicklung und Evaluation eines Förder-Assessment-Centers für Mitarbeiter der internationalen Jugendarbeit (FAIJU)." Berlin wvb, Wiss. Verl, 2006. http://www.wvberlin.de/data/inhalt/ehret.htm.
Full textGreen, George Michael. "Reducing Peak Power Consumption in Data Centers." The Ohio State University, 2013. http://rave.ohiolink.edu/etdc/view?acc_num=osu1386068818.
Full textCheung, Wang Chi. "Data-driven algorithms for operational problems." Thesis, Massachusetts Institute of Technology, 2017. http://hdl.handle.net/1721.1/108916.
Full textCataloged from PDF version of thesis.
Includes bibliographical references (pages 173-180).
In this thesis, we propose algorithms for solving revenue maximization and inventory control problems in data-driven settings. First, we study the choice-based network revenue management problem. We propose the Approximate Column Generation heuristic (ACG) and Potential Based algorithm (PB) for solving the Choice-based Deterministic Linear Program, an LP relaxation to the problem, to near-optimality. Both algorithms only assume the ability to approximate the underlying single period problem. ACG inherits the empirical efficiency from the Column Generation heuristic, while PB enjoys provable efficiency guarantee. Building on these tractability results, we design an earning-while-learning policy for the online problem under a Multinomial Logit choice model with unknown parameters. The policy is efficient, and achieves a regret sublinear in the length of the sales horizon. Next, we consider the online dynamic pricing problem, where the underlying demand function is not known to the monopolist. The monopolist is only allowed to make a limited number of price changes during the sales horizon, due to administrative constraints. For any integer m, we provide an information theoretic lower bound on the regret incurred by any pricing policy with at most m price changes. The bound is the best possible, as it matches the regret upper bound incurred by our proposed policy, up to a constant factor. Finally, we study the data-driven capacitated stochastic inventory control problem, where the demand distributions can only be accessed through sampling from offline data. We apply the Sample Average Approximation (SAA) method, and establish a polynomial size upper bound on the number of samples needed to achieve a near-optimal expected cost. Nevertheless, the underlying SAA problem is shown to be #P hard. Motivated by the SAA analysis, we propose a randomized polynomial time approximation scheme which also uses polynomially many samples. To complement our results, we establish an information theoretic lower bound on the number of samples needed to achieve near optimality.
by Wang Chi Cheung.
Ph. D.
Snyder, Ashley M. (Ashley Marie). "Data mining and visualization : real time predictions and pattern discovery in hospital emergency rooms and immigration data." Thesis, Massachusetts Institute of Technology, 2010. http://hdl.handle.net/1721.1/61199.
Full textCataloged from PDF version of thesis.
Includes bibliographical references (p. 163-166).
Data mining is a versatile and expanding field of study. We show the applications and uses of a variety of techniques in two very different realms: Emergency department (ED) length of stay prediction and visual analytics. For the ED, we investigate three data mining techniques to predict a patient's length of stay based solely on the information available at the patient's arrival. We achieve good predictive power using Decision Tree Analysis. Our results show that by using main characteristics about the patient, such as chief complaint, age, time of day of the arrival, and the condition of the ED, we can predict overall patient length of stay to specific hourly ranges with an accuracy of 80%. For visual analytics, we demonstrate how to mathematically determine the optimal number of clusters for a geospatial dataset containing both numeric and categorical data and then how to compare each cluster to the entire dataset as well as consider pairwise differences. We then incorporate our analytical methodology in visual display. Our results show that we can quickly and effectively measure differences between clusters and we can accurately find the optimal number of clusters in non-noisy datasets.
by Ashley M. Snyder.
S.M.
König, Ralf. "HP UDC - Standardizing and Automizing Data Center Operations." Universitätsbibliothek Chemnitz, 2004. http://nbn-resolving.de/urn:nbn:de:swb:ch1-200400394.
Full textWorkshop "Netz- und Service-Infrastrukturen" Die Präsentation beinhaltet einige allg. Fakten zum Utility Data Center von HP sowie die Ergebnisse meiner praktischen Arbeit in den HP Labs in Vorbereitung auf die Diplomarbeit
Zhuang, Hao. "Performance Evaluation of Virtualization in Cloud Data Center." Thesis, KTH, Skolan för informations- och kommunikationsteknik (ICT), 2012. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-104206.
Full textAmazon Elastic Compute Cloud (EC2) har antagits av ett stort antal små och medelstora företag (SMB), t.ex. foursquare, Monster World, och Netflix, för att ge olika typer av tjänster. Det finns en del tidigare arbeten i den aktuella litteraturen som undersöker variationen och oförutsägbarheten av molntjänster. Dessa arbetenhar visat intressanta iakttagelser om molnerbjudanden, men de har misslyckats med att avslöja den underliggande kärnan hos de olika utseendena för molntjänster. I denna avhandling tittade vi på de underliggande schemaläggningsmekanismerna och maskinvarukonfigurationer i Amazon EC2, och undersökte deras inverkan på resultatet för de virtuella maskiners instanser som körs ovanpå. Närmare bestämt är det flera fall med standard- och hög-CPU instanser som omfattas att belysa uppgradering av hårdvara och utbyte av Amazon EC2. Stora instanser från standardfamiljen är valda för att genomföra en fokusanalys. För att bättre förstå olika beteenden av de olika instanserna har lokala kluster miljöer inrättas, dessa klustermiljöer består av två Intel Xeonservrar och har inrättats med hjälp av olika schemaläggningsalgoritmer. Genom en serie benchmarkmätningar observerade vi följande slutsatser: (1) Amazon använder mycket diversifierad hårdvara för att tillhandahållandet olika instanser. Från de olika instans-sub-typernas perspektiv leder hårdvarumångfald till betydande prestationsvariation som kan nå upp till 30%. (2) Två olika schemaläggningsmekanismer observerades, en liknande Simple Earliest Deadline Fist(SEDF) schemaläggare, medan den andra mer liknar Credit-schemaläggaren i Xenhypervisor. Dessa två schemaläggningsmekanismer ger även upphov till variationer i prestanda. (3) Genom att tillämpa en enkel "trial-and-failure" strategi för val av instans, är kostnadsbesparande förvånansvärt stor. Med tanke på fördelning av snabba och långsamma instanser kan kostnadsbesparingen uppgå till 30%, vilket är attraktivt för små och medelstora företag som använder Amazon EC2 plattform.
Mohammadnezhad, Mahdi. "Evaluating Stream Protocol for a Data Stream Center." Thesis, Linnéuniversitetet, Institutionen för datavetenskap (DV), 2016. http://urn.kb.se/resolve?urn=urn:nbn:se:lnu:diva-55761.
Full textHe, Chunzhi, and 何春志. "Load-balanced switch design and data center networking." Thesis, The University of Hong Kong (Pokfulam, Hong Kong), 2014. http://hdl.handle.net/10722/198826.
Full textpublished_or_final_version
Electrical and Electronic Engineering
Doctoral
Doctor of Philosophy
Mitteff, Eric. "AUTOMATED ADAPTIVE DATA CENTER GENERATION FOR MESHLESS METHODS." Master's thesis, University of Central Florida, 2006. http://digital.library.ucf.edu/cdm/ref/collection/ETD/id/2635.
Full textM.S.M.E.
Department of Mechanical, Materials and Aerospace Engineering;
Engineering and Computer Science
Mechanical Engineering
Humr, Scott A. "Understanding return on investment for data center consolidation." Thesis, Monterey, California: Naval Postgraduate School, 2013. http://hdl.handle.net/10945/37641.
Full textThe federal government has mandated that agencies consolidate data centers in order to gain efficiencies and cost savings. It is a well-established fact that both public and private organizations have reported considerable cost savings from consolidating data centers; however, in the case of federal agencies, no established methodology for valuing the benefits has been delineated. Nevertheless, numerous federal policies mandate that investments in IT demonstrate a positive return on investment (ROI). The problem is that the Department of Defense does not have clear instructions on how to measure ROI in order to evaluate an opportunity to consolidate data centers. While calculating ROI for IT can be very challenging, most private and public firms have methods for demonstrating a return ratio and not only cost savings. Therefore, choosing metrics and methodologies for calculating ROI is an important step in the decision-making process. This complexity complicates estimating a data centers utility and the true value generation of merging data centers. This thesis will explore the challenges that the Marine Corps faces for calculating ROI for data center consolidation.
Eriksson, Martin. "Monitoring, Modelling and Identification of Data Center Servers." Thesis, Luleå tekniska universitet, Institutionen för system- och rymdteknik, 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:ltu:diva-69342.
Full textHassen, Fadoua. "Multistage packet-switching fabrics for data center networks." Thesis, University of Leeds, 2017. http://etheses.whiterose.ac.uk/17620/.
Full textPfeiffer, Jessica. "Datascapes: Envisioning a New Kind of Data Center." University of Cincinnati / OhioLINK, 2020. http://rave.ohiolink.edu/etdc/view?acc_num=ucin158399900447231.
Full textRudén, Philip. "FLUID SIMULATIONS FOR A AIRRECIRULATED DATA CENTER-GREENHOUSE." Thesis, Luleå tekniska universitet, Institutionen för teknikvetenskap och matematik, 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:ltu:diva-85514.
Full textLundin, Lowe. "Artificial Intelligence for Data Center Power Consumption Optimisation." Thesis, Uppsala universitet, Avdelningen för systemteknik, 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-447627.
Full textPamboris, Andreas. "LDP location discovery protocol for data center networks /." Diss., [La Jolla] : University of California, San Diego, 2009. http://wwwlib.umi.com/cr/ucsd/fullcit?p1467934.
Full textTitle from first page of PDF file (viewed September 17, 2009). Available via ProQuest Digital Dissertations. Includes bibliographical references (p. 47).
Pulice, Alessandro. "Il problema del risparmio energetico nei data center." Bachelor's thesis, Alma Mater Studiorum - Università di Bologna, 2011. http://amslaurea.unibo.it/2388/.
Full textGupta, Vishal Ph D. Massachusetts Institute of Technology. "Data-driven models for uncertainty and behavior." Thesis, Massachusetts Institute of Technology, 2014. http://hdl.handle.net/1721.1/91301.
Full textThis electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.
117
Cataloged from student-submitted PDF version of thesis.
Includes bibliographical references (pages 173-180).
The last decade has seen an explosion in the availability of data. In this thesis, we propose new techniques to leverage these data to tractably model uncertainty and behavior. Specifically, this thesis consists of three parts: In the first part, we propose a novel schema for utilizing data to design uncertainty sets for robust optimization using hypothesis testing. The approach is flexible and widely applicable, and robust optimization problems built from our new data driven sets are computationally tractable, both theoretically and practically. Optimal solutions to these problems enjoy a strong, finite-sample probabilistic guarantee. Computational evidence from classical applications of robust optimization { queuing and portfolio management { confirm that our new data-driven sets significantly outperform traditional robust optimization techniques whenever data is available. In the second part, we examine in detail an application of the above technique to the unit commitment problem. Unit commitment is a large-scale, multistage optimization problem under uncertainty that is critical to power system operations. Using real data from the New England market, we illustrate how our proposed data-driven uncertainty sets can be used to build high-fidelity models of the demand for electricity, and that the resulting large-scale, mixed-integer adaptive optimization problems can be solved efficiently. With respect to this second contribution, we propose new data-driven solution techniques for this class of problems inspired by ideas from machine learning. Extensive historical back-testing confirms that our proposed approach generates high quality solutions that compare with state-of-the-art methods. In the third part, we focus on behavioral modeling. Utility maximization (single agent case) and equilibrium modeling (multi-agent case) are by far the most common behavioral models in operations research. By combining ideas from inverse optimization with the theory of variational inequalities, we develop an efficient, data-driven technique for estimating the primitives of these models. Our approach supports both parametric and nonparametric estimation through kernel learning. We prove that our estimators enjoy a strong generalization guarantee even when the model is misspecified. Finally, we present computational evidence from applications in economics and transportation science illustrating the effectiveness of our approach and its scalability to large-scale instances.
by Vishal Gupta.
Ph. D.
McCord, Christopher George. "Data-driven dynamic optimization with auxiliary covariates." Thesis, Massachusetts Institute of Technology, 2019. https://hdl.handle.net/1721.1/122098.
Full textThesis: Ph. D., Massachusetts Institute of Technology, Sloan School of Management, Operations Research Center, 2019
Cataloged from student-submitted PDF version of thesis.
Includes bibliographical references (pages 183-190).
Optimization under uncertainty forms the foundation for many of the fundamental problems the operations research community seeks to solve. In this thesis, we develop and analyze algorithms that incorporate ideas from machine learning to optimize uncertain objectives directly from data. In the first chapter, we consider problems in which the decision affects the observed outcome, such as in personalized medicine and pricing. We present a framework for using observational data to learn to optimize an uncertain objective over a continuous and multi-dimensional decision space. Our approach accounts for the uncertainty in predictions, and we provide theoretical results that show this adds value. In addition, we test our approach on a Warfarin dosing example, and it outperforms the leading alternative methods.
In the second chapter, we develop an approach for solving dynamic optimization problems with covariates that uses machine learning to approximate the unknown stochastic process of the uncertainty. We provide theoretical guarantees on the effectiveness of our method and validate the guarantees with computational experiments. In the third chapter, we introduce a distributionally robust approach for incorporating covariates in large-scale, data-driven dynamic optimization. We prove that it is asymptotically optimal and provide a tractable general-purpose approximation scheme that scales to problems with many temporal stages. Across examples in shipment planning, inventory management, and finance, our method achieves improvements of up to 15% over alternatives. In the final chapter, we apply the techniques developed in previous chapters to the problem of optimizing the operating room schedule at a major US hospital.
Our partner institution faces significant census variability throughout the week, which limits the amount of patients it can accept due to resource constraints at peak times. We introduce a data-driven approach for this problem that combines machine learning with mixed integer optimization and demonstrate that it can reliably reduce the maximal weekly census.
by Christopher George McCord.
Ph. D.
Ph.D. Massachusetts Institute of Technology, Sloan School of Management, Operations Research Center
Blanks, Zachary D. "A generalized hierarchical approach for data labeling." Thesis, Massachusetts Institute of Technology, 2019. https://hdl.handle.net/1721.1/122386.
Full textThesis: S.M., Massachusetts Institute of Technology, Sloan School of Management, Operations Research Center, 2019
Cataloged from PDF version of thesis.
Includes bibliographical references (pages 85-90).
The goal of this thesis was to develop a data type agnostic classification algorithm best suited for problems where there are a large number of similar labels (e.g., classifying a port versus a shipyard). The most common approach to this issue is to simply ignore it, and attempt to fit a classifier against all targets at once (a "flat" classifier). The problem with this technique is that it tends to do poorly due to label similarity. Conversely, there are other existing approaches, known as hierarchical classifiers (HCs), which propose clustering heuristics to group the labels. However, the most common HCs require that a "flat" model be trained a-priori before the label hierarchy can be learned. The primary issue with this approach is that if the initial estimator performs poorly then the resulting HC will have a similar rate of error.
To solve these challenges, we propose three new approaches which learn the label hierarchy without training a model beforehand and one which generalizes the standard HC. The first technique employs a k-means clustering heuristic which groups classes into a specified number of partitions. The second method takes the previously developed heuristic and formulates it as a mixed integer program (MIP). Employing a MIP allows the user to have greater control over the resulting label hierarchy by imposing meaningful constraints. The third approach learns meta-classes by using community detection algorithms on graphs which simplifies the hyper-parameter space when training an HC. Finally, the standard HC methodology is generalized by relaxing the requirement that the original model must be a "flat" classifier; instead, one can provide any of the HC approaches detailed previously as the initializer.
By giving the model a better starting point, the final estimator has a greater chance of yielding a lower error rate. To evaluate the performance of our methods, we tested them on a variety of data sets which contain a large number of similar labels. We observed the k-means clustering heuristic or community detection algorithm gave statistically significant improvements in out-of-sample performance against a flat and standard hierarchical classifier. Consequently our approach offers a solution to overcome problems for labeling data with similar classes.
by Zachary D. Blanks.
S.M.
S.M. Massachusetts Institute of Technology, Sloan School of Management, Operations Research Center
Ng, Yee Sian. "Advances in data-driven models for transportation." Thesis, Massachusetts Institute of Technology, 2019. https://hdl.handle.net/1721.1/122100.
Full textThesis: Ph. D., Massachusetts Institute of Technology, Sloan School of Management, Operations Research Center, 2019
Cataloged from student-submitted PDF version of thesis.
Includes bibliographical references (pages 163-176).
With the rising popularity of ride-sharing and alternative modes of transportation, there has been a renewed interest in transit planning to improve service quality and stem declining ridership. However, it often takes months of manual planning for operators to redesign and reschedule services in response to changing needs. To this end, we provide four models of transportation planning that are based on data and driven by optimization. A key aspect is the ability to provide certificates of optimality, while being practical in generating high-quality solutions in a short amount of time. We provide approaches to combinatorial problems in transit planning that scales up to city-sized networks. In transit network design, current tractable approaches only consider edges that exist, resulting in proposals that are closely tethered to the original network. We allow new transit links to be proposed and account for commuters transferring between different services. In integrated transit scheduling, we provide a way for transit providers to synchronize the timing of services in multimodal networks while ensuring regularity in the timetables of the individual services. This is made possible by taking the characteristics of transit demand patterns into account when designing tractable formulations. We also advance the state of the art in demand models for transportation optimization. In emergency medical services, we provide data-driven formulations that outperforms their probabilistic counterparts in ensuring coverage. This is achieved by replacing independence assumptions in probabilistic models and capturing the interactions of services in overlapping regions. In transit planning, we provide a unified framework that allows us to optimize frequencies and prices jointly in transit networks for minimizing total waiting time.
by Yee Sian Ng.
Ph. D.
Ph.D. Massachusetts Institute of Technology, Sloan School of Management, Operations Research Center
Lindberg, Therese. "Modelling and Evaluation of Distributed Airflow Control in Data Centers." Thesis, Karlstads universitet, Institutionen för ingenjörsvetenskap och fysik, 2015. http://urn.kb.se/resolve?urn=urn:nbn:se:kau:diva-36479.
Full textMa, Wei (Will Wei). "Dynamic, data-driven decision-making in revenue management." Thesis, Massachusetts Institute of Technology, 2018. http://hdl.handle.net/1721.1/120224.
Full textCataloged from PDF version of thesis.
Includes bibliographical references (pages 233-241).
Motivated by applications in Revenue Management (RM), this thesis studies various problems in sequential decision-making and demand learning. In the first module, we consider a personalized RM setting, where items with limited inventories are recommended to heterogeneous customers sequentially visiting an e-commerce platform. We take the perspective of worst-case competitive ratio analysis, and aim to develop algorithms whose performance guarantees do not depend on the customer arrival process. We provide the first solution to this problem when there are both multiple items and multiple prices at which they could be sold, framing it as a general online resource allocation problem and developing a system of forecast-independent bid prices (Chapter 2). Second, we study a related assortment planning problem faced by Walmart Online Grocery, where before checkout, customers are recommended "add-on" items that are complementary to their current shopping cart (Chapter 3). Third, we derive inventory-dependent priceskimming policies for the single-leg RM problem, which extends existing competitive ratio results to non-independent demand (Chapter 4). In this module, we test our algorithms using a publicly-available data set from a major hotel chain. In the second module, we study bundling, which is the practice of selling different items together, and show how to learn and price using bundles. First, we introduce bundling as a new, alternate method for learning the price elasticities of items, which does not require any changing of prices; we validate our method on data from a large online retailer (Chapter 5). Second, we show how to sell bundles of goods profitably even when the goods have high production costs, and derive both distribution-dependent and distribution-free guarantees on the profitability (Chapter 6). In the final module, we study the Markovian multi-armed bandit problem under an undiscounted finite time horizon (Chapter 7). We improve existing approximation algorithms using LP rounding and random sampling techniques, which result in a (1/2 - eps)- approximation for the correlated stochastic knapsack problem that is tight relative to the LP. In this work, we introduce a framework for designing self-sampling algorithms, which is also used in our chronologically-later-to-appear work on add-on recommendation and single-leg RM.
by Will (Wei) Ma.
Ph. D.
Papush, Anna. "Data-driven methods for personalized product recommendation systems." Thesis, Massachusetts Institute of Technology, 2018. http://hdl.handle.net/1721.1/115655.
Full textCataloged from PDF version of thesis.
Includes bibliographical references.
The online market has expanded tremendously over the past two decades across all industries ranging from retail to travel. This trend has resulted in the growing availability of information regarding consumer preferences and purchase behavior, sparking the development of increasingly more sophisticated product recommendation systems. Thus, a competitive edge in this rapidly growing sector could be worth up to millions of dollars in revenue for an online seller. Motivated by this increasingly prevalent problem, we propose an innovative model that selects, prices and recommends a personalized bundle of products to an online consumer. This model captures the trade-off between myopic profit maximization and inventory management, while selecting relevant products from consumer preferences. We develop two classes of approximation algorithms that run efficiently in real-time and provide analytical guarantees on their performance. We present practical applications through two case studies using: (i) point-of-sale transaction data from a large U.S. e-tailer, and, (ii) ticket transaction data from a premier global airline. The results demonstrate that our approaches result in significant improvements on the order of 3-7% lifts in expected revenue over current industry practices. We then extend this model to the setting in which consumer demand is subject to uncertainty. We address this challenge using dynamic learning and then improve upon it with robust optimization. We first frame our learning model as a contextual nonlinear multi-armed bandit problem and develop an approximation algorithm to solve it in real-time. We provide analytical guarantees on the asymptotic behavior of this algorithm's regret, showing that with high probability it is on the order of O([square root of] T). Our computational studies demonstrate this algorithm's tractability across various numbers of products, consumer features, and demand functions, and illustrate how it significantly out performs benchmark strategies. Given that demand estimates inherently contain error, we next consider a robust optimization approach under row-wise demand uncertainty. We define the robust counterparts under both polynomial and ellipsoidal uncertainty sets. Computational analysis shows that robust optimization is critical in highly constrained inventory settings, however the price of robustness drastically grows as a result of pricing strategies if the level of conservatism is too high.
by Anna Papush.
Ph. D.
Sturt, Bradley Eli. "Dynamic optimization in the age of big data." Thesis, Massachusetts Institute of Technology, 2020. https://hdl.handle.net/1721.1/127292.
Full textCataloged from the official PDF of thesis.
Includes bibliographical references (pages 241-249).
This thesis revisits a fundamental class of dynamic optimization problems introduced by Dantzig (1955). These decision problems remain widely studied in many applications domains (e.g., inventory management, finance, energy planning) but require access to probability distributions that are rarely known in practice. First, we propose a new data-driven approach for addressing multi-stage stochastic linear optimization problems with unknown probability distributions. The approach consists of solving a robust optimization problem that is constructed from sample paths of the underlying stochastic process. As more sample paths are obtained, we prove that the optimal cost of the robust problem converges to that of the underlying stochastic problem. To the best of our knowledge, this is the first data-driven approach for multi-stage stochastic linear optimization problems which is asymptotically optimal when uncertainty is arbitrarily correlated across time.
Next, we develop approximation algorithms for the proposed data-driven approach by extending techniques from the field of robust optimization. In particular, we present a simple approximation algorithm, based on overlapping linear decision rules, which can be reformulated as a tractable linear optimization problem with size that scales linearly in the number of data points. For two-stage problems, we show the approximation algorithm is also asymptotically optimal, meaning that the optimal cost of the approximation algorithm converges to that of the underlying stochastic problem as the number of data points tends to infinity. Finally, we extend the proposed data-driven approach to address multi-stage stochastic linear optimization problems with side information. The approach combines predictive machine learning methods (such as K-nearest neighbors, kernel regression, and random forests) with the proposed robust optimization framework.
We prove that this machine learning-based approach is asymptotically optimal, and demonstrate the value of the proposed methodology in numerical experiments in the context of inventory management, scheduling, and finance.
by Bradley Eli Sturt.
Ph. D.
Ph.D. Massachusetts Institute of Technology, Sloan School of Management, Operations Research Center
Deselaers, Johannes. "Deep Learning Pupil Center Localization." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-287538.
Full textDetta projekt strävar efter att uppnå högpresterande objektlokalisering med djupa faltningsnätverker/Convolutional Neural Networks (CNNs) - särskilt för pupillcenter i samband med eyetracking. Tre olika nätverksarkitekturer som passar uppgiften utvecklas, utvärderas och jämförs - en baserad på regression med fullt anslutna lager, ett Fully Convolutional Network och ett Deconvolutional Network. Den bäst presterande modellen uppnår ett medelfel på endast 0.52 pixelavstånd och ett medianfel på 0.42 pixelavstånd jämfört med marken sanningsetiketten. Den 95:e percentilen ligger på 1.12 pixelfel. Detta överträffar prestandan hos nuvarande toppmoderna detekteringsalgoritmer för pupillcentrum med en storleksordning, ett resultat som kan ackrediteras både till algoritmen såväl som till dataset som överstiger datasets som används för detta ändamål i tidigare publikationer i lämplighet, kvalitet och storlek. Möjligheter till ytterligare förbättringar av beräkningskostnaden baserad på ny kompressionsforskning föreslås.
Tudoran, Radu-Marius. "High-Performance Big Data Management Across Cloud Data Centers." Electronic Thesis or Diss., Rennes, École normale supérieure, 2014. http://www.theses.fr/2014ENSR0004.
Full textThe easily accessible computing power offered by cloud infrastructures, coupled with the "Big Data" revolution, are increasing the scale and speed at which data analysis is performed. Cloud computing resources for compute and storage are spread across multiple data centers around the world. Enabling fast data transfers becomes especially important in scientific applications where moving the processing close to data is expensive or even impossible. The main objectives of this thesis are to analyze how clouds can become "Big Data - friendly", and what are the best options to provide data management services able to meet the needs of applications. In this thesis, we present our contributions to improve the performance of data management for applications running on several geographically distributed data centers. We start with aspects concerning the scale of data processing on a site, and continue with the development of MapReduce type solutions allowing the distribution of calculations between several centers. Then, we present a transfer service architecture that optimizes the cost-performance ratio of transfers. This service is operated in the context of real-time data streaming between cloud data centers. Finally, we study the viability, for a cloud provider, of the solution consisting in integrating this architecture as a service based on a flexible pricing paradigm, qualified as "Transfer-as-a-Service"
Anderson, Ross Michael. "Stochastic models and data driven simulations for healthcare operations." Thesis, Massachusetts Institute of Technology, 2014. http://hdl.handle.net/1721.1/92055.
Full textThis electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.
Cataloged from student-submitted PDF version of thesis.
Includes bibliographical references (pages 251-257).
This thesis considers problems in two areas in the healthcare operations: Kidney Paired Donation (KPD) and scheduling medical residents in hospitals. In both areas, we explore the implications of policy change through high fidelity simulations. We then build stochastic models to provide strategic insight into how policy decisions affect the operations of these healthcare systems. KPD programs enable patients with living but incompatible donors (referred to as patient-donor pairs) to exchange kidneys with other such pairs in a centrally organized clearing house. Exchanges involving two or more pairs are performed by arranging the pairs in a cycle, where the donor from each pair gives to the patient from the next pair. Alternatively, a so called altruistic donor can be used to initiate a chain of transplants through many pairs, ending on a patient without a willing donor. In recent years, the use of chains has become pervasive in KPD, with chains now accounting for the majority of KPD transplants performed in the United States. A major focus of our work is to understand why long chains have become the dominant method of exchange in KPD, and how to best integrate their use into exchange programs. In particular, we are interested in policies that KPD programs use to determine which exchanges to perform, which we refer to as matching policies. First, we devise a new algorithm using integer programming to maximize the number of transplants performed on a fixed pool of patients, demonstrating that matching policies which must solve this problem are implementable. Second, we evaluate the long run implications of various matching policies, both through high fidelity simulations and analytic models. Most importantly, we find that: (1) using long chains results in more transplants and reduced waiting time, and (2) the policy of maximizing the number of transplants performed each day is as good as any batching policy. Our theoretical results are based on introducing a novel model of a dynamically evolving random graph. The analysis of this model uses classical techniques from Erdos-Renyi random graph theory as well as tools from queueing theory including Lyapunov functions and Little's Law. In the second half of this thesis, we consider the problem of how hospitals should design schedules for their medical residents. These schedules must have capacity to treat all incoming patients, provide quality care, and comply with regulations restricting shift lengths. In 2011, the Accreditation Council for Graduate Medical Education (ACGME) instituted a new set of regulations on duty hours that restrict shift lengths for medical residents. We consider two operational questions for hospitals in light of these new regulations: will there be sufficient staff to admit all incoming patients, and how will the continuity of patient care be affected, particularly in a first day of a patients hospital stay, when such continuity is critical? To address these questions, we built a discrete event simulation tool using historical data from a major academic hospital, and compared several policies relying on both long and short shifts. The simulation tool was used to inform staffing level decisions at the hospital, which was transitioning away from long shifts. Use of the tool led to the following strategic insights. We found that schedules based on shorter more frequent shifts actually led to a larger admitting capacity. At the same time, such schedules generally reduce the continuity of care by most metrics when the departments operate at normal loads. However, in departments which operate at the critical capacity regime, we found that even the continuity of care improved in some metrics for schedules based on shorter shifts, due to a reduction in the use of overtime doctors. We develop an analytically tractable queueing model to capture these insights. The analysis of this model requires analyzing the steady-state behavior of the fluid limit of a queueing system, and proving a so called "interchange of limits" result.
by Ross Michael Anderson.
Ph. D.
Harris, William Ray. "Anomaly detection methods for unmanned underwater vehicle performance data." Thesis, Massachusetts Institute of Technology, 2015. http://hdl.handle.net/1721.1/98718.
Full textCataloged from PDF version of thesis.
Includes bibliographical references (pages 101-102).
This thesis considers the problem of detecting anomalies in performance data for unmanned underwater vehicles(UUVs). UUVs collect a tremendous amount of data, which operators are required to analyze between missions to determine if vehicle systems are functioning properly. Operators are typically under heavy time constraints when performing this data analysis. The goal of this research is to provide operators with a post-mission data analysis tool that automatically identifies anomalous features of performance data. Such anomalies are of interest because they are often the result of an abnormal condition that may prevent the vehicle from performing its programmed mission. In this thesis, we consider existing one-class classification anomaly detection techniques since labeled training data from the anomalous class is not readily available. Specifically, we focus on two anomaly detection techniques: (1) Kernel Density Estimation (KDE) Anomaly Detection and (2) Local Outlier Factor. Results are presented for selected UUV systems and data features, and initial findings provide insight into the effectiveness of these algorithms. Lastly, we explore ways to extend our KDE anomaly detection algorithm for various tasks, such as finding anomalies in discrete data and identifying anomalous trends in time-series data.
by William Ray Harris.
S.M.
Uichanco, Joline Ann Villaranda. "Data-driven optimization and analytics for operations management applications." Thesis, Massachusetts Institute of Technology, 2013. http://hdl.handle.net/1721.1/85695.
Full textThis electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.
Cataloged from student-submitted PDF version of thesis.
Includes bibliographical references (pages 163-166).
In this thesis, we study data-driven decision making in operation management contexts, with a focus on both theoretical and practical aspects. The first part of the thesis analyzes the well-known newsvendor model but under the assumption that, even though demand is stochastic, its probability distribution is not part of the input. Instead, the only information available is a set of independent samples drawn from the demand distribution. We analyze the well-known sample average approximation (SAA) approach, and obtain new tight analytical bounds on the accuracy of the SAA solution. Unlike previous work, these bounds match the empirical performance of SAA observed in extensive computational experiments. Our analysis reveals that a distribution's weighted mean spread (WMS) impacts SAA accuracy. Furthermore, we are able to derive distribution parametric free bound on SAA accuracy for log-concave distributions through an innovative optimization-based analysis which minimizes WMS over the distribution family. In the second part of the thesis, we use spread information to introduce new families of demand distributions under the minimax regret framework. We propose order policies that require only a distribution's mean and spread information. These policies have several attractive properties. First, they take the form of simple closed-form expressions. Second, we can quantify an upper bound on the resulting regret. Third, under an environment of high profit margins, they are provably near-optimal under mild technical assumptions on the failure rate of the demand distribution. And finally, the information that they require is easy to estimate with data. We show in extensive numerical simulations that when profit margins are high, even if the information in our policy is estimated from (sometimes few) samples, they often manage to capture at least 99% of the optimal expected profit. The third part of the thesis describes both applied and analytical work in collaboration with a large multi-state gas utility. We address a major operational resource allocation problem in which some of the jobs are scheduled and known in advance, and some are unpredictable and have to be addressed as they appear. We employ a novel decomposition approach that solves the problem in two phases. The first is a job scheduling phase, where regular jobs are scheduled over a time horizon. The second is a crew assignment phase, which assigns jobs to maintenance crews under a stochastic number of future emergencies. We propose heuristics for both phases using linear programming relaxation and list scheduling. Using our models, we develop a decision support tool for the utility which is currently being piloted in one of the company's sites. Based on the utility's data, we project that the tool will result in 55% reduction in overtime hours.
by Joline Ann Villaranda Uichanco.
Ph. D.
Menjoge, Rajiv (Rajiv Shailendra). "New procedures for visualizing data and diagnosing regression models." Thesis, Massachusetts Institute of Technology, 2010. http://hdl.handle.net/1721.1/61190.
Full textCataloged from PDF version of thesis.
Includes bibliographical references (p. 97-103).
This thesis presents new methods for exploring data using visualization techniques. The first part of the thesis develops a procedure for visualizing the sampling variability of a plot. The motivation behind this development is that reporting a single plot of a sample of data without a description of its sampling variability can be uninformative and misleading in the same way that reporting a sample mean without a confidence interval can be. Next, the thesis develops a method for simplifying large scatter plot matrices, using similar techniques as the above procedure. The second part of the thesis introduces a new diagnostic method for regression called backward selection search. Backward selection search identifies a relevant feature set and a set of influential observations with good accuracy, given the difficulty of the problem, and additionally provides a description, in the form of a set of plots, of how the regression inferences would be affected with other model choices, which are close to optimal. This description is useful, because an observation, that one analyst identifies as an outlier, could be identified as the most important observation in the data set by another analyst. The key idea behind backward selection search has implications for methodology improvements beyond the realm of visualization. This is described following the presentation of backward selection search. Real and simulated examples, provided throughout the thesis, demonstrate that the methods developed in the first part of the thesis will improve the effectiveness and validity of data visualization, while the methods developed in the second half of the thesis will improve analysts' abilities to select robust models.
by Rajiv Menjoge.
Ph.D.