Dissertations / Theses on the topic 'Warehouse performance'
Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles
Consult the top 50 dissertations / theses for your research on the topic 'Warehouse performance.'
Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.
You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.
Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.
Schefczyk, Michael. "Warehouse performance analysis: techniques and applications." Thesis, Georgia Institute of Technology, 1990. http://hdl.handle.net/1853/25125.
Full textMathews, Reena. "Simple Strategies to Improve Data Warehouse Performance." NCSU, 2004. http://www.lib.ncsu.edu/theses/available/etd-05172004-213304/.
Full textFrancis, Jimmy E. "A data warehouse architecture for DoD healthcare performance measurements." Thesis, Monterey, Calif. : Springfield, Va. : Naval Postgraduate School ; Available from National Technical Information Service, 1999. http://handle.dtic.mil/100.2/ADA369640.
Full text"September 1999". Thesis advisor(s): Daniel R. Dolk, Gregory Hildebrandt. Includes bibliographical references (p. 119). Also available online.
Agrawal, Vikas R. "Data warehouse operational design : view selection and performance simulation." Toledo, Ohio : University of Toledo, 2005. http://rave.ohiolink.edu/etdc/view?acc%5Fnum=toledo1104773641.
Full textTypescript. "Submitted as partial fulfillment of the requirements for the Doctor of Philosophy degree in Manufacturing Management and Engineering. " "A dissertation entitled"--at head of title. Title from title page of PDF document. Bibliography: p. 113-118.
CARVALHO, ELAINE ALVES DE. "HEURISTICS FOR DATA WAREHOUSE REQUIREMENTS ELICITATION USING PERFORMANCE INDICATORS." PONTIFÍCIA UNIVERSIDADE CATÓLICA DO RIO DE JANEIRO, 2009. http://www.maxwell.vrac.puc-rio.br/Busca_etds.php?strSecao=resultado&nrSeq=15136@1.
Full textAs organizações se deparam com uma necessidade cada vez maior de mudar e evoluir, mas para isso elas precisam tomar as decisões corretas. Para essa tomada de decisão, as empresas estão adotando os recursos disponibilizados pela Tecnologia da Informação (TI) como parte fundamental para apoiar suas decisões. Um componente de TI essencial para aprimorar o processo de tomada de decisão é o data warehouse. Para cumprir bem o seu papel, o data warehouse deve ser bem definido. Embora existam diversas abordagens que buscam melhorar a tarefa de identificação dos requisitos para data warehouses, poucas exploram as contribuições da Engenharia de Processos de Negócios (EPN) no processo de definição dos requisitos. Esta dissertação estuda um meio de aprimorar a tarefa de elicitação de requisitos para data warehouses, utilizando indicadores de desempenho aliados aos processos de negócio. Para isso é sugerido um conjunto de heurísticas que visam, a partir dos indicadores de desempenho, orientar a descoberta dos requisitos de data warehouse. A aplicação das heurísticas propostas é feita em um caso, facilitando a compreensão da abordagem sugerida nesse trabalho.
Organizations need to change and evolve, but for that it is necessary to make the right decisions. For this decision, companies are using Information Technology (IT) as a fundamental part to support their decisions. An essential IT component to improve the process of decision making is the data warehouse. In order to fulfill its role well, the data warehouse must be well defined. There are various approaches that try to improve the task of identifying data warehouses requirements, but few explore the contributions of Business Processes Engineering (BPE) in the process of requirements gathering. This dissertation studies how to improve data warehouses requirements elicitation using performance indicators allied to business processes. For this it is suggested a set of heuristics designed to guide performance measures identification and data warehouse requirements discovery. The heuristics are applied in a case to facilitate understanding of suggested approach in this work.
AGRAWAL, VIKAS R. "Data Warehouse Operational Design: View Selection and Performance Simulation." University of Toledo / OhioLINK, 2005. http://rave.ohiolink.edu/etdc/view?acc_num=toledo1104773641.
Full textHedler, Francielly. "Global warehouse management : a methodology to determine an integrated performance measurement." Thesis, Université Grenoble Alpes (ComUE), 2015. http://www.theses.fr/2015GREAI082/document.
Full textThe growing warehouse operation complexity has led companies to adopt a large number of indicators, making its management increasingly difficult. It may be hard for managers to evaluate the overall performance of the logistic systems, including the warehouse, because the assessment of the interdependence of indicators with distinct objectives is rather complex (e.g. the level of a cost indicator shall decrease, whereas a quality indicator level shall be maximized). This fact could lead to biases in the analysis executed by the manager in the evaluation of the global warehouse performance.In this context, this thesis develops a methodology to achieve an integrated warehouse performance measurement. It encompasses four main steps: (i) the development of an analytical model of performance indicators usually used for warehouse management; (ii) the definition of indicator relationships analytically and statistically; (iii) the aggregation of these indicators in an integrated model; (iv) the proposition of a scale to assess the evolution of the warehouse performance over time according to the integrated model results.The methodology is applied to a theoretical warehouse to demonstrate its application. The indicators used to evaluate the warehouse come from the literature and the database is generated to perform the mathematical tools. The Jacobian matrix is used to define indicator relationships analytically, and the principal component analysis to achieve indicator's aggregation statistically. The final aggregated model comprehends 33 indicators assigned in six different components, which compose the global performance indicator equation by means of component's weighted average. A scale is developed for the global performance indicator using an optimization approach to obtain its upper and lower boundaries.The usability of the integrated model is tested for two different warehouse performance situations and interesting insights about the final warehouse performance are discussed. Therefore, we conclude that the proposed methodology reaches its objective providing a decision support tool for managers so that they can be more efficient in the global warehouse performance management without neglecting important information from indicators
A crescente complexidade das operações em armazéns tem levado as empresasa adotarem um grande número de indicadores de desempenho, o que tem dificultadocada vez mais o seu gerenciamento. Além do volume de informações, os indicadores normalmentepossuem interdependências e objetivos distintos, as vezes até opostos (por exemplo,o indicador de custo deve ser reduzido enquanto o indicador de qualidade deve sempre seraumentado), tornando complexo para o gestor avaliar o desempenho logístico global dosistema, incluindo o armazém.Dentro deste contexto, esta tese desenvolve uma metodologia para obter uma medidaagregada do desempenho global do armazém. A metodologia é composta de quatro etapasprincipais: (i) o desenvolvimento de um modelo analítico dos indicadores de desempenhojá utilizados para o gerenciamento do armazém; (ii) a definição das relações entre os indicadoresde forma analítica e estatística; (iii) a agregação destes indicadores em um modelointegrado; (iv) a proposição de uma escala para avaliar a evolução do desempenho globaldo armazém ao longo do tempo, de acordo com o resultado do modelo integrado.A metodologia é aplicada em um armazém teórico para demonstrar sua aplicabilidade.Os indicadores utilizados para avaliar o desempenho do armazém são provenientesda literatura, e uma base de dados é gerada para permitir a utilização de ferramentasmatemáticas. A matriz jacobiana é utilizada para definir de forma analítica as relaçõesentre os indicadores, e uma análise de componentes principais é realizada para agregaros indicadores de forma estatística. O modelo agregado final compreende 33 indicadores,divididos em seis componentes diferentes, e a equação do indicador de desempenho globalé obtido a partir da média ponderada dos seis componentes. Uma escala é desenvolvidapara o indicador de desempenho global utilizando um modelo de otimização para obter oslimites superior e inferior da escala.Depois de testes com o modelo integrado, pôde-se concluir que a metodologia propostaatingiu seu objetivo ao fornecer uma ferramenta de ajuda à decisão para os gestores, permitindoque eles sejam mais eficazes no gerenciamento global do armazém sem negligenciarinformações importantes que são fornecidas pelos indicadores
Tashakor, Ghazal. "Delivering Business Intelligence Performance by Data Warehouse and ETL Tuning." Thesis, Mittuniversitetet, Institutionen för informationsteknologi och medier, 2013. http://urn.kb.se/resolve?urn=urn:nbn:se:miun:diva-20062.
Full textAbid, Abbas Syhood. "Job satisfaction and job performance of warehouse employees in Iraqi industry." Thesis, University of Glasgow, 1990. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.309615.
Full textDaraei, Maryam. "Warehouse Redesign Process: A case study at Enics Sweden AB." Thesis, Mälardalens högskola, Akademin för innovation, design och teknik, 2013. http://urn.kb.se/resolve?urn=urn:nbn:se:mdh:diva-19508.
Full textBreslas, Grigore. "Riqualificazione del Data Warehouse UniBO: un dashboard per il Piano Strategico d'Ateneo." Master's thesis, Alma Mater Studiorum - Università di Bologna, 2020. http://amslaurea.unibo.it/20756/.
Full textMakaci, Mourad. "La gestion des entrepôts mutualisés et leurs impacts dans les chaînes logistiques." Thesis, Université Grenoble Alpes (ComUE), 2018. http://www.theses.fr/2018GREAG002.
Full textThe warehouse pooling is one of the collaborative logistics’ research field, recently introduced by various actors to improve the performance of supply chains. The mass-market retailing requirements as well as the increase logistic costs, relative to the activities of storing and transport, oblige companies to review more in detail their distribution strategies. Consisting of a post-positivist paradigm, this thesis answers two main research questions: What are the characteristics of pooled warehouses? What is the impact of the presence of a pooled warehouse in the supply chain? We have developed an approach combining a qualitative exploratory method on seven cases located in France, supplemented by a quantitative method based on flow simulation. The qualitative study allowed to identify the main specificities of pooled warehouses and proposed a typology of pooled warehouses based on two dimensions: collaboration degree and dynamics degree. Our study also allowed identifying new performance indicators, the key success factors, the main sources of uncertainty and the risks related to pooled warehouse implementation. The impact of a pooled warehouse on the supply chain performance was analyzed more specifically in one of the seven previous cases, comparing four flow configurations with two replenishment policies, for which we proposed hybridization, and two demand profiles. The simulation results show that the pooled warehouse takes all its interest if it is associated with transport pooling. Furthermore, the hybrid procurement policy seems to be more advantageous than the classical policies of traditional reorder point and calendar replenishment. Finally, this thesis shows that the context of the shared warehouse offers interesting research perspectives on the link between practice and research, the creation of knowledge in operations management, and the impact of pooling on the performance of logistics chains
SILVA, LIVIA FONSECA DE MEDEIROS. "THE IMPACT OF WAREHOUSE MANAGEMENT SYSTEM (WMS) ON THE LOGISTICS PERFORMANCE INDICATORS: IMPLEMENTATION IN PHARMACEUTICAL DISTRIBUTION CENTER." PONTIFÍCIA UNIVERSIDADE CATÓLICA DO RIO DE JANEIRO, 2012. http://www.maxwell.vrac.puc-rio.br/Busca_etds.php?strSecao=resultado&nrSeq=21339@1.
Full textCOORDENAÇÃO DE APERFEIÇOAMENTO DO PESSOAL DE ENSINO SUPERIOR
PROGRAMA DE SUPORTE À PÓS-GRADUAÇÃO DE INSTS. DE ENSINO
A competitividade econômica tem levado as empresas a buscarem novas soluções através do uso de Tecnologia de Informação (TI) para alcançar vantagens competitivas frente a seus concorrentes. O uso da TI permite às empresas terem maior controle na gestão de seus processos e, com isso, obter ganhos de produtividade, redução de custos operacionais e satisfação dos clientes. O presente estudo tem como objetivo apresentar e analisar o processo de implementação de um Sistema de Gerenciamento de Armazéns – WMS em atividades logísticas em um centro de distribuição (CD). Será feito um estudo de caso, que consiste em uma avaliação pós-implementação do sistema em um centro de distribuição de medicamentos através de aplicação de um Sistema de Medição de Desempenho Logístico, a fim de identificar as melhorias e ganhos nas atividades de um armazém. Dentro desse contexto, são identificados também os riscos e as resistências na aplicação do WMS. A metodologia utilizada para esta dissertação é de natureza qualitativa, que consiste em revisão bibliográfica, e de caráter exploratório, mediante pesquisas de campo, visitas na empresa do estudo e entrevistas. O resultado deste estudo é uma análise da melhoria de desempenho após a aplicação do WMS, que constata vantagens da TI para a logística de distribuição como maior nível de acuracidade e redução de tempo.
The economic competitiveness has led companies to seek new solutions through the use of Information Technology (IT) to achieve competitive advantages over their competitors. The use of IT enables companies to have greater control in managing their processes and thereby achieve productivity gains, reduce operating costs and improve customer satisfaction. The present study aims to present and analyze the process of implementing a Warehouse Management System - WMS on logistics activities in a distribution center (DC). There will be a case study, consisting of a post-implementation review of the system in a distribution center for drugs through implementation of a Performance Measurement System Logistics in order to identify improvements and gains in the activities of a warehouse. Within this context, are also identified risks and resistance in the implementation of WMS. The methodology used for this thesis is qualitative in nature, consisting of a literature review, and exploratory character, through field research, visits to the company in the are study and interview. The result of this study is an analysis of performance improvement after applying WMS, noting advantages of IT for logistics distribution such as a higher level of accuracy and time reduction.
Ali, Raman. "Root Cause Analysis for In-Transit Time Performance : Time Series Analysis for Inbound Quantity Received into Warehouse." Thesis, Umeå universitet, Institutionen för matematik och matematisk statistik, 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:umu:diva-184062.
Full textGUPTA, ASHIMA. "PERFORMANCE COMPARISON OF PROPERTY MAP INDEXING AND BITMAP INDEXING FOR DATA WAREHOUSING." University of Cincinnati / OhioLINK, 2002. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1037976189.
Full textPerkins, Charles F. "Investigating the Perceived Influence of Data Warehousing and Business Intelligence Maturity on Organizational Performance: A Mixed Methods Study." NSUWorks, 2017. http://nsuworks.nova.edu/gscis_etd/1023.
Full textD'Angela, Martina. "Layout design e performance per sistemi di stoccaggio non convenzionali: il caso “Diamond”." Master's thesis, Alma Mater Studiorum - Università di Bologna, 2021.
Find full textRicca, Rosellini Fabio. "Calcolo di indicatori di performance aziendale in contesto Big Data." Master's thesis, Alma Mater Studiorum - Università di Bologna, 2018. http://amslaurea.unibo.it/15950/.
Full textCyrus, Sam. "Fast Computation on Processing Data Warehousing Queries on GPU Devices." Scholar Commons, 2016. http://scholarcommons.usf.edu/etd/6214.
Full textSalin, Gustafsson Martin, and Carl Frost. "Operational management through key performance Indicators : A case study performed at the warehouses at Fresenius Kabi." Thesis, Uppsala universitet, Industriell teknik, 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-357294.
Full textRonovský, Jan. "Tvorba metodiky pro výkonové srovnání databázových systémů datových skladů." Master's thesis, Vysoká škola ekonomická v Praze, 2017. http://www.nusl.cz/ntk/nusl-359103.
Full textYang, Aaron. "Mandate of Heaven: An Analysis of China's Government Disaster Response and CCP Performance Legitimacy." Scholarship @ Claremont, 2017. http://scholarship.claremont.edu/cmc_theses/1614.
Full textTraikova, Aneta. "A Systematic Approach for Tool-Supported Performance Management of Engineering Education." Thesis, Université d'Ottawa / University of Ottawa, 2019. http://hdl.handle.net/10393/39879.
Full textCorfiati, Matteo. "Progettazione di un sistema proattivo per l'identificazione di aggregati." Master's thesis, Alma Mater Studiorum - Università di Bologna, 2015. http://amslaurea.unibo.it/8478/.
Full textRahman, Shahbaaz. "The Impact of Adopting “Business Intelligence (BI)” in Organizations." Thesis, Uppsala universitet, Informationssystem, 2011. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-153204.
Full textGavioli, Giulio. "Sales & Operations Planning in Bonfiglioli Riduttori." Master's thesis, Alma Mater Studiorum - Università di Bologna, 2018.
Find full textJanata, Pavel. "Možnosti CPM řešení v bankovnictví." Master's thesis, Vysoká škola ekonomická v Praze, 2009. http://www.nusl.cz/ntk/nusl-15494.
Full textRanarifidy, Harison. "La gestion de la diversité culturelle des équipes dans les entrepôts logistiques : lien entre diversité culturelle et performance." Thesis, Aix-Marseille, 2019. http://www.theses.fr/2019AIXM0062.
Full textOur idea is to elaborate the conceptual model of research of the link diversity / performance resulting from the literature, thus concerning the top management. This model will then be transposed via the analysis, according to the GIOIA method, of the verbatim of our interviews with the warehouse managers to arrive at a conceptual model concerning the low-skilled use of the warehouse.To ensure that the cultural diversity of the warehouse teams is a source of performance, we recommend a global approach by implementing, within the warehouse's parent company, managerial systems composed of anxious cultural diversity policy. to inscribe change over time, cohesive management attentive to cultural dimensions and leadership attentive to cultural diversity.Within the warehouse will be implemented HRM practices specific to the logistics warehouse that strengthen organizational involvement and that take into account the aforementioned diversity management levers
Pavlovic, Anica, and Sara Johnsson. "Kalkylmodellering : En studie om hur en kalkylmodell kan konstrueras för att göra ett lagerkostnadsindex användbart i företag med geografiskt spridda lager." Thesis, Linnéuniversitetet, Institutionen för ekonomistyrning och logistik (ELO), 2015. http://urn.kb.se/resolve?urn=urn:nbn:se:lnu:diva-45211.
Full textBackground: Globalization has created a growing need for companies to remain competitive. It’s becoming more important to manage costs effectively in organizations. Based on a case company problems regarding measurement and comparability of management in warehouse costs between geographically dispersed countries has been identified. Currently there’s no method available for comparison of cost-efficiency of warehouse costs in different countries. The country-specific characteristics create differences. In order to achieve comparability adapted financial control is needed. Purpose: The purpose is to develop a model that enables a homogeneous measurement for management of warehouse costs in different countries. With the intention to make possible comparability between geographically dispersed warehouses. The case study will develop a complement to the existing decision-making. With the aim to support the organization’s decision-making process for warehouse optimization. Method: A motivation for methodological choices made during the study will be presented here. We have first evaluated the existing instrument and then implemented the model in relation to criteria for content- and process characteristics, in order to achieve the aims of the study. Material for the study was collected through the multinational company and by semi-structured interviews with three employees from the company. The section justifies the choice of theory and empirical content used. In the analysis a justification for whether the development of the model leads to the study’s conclusion. Conclusion: Organizations have previously used performance measurements for their financial control of warehouse costs. The increased globalization has complicated the comparability due to country-specific variables that affect costs. Warehouse cost indexes have been developed through a model that takes into account a standardized basket of goods that are relatable to warehouse costs. The capital structure becomes neutralized and internal benchmarking enables such that all warehouse costs can be compared in order to how cost-efficient each and every are.
Custodio, Flavio Augusto. "Uso do data mining no estabelecimento de relacionamentos entre medidas de desempenho." Universidade Federal de São Carlos, 2004. https://repositorio.ufscar.br/handle/ufscar/3763.
Full textUniversidade Federal de Sao Carlos
This work aims to propose a method to analyze the relationships between performance measures in a Performance Measurement System using historical performance data storaged in a datawarehouse or operational data store. There is a problem in the performance measurement area that it doesn t have methods to create relationships models between performance measures. The present methods that we have in academic researches don t help to build the relationships concerning historical performance data. Therefore, there is a trend to build the relationship between performance measures to reflect the desirable future, but it is also true that we have to learn about the past actions. Nowadays, with the increasing complexity in the organizations environment it is very difficulty to handle historical data about performance to identify relationship patterns without using concepts, techniques and tools of the Information Technology (IT) field. The variables contained in the performance measurement models are increasing continually so it is important to understand the complex net of relationships between performance measures in an organization. The stakeholders in the organization see the relationships between performance measures as trivial, but this doesn t help because the relationships are partial and subjective and the stakeholders that articulate the variables in most of the cases are accountable by the performance. It s expected that decision makers participate and share their models of relationships between performance measures and that it be the most comprehensive as possible. This work is important because it proposes to use the data mining philosophy to help building a method to understand relationship between performance measures with performance historical data. Hence, it will be possible to define and communicate the relationships between performance measures to the users of the organization and increase the use of performance measurement models. The proposed method presents a process to build and find relationships between performance measures data using data mining techniques. The IDEF0 procedure was used to present our approach.
O objetivo deste trabalho é propor um método para o estabelecimento dos relacionamentos entre as medidas de desempenho de um sistema de medição de desempenho a partir de dados históricos sobre desempenho armazenados em um banco de dados, utilizando a abordagem data mining. Um problema no campo da medição de desempenho é a falta de métodos de criação de modelos de relacionamentos entre as medidas de desempenho. Os existentes, encontrados na literatura, não tratam de como construir o relacionamento a partir de dados históricos de desempenho. Além disso, existe uma tendência de estabelecer o relacionamento esperado de forma que a medição de desempenho reflita o futuro desejado. Entretanto, é de grande valia aprender por intermédio daquilo que já foi feito, ou seja, pelas ações passadas. Com o aumento da complexidade das organizações, fica um tanto quanto difícil manipular dados históricos sobre desempenho para a identificação de padrões de relacionamento sem lançar mão de conceitos, técnicas e ferramentas da tecnologia de informação. Em face de o número de variáveis envolvidas ser cada vez maior, é importante a busca do entendimento da complexa teia de relacionamento existente entre as medidas de desempenho numa organização. Este relacionamento é visto pelas pessoas nas organizações como algo corriqueiro. Entretanto, o que pode ser improdutivo é que esses relacionamentos são parciais e pessoais, visando a articular as variáveis por cujo desempenho as pessoas, na maioria dos casos, tinham responsabilidade. O ideal é que a maioria dos tomadores de decisão compartilhem do mesmo modelo de relacionamento entre as medidas de desempenho e que ele fosse a mais abrangente possível. Portanto, a relevância deste trabalho é procurar desenvolver uma forma de aplicação da abordagem data mining a fim de auxiliar na construção de um método para o estabelecimento dos relacionamentos entre as medidas de desempenho com base em dados de desempenho históricos. Assim, será possível formalizar e disseminar o relacionamento entre as medidas de desempenho para uma gama maior de pessoas numa organização, podendo melhorar o uso da medição de desempenho. O método proposto procura abranger todo o processo de construção do relacionamento com aplicação de data mining e não somente a aplicação de uma ou outra técnica especifica dele. A apresentação da proposta é feita utilizando-se a prática IDEF0.
Lepori, Elvia. "Conception d'un système de mesure de la performance pour la réorganisation des activités d'entrepôt : quelle cohérence avec le système de contrôle de gestion ?" Thesis, Strasbourg, 2016. http://www.theses.fr/2016STRAB003.
Full textThird-party logistics (3PL) seek performance by reorganizing regularly their warehouse operations. Few researchers study performance measurement systems (PMS) dedicated to 3PL. Researchers in warehousing design are used to study all the different operations one by one while these operations are linked together. As far as we know the literature does not propose any SMP for warehouse operations reorganization. SMP design leads to analyze the consequences for management control system, studied through Simons’ levers of control.An Intervention-research is conducted in a french 3PL : FM Logistic. Our contribution is the design of a performance measurement system in the form of problem graph which linking both knowledge advocated by a French 3PL and quoted in the literature. This SMP has been designed using a semantic and a syntax inspired by TRIZ problem graph. SMP design enables to analyze interactivity development. Results show the development of diagnostics systems towards interactivity
Scholz, Martin. "Řízení podnikové výkonnosti a její implementace v rámci personálních informačních systémů." Master's thesis, Vysoké učení technické v Brně. Fakulta podnikatelská, 2014. http://www.nusl.cz/ntk/nusl-224333.
Full textPark, Jae Young. "Performance measures for carousel storage/retrieval system." Diss., Georgia Institute of Technology, 1992. http://hdl.handle.net/1853/24528.
Full textBozer, Yavuz Ahmet. "Optimizing throughput performance in designing order picking systems." Diss., Georgia Institute of Technology, 1985. http://hdl.handle.net/1853/25588.
Full textColace, Alessandro. "Progettazione e Sviluppo di un Sistema di Proactive Alerting all'interno di una piattaforma di Big Data Analytics." Master's thesis, Alma Mater Studiorum - Università di Bologna, 2016.
Find full textScharfstein, Daniel Oscar. "Analytical performance measures for the miniload automated storage/retrieval system." Thesis, Georgia Institute of Technology, 1990. http://hdl.handle.net/1853/25192.
Full textXu, Xia. "Impact of Integrating Zone Bypass Conveyor on the Performance of a Pick-To-Light Order Picking System." Thesis, North Dakota State University, 2012. http://hdl.handle.net/10365/19304.
Full textPudleiner, David Burl. "Using uncertainty and sensitivity analysis to inform the design of net-zero energy vaccine warehouses." Thesis, Georgia Institute of Technology, 2014. http://hdl.handle.net/1853/52232.
Full textPavlová, Petra. "Měření výkonnosti podniku." Master's thesis, Vysoká škola ekonomická v Praze, 2012. http://www.nusl.cz/ntk/nusl-165086.
Full textHabibulla, Murtuza. "Analyzing the performance of an order accumulation and sortation system using simulation a design of experiments approach." Ohio : Ohio University, 2001. http://www.ohiolink.edu/etd/view.cgi?ohiou1173895842.
Full textMatusz, Karen L. "Implementing a Grand Strategy system—by what method: a case/field study of National Grocers' Peterborough distribution warehouse's Grand Strategy System effort." Thesis, Virginia Tech, 1995. http://hdl.handle.net/10919/40631.
Full textBoussahoua, Mohamed. "Optimisation de performances dans les entrepôts de données distribués NoSQL en colonnes." Thesis, Lyon, 2020. http://www.theses.fr/2020LYSE2007.
Full textThe work presented in this thesis aims at proposing approaches to build data warehouses (DWs) by using the columnar NoSQL model. The use of NoSQL models is motivated by the advent of big data and the inability of the relational model, usually used to implement DW, to allow data scalability. Indeed, the NoSQL models are suitable for storing and managing massive data. They aredesigned to build databases whose storage model is the "key/value". Other models, then, appeared to account for the variability of the data: column oriented, document oriented and graph oriented. We have used the column NoSQL oriented model for building massive DWs because it is more suitable for decisional queries that are defined by a set of columns (measures and dimensions) from warehouse. Column family NoSQL databases offer storage techniques that are well adapted to DWs. Several scenarios are possible to develop DWs on these databases. We present in this thesis new solutions for logical and physical modeling of columnar NoSQL data warehouses. We have proposed a logic model called NLM (Naive Logical Model) to represent a NoSQL oriented columns DW and enable a better management by columnar NoSQL DBMS. We have proposed a new method to build a distributed DW using a column family NoSQL database. Our method is based on a strategy of grouping attributes from fact tables and dimensions, as families´ columns. In this purpose, we used two algorithms, the first one is a meta-heuristic algorithm, in this case the Particle Swarm Optimization : PSO, and the second one is the k-means algorithm. Furthermore, we have proposed a new method to build an efficient distributed DW inside column family NoSQL DBMSs. Our method based on the association rules method that allows to obtain groups of frequently used attributes in the workload. Hence, the partition keys RowKey, necessary to distribute data onto the different cluster nodes, are composed of those attributes groups.To validate our contributions, we have developed a software tool called RDW2CNoSQ (Relational Data Warehouse to Columnar NoSQL) to build a distributed data warehouse using a column family NoSQL Database. Also, we conducted several tests that have shown the effectiveness of different method that we proposed. Our experiments suggest that defining a good data partitioning and placement schemes during the implementation of the data warehouse with NoSQL HBase increase significantly the computation and querying performances
Arres, Billel. "Optimisation des performances dans les entrepôts distribués avec Mapreduce : traitement des problèmes de partionnement et de distribution des données." Thesis, Lyon, 2016. http://www.theses.fr/2016LYSE2012.
Full textIn this manuscript, we addressed the problems of data partitioning and distribution for large scale data warehouses distributed with MapReduce. First, we address the problem of data distribution. In this case, we propose a strategy to optimize data placement on distributed systems, based on the collocation principle. The objective is to optimize queries performances through the definition of an intentional data distribution schema of data to reduce the amount of data transferred between nodes during treatments, specifically during MapReduce’s shuffling phase. Secondly, we propose a new approach to improve data partitioning and placement in distributed file systems, especially Hadoop-based systems, which is the standard implementation of the MapReduce paradigm. The aim is to overcome the default data partitioning and placement policies which does not take any relational data characteristics into account. Our proposal proceeds according to two steps. Based on queries workload, it defines an efficient partitioning schema. After that, the system defines a data distribution schema that meets the best user’s needs, and this, by collocating data blocks on the same or closest nodes. The objective in this case is to optimize queries execution and parallel processing performances, by improving data access. Our third proposal addresses the problem of the workload dynamicity, since users analytical needs evolve through time. In this case, we propose the use of multi-agents systems (MAS) as an extension of our data partitioning and placement approach. Through autonomy and self-control that characterize MAS, we developed a platform that defines automatically new distribution schemas, as new queries appends to the system, and apply a data rebalancing according to this new schema. This allows offloading the system administrator of the burden of managing load balance, besides improving queries performances by adopting careful data partitioning and placement policies. Finally, to validate our contributions we conduct a set of experiments to evaluate our different approaches proposed in this manuscript. We study the impact of an intentional data partitioning and distribution on data warehouse loading phase, the execution of analytical queries, OLAP cubes construction, as well as load balancing. We also defined a cost model that allowed us to evaluate and validate the partitioning strategy proposed in this work
Rust, Julie. "Data warehouse query performance." 2002. http://emp3.hbg.psu.edu/theses/available/etd-01142003-231325/.
Full textChen, Ming Chen, and 陳明辰. "Sprinkler Performance Design in Warehouse." Thesis, 2008. http://ndltd.ncl.edu.tw/handle/94445462536840414828.
Full text長榮大學
職業安全與衛生研究所(在職專班)
96
Warehouse fires always cause serious damage if the sprinkler systems do not provide an appropriate protection. The 80 lpm was required for standard sprinkler to install in warehouse in 1995 and the water density was updated to 10 lpm/m2 in 2004 based on national fire agency in Taiwan. These requirements of sprinkler design are not approved per test data nor simulation data. FDS is used to simulate the required water density based on the wholesale stores in Tainan in this research. 25 lpm/m2 water densities are required to control the fire of vertical wood configuration and 15.7 lpm/m2 water densities are necessary for plastic commodity fire based on FDS simulation in this research.
Chen, Po-chia, and 陳柏嘉. "Performance of Evaluation on Warehouse with Semi-Computer-Aided." Thesis, 2005. http://ndltd.ncl.edu.tw/handle/45086316031313744762.
Full text義守大學
工業工程與管理學系碩士班
93
In the vigorous society of the logistics at present, the importance of management of the storage has already been raised day by day. No matter manage in the stock, store the location to appoint, space planning and stock quality, become the project that each storage administrator must pay attention to. A good management in storage, it can improves the speed that the products were shipped, reduces the stock cost in a company, prevents the products from being caused quality to drop by external environmental during keeping, improves the customer''s satisfaction , increases the profit of the company , and improve the reputation of the company. In present small and medium-sized enterprises, they still rely on semi-automatic storage, putting goods on the shelf , picking the goods , packing and shipping the goods etc.. The computers only record the products quantity and storing the position. So, the semi-computer-aided is stock the kinds of the product and quantity from the big computer , and then use the PC to calculate the performance of the storage space. This research adopts the quantity of warehouse entry of the previously historical materials and predict the income quality each time at random normality with office software Excel of Microsoft. By simulate the quantity in storage each time, the biggest particle probing into the storage space further and can be stored is counted , indicate quality may needs storage number , better shelf location divide into layer count and shelf location space assign may most heavy particle count. Via numerical simulation , find out a best in all feasible solve. Expect the management of the storage can be an effective prediction and assessment way and improve the performance of management of the storage.
黃俊鴻. "Performance Measure of Supply Chain Model with Dedicated Warehouse." Thesis, 1999. http://ndltd.ncl.edu.tw/handle/93132069697771505079.
Full text國立臺灣科技大學
管理研究所資訊管理學程
87
Abstract Traditionally, enterprises seek to improve their profit by reducing the production costs and increasing the value added to the products. But the results are not significant enough to satisfy their goal. Therefore, there is a growing demand for supply chain management in the sense that the normally separated parts of a supply chain could share common resources and goals. In fact, many relevant researches indicated that companies could save millions of dollars by implementing those strategies. There is also a growing efforts to study for global logistics manage-ment in facing international competition, especially in the industry of electronics. We have observed that many electric companies have several important customers that occupied a very high percentage of their orders. How to satisfy the customers’ demand with regard to reducing the inventory and lead-time becomes a new challenge for most of the electronic companies. This research tries to construct an analytical model to study the supply chain problem. We focus on the management of inventory and lead-time. We propose such a model that by building a 『Hub』, a dedicated customer distribution center, the company(supplier) may respond to the customers’ JIT policy. To accomplish our goal, this study presents an analytical model that aims to evaluate the benefits to each partner in a supply chain. The performance measures such as the total expected holding and shortage costs per time period over the review cycle will be studied in this project. In addition, we also compare the average amount of inventory in each situation of our model. Finally, we would like to provide a managerial insight about such system as well as some strategic decision for application.
Pan, Chih-hsien, and 潘志賢. "A Study on the Performance of Barcode System in Warehouse." Thesis, 2012. http://ndltd.ncl.edu.tw/handle/45962912555062166229.
Full text國立高雄第一科技大學
企業管理研究所
100
A Study on the Performance of Barcode System in Warehouse Abstract Warehouse management is crucial for the enterprise to bring about successful management in production. Poor warehouse management can result in either inventory shortage or inventory build-up. In practice, if the inventory runs short, it may cause losses from stock shortage and low sales, low customer satisfaction and production difficulties. On the other hand, if the company has excessive stock, it can freeze up the company’s capital and increase the likelihood of stock damages. With the rapid development of new technology and the urgent needs in the enterprise information system, it is considered practical and versatile to adopt the barcode system in the enterprise supporting system. As a matter of fact, there is an increasing trend of companies introducing barcodes to their warehouse management system, for the fact that the barcode system can transmit in real time the accurate product information regarding supply and demand. With the fast data transmission, companies can smooth their operations, lower inventory costs and cut down idle stock waste. The study based on case studies and compared the barcode technology used in the warehouse management system. In addition, in-depth interviews were conducted to investigate the effectiveness of the warehouse management in some flexible print circuit manufacturers. The study focused on the decrease in idle raw materials, the inventory corresponding rate and the consistent supply of raw materials. Discussions and analyses were carried out on these topics. The study results showed that the barcode system indeed helped enterprises to obtain real-time and precise stock information and to track raw materials inventory, and consequently to minimize mental uncertainty for managers. The barcode system not only assisted stock management, but also made the data flow in the logistic chain more transparent. The decrease in human errors saved enterprises from the waste of human resources and cut down the costs. Key words: warehouse management system, barcode system, in-depth interview, KPI (Key Performance Indicator)
PAI, CHIA-CHIAO, and 白嘉喬. "Knowledge Management And System Design On Maintaining Data Warehouse Performance." Thesis, 2008. http://ndltd.ncl.edu.tw/handle/78303526611389080212.
Full text國立中央大學
資訊管理學系碩士在職專班
96
The data warehouse performance has always been an important and challenged task For MIS (Management Information System) department, Due to theist scale, it has always been difficult to tune data warehouse performance, especially in the enterprise level. This research aims at providing a solution based on skills of knowledge management to improve the data warehouse tuning. This research collected many verified performance tuning knowledge and expert experiences, then organize them into a suitable knowledge structure, and finally create a knowledge management system. This research divides the performance tuning knowledge into several high level subjects based on the different sources that the performance problems result from. They include, at least,“Operating System,”“Database Management System,”“Dynamic Report,”“Ad Hoc Query” and “Batch Job.” Able to organize and share the knowledge allows all users and/or system administrators to easily find the suitable knowledge or experience for their cases on hand. This also encourage them to further contribute their expertise and experiences after benefiting from ths knowledge management systems. This research also tries to quantify some of the knowledge case in order to use the raw data from real-time “Performance Monitor System” to create a real-time knowledge recommendation system for potential problems occurring in enterprise data warehouse. When some potential performance issues occur, the quantification condition associated with knowledge case is automatically matched with the data from real-time “Performance Monitor System.” This may determine problem and warn related persons to give them some suggestion action based on the knowledge in the knowledge systems. This peculiar design feature has never seen before in knowledge management system. It is a feasible solution, because about 80% jobs were duplicated in data warehouse.
Costa, João Pedro Matos da. "Massively Scalable Data Warehouses with Performance Predictability." Doctoral thesis, 2015. http://hdl.handle.net/10316/27097.
Full textData Warehouses (DW) são ferramentas fundamentais no apoio ao processo de tomada de decisão, e que lidam com grandes volumes de dados cada vez maiores, que normalmente são armazenados usando o modelo em estrela (star schema). No entanto, o resultado das pesquisas e análises deve estar disponível em tempo útil. Contudo, como a complexidade das pesquisas que são submetidas é cada vez maior, com padrões de pesquisa imprevisíveis (ad-hoc), e devido ao aumento do número de pesquisas que são submetidas e executadas simultaneamente, provoca que o tempo de execução das pesquisas seja imprevisível. Mercados concorrenciais requerem que os resultados sejam disponibilizados em tempo útil para ajudar o processo de tomada de decisão. Isto não é apenas uma questão de obter resultados rápidos, mas de garantir que os resultados estarão disponíveis antes das decisões serem tomadas. Estratégias de pré-computação de pesquisas podem ajudar na obtenção de resultados mais rápidos, no entanto a sua utilização é limitada apenas a pesquisas com padrões conhecidos (planeados). Contudo, as consultas com padrões de pesquisa imprevisíveis (ad-hoc) são executadas sem quaisquer garantias de execução de tempo. São vários os fatores que influenciam a capacidade da DW fornecer resultados às pesquisas em tempo útil, tais como a complexidade da pesquisa (seletividade, número de tabelas que necessitam ser relacionadas, os algoritmos de junção e o tamanho das tabelas), a heterogeneidade e a capacidade da infraestrutura de processamento, incluindo a velocidade de leitura de disco, e à memória disponível para efetuar a junção das tabelas. O aumento do volume de dados e do número de pesquisas que estão a ser simultaneamente executadas também influenciam a capacidade do sistema em fornecer tempos de execução previsíveis. Apesar do tempo e esforço despendido para definir infraestruturas de processamento paralelo com capacidade para lidar com o aumento do volume de dados, e melhorar o tempo de execução das pesquisas, estas não permitem garantir a disponibilização atempada dos resultados, particularmente para as pesquisas ad-hoc. O tempo de execução de pesquisas com padrões conhecidos pode ser otimizado através de um conjunto de estratégias e mecanismos auxiliares, tais como a utilização de vistas materializadas e indexes. No entanto, para consultas ad-hoc, tais mecanismos não são uma solução. A imprevisibilidade do padrão de pesquisas origina tempos de execução imprevisíveis, que podem ser incompatíveis com os requisitos de negócio. Além disso, para muitos negócios, o crescente volume de dados condiciona ainda mais a capacidade da infraestrutura de processamento de fornecer resultados em tempo útil. Como consequência, os departamentos de TI estão constantemente atualizando a infraestrutura de processamento com a espectativa de que esta seja capaz de processar atempadamente as pesquisas, mas sem nenhuma garantia de que o consiga fazer. Não existe um método concreto que permita definir os requisitos mínimos de hardware que permita a execução atempada das pesquisas. Esta dissertação propõe uma arquitetura de Data Warehouse escalável com capacidade de lidar com grandes volumes de dados e de fornecer resultados em tempo útil, mesmo quando um grande número de pesquisas estão a ser simultaneamente executadas. A capacidade de fornecer resultados em tempo útil não é apenas uma questão de desempenho, mas uma questão de ser capaz de retornar atempadamente os resultados às pesquisas, quando esperado, de acordo com a natureza da análise e das decisões do negócio. O conceito de execução atempada (obtenção de resultados em tempo útil) é introduzido, e são propostos mecanismos que permitem fornecer garantias de execução atempada, sem no entanto descurar os requisitos de previsibilidade do tempo de execução das pesquisas e de latência mínima (frescura dos dados - freshness). A complexidade da execução de uma pesquisa é influenciada por diversos fatores, tais como a seletividade da pesquisa, o tamanho das tabelas, o número de junções e os algoritmos de junção. O volume de dados e memória disponível para junções, influenciam tanto a ordem de junção bem como o algoritmo de junção utilizado, resultando em custos de execução imprevisíveis. A necessidade de juntar as tabelas de dimensão com a tabela de factos advém do modelo em estrela (star-schema). O volume de dados é outro fator de imprevisibilidade, não sendo possível determinar com precisão o impacto do aumento do volume de dados no tempo de execução das pesquisas. Para lidar com estes fatores de imprevisibilidade relacionados com a junção de tabelas, propusemos o modelo de dados desnormalizado, chamado ONE. Neste modelo, os dados da tabela de factos, assim como os correspondentes dados das tabelas de dimensão, são fisicamente guardados numa única tabela desnormalizada, contendo todos os atributos das tabelas. O modelo de dados ONE requer mais espaço para guardar os dados, no entanto o modelo de processamento é mais simples e com tempos de execução previsíveis. Com o modelo de dados ONE, a tabela desnormalizada é particionada em fragmentos de dados mais pequenos e distribuídos pelos nós da infraestrutura para processamento paralelo, obtendo-se um aumento de desempenho. ONE possibilita uma escalabilidade quase ilimitada, uma vez que a totalidade dos dados (dos factos e das dimensões), e não apenas da tabela de factos, é linearmente dividida pelos nós da infraestrutura de processamento (com η nós homogéneos, cada nó conterá 1/η dos dados). Portanto, e uma vez que a adição de novos nós à infraestrutura de processamento não requer a replicação das dimensões, o modelo ONE oferece escalabilidade massiva de dados. Ao garantir uma distribuição linear de todos os dados, e não apenas os dados da tabela de fatos, o tempo de execução das pesquisas é melhorado proporcionalmente à redução do volume de dados em cada nó. Além disso, e porque os dados estão desnormalizados, o processamento das pesquisas é bastante simplificado e previsível, pois fica reduzido às operações de filtragem e de agregação dos dados. Como consequência, são reduzidos os requisitos da infraestrutura de processamento. Por norma, quando uma pesquisa é submetida não existe uma noção clara de quanto tempo irá demorar e se o resultado será obtido antes da tomada de decisão. Definimos o conceito de execução em tempo útil (right-time) como a capacidade de executar pesquisas de modo que os resultados estejam disponíveis antes da tomada de decisão (execução atempada), antes dum determinado objetivo temporal. O objetivo não é obter execuções mais rápidas, mas sim garantir que os resultados estarão disponíveis quando esperado. São propostos mecanismos que permitem fornecer previsibilidade de tempo de execução e garantias de execução atempada de pesquisas que tenham objetivos temporais. Como as pesquisas podem ter objetivos temporais diferentes do oferecido pela atual infraestrutura de processamento, propusemos um modelo de processamento chamado TEEPA (Timely Execution with Elastic Parallel Architecture), que toma em consideração os objetivos temporais das pesquisas para ajustar e rebalancear a infraestrutura de processamento de modo a que estes sejam garantidos. Quando a infraestrutura atual não consegue executar atempadamente as pesquisas, são adicionados mais nós de processamento e o volume de dados é redistribuído entre eles. Em cada nó, TEEPA monitora continuamente a execução da pesquisa, o volume de dados alocado, e a taxa de transferência IO, para determinar se as pesquisas podem ser atempadamente executadas. Como os nós de processamento podem ser heterogéneos, TEEPA toma em conta as suas capacidades de IO para determinar quantos nós são necessários e como deve ser efetuada a redistribuição dos dados. O volume de dados alocado em cada nó é ajustado em função do volume total (número total de registos), do tamanho do registo e da taxa de transferência de cada nó. Deste modo, a nós mais rápidos são atribuídos maiores volumes de dados. O processo de seleção e integração de novos nós de processamento e posterior rebalanceamento e reequilíbrio dos dados é executado até que os objetivos temporais sejam atingidos. Por outro lado, cada vez mais há a necessidade de analisar dados obtidos quase em tempo real, com mínima latência e frescura (freshness), o que requer que os dados sejam carregados mais frequentemente, à medida que são registados. Contudo, tipicamente as DW são refrescadas periodicamente com conjuntos de registos (batch), de modo a reduzir os custos de carregamento e os custos relacionados com o refrescamento de estruturas auxiliares, como índices e vistas materializadas. Sistemas de base de dados em memória minimizam estes custos, e possibilitam que os dados sejam carregados mais frequentemente. Contudo, a memória é finita e é insuficiente para conter a totalidades dos dados. De modo a oferecer latência mínima, definimos um modelo de processamento paralelo em que os dados são divididos em duas partes distintas: os dados antigos são guardados no modelo de dados ONE, ao qual chamámos Od, e os dados mais recentes são guardados em memória num modelo em estrela, designado de Os. Os dados podem ser carregados com maior frequência para Os, reduzindo assim a sua latência, e são aí mantidos enquanto existir memória disponível. Quando for necessário, por exemplo quando for necessário libertar memória para guardar novos dados, os dados mais antigos existentes em Os são movidos para Od. A utilização dum modelo hibrido, composto por Od e Os, permite que as DW existentes, que utilizam o modelo em estrela, possam ser migradas diretamente para este modelo com mínimo impacto ao nível dos processos de extração, transformação e carregamento dos dados (ETL). Na perspetiva do utilizador e das aplicações, este modelo hibrido oferece uma visão lógica dos dados num modelo em estrela, por forma a permitir uma fácil integração com aplicações e processos de carregamentos existentes, e a oferecer as vantagens do modelo em estrela, nomeadamente ao nível de usabilidade e facilidade de utilização. Uma camada de abstração gere a consistência de dados e processamento entre as duas componentes (Os e Od), incluindo a reescrita das pesquisas de modo a processar os dados que se encontram em cada uma das componentes. São também propostos mecanismos que oferecem garantias de execução atempada de pesquisas, mesmo quando um grande número de pesquisas está sendo processado simultaneamente. Infraestruturas paralelas podem minimizar esta questão, no entanto a sua escalabilidade é limitada pelo modelo de execução dos sistemas de bases de dados relacionais, onde cada pesquisa é processada individualmente e compete com as outras pelos recursos (IO, CPU, memória, …). É proposto um modelo de processamento de pesquisas, chamado SPIN, que analisa as pesquisas submetidas e, sempre que possível, efetua a partilha de dados e processamento entre elas, e assim consegue oferecer tempos de execução mais rápidos e previsíveis. SPIN utiliza o modelo de dados ONE, mas considera a tabela como sendo circular, isto é, uma tabela que é lida continuamente de uma forma circular. Enquanto existirem pesquisas a serem executadas, os dados são lidos sequencialmente e quando chega ao fim da tabela, recomeça a ler os dados desde o início da tabela. À medida que os dados são lidos, estes são colocados sequencialmente numa janela deslizante em memória (base pipeline), para serem partilhados pelas várias pesquisas. Cada pesquisa processa todos os registos da tabela, no entanto a leitura e o processamento não começa no registo número 1 da tabela, mas sim no primeiro registo da janela deslizante (início lógico). Os restantes registos são processados à medida que forem lidos e colocados na janela deslizante, até que o próximo registo a ser processado seja o do início lógico, isto é, após um ciclo completo. O custo da leitura dos dados é constante e partilhado por todas as pesquisas. Deste modo, a submissão de novas pesquisas não introduz custos adicionais ao nível da leitura de dados. O tempo de execução das pesquisas é influenciado apenas pela complexidade e número dos filtros (restrições) das pesquisas e pelo custo das agregações e ordenações dos dados. SPIN partilha dados e processamento entre pesquisas, combinando filtros e computações comuns a várias pesquisas num único fluxo (ramo) de processamento. Os vários ramos (branches) são sequencialmente conectados, formando uma estrutura em árvore que denominámos de WPtree (Workload Processing Tree), que tem como raiz o base pipeline. Quando uma pesquisa é submetida, se existir um ramo de processamento com predicados comuns aos da pesquisa, a pesquisa é encadeada como um novo ramo desse ramo comum, e são removidos os respetivos predicados da pesquisa. Se não existir um ramo com predicados comuns, a pesquisa é encadeada como um novo ramo do base pipeline. Deste modo, reduz-se o volume de dados que está em memória para processamento, bem como o custo de processamento dos predicados. A árvore de processamento é continuamente monitorizada, e quando necessário, um optimizador reorganiza dinamicamente o número e a ordem dos ramos. Sempre que possível, uma pesquisa é processada através da combinação dos resultados que estão a ser processados por outros ramos, e deste modo simplificando e reduzindo o volume de dados que a pesquisa tem que processar. Como os registos são lidos e processados pela mesma ordem, enquanto os dados não forem alterados, o resultado da avaliação dos predicados de cada registo é o igual ao da última vez que foi avaliado. De modo a evitar o custo da avaliação de registos anteriormente avaliados, e que não foram alterados, é proposta uma extensão ao modelo de processamento SPIN que utiliza uma abordagem de processamento baseada em bitsets (estruturas similares aos índices bitmaps). Um bitset é construído para cada ramo com o resultado da avaliação dos seus predicados, sendo o resultado de cada registo guardado na correspondente posição do bitset. Após o bitset estar completo, a posterior avaliação desses predicados pode ser substituído por uma simples verificação no bitset. Os bitsets têm um tamanho reduzido e são guardados em memória, de modo a evitar a introdução de custos adicionais ao nível de IO. Bitsets são particularmente relevantes para predicados complexos e com elevado custo de processamento, sendo criados e removidos dinamicamente de acordo com uma política de retenção, que toma em consideração vários aspetos, tais como a memória disponível, cardinalidade, e o custo da avaliação dos predicados. Através da análise do conjunto sequencial de ramos (path) de uma pesquisa, e dos custos de processamento de cada ramo, é possível estimar, com elevada precisão, o tempo de execução da pesquisa, mesmo quando existe um grande número de pesquisas a serem executadas simultaneamente. Para satisfazer pesquisas com objetivos temporais mais exigentes, é proposto um mecanismo de processamento, denominado CARROUSEL, que além de redistribuir e/ou replicar fragmentos dos dados pelos vários nós de processamento, redistribui também o processamento das pesquisas e dos ramos pelos nós. Tomando em consideração os bitsets existentes, é possível determinar quais os fragmentos de dados que cada pesquisa necessita processar e deste modo reduzir o custo de processamento através da ativação/desativação dinâmica dos ramos, consoante os fragmentos que estão nesse momento em memória. É possível terminar antecipadamente a execução de uma pesquisa, antes do término do ciclo. CARROUSEL é um processador flexível de fragmentos que utiliza um conjunto de nós inativos, ou nós que estão a executar pesquisas com objetivos temporais menos exigentes, para processar em paralelo alguns dos fragmentos de dados requeridos por pesquisas com objetivos temporais mais exigentes. Ao reduzir-se o volume de dados a ser processado por cada nó consegue-se tempos de execução mais rápidos. Alternativamente, alguns dos ramos de processamentos podem ser redistribuídos para outros nós com réplicas dos fragmentos. A execução da pesquisa termina quando todos os registos foram processados. No entanto, como os dados estão continuamente a ser lidos e à medida que são processados é recolhida informação relevante sobre os dados que existem em cada fragmento. Esta informação é relevante de modo a decidir como o balanceamento dos fragmentos e dos ramos a processar devem ser redistribuídos pelos nós por forma a reduzir custos de processamento e tempos de execução.
Data Warehouse (DW) systems are a fundamental tool for the decision-making process, have to deal with increasingly large data volumes, which is typically stored in as a star-schema model. The query workload is also more demanding, involving more complex, ad-hoc and unpredictable query patterns, with more simultaneous queries being submitted and executed concurrently. Modern competitive markets require decisions to be taken in a timely fashion. It is not just a matter of delivering fast analysis, but also of guaranteeing that they will be available before business-decisions are made. Moreover, the data volumes produced by data intensive industries are continuously increasing, stressing the processing infrastructure ability to provide such timely requirements even further. As a consequence, IT departments are continuously upgrading the processing infrastructure with the objective to hopefully the newer architecture will be able to deliver query results within the required time frame, but without any guarantees that it will be able to do so. There’s no concrete method to define the minimal hardware requirements to deliver timely query results. Several factors influence the ability of the DW infrastructure to provide timely results to queries, such as the query execution complexity (query selectivity, number of relations that have to be joined, the joins algorithms and the relations’ size), the heterogeneity and capabilities of the processing infrastructure, including IO throughput, and the memory available to process joins and the implementation of the join algorithms). Larger data volumes and concurrent query loads; concurrent queries that are executing simultaneously also influence the system ability to provide predictable execution times. In spite of all the time and effort to come up with a parallel infrastructure to handle such increase in data volume and to improve query execution time, it may be insufficient to provide timely execution queries, particularly for ad-hoc queries. The performance of well-known queries can be tuned through a set of auxiliary strategies and mechanisms, such as materialized views and index tuning. However, for ad-hoc queries, such mechanisms are not an alternative solution. The query patterns unpredictability result in unpredictable query execution times, which may be incompatible with business requirements. Data volumes produced by data intensive industries are continuously increasing, stressing the ability of the processing infrastructure to provide such timely answers even further. As a consequence, IT departments are continuously upgrading the processing infrastructure with the objective to deliver query results within the required time frame, but without any guarantees that it will be able to do so. There’s no concrete method to define the minimal hardware requirements to deliver timely query results. This dissertation proposes a data warehousing architecture that provides scalability and timely results for massive data volumes. The architecture is able to do this even in the presence of a large number of concurrent queries, and it is able to meet near real-time requirements. The ability to provide timely results is not just a performance issue (high throughput), but also a matter of returning query results when expected, according to the nature of the analysis and the business decisions. Query execution complexity is highly influenced by the number of relations that have to be joined together, the relations’ size and the query selection predicates (selectivity), influencing the data volume that has to be read from storage and joined. This data volume and the memory available for joins, influence both the join order and the used join algorithms. These unpredictable costs related to joining the fact table with dimensions relations arise from the star-schema model organization. The data volume is another factor of unpredictability, since there’s no simple and accurate method to determine the impact of larger data volumes in query execution time. To handle the unpredictability factors related to joining relations, we proposed the ONE data model, where the fact table and data from corresponding dimensions are physically stored into a single de-normalized relation, without primary and foreign keys, containing all the attributes from both fact and dimension tables. ONE trades-off storage space for a more simpler and predictable processing model. To provide horizontal scalability, we partitioned the de-normalized ONE relation into data fragments and distribute them among a set of processing nodes for parallel processing, yielding improved performance speedup. ONE delivers unlimited data scalability, since the whole data (fact and dimensions), and not just the fact table, is linearly partitioned among nodes (with η nodes, each will have 1/η of the ONE node). Therefore, since the addition of more nodes to the processing infrastructure does not require additional data replication of dimensions, ONE provides massive data scalability. By ensuring a linear distribution of the whole data, and not just the fact table, query execution time is improved proportionally to the data volume in each node. Moreover, since data in each node is already joined and thus query processing does not involve the execution of costly join algorithms, the speedup in each node is enhanced (almost) linearly as a function of the data volume that it has to process. By de-normalizing the data, we also decrease the nodes’ requirements, in what concerns physical memory (needed for processing joins), and query processing tasks, since the join processing tasks that were repeatedly (over and over) processed are removed. The remaining tasks, such as filtering and aggregations, have minimum memory and processing requirements. Only group by aggregations and sorting have memory requirements. The concept of timely results (right-time execution) is introduced, and we propose mechanisms to provide right-time guarantees while meeting runtime predictability and freshness requirements. The ability to provide right-time data analysis is gaining increasing importance, with more and more operational decisions being made using data analysis from the DW. The predictability of the query execution tasks is particularly relevant for providing right-time or real-time data analysis. We define right-time as the ability to deliver query results in a timely manner, before they are required. The aim is not to provide the fastest answers, but to guarantee that the answers will be available when expected and needed. We proposed a Timely Execution with Elastic Parallel Architecture (TEEPA) which takes into consideration the query time targets to adjust and rebalancing the processing infrastructure and thus providing right-time guarantees. When the current deployment is unable to deliver the time targets, it adds more processing nodes and redistributes the data volumes among them. TEEPA continuously monitors the local query execution, the IO throughput and the data volume allocated to each processing node, to determine if the system is able to satisfy the user specified time targets. TEEPA was designed to handle heterogeneous nodes and thus it takes into account their IO capabilities when performing the necessary data rebalancing tasks. The data volume allocated to each node is adjusted as a function of the whole data load (total number of tuples), the tuple size and the node’ sequential scan throughput, with larger data volumes allocated to faster processing nodes. The node allocation (selection and integration of newer nodes) and data rebalancing tasks are continuously executed until the time targets can be assured. There’s an increasing demand for data analyses over near real-time data, with low latency and minimum freshness, which requires data to be loaded more frequently or loaded in a row-by-row fashion. However, traditionally DWs are periodically refreshed in batches, to reduce IO loading costs and costs related to the refreshing indexes and pre-computed aggregation data structures. Main memory DBMS eliminate IO costs and thus can handle higher data loading frequencies. However, physical memory is limited in size and cannot typically hold the whole tables and structures. To provide freshness guarantees, the proposed architecture combines a parallel ONE deployment with an in-memory star-schema model holding recent data. The in-memory part (Os) maintains the recently loaded data, to allow the execution of real-time analyses. By using a star-schema model in Os, existing DW applications can be easily replaced and integrated with the architecture without the need to recreate the existing ETL tasks. Data is loaded into the in-memory Os and remains there for real-time processing while there’s memory available, so that the most recent data is held in the star-schema. When the physical memory is exhausted, the data in Os stored in the star-schema model is moved to Od in the ONE data model. From the user perspective and data presentation, the architecture offers a logical star-schema model view of the data, in order to provide easy integration with existing applications and because the model has advantages in what concerns users understanding and usability. A logical to physical layer manages data and processing consistency between models, including the necessary query rewriting for querying the data stored in each part, and merging of results. Finally we present the mechanisms of the architecture that allow it to still guarantee right-time execution in the presence of huge concurrent query loads. Modern DWs also suffers from workload scalability limitations, with more and more queries (in particular ad-hoc) being concurrently submitted. Larger parallel infrastructures can reduce this limitation, but its scalability is constrained by the query-at-time execution model of custom RDBMs, where each query is individually processed, competing for resources (IO, CPU, memory,…) and accessing the common base data, without data and processing sharing considerations. We propose SPIN, a data and processing sharing model that delivers predictable execution times for concurrent queries and overcomes the memory and scalability limitations of existing approaches. SPIN views the ONE relation in a node, as a logical circular relation, i.e. a relation that is constantly scanned in a circular fashion. When the end is reached, it continues scanning from the beginning, while there are queries running. Each query process all the required tuples of relation ONE, but the scanning and the query processing does not starts from the same first physical row. As the relation is read in a circular fashion, the first logical row is the one that already is cached in memory. The remaining tuples of the query are processed as they are being read from storage until the first logical row is reached. Data is read from storage and placed into an in–memory pipeline to be shared by all running concurrent queries. IO reading cost is constant and is shared between running queries. Therefore, the submission of additional queries does not incur in additional IO costs and joins operations. The execution times of concurrent queries are influenced by the number and complexity of the query constraints (filtering) and the cost of aggregations. To provide massive workload scalability it shares data and processing among queries, by combining the running queries in logical query branches for filtering clauses and by extensive reuse of common computations and aggregation operations. It analyses the query predicates, and if exists a logical branch in the current workload processing tree with common predicates it is registered in that logical branch, and the corresponding query predicates are removed. Otherwise, if do not exists a logical branch that meet the query predicates, it is registered as a new logical branch of the base data pipeline. This enhances processing sharing, and reduces the number of filtering conditions. The architecture has a branch optimizer that is continuously adjusting the number and order of the existing branches, and reorganizing them as required. Whenever possible, a query can merge and combine the results that are being processed by other branches, and thus simplifying and reducing the data volume that the query branch has to filter and to process. Since tuples flow using the same reading order, if data doesn’t change, the evaluation of the branch predicates against every tuple that flows along the branch will not change. The result of predicate evaluation will be the same as the last time it was evaluated. To avoid subsequent evaluation of unchanged data tuples, we extended the SPIN approach with a bitset processing approach. A branch bitset (bitmap) is built according to the branch’ predicates, where each bit represents the boolean result of the predicate evaluation (true/false) applied to a corresponding tuple index. Future evaluations of the tuple can take advantage of the existence of this bitset, since the selection operator that evaluates the predicate can be replaced by a fast lookup operator to the corresponding position in the bitset to gathers the result. Bitsets are small and reside in memory in order to avoid introducing overhead at IO level. This is particularly relevant for predicates with high evaluations costs. Through the analysis of the data path (branches) of queries, and the required computational costs of each branch, it is possible to determine high accurate estimations of query execution times. Therefore, predictable execution times can be given for massive workload scalability. Tighter right-time guarantees can be provided by extending the parallel infrastructure, and redistributing data among processing nodes, but also by redistributing queries, query processing and data branches between nodes holding replicated fragments. This is achieved by using two distinct approaches, a parallel fine-tuned fragment level processing, named CARROUSEL, and an early-end query processing mechanism. CARROUSEL is a flexible fragment processor that uses idle nodes, or nodes currently running less time-stricter queries, to process some of the fragments required by time-stricter queries, on behalf of the fragment node’s owner. By reducing the data volume to be processed by a node, it can provide faster execution times. Alternatively, it may distribute some logical data branches among nodes with replicated fragments, and thus reducing query processing. This is only possible with nodes with replicated data fragments. The execution of a query ends when all tuples of the data fragments are processed and the circular logical loop is completed. But as the system is continuously spinning, reading and processing over and over the same data, it collects insightful information regarding the data that is stored in each data fragment. For some logical data branches, this can be relevant to reduce memory and computational usage by using a postponed start (delaying the query execution until the first relevant fragment is loaded) and early-end approaches (detaching the query pipeline when all the relevant fragments for a query have been processed). This information is useful when the architecture needs to perform a data rebalancing process, with the rebalanced data being clustered according to logical branch predicates and stored as new data fragments.