Dissertations / Theses on the topic 'Document warehouse'
Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles
Consult the top 33 dissertations / theses for your research on the topic 'Document warehouse.'
Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.
You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.
Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.
Kanna, Rajesh. "Managing XML data in a relational warehouse on query translation, warehouse maintenance, and data staleness /." [Gainesville, Fla.] : University of Florida, 2001. http://etd.fcla.edu/etd/uf/2001/anp4011/Thesis.PDF.
Full textTitle from first page of PDF file. Document formatted into pages; contains x, 75 p.; also contains graphics. Vita. Includes bibliographical references (p. 71-74).
Bange, Carsten. "Business intelligence aus Kennzahlen und Dokumenten : Integration strukturierter und unstrukturierter Daten in entscheidungsunterstützenden Informationssystemen /." Hamburg : Kovac, 2004. http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&doc_number=012863212&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA.
Full textHedler, Francielly. "Global warehouse management : a methodology to determine an integrated performance measurement." Thesis, Université Grenoble Alpes (ComUE), 2015. http://www.theses.fr/2015GREAI082/document.
Full textThe growing warehouse operation complexity has led companies to adopt a large number of indicators, making its management increasingly difficult. It may be hard for managers to evaluate the overall performance of the logistic systems, including the warehouse, because the assessment of the interdependence of indicators with distinct objectives is rather complex (e.g. the level of a cost indicator shall decrease, whereas a quality indicator level shall be maximized). This fact could lead to biases in the analysis executed by the manager in the evaluation of the global warehouse performance.In this context, this thesis develops a methodology to achieve an integrated warehouse performance measurement. It encompasses four main steps: (i) the development of an analytical model of performance indicators usually used for warehouse management; (ii) the definition of indicator relationships analytically and statistically; (iii) the aggregation of these indicators in an integrated model; (iv) the proposition of a scale to assess the evolution of the warehouse performance over time according to the integrated model results.The methodology is applied to a theoretical warehouse to demonstrate its application. The indicators used to evaluate the warehouse come from the literature and the database is generated to perform the mathematical tools. The Jacobian matrix is used to define indicator relationships analytically, and the principal component analysis to achieve indicator's aggregation statistically. The final aggregated model comprehends 33 indicators assigned in six different components, which compose the global performance indicator equation by means of component's weighted average. A scale is developed for the global performance indicator using an optimization approach to obtain its upper and lower boundaries.The usability of the integrated model is tested for two different warehouse performance situations and interesting insights about the final warehouse performance are discussed. Therefore, we conclude that the proposed methodology reaches its objective providing a decision support tool for managers so that they can be more efficient in the global warehouse performance management without neglecting important information from indicators
A crescente complexidade das operações em armazéns tem levado as empresasa adotarem um grande número de indicadores de desempenho, o que tem dificultadocada vez mais o seu gerenciamento. Além do volume de informações, os indicadores normalmentepossuem interdependências e objetivos distintos, as vezes até opostos (por exemplo,o indicador de custo deve ser reduzido enquanto o indicador de qualidade deve sempre seraumentado), tornando complexo para o gestor avaliar o desempenho logístico global dosistema, incluindo o armazém.Dentro deste contexto, esta tese desenvolve uma metodologia para obter uma medidaagregada do desempenho global do armazém. A metodologia é composta de quatro etapasprincipais: (i) o desenvolvimento de um modelo analítico dos indicadores de desempenhojá utilizados para o gerenciamento do armazém; (ii) a definição das relações entre os indicadoresde forma analítica e estatística; (iii) a agregação destes indicadores em um modelointegrado; (iv) a proposição de uma escala para avaliar a evolução do desempenho globaldo armazém ao longo do tempo, de acordo com o resultado do modelo integrado.A metodologia é aplicada em um armazém teórico para demonstrar sua aplicabilidade.Os indicadores utilizados para avaliar o desempenho do armazém são provenientesda literatura, e uma base de dados é gerada para permitir a utilização de ferramentasmatemáticas. A matriz jacobiana é utilizada para definir de forma analítica as relaçõesentre os indicadores, e uma análise de componentes principais é realizada para agregaros indicadores de forma estatística. O modelo agregado final compreende 33 indicadores,divididos em seis componentes diferentes, e a equação do indicador de desempenho globalé obtido a partir da média ponderada dos seis componentes. Uma escala é desenvolvidapara o indicador de desempenho global utilizando um modelo de otimização para obter oslimites superior e inferior da escala.Depois de testes com o modelo integrado, pôde-se concluir que a metodologia propostaatingiu seu objetivo ao fornecer uma ferramenta de ajuda à decisão para os gestores, permitindoque eles sejam mais eficazes no gerenciamento global do armazém sem negligenciarinformações importantes que são fornecidas pelos indicadores
Garcelon, Nicolas. "Problématique des entrepôts de données textuelles : dr Warehouse et la recherche translationnelle sur les maladies rares." Thesis, Sorbonne Paris Cité, 2017. http://www.theses.fr/2017USPCB257/document.
Full textThe repurposing of clinical data for research has become widespread with the development of clinical data warehouses. These data warehouses are modeled to integrate and explore structured data related to thesauri. These data come mainly from machine (biology, genetics, cardiology, etc.) but also from manual data input forms. The production of care is also largely providing textual data from hospital reports (hospitalization, surgery, imaging, anatomopathologic etc.), free text areas in electronic forms. This mass of data, little used by conventional warehouses, is an indispensable source of information in the context of rare diseases. Indeed, the free text makes it possible to describe the clinical picture of a patient with more precision and expressing the absence of signs and uncertainty. Particularly for patients still undiagnosed, the doctor describes the patient's medical history outside any nosological framework. This wealth of information makes clinical text a valuable source for translational research. However, this requires appropriate algorithms and tools to enable optimized re-use by doctors and researchers. We present in this thesis the data warehouse centered on the clinical document, which we have modeled, implemented and evaluated. In three cases of use for translational research in the context of rare diseases, we attempted to address the problems inherent in textual data: (i) recruitment of patients through a search engine adapted to textual (data negation and family history detection), (ii) automated phenotyping from textual data, and (iii) diagnosis by similarity between patients based on phenotyping. We were able to evaluate these methods on the data warehouse of Necker-Enfants Malades created and fed during this thesis, integrating about 490,000 patients and 4 million reports. These methods and algorithms were integrated into the software Dr Warehouse developed during the thesis and distributed in Open source since September 2017
Samuel, John. "Feeding a data warehouse with data coming from web services. A mediation approach for the DaWeS prototype." Thesis, Clermont-Ferrand 2, 2014. http://www.theses.fr/2014CLF22493/document.
Full textThe role of data warehouse for business analytics cannot be undermined for any enterprise, irrespective of its size. But the growing dependence on web services has resulted in a situation where the enterprise data is managed by multiple autonomous and heterogeneous service providers. We present our approach and its associated prototype DaWeS [Samuel, 2014; Samuel and Rey, 2014; Samuel et al., 2014], a DAta warehouse fed with data coming from WEb Services to extract, transform and store enterprise data from web services and to build performance indicators from them (stored enterprise data) hiding from the end users the heterogeneity of the numerous underlying web services. Its ETL process is grounded on a mediation approach usually used in data integration. This enables DaWeS (i) to be fully configurable in a declarative manner only (XML, XSLT, SQL, datalog) and (ii) to make part of the warehouse schema dynamic so it can be easily updated. (i) and (ii) allow DaWeS managers to shift from development to administration when they want to connect to new web services or to update the APIs (Application programming interfaces) of already connected ones. The aim is to make DaWeS scalable and adaptable to smoothly face the ever-changing and growing web services offer. We point out the fact that this also enables DaWeS to be used with the vast majority of actual web service interfaces defined with basic technologies only (HTTP, REST, XML and JSON) and not with more advanced standards (WSDL, WADL, hRESTS or SAWSDL) since these more advanced standards are not widely used yet to describe real web services. In terms of applications, the aim is to allow a DaWeS administrator to provide to small and medium companies a service to store and query their business data coming from their usage of third-party services, without having to manage their own warehouse. In particular, DaWeS enables the easy design (as SQL Queries) of personalized performance indicators. We present in detail this mediation approach for ETL and the architecture of DaWeS. Besides its industrial purpose, working on building DaWeS brought forth further scientific challenges like the need for optimizing the number of web service API operation calls or handling incomplete information. We propose a bound on the number of calls to web services. This bound is a tool to compare future optimization techniques. We also present a heuristics to handle incomplete information
Khemiri, Rym. "Vers l'OLAP collaboratif pour la recommandation des analyses en ligne personnalisées." Thesis, Lyon 2, 2015. http://www.theses.fr/2015LYO22015/document.
Full textThe objective of this thesis is to provide a collaborative approach to the OLAP involving several users, led by an integrated personalization process in decision-making systems in order to help the end user in their analysis process. Whether personalizing the warehouse model, recommending decision queries or recommending navigation paths within the data cubes, the user need an efficient decision-making system that assist him. We were interested in three issues falling within data warehouse and OLAP personalization offering three major contributions. Our contributions are based on a combination of datamining techniques with data warehouses and OLAP technology. Our first contribution is an approach about personalizing dimension hierarchies to obtain new analytical axes semantically richer for the user that can help him to realize new analyzes not provided by the original data warehouse model. Indeed, we relax the constraint of the fixed model of the data warehouse which allows the user to create new relevant analysis axes taking into account both his/her constraints and his/her requirements. Our approach is based on an unsupervised learning method, the constrained k-means. Our goal is then to recommend these new hierarchy levels to other users of the same user community, in the spirit of a collaborative system in which each individual brings his contribution. The second contribution is an interactive approach to help the user to formulate new decision queries to build relevant OLAP cubes based on its past decision queries, allowing it to anticipate its future analysis needs. This approach is based on the extraction of frequent itemsets from a query load associated with one or a set of users belonging to the same actors in a community organization. Our intuition is that the relevance of a decision query is strongly correlated to the usage frequency of the corresponding attributes within a given workload of a user (or group of users). Indeed, our approach of decision queries formulation is a collaborative approach because it allows the user to formulate relevant queries, step by step, from the most commonly used attributes by all actors of the user community. Our third contribution is a navigation paths recommendation approach within OLAP cubes. Users are often left to themselves and are not guided in their navigation process. To overcome this problem, we develop a user-centered approach that suggests the user navigation guidance. Indeed, we guide the user to go to the most interesting facts in OLAP cubes telling him the most relevant navigation paths for him. This approach is based on Markov chains that predict the next analysis query from the only current query. This work is part of a collaborative approach because transition probabilities from one query to another in the cuboids lattice (OLAP cube) is calculated by taking into account all analysis queries of all users belonging to the same community. To validate our proposals, we present a support system user-centered decision which comes in two subsystems: (1) content personalization and (2) recommendation of decision queries and navigation paths. We also conducted experiments that showed the effectiveness of our analysis online user centered approaches using quality measures such as recall and precision
Tournier, Ronan. "Analyse en ligne (OLAP) de documents." Phd thesis, Université Paul Sabatier - Toulouse III, 2007. http://tel.archives-ouvertes.fr/tel-00348094.
Full textRoatis, Alexandra. "Efficient Querying and Analytics of Semantic Web Data." Thesis, Paris 11, 2014. http://www.theses.fr/2014PA112218/document.
Full textThe utility and relevance of data lie in the information that can be extracted from it.The high rate of data publication and its increased complexity, for instance the heterogeneous, self-describing Semantic Web data, motivate the interest in efficient techniques for data manipulation.In this thesis we leverage mature relational data management technology for querying Semantic Web data.The first part focuses on query answering over data subject to RDFS constraints, stored in relational data management systems. The implicit information resulting from RDF reasoning is required to correctly answer such queries. We introduce the database fragment of RDF, going beyond the expressive power of previously studied fragments. We devise novel techniques for answering Basic Graph Pattern queries within this fragment, exploring the two established approaches for handling RDF semantics, namely graph saturation and query reformulation. In particular, we consider graph updates within each approach and propose a method for incrementally maintaining the saturation. We experimentally study the performance trade-offs of our techniques, which can be deployed on top of any relational data management engine.The second part of this thesis considers the new requirements for data analytics tools and methods emerging from the development of the Semantic Web. We fully redesign, from the bottom up, core data analytics concepts and tools in the context of RDF data. We propose the first complete formal framework for warehouse-style RDF analytics. Notably, we define analytical schemas tailored to heterogeneous, semantic-rich RDF graphs, analytical queries which (beyond relational cubes) allow flexible querying of the data and the schema as well as powerful aggregation and OLAP-style operations. Experiments on a fully-implemented platform demonstrate the practical interest of our approach
Pérez, Martínez Juan Manuel. "Contextualizing a Data Warehouse with Documents." Doctoral thesis, Universitat Jaume I, 2007. http://hdl.handle.net/10803/10482.
Full textEl, Malki Mohammed. "Modélisation NoSQL des entrepôts de données multidimensionnelles massives." Thesis, Toulouse 2, 2016. http://www.theses.fr/2016TOU20139/document.
Full textDecision support systems occupy a large space in companies and large organizations in order to enable analyzes dedicated to decision making. With the advent of big data, the volume of analyzed data reaches critical sizes, challenging conventional approaches to data warehousing, for which current solutions are mainly based on R-OLAP databases. With the emergence of major Web platforms such as Google, Facebook, Twitter, Amazon...etc, many solutions to process big data are developed and called "Not Only SQL". These new approaches are an interesting attempt to build multidimensional data warehouse capable of handling large volumes of data. The questioning of the R-OLAP approach requires revisiting the principles of modeling multidimensional data warehouses.In this manuscript, we proposed implementation processes of multidimensional data warehouses with NoSQL models. We defined four processes for each model; an oriented NoSQL column model and an oriented documents model. Each of these processes fosters a specific treatment. Moreover, the NoSQL context adds complexity to the computation of effective pre-aggregates that are typically set up within the ROLAP context (lattice). We have enlarged our implementations processes to take into account the construction of the lattice in both detained models.As it is difficult to choose a single NoSQL implementation that supports effectively all the applicable treatments, we proposed two translation processes. While the first one concerns intra-models processes, i.e., pass rules from an implementation to another of the same NoSQL logic model, the second process defines the transformation rules of a logic model implementation to another implementation on another logic model
N'damas, Henri-Blaise. "Dans quelle mesure une démarche d’intelligence économique permettrait-elle une réduction du risque de crédit bancaire ?" Thesis, Université de Lorraine, 2017. http://www.theses.fr/2017LORR0081/document.
Full textBank Information Systems, key tools in banking strategies, have become comprehensive and complex. And the decision-making or strategic information systems are playing an increasingly more important role.Nevertheless, some inefficiencies in the conception of information systems still continue to exist, due to the uncontrolled design or rather construction of strategic information systems, systematically alienating the end-user.One solution seems to be to rely on economic intelligence to attempt to solve the matter of the construction of those strategic information systems, and consequently to improve decision-making. Because, the strategic information system, the core of decision-making systems, is the heart itself of the economic intelligence system.Our theory is that an approach of economic intelligence applied to the conception of information systems in banking would allow the reduction of the “loan risk”. This specifically in the sector of retail banking and for the individual, professional and contractor customer.- Risk for the customer who should not start loan payments which he cannot cover, or commit to loan projects which do not match the stakes he would have set himself.- Risk obviously for the bank which is not willing to accumulate uncreditworthy customers, and which would not match either the stakes set by the bank decision-makers.After putting emphasis on the distinctive features of the bank and the complexity of its environment, we will show the evidence that the current approach to risk management inside banks seems “incomplete” and fragmented, and consequently, where there is room for improvement particularly for individual and professional customers.Then, we intend to suggest some methodological rules for the conception of strategic information systems in banking, as well as a business model of such a system taking into account the needs of the end-user who will be, as shown in this present thesis, the decision-maker of a credit file or the bank adviser, or even the credit risk analyst. Finally, after drawing up this model of strategic information systems, we will compare how it could improve on the existing one. Our thesis is situated at a crossroads, at a confluence, of a thesis in management sciences, more particularly in bank finance, and of a thesis in information systems, and in computer science; and it leans largely on our professional experience in the banking sector in France.Thus, along with the banking sector, we wish to explore the new field of application of research in economic intelligence, particularly linked to the results stemming from the work by the research team SITE of LORIA as far as the conception of information systems for economic intelligence is concerned.After introducing the concept of economic intelligence and the decision-support process (chapter no. 1), we will outline the specificities of the banking sector and its information systems (chapter no. 2). Then we will clarify the difficulties of credit risk management within banks (chapter no. 3) before submitting our proposals for the implementation of a strategic information system enabling the improvement of credit risk management in banking (chapter no. 4)
Bessouat, Jeanne. "Un modèle de référence pour l'application de l'ABC dans le cadre de la réorganisation des activités de l'entrepôt : une recherche-intervention chez FM Logistic." Thesis, Strasbourg, 2019. http://www.theses.fr/2019STRAB010/document.
Full textThe logistics service provider (3PL) is reorganizing the activities of its warehouse to be efficient (C.-L. Liu and Lyons 2011). In fact, 3PL needs a detailed knowledge of the cost of the activities of its warehouses. The theoretical framework of this thesis lies at the intersection between warehouse, design and costs. Concerning the design of the warehouses, the selection, and especially the identification of all resources, are not much studied (S. S. Heragu et al. 2005; Gu, Goetschalckx, and McGinnis 2010). Concurrently, activity-based costing (ABC) is a cost calculation method which is little applied for obtaining warehouse costs (Pirttilä and Hautaniemi 1995). Several obstacles hinder its application, including the lack of formalization in the definition of activities (Waeytens and Bruggeman 1994).Using qualitative research within the enterprise FM Logistic, a classification of warehouse resources and a reference model of warehouse activities are proposed. The warehouse resource classification allows the identification of all the warehouse resources. The classification is then mobilised within the reference model of warehouse activities. The reference model facilitates the application of the ABC by standardizing the vocabulary to define the activities of the warehouse. The reference model is then employed for different applications, as part of the design of warehouse activities of a logistic service provider
Lamer, Antoine. "Contribution à la prévention des risques liés à l’anesthésie par la valorisation des informations hospitalières au sein d’un entrepôt de données." Thesis, Lille 2, 2015. http://www.theses.fr/2015LIL2S021/document.
Full textIntroduction Hospital Information Systems (HIS) manage and register every day millions of data related to patient care: biological results, vital signs, drugs administrations, care process... These data are stored by operational applications provide remote access and a comprehensive picture of Electronic Health Record. These data may also be used to answer to others purposes as clinical research or public health, particularly when integrated in a data warehouse. Some studies highlighted a statistical link between the compliance of quality indicators related to anesthesia procedure and patient outcome during the hospital stay. In the University Hospital of Lille, the quality indicators, as well as the patient comorbidities during the post-operative period could be assessed with data collected by applications of the HIS. The main objective of the work is to integrate data collected by operational applications in order to realize clinical research studies.Methods First, the data quality of information registered by the operational applications is evaluated with methods … by the literature or developed in this work. Then, data quality problems highlighted by the evaluation are managed during the integration step of the ETL process. New data are computed and aggregated in order to dispose of indicators of quality of care. Finally, two studies bring out the usability of the system.Results Pertinent data from the HIS have been integrated in an anesthesia data warehouse. This system stores data about the hospital stay and interventions (drug administrations, vital signs …) since 2010. Aggregated data have been developed and used in two clinical research studies. The first study highlighted statistical link between the induction and patient outcome. The second study evaluated the compliance of quality indicators of ventilation and the impact on comorbity.Discussion The data warehouse and the cleaning and integration methods developed as part of this work allow performing statistical analysis on more than 200 000 interventions. This system can be implemented with other applications used in the CHRU of Lille but also with Anesthesia Information Management Systems used by other hospitals
Jouhet, Vianney. "Automated adaptation of Electronic Heath Record for secondary use in oncology." Thesis, Bordeaux, 2016. http://www.theses.fr/2016BORD0373/document.
Full textWith the increasing adoption of Electronic Health Records (EHR), the amount of data produced at the patient bedside is rapidly increasing. Secondary use is there by an important field to investigate in order facilitate research and evaluation. In these work we discussed issues related to data representation and semantics within EHR that need to be address in order to facilitate secondary of structured data in oncology. We propose and evaluate ontology based methods for heterogeneous diagnosis terminologies integration in oncology. We then extend obtained model to enable tumoral disease representation and links with diagnosis as recorded in EHR. We then propose and implement a complete architecture combining a clinical data warehouse, a metadata registry and web semantic technologies and standards. This architecture enables syntactic and semantic integration of a broad range of hospital information System observation. Our approach links data with external knowledge (ontology), in order to provide a knowledge resource for an algorithm for tumoral disease identification based on diagnosis recorded within EHRs. As it based on the ontology classes, the identification algorithm is uses an integrated view of diagnosis (avoiding semantic heterogeneity). The proposed architecture leading to algorithm on the top of an ontology offers a flexible solution. Adapting the ontology, modifying for instance the granularity provide a way for adapting aggregation depending on specific needs
Boulil, Kamal. "Une approche automatisée basée sur des contraintes d’intégrité définies en UML et OCL pour la vérification de la cohérence logique dans les systèmes SOLAP : applications dans le domaine agri-environnemental." Thesis, Clermont-Ferrand 2, 2012. http://www.theses.fr/2012CLF22285/document.
Full textSpatial Data Warehouse (SDW) and Spatial OLAP (SOLAP) systems are Business Intelligence (BI) allowing for interactive multidimensional analysis of huge volumes of spatial data. In such systems the quality ofanalysis mainly depends on three components : the quality of warehoused data, the quality of data aggregation, and the quality of data exploration. The warehoused data quality depends on elements such accuracy, comleteness and logical consistency. The data aggregation quality is affected by structural problems (e.g., non-strict dimension hierarchies that may cause double-counting of measure values) and semantic problems (e.g., summing temperature values does not make sens in many applications). The data exploration quality is mainly affected by inconsistent user queries (e.g., what are temperature values in USSR in 2010?) leading to possibly meaningless interpretations of query results. This thesis address the problems of logical inconsistency that may affect the data, aggregation and exploration qualities in SOLAP. The logical inconsistency is usually defined as the presence of incoherencies (contradictions) in data ; It is typically controlled by means of Integrity Constraints (IC). In this thesis, we extends the notion of IC (in the SOLAP domain) in order to take into account aggregation and query incoherencies. To overcome the limitations of existing approaches concerning the definition of SOLAP IC, we propose a framework that is based on the standard languages UML and OCL. Our framework permits a plateforme-independent conceptual design and an automatic implementation of SOLAP IC ; It consists of three parts : (1) A SOLAP IC classification, (2) A UML profile implemented in the CASE tool MagicDraw, allowing for a conceptual design of SOLAP models and their IC, (3) An automatic implementation based on the code generators Spatial OCLSQL and UML2MDX, which allows transforming the conceptual specifications into code. Finally, the contributions of this thesis have been experimented and validated in the context of French national projetcts aimming at developping (S)OLAP applications for agriculture and environment
Younsi, Fatima-Zohra. "Mise en place d'un Système d'Information Décisionnel pour le suivi et la prévention des épidémies." Thesis, Lyon, 2016. http://www.theses.fr/2016LYSE2005/document.
Full textToday, infectious diseases represent a major public health problem. With the increase of bacterial resistance, the emergence of new pathogens and the rapid spread of epidemic, monitoring and surveillance of disease transmission becomes important. In the face of such a threat, the society must prepare in advance to respond quickly and effectively if an outbreak is declared. This requires setting up monitoring mechanisms and prevention.In this context, we are particularly interested by development a Spatiotemporal decision support system for monitoring and preventing the phenomenon of seasonal influenza epidemic spread in the population of Oran (city at Algeria).The objective of this system is twofold: on one hand, to understand how epidemic is spreading through the social network by using SEIR (Susceptible-Exposed-Infected-Removed) compartmental model within Small World network, and on the other hand, to store multiple data in data warehouse and analyzing it by a specific online analysis tool Spatial OLAP (Spatial on-line Analytical Processing)
Windhouwer, Menzo Aart. "Feature grammar systems incremental maintenance of indexes to digital media warehouses /." [S.l. : Amsterdam : s.n.] ; Universiteit van Amsterdam [Host], 2003. http://dare.uva.nl/document/86747.
Full textBerbel, Talita dos Reis Lopes. "Recomendação semântica de documentos de texto mediante a personalização de agregações OLAP." Universidade Federal de São Carlos, 2015. https://repositorio.ufscar.br/handle/ufscar/632.
Full textWith the rapid growth of unstructured data, such as text documents, it becomes more and more interesting and necessary to extract such information to support decision making in business intelligence systems. Recommendations can be used in the OLAP process, because they allow users to have a particular experience in exploiting data. The process of recommendation, together with the possibility of query personalisation, allows recommendations to be increasingly relevant. The main contribution of this work is to propose an effective solution for semantic recommendation of documents through personalisation of OLAP aggregation queries in a data warehousing environment. In order to aggregate and recommend documents, we propose the use of semantic similarity. Domain ontology and the statistical measure of frequency are used in order to verify the similarity between documents. The threshold of similarity between documents in the recommendation process is adjustable and this is the personalisation that provides to the user an interactive way to improve the relevance of the results. The proposed case study is based on articles from PubMed and its domain ontology in order to create a prototype using real data. The results of the experiments are presented and discussed, showing that good recommendations and aggregations are possible with the suggested approach. The results are discussed on the basis of evaluation measures: precision, recall and F1-measure.
Com o crescimento do volume dos dados não estruturados, como os documentos de texto, torna-se cada vez mais interessante e necessário extrair informações deste tipo de dado para dar suporte à tomada de decisão em sistemas de Business Intelligence. Recomendações podem ser utilizadas no processo OLAP, pois permitem que os usuários tenham uma experiência diferenciada na exploração dos dados. O processo de recomendação, aliado à possibilidade da personalização das consultas dos usuários, tomadores de decisão, permite que as recomendações possam ser cada vez mais relevantes. A principal contribuição deste trabalho é a proposta de uma solução eficaz para a recomendação semântica de documentos mediante a personalização de consultas de agregação OLAP em um ambiente de Data Warehousing. Com o intuito de agregar e recomendar documentos propõe-se a utilização da similaridade semântica. A ontologia de domínio e a medida estatística de frequência são utilizadas com o objetivo de verificar a similaridade entre os documentos. O limiar de similaridade entre os documentos no processo de recomendação pode ser parametrizado e é esta a personalização que oferece ao usuário uma maneira interativa de melhorar a relevância dos resultados obtidos. O estudo de caso proposto se baseia em artigos da PubMed e em sua ontologia de domínio com o propósito de criar um protótipo utilizando dados reais. Os resultados dos experimentos realizados são expostos e analisados, mostrando que boas recomendações e agregações são possíveis utilizando a abordagem sugerida. Os resultados são discutidos com base nas métricas de avaliação: precision, recall e F1-measure.
Aknouche, Rachid. "Entrepôt de textes : de l'intégration à la modélisation multidimensionnelle de données textuelles." Thesis, Lyon 2, 2014. http://www.theses.fr/2014LYO20025.
Full textThe work, presented in this thesis, aims to propose solutions to the problems of textual data warehousing. The interest in the textual data is motivated by the fact that they cannot be integrated and warehoused by using the traditional applications and the current techniques of decision-making systems. In order to overcome this problem, we proposed a text warehouses approach which covers the main phases of a data warehousing process adapted to textual data. We focused specifically on the integration of textual data and their multidimensional modeling. For the textual data integration, we used information retrieval (IR) techniques and automatic natural language processing (NLP). Thus, we proposed an integration framework, called ETL-Text which is an ETL (Extract- Transform- Load) process suitable for textual data. The ETL-Text performs the extracting, filtering and transforming tasks of the original textual data in a form allowing them to be warehoused. Some of these tasks are performed in our RICSH approach (Contextual information retrieval by topics segmentation of documents) for pretreatment and textual data search. On the other hand, the organization of textual data for the analysis is carried out by our proposed TWM (Text Warehouse Modelling). It is a new multidimensional model suitable for textual data. It extends the classical constellation model to support the representation of textual data in a multidimensional environment. TWM includes a semantic dimension defined for structuring documents and topics by organizing the semantic concepts into a hierarchy. Also, we depend on a Wikipedia, as an external semantic source, to achieve the semantic part of the model. Furthermore, we developed WikiCat, which is a tool permit to feed the TWM semantic dimension with semantics descriptors from Wikipedia. These last two contributions complement the ETL-Text framework to establish the text warehouse device. To validate the different contributions, we performed, besides the implementation works, an experimental study for each model. For the emergence of large data, we developed, as part of a case study, a parallel processing algorithms using the MapReduce paradigm tested in the Apache Hadoop environment
Heitz, Adeline. "La Métropole Logistique : structure métropolitaine et enjeux d'aménagement." Thesis, Paris Est, 2017. http://www.theses.fr/2017PESC1098/document.
Full textAmong other activities, metropolitan areas have become places of premium location for logistics activities. As a consequence of the concentration of warehouses in metropolitan areas, logistics facilities are mainly located in suburban areas, inducing logistics metropolization. This logistics suburbanization amplifies the negative externalities of transport and challenges public policies. However, suburban areas are not the only location choices of logistics facilities. Analysis on logistics sprawl should not overlook logistics facilities located in dense parts of metropolitan areas which, moreover, draw the focus of public authorities. The apparent contradiction between logistics that contribute to urban sprawl and the new sustainability issues has led to refocusing the debate on the "last mile" rather than logistical planning in the fringes of metropolitan area. Through the development of "urban logistics" policies, public stakeholders intend to offer a complementary service to those offered by the logistics real estate market, while complying with environmental objectives. The main challenge of analyzing this logistics metropolization lies in the double contribution of logistics to metropolitan morphology and the political agenda
Kuchmann-Beauger, Nicolas. "Question Answering System in a Business Intelligence Context." Thesis, Châtenay-Malabry, Ecole centrale de Paris, 2013. http://www.theses.fr/2013ECAP0021/document.
Full textThe amount and complexity of data generated by information systems keep increasing in Warehouses. The domain of Business Intelligence (BI) aims at providing methods and tools to better help users in retrieving those data. Data sources are distributed over distinct locations and are usually accessible through various applications. Looking for new information could be a tedious task, because business users try to reduce their work overload. To tackle this problem, Enterprise Search is a field that has emerged in the last few years, and that takes into consideration the different corporate data sources as well as sources available to the public (e.g. World Wide Web pages). However, corporate retrieval systems nowadays still suffer from information overload. We believe that such systems would benefit from Natural Language (NL) approaches combined with Q&A techniques. Indeed, NL interfaces allow users to search new information in their own terms, and thus obtain precise answers instead of turning to a plethora of documents. In this way, users do not have to employ exact keywords or appropriate syntax, and can have faster access to new information. Major challenges for designing such a system are to interface different applications and their underlying query languages on the one hand, and to support users’ vocabulary and to be easily configured for new application domains on the other hand. This thesis outlines an end-to-end Q&A framework for corporate use-cases that can be configured in different settings. In traditional BI systems, user-preferences are usually not taken into account, nor are their specific contextual situations. State-of-the art systems in this field, Soda and Safe do not compute search results on the basis of users’ situation. This thesis introduces a more personalized approach, which better speaks to end-users’ situations. Our main experimentation, in this case, works as a search interface, which displays search results on a dashboard that usually takes the form of charts, fact tables, and thumbnails of unstructured documents. Depending on users’ initial queries, recommendations for alternatives are also displayed, so as to reduce response time of the overall system. This process is often seen as a kind of prediction model. Our work contributes to the following: first, an architecture, implemented with parallel algorithms, that leverages different data sources, namely structured and unstructured document repositories through an extensible Q&A framework, and this framework can be easily configured for distinct corporate settings; secondly, a constraint-matching-based translation approach, which replaces a pivot language with a conceptual model and leads to more personalized multidimensional queries; thirdly, a set of NL patterns for translating BI questions in structured queries that can be easily configured in specific settings. In addition, we have implemented an iPhone/iPad™ application and an HTML front-end that demonstrate the feasibility of the various approaches developed through a series of evaluation metrics for the core component and scenario of the Q&A framework. To this end, we elaborate on a range of gold-standard queries that can be used as a basis for evaluating retrieval systems in this area, and show that our system behave similarly as the well-known WolframAlpha™ system, depending on the evaluation settings
Megdiche, Bousarsar Imen. "Intégration holistique et entreposage automatique des données ouvertes." Thesis, Toulouse 3, 2015. http://www.theses.fr/2015TOU30214/document.
Full textStatistical Open Data present useful information to feed up a decision-making system. Their integration and storage within these systems is achieved through ETL processes. It is necessary to automate these processes in order to facilitate their accessibility to non-experts. These processes have also need to face out the problems of lack of schemes and structural and sematic heterogeneity, which characterize the Open Data. To meet these issues, we propose a new ETL approach based on graphs. For the extraction, we propose automatic activities performing detection and annotations based on a model of a table. For the transformation, we propose a linear program fulfilling holistic integration of several graphs. This model supplies an optimal and a unique solution. For the loading, we propose a progressive process for the definition of the multidimensional schema and the augmentation of the integrated graph. Finally, we present a prototype and the experimental evaluations
Hachicha, Marouane. "Modélisation de hiérarchies complexes dans les entrepôts de données XML et traitement des problèmes d'additivité dans l'analyse en ligne XOLAP." Thesis, Lyon 2, 2012. http://www.theses.fr/2012LYO22016/document.
Full textSince its inception in 1998, the eXtensible Markup Language (XML) has emerged as a standard for data representation and exchange over the Internet. XML provides an opportunity for modeling data structures that are not easily represented in relational systems. In this context, XML data warehouses nowadays form the basis of several decision-support applications exploiting heterogeneous data (little structured and coming from various sources) bearing complex structures, such as complex hierarchies. In this thesis, we propose a novel XOLAP (XML-OLAP) approach that automatically detects and processes summarizability issues at query time, without requiring any particular expertise from the user. Thus, at the logical level, we choose XML data trees, so-called multidimensional data trees, to model the multidimensional structures (facts, dimensions, measures and complex hierarchies) of XML data warehouses. In order to query multidimensional data trees, we model user queries as XML pattern trees. Then, we introduce a new aggregation algorithm to address summarizability issues in complex hierarchies. On the basis of this algorithm, we propose a novel XOLAP roll-up operator. Finally, we experimentally validate our proposal and compare our approach with the reference approach for addressing summarizability issues in complex hierarchies. For this sake, we extend the XML warehouse benchmark XWeB with complex hierarchies to generate XML data warehouses with scalable complex hierarchies. The results of our experiments show that the overhead induced by managing hierarchy complexity at run-time is totally acceptable and that our approach is expected to scale up well
Attasena, Varunya. "Secret sharing approaches for secure data warehousing and on-line analysis in the cloud." Thesis, Lyon 2, 2015. http://www.theses.fr/2015LYO22014/document.
Full textCloud business intelligence is an increasingly popular solution to deliver decision support capabilities via elastic, pay-per-use resources. However, data security issues are one of the top concerns when dealing with sensitive data. Many security issues are raised by data storage in a public cloud, including data privacy, data availability, data integrity, data backup and recovery, and data transfer safety. Moreover, security risks may come from both cloud service providers and intruders, while cloud data warehouses should be both highly protected and effectively refreshed and analyzed through on-line analysis processing. Hence, users seek secure data warehouses at the lowest possible storage and access costs within the pay-as-you-go paradigm.In this thesis, we propose two novel approaches for securing cloud data warehouses by base-p verifiable secret sharing (bpVSS) and flexible verifiable secret sharing (fVSS), respectively. Secret sharing encrypts and distributes data over several cloud service providers, thus enforcing data privacy and availability. bpVSS and fVSS address five shortcomings in existing secret sharing-based approaches. First, they allow on-line analysis processing. Second, they enforce data integrity with the help of both inner and outer signatures. Third, they help users minimize the cost of cloud warehousing by limiting global share volume. Moreover, fVSS balances the load among service providers with respect to their pricing policies. Fourth, fVSS improves secret sharing security by imposing a new constraint: no cloud service provide group can hold enough shares to reconstruct or break the secret. Five, fVSS allows refreshing the data warehouse even when some service providers fail. To evaluate bpVSS' and fVSS' efficiency, we theoretically study the factors that impact our approaches with respect to security, complexity and monetary cost in the pay-as-you-go paradigm. Moreover, we also validate the relevance of our approaches experimentally with the Star Schema Benchmark and demonstrate its superiority to related, existing methods
Ozturk, Aybuke. "Design, Implementation and Analysis of a Description Model for Complex Archaeological Objects." Thesis, Lyon, 2018. http://www.theses.fr/2018LYSE2048/document.
Full textCeramics are one of the most important archaeological materials to help in the reconstruction of past civilizations. Information about complex ceramic objects is composed of textual, numerical and multimedia data, which induce several research challenges addressed in this thesis. From a technical perspective, ceramic databases have different file formats, access protocols and query languages. From a data perspective, ceramic data are heterogeneous and experts have differentways of representing and storing data. There is no standardized content and terminology, especially in terms of description of ceramics. Moreover, data navigation and observation are difficult. Data integration is also difficult due to the presence of various dimensions from distant databases, which describe the same categories of objects in different ways.Therefore, the research project presented in this thesis aims to provide archaeologists and archaeological scientists with tools for enriching their knowledge by combining different information on ceramics. We divide our work into two complementary parts: (1) Modeling of Complex Archaeological Data and (2) Clustering Analysis of Complex Archaeological Data. The first part of this thesis is dedicated to the design of a complex archaeological database model for the storage of ceramic data. This database is also used to source a data warehouse for doing online analytical processing (OLAP). The second part of the thesis is dedicated to an in-depth clustering (categorization) analysis of ceramic objects. To do this, we propose a fuzzy approach, where ceramic objects may belong to more than one cluster (category). Such a fuzzy approach is well suited for collaborating with experts, by opening new discussions based on clustering results.We contribute to fuzzy clustering in three sub-tasks: (i) a novel fuzzy clustering initialization method that keeps the fuzzy approach linear; (ii) an innovative quality index that allows finding the optimal number of clusters; and (iii) the Multiple Clustering Analysis approach that builds smart links between visual, textual and numerical data, which assists in combining all types ofceramic information. Moreover, the methods we propose could also be adapted to other application domains such as economy or medicine
Ayadi, Abdessalem. "Vers une organisation globale durable de l’approvisionnement des ménages : bilans économiques et environnementaux de différentes chaînes de distribution classiques et émergentes depuis l’entrepôt du fournisseur jusqu’au domicile du ménage." Thesis, Lyon 2, 2014. http://www.theses.fr/2014LYO22010/document.
Full textUrban logistics and the last mile in particular, is a major concern for cities today. To address this concern, we have established in the introductory chapter a history of the problem of urban logistics. This allows a better understanding of its development over the years, and deducing that it’s essential to study the supply chain in its entirety to better solve the problem of urban logistics. However, we were faced with a daunting task: the lack of comprehensive and reliable data. In addition, there has been a multiplication of distribution channels in recent years. This includes the delivery from warehouses to stores and further to households from the retail space.Therefore, we intended to identify all existing and emerging logistics organizations in France and beyond (one year exchange stay in England and Switzerland for research purposes). To do this, we established in the second chapter certain parameters that differentiate the logistics modes of various organizations upstream (from manufacturers to retail stores) and downstream (from retail stores to households). Unfortunately, there does not exist any economic and environmental assessment to settle between different forms of traditional and modern electronic distribution, by taking into account the various characteristics of different products families (non-food, dry, fresh, frozen) and the diversity of their delivery modes.Faced with constraints of such size, we conducted surveys with different actors of distribution channels, which provided the opportunity to make contacts, thus collect firsthand and so far unpublished technical and economic data. In addition to the resolution of empirical inadequacy in the third chapter, this research also helped to develop a methodological approach related to the reconstruction and evaluation of logistics costs and emissions (in warehouses, transit platforms, retail stores and shared platforms) and also the costs and emissions of vehicles (trucks, delivery van, cars, public transport, bikes, motorbikes and walking).Finally, this research has lead to the construction of a database and the development of a decision support tool to infer, in the fourth chapter, the economic and environmental appraisal of the entire supply chain from the supplier's warehouse to the final customer. This tool can be useful for public policy, future strategies of retailers and Third-Party Logistics providers to focus on efficient and sustainable modes of organization, and even it will benefit the customer to estimate the costs and emissions of its acts of purchase in classic and e-grocery shopping
Chen, Hui-Shu, and 陳慧書. "Multidimensional Document Warehouse with Knowledge Management Application." Thesis, 2002. http://ndltd.ncl.edu.tw/handle/06718942359200395115.
Full text國立臺灣大學
資訊管理研究所
90
The growth of World-Wide Web has brought in an improved work environment for many people but introduced the “information overloading” problem to them at the same time. Search engines help one find what s/he wants but the keyword-based search often results in too much information to be managed. Numerous research approaches have been reported in the literature, covering such important issues as efficient search algorithms and improved search results. In this paper, we explore the idea of supporting multi-perspective view of document collections based on the concept of data warehousing as a complementary approach to search engines. First, we build up a document warehouse system to provide multi-dimensional processing capability. Unlike the data warehouse for quantitative data, the document warehouse is meant to deal with document collections such as news group articles, e-mails and so on. Based on the one-dimensional searching structure of search engines and the capability of data processing in the data warehouse, we can provide aggregated multi-dimensional view of one document set. Second, we implement a set of set operators including roll-up, drill-down, slice-and-dice, which allow users to view document information at different levels of details. From the different perspective of a document set, users can get lots of implicit information that is not provided in the search engine result. The document warehouse system also plays the role of decision-making support for organizations.
Chun-Feng, Hung, and 洪春鳳. "A MetaModel-based XML Document Warehouse Architecture." Thesis, 2002. http://ndltd.ncl.edu.tw/handle/37283445123556763763.
Full text大同大學
資訊工程研究所
90
In the last decade of the 20th century, because of the popularity of Internet, the trend is towards e-solutions for businesses. Not only apply on the electronic commerce but also on the information exchange to decrease time from material in the manufacturer to products brought by customers. However, the problem we confront today is that there are full of e-documents in businesses. This paper provides an overview of the technologies and design issues that we have explored to meet the needs of enterprise information integration infrastructure. We propose a modeling metadata to highlight the intelligent document warehousing management to enable the enterprises to have overall document management.
Wu, Shu-Fu, and 吳書福. "Applying Text Classification Techniques in Multidimensional Document Warehouse System." Thesis, 2005. http://ndltd.ncl.edu.tw/handle/50720588439124337816.
Full text國立臺灣大學
資訊管理學研究所
93
The development and growth of information technologies have caused a situation called “information overloading”. Therefore, we begin to look for new tools which allow us to create a query in multidimensional perspectives rather then to use traditional keyword-based search engines. Data warehouse systems provide the capabilities of storing and analyzing numerical data but lack the ability to deal with document collections. In order to solve these problems above, we are going to build a whole new system. In this paper, we describe automatic metadata extraction algorithm and build up a document warehouse system. We define 15 kinds of metadata as 15 classes. Using support vector machine, we create 15 classifies to extract metadata from a new document. Sentences in the document with corresponding metadata were saved in xml format. Next, we use star schema to build a multidimensional document warehouse system. Metadata is used to support the process of loading documents into document warehouse. We also provide client side tools such as OLAP, cube browser, MDX query interface. Our Experiments show that support vector machine can achieve high classification performance. We can extract most metadata from a document by SVM classifier. The prototype system built in this paper also shows the fundamental components and processes in a document warehouse system. The OLAP tools and multidimensional query tools provide methods of search and analyze document from multi-points of view of user perspectives
Wei-Lin, Yang. "Developing a Virtual Document Warehouse with Dynamic Hierarchical Clustering Techniques." 2006. http://www.cetd.com.tw/ec/thesisdetail.aspx?etdun=U0001-1007200621032000.
Full textYang, Wei-Lin, and 楊瑋琳. "Developing a Virtual Document Warehouse with Dynamic Hierarchical Clustering Techniques." Thesis, 2006. http://ndltd.ncl.edu.tw/handle/85157078361120147542.
Full text國立臺灣大學
資訊管理學研究所
94
Searching for information based on the keyword-based retrieval by using search engines has limited ability to mine the most important and relevant knowledge. The retrieved search results are disorganized results and lack of dimensions. In the information retrieval (IR) field, text categorization has been investigated for many years to organize search results automatically into corresponding categories, which contains classification and clustering. In this thesis, we propose and describe the Virtual Document Warehouse System, which contains an integrated interface for multi-dimensional analysis for knowledge management and decision-making. The system extracts relevant documents by using search engines and we utilize clustering algorithms to dynamically and automatically organize information retrieved from heterogeneous sources into hierarchical structures, and to combine different concept hierarchies. Finally, we propose an approach that makes searching more convenient and multi-dimensional, and present the application of personalized conceptual knowledge maps.
Lin, Wen-Ping, and 林文平. "A Study on Indexing Structure and Its Properties for Constructing Document Warehouse." Thesis, 2003. http://ndltd.ncl.edu.tw/handle/03257412650051325009.
Full text國立高雄第一科技大學
資訊管理所
91
Data warehousing and data mining techniques are gaining in popularity as organizations realize the benefits of being able to perform multi-dimensional analyses of cumulated historical business data to help contemporary administrative decision-making. However, based on the survey of survey.com, for the business intelligence of an enterprise, there are only about 20% information can be extracted from formatted data stored in relational databases. The remaining 80% information is hidden in unstructured or semi-structured documents. For instances, market survey reports, project status reports, meeting records, customer complain e-mails, patent application sheets, advertisements of competitors are all recorded in documents. Therefore, the next challenge will be the study of topics about document warehousing and text mining to help enterprises on obtaining the complete business intelligence. Since a document is multi-dimensional in nature, traditional indexing methods are not really suitable for a document warehouse. Although a multi-dimensional array can be employed to represent the index of a document warehouse, it usually costs too much as document cubes are usually sparse. That is, if we use a multi-dimensional array to index a document cube, then the space utilization will be poor. In this thesis, based on the concept of R-tree, we propose an index structure called D-tree to fit the requirement of a document cube, and therefore study the related properties of D-Tree to make the indexing process more efficient. We hope such infrastructure can help us to extend our work for combining with text processing technologies to make data warehousing and document warehousing be one of the most important kernel of knowledge management and customer relationship management applications.
"Artefatos da semiotica organizacional na elicitação de requisitos para soluções de data warehouse." Tese, Biblioteca Digital da Unicamp, 2006. http://libdigi.unicamp.br/document/?code=vtls000393671.
Full text