Dissertations / Theses: 'Optimization of SQL queries'

1

Hasan, Waqar. "Optimization of SQL queries for parallel machines /." Berlin [u.a.] : Springer, 1996. http://www.loc.gov/catdir/enhancements/fy0815/96039704-d.html.

Full text

APA, Harvard, Vancouver, ISO, and other styles

2

Muller, Leslie. "'n Ondersoek na en bydraes tot navraaghantering en -optimering deur databasisbestuurstelsels / L. Muller." Thesis, North-West University, 2006. http://hdl.handle.net/10394/1181.

Full text

Abstract:

The problems associated with the effective design and uses of databases are increasing. The information contained in a database is becoming more complex and the size of the data is causing space problems. Technology must continually develop to accommodate this growing need. An inquiry was conducted in order to find effective guidelines that could support queries in general in terms of performance and productivity. Two database management systems were researched to compare die theoretical aspects with the techniques implemented in practice. Microsoft SQL Sewer and MySQL were chosen as the candidates and both were put under close scrutiny. The systems were researched to uncover the methods employed by each to manage queries. The query optimizer forms the basis for each of these systems and manages the parsing and execution of any query. The methods employed by each system for storing data were researched. The way that each system manages table joins, uses indices and chooses optimal execution plans were researched. Adjusted algorithms were introduced for various index processes like B+ trees and hash indexes. Guidelines were compiled that are independent of the database management systems and help to optimize relational databases. Practical implementations of queries were used to acquire and analyse the execution plan for both MySQL and SQL Sewer. This plan along with a few other variables such as execution time is discussed for each system. A model is used for both database management systems in this experiment.
Thesis (M.Sc. (Computer Science))--North-West University, Potchefstroom Campus, 2007.

APA, Harvard, Vancouver, ISO, and other styles

3

Janeček, Jiří. "Optimalizace strukturovaných dotazů nad rozsáhlými databázemi." Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2012. http://www.nusl.cz/ntk/nusl-412868.

Full text

Abstract:

This master's thesis deals with optimization of structured queries on large databases. Principles of these optimizations are used during creation of application, which allows finding over one specific large database. At the same time this thesis compares efficiency between the new designed SQL constructions and the not optimized SQL constructions.

APA, Harvard, Vancouver, ISO, and other styles

4

Ferreira, Mônica Ribeiro Porto. "Suporte a consultas por similaridade unárias em SQL." Universidade de São Paulo, 2008. http://www.teses.usp.br/teses/disponiveis/55/55134/tde-01042008-101843/.

Full text

Abstract:

Os operadores convencionais para comparação de dados por igualdade e por relação de ordem total não são adequados para o gerenciamento de dados complexos como, por exemplo, os dados multimí?dia (imagens, áudio, textos longos), séries temporais e seqüências genéticas. Para comparar dados desses tipos, o grau de similaridade entre suas instâncias é, em geral, o fator mais importante sendo, portanto, indicado que as operações de consulta sejam realizadas utilizando os chamados operadores por similaridade. Existem operadores de busca por similaridade tanto unários quanto binários. Os operadores unários são utilizados para implementar operações de seleção, enquanto os operadores binários destinam-se a operações de junção. A álgebra relacional, usada nos Sistemas de Gerenciamento de Bases de Dados Relacionais, não provê suporte para expressar critérios de busca por similaridade. Para suprir esse suporte, está em desenvolvimento no Grupo de Bases de Dados e Imagens (GBdI-ICMC-USP) uma extensão à álgebra relacional que permite representar as consultas por similaridade em expressões algébricas. Esta dissertação incorpora-se nesse empreendimento, abordando o tratamento aos operadores unários por similaridade na álgebra, bem como a implementação do otimizador de consultas por similaridade no SIREN (Similarity Retrieval Engine) para que as consultas por similaridade possam ser respondidas pelos Sistemas de Gerenciamento de Bases de Dados relacionais
Conventional operators for data comparison based on exact matching and total order relations are not appropriate to manage complex data, such as multimedia data (e.g. images, audio and large texts), time series and genetic sequences. In fact, the most important aspect to compare complex data is usually the similarity degree between instances, leading to the use of similarity operators to perform search and retrieval operations. Similarity operators can be classified as unary or as binary, respectively used to implement selection operations and joins. However, the Relation Algebra, employed in Relational Database Management Systems (DBMS), does not provide resources to express similarity search criteria. In order to fulfill this lack of support, an extension to the Relational Algebra is under development at GBdI-ICMC-USP (Grupo de Bases de Dados e Imagens), aiming to represent similarity queries in algebraic expressions. This work contributes to such an effort by dealing with unary similarity operators in Relational Algebra and by developing a similarity query optimizer for SIREN (Similarity Retrieval Engine), therefore allowing similarity queries to be answered by Relational DBMS

APA, Harvard, Vancouver, ISO, and other styles

5

Mounagurusamy, Purani. "Parsing AQL Queries into SQL Queries using ANTLR." Thesis, Linköpings universitet, Databas och informationsteknik, 2015. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-124151.

Full text

Abstract:

An Electronic Health Record is a collection of each patient’s health information which is stored electronically or in digital format. openEHR is an open standard specification for electronic health record data. openEHR has a method for querying a set of clinical data using the Archetype Query Language (AQL). The EHR data is in XML format and this format is a tree like structure. Since XML databases were considerably slower, AQL needs to be translated to another query language. Researchers have already investigated translating AQL to XQuery and tested the performance. Since the performance was not satisfactory, we now investigate translating AQL to SQL. AQL queries are translated to SQL queries using the ANTLR tool. The translation is implemented in Java language. The AQL queries which are translated into SQL queries are also tested in this thesis work. Finally, the result is to get the corresponding SQL query for any given AQL query.

APA, Harvard, Vancouver, ISO, and other styles

6

Venkatamuniyappa, Vijay Kumar. "Towards automatic grading of SQL queries." Kansas State University, 2018. http://hdl.handle.net/2097/38819.

Full text

Abstract:

Master of Science
Department of Computer Science
Doina Caragea
An Introduction to Databases course involves learning the concepts of data storage, manipulation, and retrieval. Relational databases provide an ideal learning path for understanding database concepts. The Structured Query Language (SQL) is a standard language for interacting with relational database. Each database vendor implements a variation of the SQL standard. Furthermore, a particular question that asks for some data can be written in many ways, using somewhat similar or structurally different SQL queries. Evaluation of SQL queries for correctness involves the verification of the SQL syntax and semantics, as well as verification of the output of queries and the usage of correct clauses. An evaluation tool should be independent of the specific database queried, and of the nature of the queries, and should allow multiple ways of providing input and retrieving the output. In this report, we have developed an evaluation tool for SQL queries, which checks for correctness of MySQL and PostgreSQL queries with the help of a parser that can identify SQL clauses. The tool developed will act as a portal for students to test and improve their queries, and finally to submit the queries for grading. The tool minimizes the manual effort required while grading, by taking advantage of the SQL parser to check queries for correctness, provide feedback, and allow submission.

APA, Harvard, Vancouver, ISO, and other styles

7

Jain, Ritika. "Validation of SQL queries over streaming warehouses." Thesis, University of British Columbia, 2017. http://hdl.handle.net/2429/62867.

Full text

Abstract:

There is often a need to recover the “missing” query that produced a particular output from a data stream. As an example, since a data stream is constantly evolving, the analyst may be curious about using the query from the past to evaluate it on the current state of the data stream, for further analysis. Previous research has studied the problem of reverse engineering a query that would produce a given result at a particular database state. We study the following problem. Given a streaming database D=, a result Rout , and a set of candidate queries Q, efficiently find all queries Qi ϵ Q such that for some state Dji of the stream, Qi(Dji) = Rout , and report the pair (Q,witQ) where witQ is the witness of (in)validity. A witness for a valid query Qval is a state Di s.t. Qval(Di) = Rout. For an invalid query Qinval , a witness is a pair of consecutive states (Di, Di+1) such that Rout \ Qinval (Di) ≠ Ø ≠ Qinval (Di+1) \ Rout. We allow any PTIME computable monotone query to be included in Q. While techniques developed in previous research can be used to generate the candidate query set Q, we focus on developing a scalable strategy for quickly determining the witness. We establish theoretical worst-case performance guarantees for our proposed approach and show that it is no more than a factor of O(log |DRDS|) of the optimal “Lucky guess” strategy, where Q(DRDS) = Rout. We empirically evaluate our technique and compare with natural baselines inspired from previous research. We show that the baselines either fail to scale or incur an inordinate amount of overhead by failing to take advantage of natural properties of a data stream. By contrast, our strategy scales effortlessly for very large data streams. Moreover, it never performs more than a small constant times the optimal amount of work, regardless of the state of the data stream that may have led to Rout.
Science, Faculty of
Computer Science, Department of
Graduate

APA, Harvard, Vancouver, ISO, and other styles

8

Manzi, Eric R. "SQL-ACT : content-based and history-aware input prediction for non-trivial SQL queries." Thesis, Massachusetts Institute of Technology, 2017. http://hdl.handle.net/1721.1/119534.

Full text

Abstract:

Thesis: M. Eng., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2017.
This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.
Cataloged from student-submitted PDF version of thesis.
Includes bibliographical references (pages 79-81).
This thesis presents SqlAct, a SQL auto-completion system that uses content-based and history-aware input prediction to assist in the process of composing non-trivial queries. By offering the most relevant suggestions to complete the partially typed query first at the word-level and then at the statement-level, SqlAct hopes to help both novice and expert SQL developers to increase their productivity. Two approaches are explored: word-level suggestions are optimized based on the database's schema and content statistics, and statement-level suggestions that rely on Long Short-term Memory (LSTM) Recurrent Neural Networks language models trained on historical queries. The word-level model is integrated in a responsive command-line interface database client which is evaluated quantitatively and qualitatively. Results shows SqlAct provides a highly-responsive interface that makes high quality suggestions to complete the currently typed query. Possible directions for integration with the word-level model in the command-line tool are explored as well as the planned evaluation techniques.
by Eric R. Manzi.
M. Eng.

APA, Harvard, Vancouver, ISO, and other styles

9

Escalante, Osuna Carlos. "Estimating the cost of GraphLog queries." Thesis, National Library of Canada = Bibliothèque nationale du Canada, 1997. http://www.collectionscanada.ca/obj/s4/f2/dsk2/tape16/PQDD_0002/NQ32743.pdf.

Full text

APA, Harvard, Vancouver, ISO, and other styles

10

Trigoni, Agathoniki. "Semantic optimization of OQL queries." Thesis, University of Cambridge, 2002. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.620163.

Full text

APA, Harvard, Vancouver, ISO, and other styles

11

Gureev, Nikita. "Hive, Spark, Presto for Interactive Queries on Big Data." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-234927.

Full text

Abstract:

Traditional relational database systems can not be efficiently used to analyze data with large volume and different formats, i.e. big data. Apache Hadoop is one of the first open-source tools that provides a distributed data storage system and resource manager. The space of big data processing has been growing fast over the past years and many technologies have been introduced in the big data ecosystem to address the problem of processing large volumes of data, and some of the early tools have become widely adopted, with Apache Hive being one of them. However,with the recent advances in technology, there are other tools better suited for interactive analytics of big data, such as Apache Spark and Presto. In this thesis these technologies are examined and benchmarked in order to determine their performance for the task of interactive business intelligence queries. The benchmark is representative of interactive business intelligence queries, and uses a star-shaped schema. The performance HiveTez, Hive LLAP, Spark SQL, and Presto is examined with text, ORC, Parquet data on different volume and concurrency. A short analysis and conclusions are presented with the reasoning about the choice of framework and data format for a system that would run interactive queries on bigdata.
Traditionella relationella databassystem kan inte användas effektivt för att analysera stora datavolymer och filformat, såsom big data. Apache Hadoop är en av de första open-source verktyg som tillhandahåller ett distribuerat datalagring och resurshanteringssystem. Området för big data processing har växt fort de senaste åren och många teknologier har introducerats inom ekosystemet för big data för att hantera problemet med processering av stora datavolymer, och vissa tidiga verktyg har blivit vanligt förekommande, där Apache Hive är en av de. Med nya framsteg inom området finns det nu bättre verktyg som är bättre anpassade för interaktiva analyser av big data, som till exempel Apache Spark och Presto. I denna uppsats är dessa teknologier analyserade med benchmarks för att fastställa deras prestanda för uppgiften av interaktiva business intelligence queries. Dessa benchmarks är representative för interaktiva business intelligence queries och använder stjärnformade scheman. Prestandan är undersökt för Hive Tex, Hive LLAP, Spark SQL och Presto med text, ORC Parquet data för olika volymer och parallelism. En kort analys och sammanfattning är presenterad med ett resonemang om valet av framework och dataformat för ett system som exekverar interaktiva queries på big data.

APA, Harvard, Vancouver, ISO, and other styles

12

Murray, Paul Timothy. "Semantic correctness in the specification, translation, and parallel implementation of SQL queries." Thesis, University of Sheffield, 1994. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.555319.

Full text

APA, Harvard, Vancouver, ISO, and other styles

13

Makiyama, Vitor Hirota. "Text mining applied to SQL queries: a case study for SDSS SkyServer." Instituto Nacional de Pesquisas Espaciais (INPE), 2015. http://urlib.net/sid.inpe.br/mtc-m21b/2015/08.31.17.43.

Full text

Abstract:

SkyServer, the Internet portal for the Sloan Digital Sky Survey (SDSS) catalog, provides a set of tools that allows data access for astronomers and scientific education. One of the available interfaces allows users to enter ad-hoc SQL statements to query the catalog, and has logged over 280 million queries since 2001. To assess and investigate usage behavior, log analyses were performed after the 5$^{th}$ and 10$^{th}$ year of the portal being in production. Such analyses, however, focused on the HTTP access, and just simple information for the database usage. This work aims to apply text mining techniques over the SQL logs to define a methodology to parse, clean and tokenize statements into an intermediate numerical representation for data mining and knowledge discovery, which can provide deeper analysis over SQL usage, and also has a number of foreseen applications in database optimization and improving user experience.
SkyServer, o portal de Internet para o catálogo \emph{Sloan Digital Sky Survey} (SDSS), fornece um conjunto de ferramentas que permitem acesso a dados para astrônomos e para educação científica. Uma das interfaces disponíveis permite a inserção de instruções SQL ad-hoc para consultar o catálogo, e já recebeu mais de 280 milhões de consultas desde 2001. Para avaliar e investigar o comportamento de uso, análises de log foram realizadas após o 5$^{o}$ e 10$^{o}$ ano de vida do portal. Tais análises, no entanto, focaram no acesso HTTP, e apenas informações básicas de utlização do banco de dados. Este trabalho tem por objetivo aplicar técnicas de mineração de texto sobre os logs SQL com o intuito de definir uma metodologia para analisar, limpar e dividir em símbolos tais declarações em uma representação numérica intermediária para posterior mineração de dados e extração de conhecimento; possibilitando análises mais profundas sobre o uso de SQL, e também aplicações previstas em otimização de banco de dados e para melhora de experiência de usuário.

APA, Harvard, Vancouver, ISO, and other styles

14

Höggren, Carl, and Carl Johan Widman. "Txt2SQL : SQL-queries from Natural Language Questions and its Practical Business Applications." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-281533.

Full text

Abstract:

The area of research regarding how to translate Natural Language Questions into SQL-queries has evolved in the last years with developments in Machine Learning algorithms as well as with the emergence of new datasets. This report aims to give insights into the complexities of the task and attempts to evaluate the possible areas of use in a practical context of modern-day corporations. The implementation suggested in this report gave inferior results to the state-of-the-art but illuminated several aspects that, together with a qualitative study, showed that the implementation and usefulness in a practical context are not obvious and require further research.
Studier kring hur sökfrågor kan översättas till SQL-frågor har utvecklats de senaste åren som konsekvens av förbättrade metoder inom maskininlärning och uppkomsten av nya dataset. Denna rapport har för avsikt att belysa komplexiteten hos problemställningen och evaluera hur en sådan lösning kan användas av ett företag i en praktisk kontext. Den metod som presenteras i rapporten presterar sämre än de bästa metoderna men påvisade flera aspekter som, tillsammans med en kvalitativ studie, visade att implementationen och användbarheten sett från en praktisk kontext inte är självklara och kräver vidare studier.

APA, Harvard, Vancouver, ISO, and other styles

15

Fomkin, Ruslan. "Optimization and Execution of Complex Scientific Queries." Doctoral thesis, Uppsala : Acta Universitatis Upsaliensis, 2009. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-9514.

Full text

APA, Harvard, Vancouver, ISO, and other styles

16

Galpin, Ixent. "Quality of service aware optimization of sensor network queries." Thesis, University of Manchester, 2010. http://www.manchester.ac.uk/escholar/uk-ac-man-scw:136326.

Full text

Abstract:

Sensor networks comprise resource-constrained wireless nodes with the capability of gathering information about their surroundings and have recently risen to prominence with the promise of being an effective computing platform for diverse applications, ranging from event detection to environmental monitoring. The database community proposed the use of sensor network query processors (SNQPs) as means to meet data collection requirements using a declarative query language. Declarative queries posed against a sensor network constitute an effective means to repurpose sensor networks and reduce the high software development costs associated with them. The range of sensor network applications is very broad. Such applications have diverse, and often conflicting, QoS expectations in terms of the delivery time of results, the acquisition interval at which data is collected, the total energy consumption of the deployment, or the network lifetime. The conflicting nature of these desiderata is aggravated by the resource-constrained nature of sensor networks as a computing fabric, making it particularly challenging to reconcile the trade-offs that arise. Previously, SNQPs have been focussed on evaluating queries as energy-efficiently as possible. There has been comparatively less work on attempting to meet a broad range of optimization goals and constraints that captured these QoS expectations. In this respect, previous work in SNQP has not aimed at being general purpose across the breadth of applications to which sensor networks have been applied. This PhD dissertation presents an approach for enabling QoS-awareness in SNQPs so that query evaluation plans are generated that exhibit good performance for a broader range of sensor network applications in terms of their QoS expectations. The research contributions reported here include (a) a functional decomposition of the decision-making steps required to compile a declarative query into a query evaluation plan in a sensor network setting; (b) algorithms to implement these decision-making steps; and (c) an empirical evaluation to show the benefits of QoS-awareness compared to a representative fixed-goal SNQP.

APA, Harvard, Vancouver, ISO, and other styles

17

GUARINO, RODRIGO SILVA. "EXPERIMENTAL STUDY OF CONJUNCTIVE QUERIES OPTIMIZATION WITH EXPENSIVE PREDICATES." PONTIFÍCIA UNIVERSIDADE CATÓLICA DO RIO DE JANEIRO, 2004. http://www.maxwell.vrac.puc-rio.br/Busca_etds.php?strSecao=resultado&nrSeq=5170@1.

Full text

Abstract:

COORDENAÇÃO DE APERFEIÇOAMENTO DO PESSOAL DE ENSINO SUPERIOR
As técnicas tradicionais de otimização de consultas em banco de dados possuem como heurística fundamental a organização dos predicados de uma consulta em dois tipos principais: predicados simples e predicados envolvendo junção(join) de tabelas. Como príncipio geral considera-se a priori os predicados envolvendo junção bem mais caros do que os predicados simples, e também que não existam diferenças significativas entre os tempos de processamento dos predicados simples, o que leva o otimizador a executar primeiro os predicados simples(em uma ordem qualquer), a fim de se diminuir a quantidade de tuplas que seriam necessárias à execução da junção. Essa consideração que se aplica bem à maioria das aplicações convencionais de banco de dados, passou a não se aplicar mais à novas aplicações que envolviam o preprocessamento de dados e/ou funções complexas nos predicados que não envolviam junções. Dessa forma esses novos predicados simples passaram a ter um tempo de processamento não mais desprezível em relação aos predicados que envolviam junções e também em relação a outros predicados simples. Dessa forma a heurística principal de otimização não se aplicava mais e tornou-se necessário o desenvolvimento de novas técnicas para resolver consultas que envolvessem esse novo tipo de predicado, que passou a ser chamado de predicado caro. O presente trabalho tem dois objetivos principais: apresentar um framework que possibilite o desenvolvimento, teste e análise integrada de algoritmos para o processamento de predicados caros, e analisar o desempenho de quatro implementações de algoritmos baseados na abordagem Cherry Picking, cujo o objetivo é explorar a dependência entre os dados que compõem as consultas. Os experimentos são conduzidos em consultas envolvendo predicados conjuntivos (AND) e a idéia geral é tentar avaliar os atributos em uma ordem que minimize o custo de avaliação geral das tuplas.
Traditional database query optimization technique have as its main heuristic the organization of predicates in two main types: selection predicates and join predicates. Join predicates are considered much more expensive than selection predicates. In additional, it's also considered that there's no big difference among the costs of different selection predicates, what makes the optimizer executes them first in any order, reducing the number of tuples necessary to execute join predicates.This assumption, that is well applied in traditional database applications, becomes invalid in respect of recent database applications, that executes complex functions over complex data in selection predicates. In this cases, selection predicates are considered more expensive than join predicates and their costs cannot be considered equivalent anymore. This makes the main heuristic of push down selections invalid for these kind of new selection predicates which calls for new optimization techniques. These type of cue named expensive predicates. This work has two main objectives: Present a software that makes possible the development, test and integrat analisys of different algorithms for evaluating expensive predicates and analyse the performance of four algorithm's implementations that are based on Cherry Picking strategy, which aims at exploring the data dependency between input values to expensive predicates. The experiments considered conjunctive(AND) queries, and the general idea is to try evaluate the attributes in a order that minimizes the general cost of the tuples.

APA, Harvard, Vancouver, ISO, and other styles

18

Cheng, Sijin. "Relevance feedback-based optimization of search queries for Patents." Thesis, Linköpings universitet, Interaktiva och kognitiva system, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-154173.

Full text

Abstract:

In this project, we design a search query optimization system based on the user’s relevance feedback by generating customized query strings for existing patent alerts. Firstly, the Rocchio algorithm is used to generate a search string by analyzing the characteristics of related patents and unrelated patents. Then the collaborative filtering recommendation algorithm is used to rank the query results, which considering the previous relevance feedback and patent features, instead of only considering the similarity between query and patents as the traditional method. In order to further explore the performance of the optimization system, we design and conduct a series of evaluation experiments regarding TF-IDF as a baseline method. Experiments show that, with the use of generated search strings, the proportion of unrelated patents in search results is significantly reduced over time. In 4 months, the precision of the retrieved results is optimized from 53.5% to 72%. What’s more, the rank performance of the method we proposed is better than the baseline method. In terms of precision, top10 of recommendation algorithm is about 5 percentage points higher than the baseline method, and top20 is about 7.5% higher. It can be concluded that the approach we proposed can effectively optimize patent search results by learning relevance feedback.

APA, Harvard, Vancouver, ISO, and other styles

19

Wei, Mingrui. "Multi-Mode Stream Processing For Hopping Window Queries." Digital WPI, 2008. https://digitalcommons.wpi.edu/etd-theses/769.

Full text

Abstract:

Window constraints are mechanisms to bound the tuples processed by continuous queries specified over unbounded data streams. While sliding window queries move the constraint window upon the arrival of each individual tuple, hopping window queries instead move the window by a fixed amount after some period, thus periodically refreshing their results. We observe that for large hops, techniques liked delta result updating may not be efficient -- as large portions of the tuples in the current window will be different from the previous window and thus must be maintained. On the other hand, the complete result updating technique, which has been found to be less suitable for sliding windows queries. Compute the next result based on the complete current window now can be shown to be superior in performance for some hopping windows queries. A trade-off emerges between the complete result method which has a lower per tuple processes cost but potentially processing redundant results versus the delta result method which has no redundant processing but pays a higher per tuple processing cost. On top of that, strict non-monotonic operators such as difference operator, cause premature expiration due to operator semantics. Negative tuples are needed for this kind of special expiration. Such negative tuples added extra burden to the stream engine. Thus, in streaming processing, the difference operator is typically suggested to be placed on top of the query plan despite its potential ability to reduce cardinality of the stream. With this thesis, we introduce a whole solution for hopping window query processing which includes an optimizer for generalized hopping window query optimization that exploits both processing techniques within one integrated query plan alone with query plan rewriting. First, we design the query operators to be multi-mode, that is, to be able to take either a delta or a complete result as input, and produce either a delta result or complete result as output. Then we design a cost model to be able to chose the optimal mode for each operator. Thirdly, our optimizer targets to configure each operator within a query plan to work in the suitable mode to achieve minimum overall processing costs. Last but not least, two query optimization techniques have been adopted. One explores all possibilities of pushing the difference down past joins using dynamic programming and assigning optimal mode at the same time, the other applies heuristic difference push down rule. The proposed techniques has been implemented within the WPI stream query engine, called CAPE. Finally, we show the benefit of our solution with a vast number of experimental results.

APA, Harvard, Vancouver, ISO, and other styles

20

Grade, Nuno Daniel Gouveia de Sousa. "Data queries over heterogeneous sources." Master's thesis, Faculdade de Ciências e Tecnologia, 2013. http://hdl.handle.net/10362/10053.

Full text

Abstract:

Dissertação para obtenção do Grau de Mestre em Engenharia Informática
Enterprises typically have their data spread over many software systems, such as custom made applications, CRM systems like SalesForce, CMS systems, or ERP systems like SAP. In these setting, it is often desired to integrate information from many data sources to accomplish some business goal in an application. Data may be stored locally or in the cloud in a wide variety of ways, demanding for explicit transformation processes to be defined, reason why it is hard for developers to integrate it. Moreover, the amount of external data can be large and the difference of efficiency between a smart and a naive way of retrieving and filtering data from different locations can be great. Hence, it is clear that developers would benefit greatly from language abstractions to help them build queries over heterogeneous data sources and from an optimization process that avoids large and unnecessary data transfers during the execution of queries. This project was developed at OutSystems and aims at extending a real product, which makes it even more challenging. We followed a generic approach that can be implemented in any framework, not only focused on the product of OutSystems.

APA, Harvard, Vancouver, ISO, and other styles

21

Ritsch, Roland. "Optimization and evaluation of array queries in database management systems." [S.l. : s.n.], 1999. http://deposit.ddb.de/cgi-bin/dokserv?idn=959772502.

Full text

APA, Harvard, Vancouver, ISO, and other styles

22

Zhu, Yali. "Dynamic optimization and migration of continuous queries over data streams." Link to electronic dissertation, 2006. http://www.wpi.edu/Pubs/ETD/Available/etd-082306-133807/.

Full text

Abstract:

Dissertation (Ph.D.)--Worcester Polytechnic Institute.
Keywords: Query optimization, data streams, runtime query adaptations, continuous queries, plan migration, distributed query processing, window constraints. Includes bibliographical references (p. 313 - 319 ).

APA, Harvard, Vancouver, ISO, and other styles

23

Yuasa, Mashiho. "Effect of feedback and prompts on initial learning and transfer in learning to write SQL database queries." Thesis, Georgia Institute of Technology, 1990. http://hdl.handle.net/1853/29883.

Full text

APA, Harvard, Vancouver, ISO, and other styles

24

Andrejev, Andrej. "Semantic Web Queries over Scientific Data." Doctoral thesis, Uppsala universitet, Datalogi, 2016. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-274856.

Full text

Abstract:

Semantic Web and Linked Open Data provide a potential platform for interoperability of scientific data, offering a flexible model for providing machine-readable and queryable metadata. However, RDF and SPARQL gained limited adoption within the scientific community, mainly due to the lack of support for managing massive numeric data, along with certain other important features – such as extensibility with user-defined functions, query modularity, and integration with existing environments and workflows. We present the design, implementation and evaluation of Scientific SPARQL – a language for querying data and metadata combined, represented using the RDF graph model extended with numeric multidimensional arrays as node values – RDF with Arrays. The techniques used to store RDF with Arrays in a scalable way and process Scientific SPARQL queries and updates are implemented in our prototype software – Scientific SPARQL Database Manager, SSDM, and its integrations with data storage systems and computational frameworks. This includes scalable storage solutions for numeric multidimensional arrays and an efficient implementation of array operations. The arrays can be physically stored in a variety of external storage systems, including files, relational databases, and specialized array data stores, using our Array Storage Extensibility Interface. Whenever possible SSDM accumulates array operations and accesses array contents in a lazy fashion. In scientific applications numeric computations are often used for filtering or post-processing the retrieved data, which can be expressed in a functional way. Scientific SPARQL allows expressing common query sub-tasks with functions defined as parameterized queries. This becomes especially useful along with functional language abstractions such as lexical closures and second-order functions, e.g. array mappers. Existing computational libraries can be interfaced and invoked from Scientific SPARQL queries as foreign functions. Cost estimates and alternative evaluation directions may be specified, aiding the construction of better execution plans. Costly array processing, e.g. filtering and aggregation, is thus preformed on the server, saving the amount of communication. Furthermore, common supported operations are delegated to the array storage back-ends, according to their capabilities. Both expressivity and performance of Scientific SPARQL are evaluated on a real-world example, and further performance tests are run using our mini-benchmark for array queries.

APA, Harvard, Vancouver, ISO, and other styles

25

Trissl, Silke. "Cost-based optimization of graph queries in relational database management systems." Doctoral thesis, Humboldt-Universität zu Berlin, Mathematisch-Naturwissenschaftliche Fakultät II, 2012. http://dx.doi.org/10.18452/16544.

Full text

Abstract:

Graphen sind in vielen Bereichen des Lebens zu finden, wobei wir speziell an Graphen in der Biologie interessiert sind. Knoten in solchen Graphen sind chemische Komponenten, Enzyme, Reaktionen oder Interaktionen, die durch Kanten miteinander verbunden sind. Eine effiziente Ausführung von Graphanfragen ist eine Herausforderung. In dieser Arbeit präsentieren wir GRIcano, ein System, das die effiziente Ausführung von Graphanfragen erlaubt. Wir nehmen an, dass Graphen in relationalen Datenbankmanagementsystemen (RDBMS) gespeichert sind. Als Graphanfragesprache schlagen wir eine erweiterte Version der Pathway Query Language (PQL) vor. Der Hauptbestandteil von GRIcano ist ein kostenbasierter Anfrageoptimierer. Diese Arbeit enthält Beiträge zu allen drei benötigten Komponenten des Optimierers, der relationalen Algebra, Implementierungen und Kostenmodellen. Die Operatoren der relationalen Algebra sind nicht ausreichend, um Graphanfragen auszudrücken. Daher stellen wir zuerst neue Operatoren vor. Wir schlagen den Erreichbarkeits-, Distanz-, Pfadlängen- und Pfadoperator vor. Zusätzlich geben wir Regeln für die Umformung von Ausdrücken an. Des Weiteren präsentieren wir Implementierungen für jeden vorgeschlagenen Operator. Der Hauptbeitrag ist GRIPP, eine Indexstruktur, die die effiziente Ausführung von Erreichbarkeitsanfragen auf sehr großen Graphen erlaubt. Wir zeigen, wie GRIPP und die rekursive Anfragestrategie genutzt werden können, um Implementierungen für alle Operatoren bereitzustellen. Die dritte Komponente von GRIcano ist das Kostenmodell, das Kardinalitätsabschätzungen der Operatoren und Kostenfunktionen für die Implementierungen benötigt. Basierend auf umfangreichen Experimenten schlagen wir in dieser Arbeit Funktionen dafür vor. Der neue Ansatz unserer Kostenmodelle ist, dass die Funktionen nur Kennzahlen der Graphen verwenden. Abschließend zeigen wir die Wirkungsweise von GRIcano durch Beispielanfragen auf echten biologischen Graphen.
Graphs occur in many areas of life. We are interested in graphs in biology, where nodes are chemical compounds, enzymes, reactions, or interactions that are connected by edges. Efficiently querying these graphs is a challenging task. In this thesis we present GRIcano, a system that efficiently executes graph queries. For GRIcano we assume that graphs are stored and queried using relational database management systems (RDBMS). We propose an extended version of the Pathway Query Language PQL to express graph queries. The core of GRIcano is a cost-based query optimizer. This thesis makes contributions to all three required components of the optimizer, the relational algebra, implementations, and cost model. Relational algebra operators alone are not sufficient to express graph queries. Thus, we first present new operators to rewrite PQL queries to algebra expressions. We propose the reachability, distance, path length, and path operator. In addition, we provide rewrite rules for the newly proposed operators in combination with standard relational algebra operators. Secondly, we present implementations for each proposed operator. The main contribution is GRIPP, an index structure that allows us to answer reachability queries on very large graphs. GRIPP has advantages over other existing index structures, which we review in this work. In addition, we show how to employ GRIPP and the recursive query strategy as implementation for all four proposed operators. The third component of GRIcano is the cost model, which requires cardinality estimates for operators and cost functions for implementations. Based on extensive experimental evaluation of our proposed algorithms we present functions to estimate the cardinality of operators and the cost of executing a query. The novelty of our approach is that these functions only use key figures of the graph. We finally present the effectiveness of GRIcano using exemplary graph queries on real biological networks.

APA, Harvard, Vancouver, ISO, and other styles

26

Valerián, Martin. "Optimalizace SQL kódu v oblasti reportingu bankovního informačního systemu." Master's thesis, Vysoká škola ekonomická v Praze, 2011. http://www.nusl.cz/ntk/nusl-165099.

Full text

Abstract:

The thesis analyses tools, which offers Oracle Database 11g R2 for tuning SQL statements. Tools are then used in reporting section of real banking system to optimize SQL statements. Then are created test cases and results are analyzed with categorization to prove successful optimization. In the end are evaluated benefits of the optimization and created advices for next progress in optimization.

APA, Harvard, Vancouver, ISO, and other styles

27

Gadiraju, Krishna Karthik. "Benchmarking Performance for Migrating a Relational Application to a Parallel Implementation." University of Cincinnati / OhioLINK, 2014. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1409065914.

Full text

APA, Harvard, Vancouver, ISO, and other styles

28

Ibragimov, Dilshod. "Optimizing Analytical Queries over Semantic Web Sources." Doctoral thesis, Universite Libre de Bruxelles, 2017. https://dipot.ulb.ac.be/dspace/bitstream/2013/282819/5/contratDI.pdf.

Full text

Abstract:

Les données ont toujours été un atout clé pour beaucoup d’industries et d’entreprises ;cependant, ces derniers temps les possesseurs de données jouissent d’un véritable avantage compétitif sur les autres. De nos jours, les compagnies collectent de gros volumes de données et les stockent dans de grandes bases de données multidimensionnelles appelées entrepôts de données. Un entrepôt de données présente les données agrégées sous la forme d’un cube dont les cellules contiennent des faits et des informations contextuelles telles que des dates, des lieux, des informations sur les clients et fournisseurs, etc. Les solutions d’entreposage de données utilisent avec succès OLAP (Traitement Analytique En Ligne – en anglais Online Analytical Processing) afin d’analyser ces grands ensembles de données ;par exemple, les informations des ventes peuvent être agrégées selon le lieu et/ou la dimension temporelle. Les tendances récentes des technologies et du Web posent actuellement de nouveaux défis. Une bonne quantité de l’information disponible sur le Web s’y trouve sous une forme qui se prête au traitement par machine (Web Sémantique) ;les outils de veille économique (en anglais Business Intelligence ou BI) doivent être capables de découvrir et récupérer les informations pertinentes, et les présenter aux utilisateurs afin de les assister dans une bonne analyse de la situation. De nombreux gouvernements et organisations rendent leurs données publiquement accessible, identifiables avec des URI (Unified Resource Identifiers), et les lient à d’autres données. Cette collection de jeux de données interconnectés sur le Web s’appelle Linked Data [1]. Ces jeux de données sont basés sur le modèle RDF (Resource Description Framework) – un format standard pour l’échange de données sur le Web [2]. SPARQL, un protocole et un langage de requêtes pour RDF [4], est utilisé pour interroger et manipuler les jeux de données RDF stockés dans des triplestores SPARQL. SPARQL 1.1 Federated Query [6] définit également une extension pour exécuter des requêtes distribuées sur plusieurs triplestores. Le standard actuel permet donc des requêtes analytiques complexes sur de multiples sources de données, et l’intégration de ces données dans le processus d’analyse devient une nécessité pour les outils de BI. Cependant, en raison de la quantité et de la complexité des données disponibles sur le Web, leur incorporation et leur utilisation ne sont pas toujours évidentes. Par conséquent, une solution OLAP efficace sur des source Web Sémantiques est nécessaire pour améliorer les outils de BI. Cette thèse de doctorat se concentre sur les défis liés à l’optimisation des requêtes analytiques qui utilisent des données provenant de plusieurs triplestores SPARQL. Premièrement, cette thèse propose un framework pour la découverte, l’intégration et l’interrogation analytique des Linked Data – ce type d’OLAP a été nommé OLAP Exploratoire [21]. Ce framework est conçu pour utiliser un schéma multidimensionnel du cube OLAP exprimé dans des vocabulaires RDF, afin de pouvoir interroger des sources de données, extraire et agréger des données, et construire un cube de données. Nous proposons également un processus assisté par ordinateur pour découvrir des sources de données précédemment inconnues et construire un schéma multidimensionnel du cube. Deuxièmement, vu l’inefficacité actuelle des triplestores SPARQL pour l’exécution des requêtes analytiques fédérées, cette thèse propose un ensemble de stratégies pour le traitement de ces requêtes ainsi qu’un module (appelé Cost-based Optimizer for Distributed Aggregate ou CoDA) pour optimiser leur exécution. Troisièmement, afin de surmonter les défis liés aux techniques de traitement des requêtes SPARQL agrégées sur un seul triplestore, nous proposons MARVEL (MAterialized Rdf Views with Entailment and incompLeteness) – une approche qui utilise des techniques de vues matérialisées spécifiques à RDF pour traiter les requêtes agrégées complexes. Notre approche consiste en un algorithme de sélection de vues selon un modèle de coût associé spécifique à RDF, une syntaxe pour la définition des vues et un algorithme pour la réécriture des requêtes SPARQL en utilisant les vues matérialisées RDF. Finalement, nous nous concentrons sur les techniques relatives au support des requêtes analytiques SPARQL sur des données liées situées en de multiples triplestores, qui nous conduisent à d’intéressantes analyses et constatations à grande échelle. En particulier, la technique proposée est capable d’intégrer les schémas divers des endpoints SPARQL, donnant accès aux données via des hiérarchies dans le style d’OLAP pour permettre des analyses uniformes, efficaces et puissantes. Enfin, cette thèse préconise une plus grande attention au traitement des requêtes analytiques au sein des systèmes RDF distribués.
Doctorat en Sciences de l'ingénieur et technologie
info:eu-repo/semantics/nonPublished

APA, Harvard, Vancouver, ISO, and other styles

29

Jandhyala, Sandeep. "An automated XPATH to SQL transformation methodology for XML data." unrestricted, 2006. http://etd.gsu.edu/theses/available/etd-04012006-121218/.

Full text

Abstract:

Thesis (M.S.)--Georgia State University, 2006.
Rajshekhar Sunderraman, committee chair; Sushil Prasad, Alex Zelikovsky, committee members. Electronic text (58 p.) : digital, PDF file. Description based on contents viewed Aug. 13, 2007. Includes bibliographical references (p. 58).

APA, Harvard, Vancouver, ISO, and other styles

30

Jäcksch, Bernhard [Verfasser]. "A Plan For OLAP: Optimization Of Financial Planning Queries In Data Warehouse Systems / Bernhard Jäcksch." München : Verlag Dr. Hut, 2011. http://d-nb.info/1017353700/34.

Full text

APA, Harvard, Vancouver, ISO, and other styles

31

Ebenstein, Roee A. "Supporting Advanced Queries on Scientific Array Data." The Ohio State University, 2018. http://rave.ohiolink.edu/etdc/view?acc_num=osu1531322027770129.

Full text

APA, Harvard, Vancouver, ISO, and other styles

32

Qian, Xiaoyan. "Design, implementation and performance tests for predicate introduction, a semantic query optimization technique for database queries." Thesis, National Library of Canada = Bibliothèque nationale du Canada, 1999. http://www.collectionscanada.ca/obj/s4/f2/dsk3/ftp04/mq43398.pdf.

Full text

APA, Harvard, Vancouver, ISO, and other styles

33

Bêdo, Marcos Vinícius Naves. "Incluindo funções de distância e extratores de características para suporte a consultas por similaridade." Universidade de São Paulo, 2013. http://www.teses.usp.br/teses/disponiveis/55/55134/tde-08112013-160506/.

Full text

Abstract:

Sistemas Gerenciadores de Bases de Dados Relacionais (SGBDR) são capazes de lidar com um alto volume de dados. As consultas nestes sistemas são realizados a partir da relação de ordem total, domínio sob o qual estão definidos dados simples como números ou strings, por exemplo. No caso de dados complexos, como imagens médicas, áudio ou séries-temporais financeiras que não obedecem as propriedade da relação acima citada e necessária uma abordagem que seja capaz de realizar a recuperação por conteúdo destes dados em tempo hábil e com semântica adequada. Nesse sentido, a literatura nos apresenta, como paradigma consolidado, as consultas por similaridade. Esse paradigma e a base para o funcionamento de muitos aplicativos de auxílio a tomada de decisão pelo especialista como Recuperação de Imagens Médicas por Conteúdo (CBMIR) e Recuperação de Áudio por Conteúdo (CBAR) e inclui diversas sub-áreas de pesquisa tais como extratores de características, funções de distância e métodos de acesso métrico. O desenvolvimento de novos métodos extratores de características e novas funções de distância são de fundamental importância para a diminuição do gap semântico entre os aplicativos e usuários, enquanto os métodos de acesso métricos são os reponsáveis diretos pela rápida resposta dos sistemas. Integrar todas essas funcionalidades em um framework de suporte a consultas por similaridade dentro de um SGBDR permanece um grande desafio. Esse trabalho objetiva estender uma proposta inicial dos recursos disponíveis no SIREN, inserindo novos extratores de características e funções de distância para imagens médicas e séries-temporais financeiras transformando-o em um framework, de forma que seus componentes possam ser utilizados via comandos Structured Query Language (SQL). Os resultados poderão ser diretamente utilizados por aplicativos de auxílio a tomada de decisão pelo especialista
Database Management Systems (DBMS) can deal with large amount of data. The queries on those systems obey the total order relation (TOR), domain where simple data such as numbers or strings are defined. In the case of complex data (e.g.: medical images, audio or temporal time-series) which does not obey the TOR properties, it\'s mandatory a new approach that can retrieve complex data by content with time skilful and proper semantics. To do so, the literature presents us, as consolidated paradigm, the similarity queries. This paradigm is the base of many computer aided applications (e.g.: Content-Based Medical Image Retrieval (CBMIR) and Content-Based Audio Retrieval (CBAR)) and include several research areas such as features extraction, distance functions and metrical access methods (MAM). Developing new features extractors methods and new distance functions (and combine them) are crucial to reduce the semantic gap between the content-based applications and the users. The MAM are responsible to provide fast and scalable answer to the systems. Integrate all those functionalities in one framework that can provide support to similarity queries inside a DBMS remains a huge challenge. The main objective of this work is extend the initial resources of the system SIREN, inserting new features extractor methods and distance functions to medical images, audio and financial time-series, turning it into a framework. All components may be used by extended Structured Query Language (SQL) commands. The SQL can be directly used by computer-aided applications

APA, Harvard, Vancouver, ISO, and other styles

34

Ferreira, Mônica Ribeiro Porto. "Optimizing similarity queries in metric spaces meeting user\'s expectation." Universidade de São Paulo, 2012. http://www.teses.usp.br/teses/disponiveis/55/55134/tde-24012013-091242/.

Full text

Abstract:

The complexity of data stored in large databases has increased at very fast paces. Hence, operations more elaborated than traditional queries are essential in order to extract all required information from the database. Therefore, the interest of the database community in similarity search has increased significantly. Two of the well-known types of similarity search are the Range (\'R IND. q\') and the k-Nearest Neighbor (\'kNN IND. q\') queries, which, as any of the traditional ones, can be sped up by indexing structures of the Database Management System (DBMS). Another way of speeding up queries is to perform query optimization. In this process, metrics about data are collected and employed to adjust the parameters of the search algorithms in each query execution. However, although the integration of similarity search into DBMS has begun to be deeply studied more recently, the query optimization has been developed and employed just to answer traditional queries. The execution of similarity queries, even using efficient indexing structures, tends to present higher computational cost than the execution of traditional ones. Two strategies can be applied to speed up the execution of any query, and thus they are worth to employ to answer also similarity queries. The first strategy is query rewriting based on algebraic properties and cost functions. The second technique is when external query factors are applied, such as employing the semantic expected by the user, to prune the answer space. This thesis aims at contributing to the development of novel techniques to improve the similarity-based query optimization processing, exploiting both algebraic properties and semantic restrictions as query refinements
A complexidade dos dados armazenados em grandes bases de dados tem aumentado sempre, criando a necessidade de novas operações de consulta. Uma classe de operações de crescente interesse são as consultas por similaridade, das quais as mais conhecidas são as consultas por abrangência (\'R IND. q\') e por k-vizinhos mais próximos (\'kNN IND. q\'). Qualquer consulta e agilizada pelas estruturas de indexação dos Sistemas de Gerenciamento de Bases de Dados (SGBDs). Outro modo de agilizar as operações de busca e a manutenção de métricas sobre os dados, que são utilizadas para ajustar parâmetros dos algoritmos de busca em cada consulta, num processo conhecido como otimização de consultas. Como as buscas por similaridade começaram a ser estudadas seriamente para integração em SGBDs muito mais recentemente do que as buscas tradicionais, a otimização de consultas, por enquanto, e um recurso que tem sido utilizado para responder apenas a consultas tradicionais. Mesmo utilizando as melhores estruturas existentes, a execução de consultas por similaridade tende a ser mais custosa do que as operações tradicionais. Assim, duas estratégias podem ser utilizadas para agilizar a execução de qualquer consulta e, assim, podem ser empregadas também para responder às consultas por similaridade. A primeira estratégia e a reescrita de consultas baseada em propriedades algébricas e em funções de custo. A segunda técnica faz uso de fatores externos à consulta, tais como a semântica esperada pelo usuário, para restringir o espaço das respostas. Esta tese pretende contribuir para o desenvolvimento de técnicas que melhorem o processo de otimização de consultas por similaridade, explorando propriedades algebricas e restrições semânticas como refinamento de consultas

APA, Harvard, Vancouver, ISO, and other styles

35

Alves, André Filipe Pereira. "DICOOGLE: No-SQL for supporting Big Data environments." Master's thesis, Universidade de Aveiro, 2016. http://hdl.handle.net/10773/17218.

Full text

Abstract:

Mestrado em Engenharia de Computadores e Telemática
The last few years have been characterized by a proliferation of different types of medical imaging modalities in healthcare institutions. As a result, the services are migrating to infrastructures in the Cloud. Thus, in addition to a scenario where tremendous amounts of data are produced, we walked to a reality where processes are increasingly distributed. Consequently, this reality has created new technological challenges regarding storage, management and handling of this data, in order to guarantee high availability and performance of the information systems, dealing with the images. An Open Source Picture Archive and Communication System (PACS) has been developed by the bioinformatics research group at the University of Aveiro labeled Dicoogle. This system replaced the traditional relational database engine for an agile mechanism, which indexes and retrieves data. Thus it is possible to extract, index and store all the image’s metadata, including any private information, without re-engineering or reconfiguration process. Among other use cases, this system has already indexed more than 22 million images in 3 hospitals from the region of Aveiro. Currently, Dicoogle provides a solution based on the Apache Lucene library. However, it has performance issues in environments where we need to handle and search over large amounts of data, more particularly in data analytics scenarios. In the context of this work, different technologies capable of supporting a database of an image repository were studied. In sequence, four solutions were fully implemented based on relational databases, NoSQL and two distinct text engines. A test platform was also developed to evaluate the performance and scalability of these solutions, which allowed a comparative analysis of them. In the end, it is proposed a hybrid architecture of medical image database, which was implemented and validated. This proposal has demonstrated significant gains in terms of query, index time and in scenarios where it is required a wide data analyze.
Os últimos anos têm sido caracterizados por uma proliferação de diversos tipos de modalidades de imagem médica nas instituições de saúde. Por outro lado, assistimos a uma migração de serviços para infraestruturas na Cloud. Assim, para além de um cenário onde são produzidos tremendos volumes de dados, caminhamos para uma realidade em que os processos são cada vez mais distribuídos. Tal realidade tem colocado novos desafios tecnológicos ao nível do arquivo, transmissão e visualização, muito particularmente nos aspetos de desempenho e escalabilidade dos sistemas de informação que lidam com a imagem. O grupo de bioinformática da universidade de Aveiro tem vindo a desenvolver um inovador sistema distribuído de arquivo de imagem médica, o Dicoogle Open Source PACS. Este sistema substituiu o tradicional motor de base de dados relacional por um mecanismo ágil de indexação e recuperação de dados. Desta forma é possível extrair, indexar e armazenar todos os metadados das imagens, incluindo eventuais elementos privados, sem necessidade de processos de reengenharia ou reconfiguração. Entre outros casos de uso, este sistema já indexou mais de 22 milhões de imagens em 3 hospitais da região de Aveiro. Atualmente, o Dicoogle dispõe de uma solução baseada na biblioteca Apache Lucene. No entanto, esta tem demonstrado alguns problemas de desempenho em ambientes em que temos necessidade de manusear e pesquisar sobre uma grande quantidade de dados, muito particularmente em cenários de análise de dados. No âmbito desta dissertação foram estudadas diferentes tecnologias capazes de suportar uma base dados de um repositório de imagem. Em sequência, foram implementadas quatro soluções baseadas em bases de dados relacionais, NoSQL e motor de indexação. Foi também desenvolvida uma plataforma de testes de desempenho e escalabilidade que permitiu efetuar uma análise comparativa das soluções implementadas. No final, é proposta uma arquitetura híbrida de base de dados de imagem médica que foi implementada e validada. Tal proposta demonstrou ter ganhos significativos ao nível dos tempos de pesquisa de conteúdos e em cenários de análise alargada de dados.

APA, Harvard, Vancouver, ISO, and other styles

36

Ribeiro, porto ferreira Monica. "Optimizing similarity queries in metric spaces meeting user's expectation." Thesis, Dijon, 2012. http://www.theses.fr/2012DIJOS040/document.

Full text

Abstract:

La complexité des données contenues dans les grandes bases de données a augmenté considérablement. Par conséquent, des opérations plus élaborées que les requêtes traditionnelles sont indispensable pour extraire toutes les informations requises de la base de données. L'intérêt de la communauté de base de données a particulièrement augmenté dans les recherches basées sur la similarité. Deux sortes de recherche de similarité bien connues sont la requête par intervalle (Rq) et par k-plus proches voisins (kNNq). Ces deux techniques, comme les requêtes traditionnelles, peuvent être accélérées par des structures d'indexation des Systèmes de Gestion de Base de Données (SGBDs).Une autre façon d'accélérer les requêtes est d'exécuter le procédé d'optimisation des requêtes. Dans ce procédé les données métriques sont recueillies et utilisées afin d'ajuster les paramètres des algorithmes de recherche lors de chaque exécution de la requête. Cependant, bien que l'intégration de la recherche de similarités dans le SGBD ait commencé à être étudiée en profondeur récemment, le procédé d'optimisation des requêtes a été développé et utilisé pour répondre à des requêtes traditionnelles. L'exécution des requêtes de similarité a tendance à présenter un coût informatique plus important que l'exécution des requêtes traditionnelles et ce même en utilisant des structures d'indexation efficaces. Deux stratégies peuvent être appliquées pour accélérer l'execution de quelques requêtes, et peuvent également être employées pour répondre aux requêtes de similarité. La première stratégie est la réécriture de requêtes basées sur les propriétés algébriques et les fonctions de coût. La deuxième stratégie est l'utilisation des facteurs externes de la requête, tels que la sémantique attendue par les usagers, pour réduire le nombre des résultats potentiels. Cette thèse vise à contribuer au développement des techniques afin d'améliorer le procédé d'optimisation des requêtes de similarité, tout en exploitant les propriétés algébriques et les restrictions sémantiques pour affiner les requêtes
The complexity of data stored in large databases has increased at very fast paces. Hence, operations more elaborated than traditional queries are essential in order to extract all required information from the database. Therefore, the interest of the database community in similarity search has increased significantly. Two of the well-known types of similarity search are the Range (Rq) and the k-Nearest Neighbor (kNNq) queries, which, as any of the traditional ones, can be sped up by indexing structures of the Database Management System (DBMS). Another way of speeding up queries is to perform query optimization. In this process, metrics about data are collected and employed to adjust the parameters of the search algorithms in each query execution. However, although the integration of similarity search into DBMS has begun to be deeply studied more recently, the query optimization has been developed and employed just to answer traditional queries.The execution of similarity queries, even using efficient indexing structures, tends to present higher computational cost than the execution of traditional ones. Two strategies can be applied to speed up the execution of any query, and thus they are worth to employ to answer also similarity queries. The first strategy is query rewriting based on algebraic properties and cost functions. The second technique is when external query factors are applied, such as employing the semantic expected by the user, to prune the answer space. This thesis aims at contributing to the development of novel techniques to improve the similarity-based query optimization processing, exploiting both algebraic properties and semantic restrictions as query refinements
A complexidade dos dados armazenados em grandes bases de dados tem aumentadosempre, criando a necessidade de novas operaoes de consulta. Uma classe de operações de crescente interesse são as consultas por similaridade, das quais as mais conhecidas sãoas consultas por abrangência (Rq) e por k-vizinhos mais próximos (kNNq). Qualquerconsulta é agilizada pelas estruturas de indexaçãodos Sistemas de Gerenciamento deBases de Dados (SGBDs). Outro modo de agilizar as operações de busca é a manutençãode métricas sobre os dados, que são utilizadas para ajustar parâmetros dos algoritmos debusca em cada consulta, num processo conhecido como otimização de consultas. Comoas buscas por similaridade começaram a ser estudadas seriamente para integração emSGBDs muito mais recentemente do que as buscas tradicionais, a otimização de consultas,por enquanto, é um recurso que tem sido utilizado para responder apenas a consultastradicionais.Mesmo utilizando as melhores estruturas existentes, a execução de consultas por similaridadetende a ser mais custosa do que as operações tradicionais. Assim, duas estratégiaspodem ser utilizadas para agilizar a execução de qualquer consulta e, assim, podem serempregadas também para responder às consultas por similaridade. A primeira estratégiaé a reescrita de consultas baseada em propriedades algébricas e em funções de custo. Asegunda técnica faz uso de fatores externos à consulta, tais como a semântica esperadapelo usuário, para restringir o espaço das respostas. Esta tese pretende contribuir parao desenvolvimento de técnicas que melhorem o processo de otimização de consultas porsimilaridade, explorando propriedades algébricas e restrições semânticas como refinamentode consultas

APA, Harvard, Vancouver, ISO, and other styles

37

Granbohm, Martin, and Marcus Nordin. "The optimization of Database queries by using a dynamic caching policy on the application side of a system." Thesis, Malmö universitet, Fakulteten för teknik och samhälle (TS), 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:mau:diva-20296.

Full text

Abstract:

Det är viktigare än någonsin att optimera svarstiden för databasförfrågningardå internettrafiken ökar och storleken på data växer. IT-företag har också blivitmer medvetna om vikten av att snabbt leverera innehåll till slutanvändaren pågrund av hur långsammare svarstider kan påverka kvalitetsuppfattningen påen produkt/ett system. Detta kan i sin tur leda till en negativ påverkan på ettföretags intäkter.I det här arbetet utvecklar och implementerar vi en ny dynamisk cachelösningpå applikationssidan av systemet och testar den mot väletablerade cachestrategier. Vi undersökte kända cache-strategier och relaterad forskning somtar hänsyn till den aktuella databasbelastningen så som historisk frekvens fören specifik databasförfrågan och tillämpade detta i vår algoritm. Vi utveckladefrån detta en dynamisk cachepolicy som använder en logaritmisk beräkningsom involverar den historiska frekvensen tillsammans med endatabasförfrågans svarstid och beräknade en vikt för en viss databasförfrågan.Vikten ger sedan prioritet i förhållande till andra databasförfrågningar som ärcachade. Vi kan här påvisa en prestandahöjning på 11-12% mot LRU, enprestandahöjning på 15% mot FIFO och en väsentlig prestandahöjning mot attanvända databasen direkt med både MySQL-cache aktiverad och inaktiverad.
With IP traffic and data sets continuously growing together with IT companiesbecoming more and more dependent on large data sets, it is more importantthan ever to optimize the load time of queries. IT companies have also becomemore aware of the importance of delivering content quickly to the end userbecause of how slower response times can affect quality perception which inturn can have a negative impact on revenue.In this paper, we develop and implement a new dynamic cache managementsystem with the cache on the application side of the system and test it againstwell-established caching policies. By looking at known caching strategies andresearch that takes the current database load into account with attributes suchas a queries frequency and incorporating this into our algorithm, we developeda dynamic caching policy that utilizes a logarithmic calculation involvinghistorical query frequency together with query response time to calculate aweight for a specific query. The weight gives priority in relation to other queriesresiding within the cache, which shows a performance increase towards existingcaching policies. The results show that we have a 11-12 % performance increasetowards LRU, a 15 % performance increase towards FIFO and a substantialperformance increase towards using the database directly with both MySQLcaching enabled and disabled.

APA, Harvard, Vancouver, ISO, and other styles

38

Kumar, Hara. "Dynamic First Match : Reducing Resource Consumption of First Match Queries in MySQL NDB Cluster." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-289594.

Full text

Abstract:

Dynamic First Match is a learned heuristic that reduces the resource consumption of first match queries in a multi-threaded, distributed relational database, while having a minimal effect on latency. Traditional first match range scans occur in parallel across all data fragments simultaneously. This could potentially return many redundant results. Dynamic First Match reduced this redundancy by learning to scan only a portion of the data fragments first, before scanning the remaining fragments with a pruned data set. Benchmark tests show that Dynamic First Match could reduce resource consumption of first match queries containing first match range scans by over 40% while having a minimal effect on latency.
Dynamisk Första Match är en lärd heuristik som minskar resursförbrukningen för första match frågor i en flertrådad och distribuerad relationsdatabas, samtidigt som den har en minimal effekt på latens. Första match frågor resulterar i många intervallavsökningar. Traditionellt intervallskanningarna körs parallellt över alla datafragment samtidigt. Detta kan potentiellt ge många överflödiga resultat. Dynamisk Första Match minskade denna redundans genom att lära sig att bara skanna en del av datafragmenten innan återstående datafragmenten skannades med en beskuren datamängd. Jämförelsetester visar att Dynamisk Första Match kan minska resursförbrukningen för första match frågor med intervallavsökningar med över 40% samtidigt som den har en minimal effekt på latens.

APA, Harvard, Vancouver, ISO, and other styles

39

Šístek, Petr. "Optimalizace informačního systému firmy a jeho rozšíření." Master's thesis, Vysoké učení technické v Brně. Fakulta podnikatelská, 2010. http://www.nusl.cz/ntk/nusl-222593.

Full text

Abstract:

The purpose of thesis is optimization information system and its upgrade with some new functions. Optimization is based on users experience and edit using processes. Upgrade will be make new chances and functions which will help company in decision processes and it contributes to more effective work with firm’s data.

APA, Harvard, Vancouver, ISO, and other styles

40

Waite, Edwin Richard. "Web Based Query Optimization Simulator." CSUSB ScholarWorks, 2004. https://scholarworks.lib.csusb.edu/etd-project/2519.

Full text

Abstract:

The Web Based Query Optimization Simulator (WBQOS) is a software tool designed to enhance understanding of query optimization with a Relational Database Management System (RDBMS). WBQOS allows the user to visualize and participate in query optimization, which enhances the learning process.

APA, Harvard, Vancouver, ISO, and other styles

41

Trißl, Silke [Verfasser], Ulf [Akademischer Betreuer] Leser, Johann-Christoph [Akademischer Betreuer] Freytag, and Thorsten [Akademischer Betreuer] Grust. "Cost-based optimization of graph queries in relational database management systems / Silke Trißl. Gutachter: Ulf Leser ; Johann-Christoph Freytag ; Thorsten Grust." Berlin : Humboldt Universität zu Berlin, Mathematisch-Naturwissenschaftliche Fakultät II, 2012. http://d-nb.info/1024311309/34.

Full text

APA, Harvard, Vancouver, ISO, and other styles

42

Pinarer, Ozgun. "Sustainable Declarative Monitoring Architecture : Energy optimization of interactions between application service oriented queries and wireless sensor devices : Application to Smart Buildings." Thesis, Lyon, 2017. http://www.theses.fr/2017LYSEI126/document.

Full text

Abstract:

La dernière décennie a montré un intérêt croissant pour les bâtiments intelligents. Les bâtiments traditionnels sont les principaux consommateurs d’une partie importante des ressources énergétiques, d'où le besoin de bâtiments intelligents a alors émergé. Ces nouveaux bâtiments doivent être conçus selon des normes de construction durables pour consommer moins. Ces bâtiments intelligents sont devenus l’un des principaux domaines d’application des environnements pervasifs. En effet, une infrastructure basique de construction de bâtiment intelligent se compose notamment d’un ensemble de capteurs sans fil. Les capteurs basiques permettent l’acquisition, la transmission et la réception de données. La consommation d’énergie élevée de l’ensemble de ces appareils est un des problèmes les plus difficiles et fait donc l’objet d’études dans ce domaine de la recherche. Les capteurs sont autonomes en termes d’énergie. Etant donné que la consommation d’énergie a un fort impact sur la durée de vie du service, il existe plusieurs approches dans la littérature. Cependant, les approches existantes sont souvent adaptées à une seule application de surveillance et reposent sur des configurations statiques pour les capteurs. Dans cette thèse, nous contribuons à la définition d’une architecture de surveillance déclaratif durable par l’optimisation énergétique des interactions entre requêtes applicative orientées service et réseau de capteurs sans fil. Nous avons choisi le bâtiment intelligent comme cas d’application et nous étudions donc un système de surveillance d’un bâtiment intelligent. Du point de vue logiciel, un système de surveillance peut être défini comme un ensemble d’applications qui exploitent les mesures des capteurs en temps réel. Ces applications sont exprimées dans un langage déclaratif sous la forme de requêtes continues sur les flux de données des capteurs. Par conséquent, un système de multi-applications nécessite la gestion de plusieurs demandes de flux de données suivant différentes fréquences d’acq/tx de données pour le même capteur sans fil, avec des exigences dynamiques requises par les applications. Comme une configuration statique ne peut pas optimiser la consommation d’énergie du système, nous proposons une approche intitulée Smart-Service Stream-oriented Sensor Management (3SoSM) afin d’optimiser les interactions entre les exigences des applications et l’environnement des capteurs sans fil, en temps réel. 3SoSM offre une configuration dynamique des capteurs pour réduire la consommation d’énergie tout en satisfaisant les exigences des applications en temps réel. Nous avons conduit un ensemble d’expérimentations effectuées avec un simulateur de réseau de capteurs sans fil qui ont permis de valider notre approche quant à l’optimisation de la consommation d’énergie des capteurs, et donc l’augmentation de la durée de vie de ces capteurs, en réduisant notamment les communications non nécessaires
Recent researches and analysis reports declare that high energy consumption of buildings is major problem in developed countries. As a result, they show concretely that building energy management systems (BEMS) and deployed wireless sensor network environments are important for energy efficiency of building operations. In the literature, existing smart building management systems focus on energy consumption of the building, hardware deployed inside/outside of the building and network communication issues. They adopt static configurations for wireless sensor devices and proposed models are fitted to a single application. In this study, we propose a sustainable declarative monitoring architecture that focus on the energy optimisation of interactions between application service oriented queries and wireless sensor devices. We consider the monitoring system as a set of applications that exploit sensor measures in real time such as HVAC automation and control systems, real time supervision, security. These applications can be configured dynamically by the users or by the supervisor. In our approach, we take a data point of view: applications are declaratively expressed as a set of continuous queries on the sensor data stream. To achieve our objective of energy aware optimization of the monitoring architecture, we formalize sensor device configuration and fit data acquisition and data transmission to actual applications requirements. We present a complete monitoring architecture and an algorithm that handles dynamic sensor configuration. We introduce a platform that covers physical and also simulated wireless sensor devices

APA, Harvard, Vancouver, ISO, and other styles

43

Murphy, Brian R. "Order-sensitive XML query processing over relational sources." Link to electronic thesis, 2003. http://www.wpi.edu/Pubs/ETD/Available/etd-0505103-123753.

Full text

Abstract:

Thesis (M.S.)--Worcester Polytechnic Institute.
Keywords: computation pushdown; XML; order-based Xquery processing; relational database; ordered SQL queries; data model mapping; XQuery; XML data mapping; SQL; XML algebra rewrite rules; XML document order. Includes bibliographical references (p. 64-67).

APA, Harvard, Vancouver, ISO, and other styles

44

Kanchev, Kancho. "Employee Management System." Thesis, Växjö University, School of Mathematics and Systems Engineering, 2007. http://urn.kb.se/resolve?urn=urn:nbn:se:vxu:diva-1048.

Full text

Abstract:

This report includes a development presentation of an information system for managing the staff data within a small company or organization. The system as such as it has been developed is called Employee Management System. It consists of functionally related GUI (application program) and database.

The choice of the programming tools is individual and particular.

APA, Harvard, Vancouver, ISO, and other styles

45

Brander, Thomas, and Christian Dakermandji. "En jämförelse mellan databashanterare med prestandatester och stora datamängder." Thesis, KTH, Data- och elektroteknik, 2016. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-188199.

Full text

Abstract:

Företaget Nordicstation hanterar stora datamängder åt Swedbank där datalagringen sker i relationsdatabasen Microsoft SQL Server 2012 (SQL Server). Då det finns andra databashanterare designade för stora datavolymer är det oklart om SQL Server är den optimala lösningen för situationen. Detta examensarbete har tagit fram en jämförelse med hjälp av prestandatester, beträffande exekveringstiden av databasfrågor, mellan databaserna SQL Server, Cassandra och NuoDB vid hanteringen av stora datamängder. Cassandra är en kolumnbaserad databas designad för hantering av stora datavolymer, NuoDB är en minnesdatabas som använder internminnet som lagringsutrymme och är designad för skalbarhet. Resultaten togs fram i en virtuell servermiljö med Windows Server 2012 R2 på en testplattform skriven i Java. Jämförelsen visar att SQL Server var den databas mest lämpad för gruppering, sortering och beräkningsoperationer. Däremot var Cassandra bäst i skrivoperationer och NuoDB presterade bäst i läsoperationer. Analysen av resultatet visade att mindre access till disken ger kortare exekveringstid men den skalbara lösningen, NuoDB, lider av kraftiga prestandaförluster av att endast konfigureras med en nod. Nordicstation rekommenderas att uppgradera till Microsoft SQL Server 2014, eller senare, där möjlighet finns att spara tabeller i internminnet.
The company Nordicstation handles large amounts of data for Swedbank, where data is stored using the relational database Microsoft SQL Server 2012 (SQL Server). The existence of other databases designed for handling large amounts of data, makes it unclear if SQL Server is the best solution for this situation. This degree project describes a comparison between databases using performance testing, with regard to the execution time of database queries. The chosen databases were SQL Server, Cassandra and NuoDB. Cassandra is a column-oriented database designed for handling large amounts of data, NuoDB is a database that uses the main memory for data storage and is designed for scalability. The performance tests were executed in a virtual server environment with Windows Server 2012 R2 using an application written in Java. SQL Server was the database most suited for grouping, sorting and arithmetic operations. Cassandra had the shortest execution time for write operations while NuoDB performed best in read operations. This degree project concludes that minimizing disk operations leads to shorter execution times but the scalable solution, NuoDB, suffer severe performance losses when configured as a single-node. Nordicstation is recommended to upgrade to Microsoft SQL Server 2014, or later, because of the possibility to save tables in main memory.

APA, Harvard, Vancouver, ISO, and other styles

46

Sabesan, Manivasakan. "Querying Data Providing Web Services." Doctoral thesis, Uppsala universitet, Avdelningen för datalogi, 2010. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-128928.

Full text

Abstract:

Web services are often used for search computing where data is retrieved from servers providing information of different kinds. Such data providing web services return a set of objects for a given set of parameters without any side effects. There is need to enable general and scalable search capabilities of data from data providing web services, which is the topic of this Thesis. The Web Service MEDiator (WSMED) system automatically provides relational views of any data providing web service operations by reading the WSDL documents describing them. These views can be queried with SQL. Without any knowledge of the costs of executing specific web service operations the WSMED query processor automatically and adaptively finds an optimized parallel execution plan calling queried data providing web services. For scalable execution of queries to data providing web services, an algebra operator PAP adaptively parallelizes calls in execution plans to web service operations until no significant performance improvement is measured, based on monitoring the flow from web service operations without any cost knowledge or extensive memory usage. To comply with the Everything as a Service (XaaS) paradigm WSMED itself is implemented as a web service that provides web service operations to query and combine data from data providing web services. A web based demonstration of the WSMED web service provides general SQL queries to any data providing web service operations from a browser. WSMED assumes that all queried data sources are available as web services. To make any data providing system into a data providing web service WSMED includes a subsystem, the web service generator, which generates and deploys the web service operations to access a data source. The WSMED web service itself is generated by the web service generator.
eSSENCE

APA, Harvard, Vancouver, ISO, and other styles

47

Lu, Qifeng. "Bivariate Best First Searches to Process Category Based Queries in a Graph for Trip Planning Applications in Transportation." Diss., Virginia Tech, 2009. http://hdl.handle.net/10919/26444.

Full text

Abstract:

With the technological advancement in computer science, Geographic Information Science (GIScience), and transportation, more and more complex path finding queries including category based queries are proposed and studied across diverse disciplines. A category based query, such as Optimal Sequenced Routing (OSR) queries and Trip Planning Queries (TPQ), asks for a minimum-cost path that traverses a set of categories with or without a predefined order in a graph. Due to the extensive computing time required to process these complex queries in a large scale environment, efficient algorithms are highly desirable whenever processing time is a consideration. In Artificial Intelligence (AI), a best first search is an informed heuristic path finding algorithm that uses domain knowledge as heuristics to expedite the search process. Traditional best first searches are single-variate in terms of the number of variables to describe a state, and thus not appropriate to process these queries in a graph. In this dissertation, 1) two new types of category based queries, Category Sequence Traversal Query (CSTQ) and Optimal Sequence Traversal Query (OSTQ), are proposed; 2) the existing single-variate best first searches are extended to multivariate best first searches in terms of the state specified, and a class of new concepts--state graph, sub state graph, sub state graph space, local heuristic, local admissibility, local consistency, global heuristic, global admissibility, and global consistency--is introduced into best first searches; 3) two bivariate best first search algorithms, C* and O*, are developed to process CSTQ and OSTQ in a graph, respectively; 4) for each of C* and O*, theorems on optimality and optimal efficiency in a sub state graph space are developed and identified; 5) a family of algorithms including C*-P, C-Dijkstra, O*-MST, O*-SCDMST, O*- Dijkstra, and O*-Greedy is identified, and case studies are performed on path finding in transportation networks, and/or fully connected graphs, either directed or undirected; and 6) O*- SCDMST is adopted to efficiently retrieve optimal solutions for OSTQ using network distance metric in a large transportation network.
Ph. D.

APA, Harvard, Vancouver, ISO, and other styles

48

Komaragiri, Vivek Chakravarthy. "Application of decision diagrams for information storage and retrieval." Master's thesis, Mississippi State : Mississippi State University, 2002. http://library.msstate.edu/etd/show.asp?etd=etd-04082002-144345.

Full text

APA, Harvard, Vancouver, ISO, and other styles

49

Řeháček, Tomáš. "Analýza efektivnosti BI systémů s použitím databáze Oracle 10g." Master's thesis, Vysoká škola ekonomická v Praze, 2011. http://www.nusl.cz/ntk/nusl-164047.

Full text

Abstract:

The main goal of this diploma thesis is to provide advice and recommendations to increase the effectiveness of Business Intelligence using Oracle database. More specifically, the focus of the thesis is the optimization of ETL processes and objects in a data warehouse. Tests to compare different optimization techniques are performed to meet the defined goal. These tests are based on common tasks solved in data warehouses on daily basis. Tests are also supported by publications of leading authors in Oracle environment. The second goal of this work is the implementation of appropriate optimization techniques to real ETL pro-cess in order to increase its effectiveness. Performance changes are compared and measured using metrics defined for the ETL process and requirements from client. Optimization techniques presented for main objectives are used to achieve the second objective. Contri-bution of the thesis is to show the most important optimization techniques and tips that every developer should know and be able to use effectively. Another added value is au-thor's practical experiences in real data warehouse environment.

APA, Harvard, Vancouver, ISO, and other styles

50

Håkansson, Gunnar. "Applikation för sökning i databaslogg samt design av databas." Thesis, Örebro universitet, Institutionen för naturvetenskap och teknik, 2012. http://urn.kb.se/resolve?urn=urn:nbn:se:oru:diva-23462.

Full text

Abstract:

Den här rapporten behandlar ett system som använder en databas som lagringsplats för loggar. En bra metod för att hämta ut dessa loggar saknades och databasdesignen behövde förbättras för sökningar i loggarna. En applikation för att hämta och söka i loggposter från databasen skapades. En undersökning om hur databasdesignen kunde förbättras genomfördes också. Båda delarna gjordes i ett projekt för att de hörde ihop. Applikationen skulle använda databasen. Då jag inte kunde göra vilka ändringar jag ville i databasen gjordes relativt begränsade ändringar i den. Större ändringar utreddes teoretiskt. Applikationen gjordes mot den existerande databasdesignen, med ett undantag: en vy lades till. Rapporten undersöker index och andra metoder att göra sökningar i en databas snabbare. En metod för att hämta data inom ett intervall i en databas utvecklades och den beskrivs i rapporten. Metoden söker efter all data som har värden på en kolumn som faller inom ett intervall och där databasen är ordnad, eller nästan ordnad, på den kolumnen. Metoden ger oexakta svar om databasen är nästan ordnad på den kolumnen. Den är snabbare än en motsvarande exakt sökning.
This report considers a system where a database is used as the back-end storage for logging. A suitable method for extracting information from the logs was missing and the database design needed an improvement for log searching. An application for extracting and filtering the logs was created. An evaluation of how the database could be improved was also performed. Both parts were done in one project since they were heavily connected. The application would use the database. Since I couldn’t make arbitrary changes to the database only relatively limited changes were made in practice. Larger changes were evaluated theoretically. The application was made against the existing database, with one exception: a view was added. The report handles indexes and other methods for speeding up database searches. A method for fetching data inside an interval in a database was developed and is described in the report. The method searches for all data where the value of a column is inside an interval and the database is ordered, or almost ordered, on that column. The method gives inexact answers if the database is almost ordered on that column. It is faster than a corresponding exact search.

APA, Harvard, Vancouver, ISO, and other styles

Dissertations / Theses on the topic 'Optimization of SQL queries'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles