Dissertations / Theses: 'Approximate database'

1

Jermaine, Christopher. "Approximate answering of aggregate queries in relational databases." Diss., Georgia Institute of Technology, 2002. http://hdl.handle.net/1853/9221.

Full text

APA, Harvard, Vancouver, ISO, and other styles

2

Cheng, Lok-lam, and 鄭樂霖. "Approximate string matching in DNA sequences." Thesis, The University of Hong Kong (Pokfulam, Hong Kong), 2003. http://hub.hku.hk/bib/B29350591.

Full text

APA, Harvard, Vancouver, ISO, and other styles

3

Sjö, Kristoffer. "Semantics and Implementation of Knowledge Operators in Approximate Databases." Thesis, Linköping University, Department of Computer and Information Science, 2004. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-2438.

Full text

Abstract:

In order that epistemic formulas might be coupled with approximate databases, it is necessary to have a well-defined semantics for the knowledge operator and a method of reducing epistemic formulas to approximate formulas. In this thesis, two possible definitions of a semantics for the knowledge operator are proposed for use together with an approximate relational database:

* One based upon logical entailment (being the dominating notion of knowledge in literature); sound and complete rules for reduction to approximate formulas are explored and found not to be applicable to all formulas.

* One based upon algorithmic computability (in order to be practically feasible); the correspondence to the above operator on the one hand, and to the deductive capability of the agent on the other hand, is explored.

Also, an inductively defined semantics for a"know whether"-operator, is proposed and tested. Finally, an algorithm implementing the above is proposed, carried out using Java, and tested.

APA, Harvard, Vancouver, ISO, and other styles

4

Geum, Seong. "An approximate load balancing parallel hash join algorithm to handle data skew in a parallel data base system." Thesis, Georgia Institute of Technology, 1995. http://hdl.handle.net/1853/9222.

Full text

APA, Harvard, Vancouver, ISO, and other styles

5

Linari, Alessandro <1977&gt. "Models and techniques for approximate similarity search in large databases." Doctoral thesis, Alma Mater Studiorum - Università di Bologna, 2007. http://amsdottorato.unibo.it/398/.

Full text

APA, Harvard, Vancouver, ISO, and other styles

6

CALENDER, CHRISTOPHER R. "APPROXIMATE N-NEAREST NEIGHBOR CLUSTERING ON DISTRIBUTED DATABASES USING ITERATIVE REFINEMENT." University of Cincinnati / OhioLINK, 2004. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1092929952.

Full text

APA, Harvard, Vancouver, ISO, and other styles

7

Monge, Alvaro Edmundo. "Adaptive detection of approximately duplicate database records and the database integration approach to information discovery /." Diss., Connect to a 24 p. preview or request complete full text in PDF format. Access restricted to UC campuses, 1997. http://wwwlib.umi.com/cr/ucsd/fullcit?p9804033.

Full text

APA, Harvard, Vancouver, ISO, and other styles

8

Bechchi, Mounir. "Clustering-based Approximate Answering of Query Result in Large and Distributed Databases." Phd thesis, Université de Nantes, 2009. http://tel.archives-ouvertes.fr/tel-00475917.

Full text

Abstract:

Les utilisateurs des bases de données doivent faire face au problème de surcharge d'information lors de l'interrogation de leurs données, qui se traduit par un nombre de réponses trop élevé à des requêtes exploratoires. Pour remédier à ce problème, nous proposons un algorithme efficace et rapide, ap- pelé ESRA (Explore-Select-Rearrange Algorithm), qui utilise les résumés SAINTETIQ pré-calculés sur l'ensemble des données pour regrouper les réponses à une requête utilisateur en un ensemble de classes (ou résumés) organisées hiérarchiquement. Chaque classe décrit un sous-ensemble de résul- tats dont les propriétés sont voisines. L'utilisateur pourra ainsi explorer la hiérarchie pour localiser les données qui l'intéressent et en écarter les autres. Les résultats expérimentaux montrent que l'al- gorithme ESRA est efficace et fournit des classes bien formées (i.e., leur nombre reste faible et elles sont bien séparées). Cependant, le modèle SAINTETIQ, utilisé par l'algorithme ESRA, exige que les données soient disponibles sur le serveur des résumés. Cette hypothèse rend inapplicable l'algo- rithme ESRA dans des environnements distribués où il est souvent impossible ou peu souhaitable de rassembler toutes les données sur un même site. Pour remédier à ce problème, nous proposons une collection d'algorithmes qui combinent deux résumés générés localement et de manière autonome sur deux sites distincts pour en produire un seul résumant l'ensemble des données distribuées, sans accéder aux données d'origine. Les résultats expérimentaux montrent que ces algorithmes sont aussi performants que l'approche centralisée (i.e., SAINTETIQ appliqué aux données après regroupement sur un même site) et produisent des hiérarchies très semblables en structure et en qualité à celles produites par l'approche centralisée.

APA, Harvard, Vancouver, ISO, and other styles

9

Brodsky, Lloyd. "A knowledge-based preprocessor for approximate joins in improperly designed transaction databases." Thesis, Massachusetts Institute of Technology, 1991. http://hdl.handle.net/1721.1/13744.

Full text

APA, Harvard, Vancouver, ISO, and other styles

10

Ozturk, Ozgur. "Feature extraction and similarity-based analysis for proteome and genome databases." The Ohio State University, 2007. http://rave.ohiolink.edu/etdc/view?acc_num=osu1190138805.

Full text

APA, Harvard, Vancouver, ISO, and other styles

11

Sanz, Blasco Ismael. "Flexible techniques for heterogeneous XML data retrieval." Doctoral thesis, Universitat Jaume I, 2007. http://hdl.handle.net/10803/10373.

Full text

Abstract:

The progressive adoption of XML by new communities of users has motivated the appearance of applications that require the management of large and complex collections, which present a large amount of heterogeneity. Some relevant examples are present in the fields of bioinformatics, cultural heritage, ontology management and geographic information systems, where heterogeneity is not only reflected in the textual content of documents, but also in the presence of rich structures which cannot be properly accounted for using fixed schema definitions. Current approaches for dealing with heterogeneous XML data are, however, mainly focused at the content level, whereas at the structural level only a limited amount of heterogeneity is tolerated; for instance, weakening the parent-child relationship between nodes into the ancestor-descendant relationship.
The main objective of this thesis is devising new approaches for querying heterogeneous XML collections. This general objective has several implications: First, a collection can present different levels of heterogeneity in different granularity levels; this fact has a significant impact in the selection of specific approaches for handling, indexing and querying the collection. Therefore, several metrics are proposed for evaluating the level of heterogeneity at different levels, based on information-theoretical considerations. These metrics can be employed for characterizing collections, and clustering together those collections which present similar characteristics.
Second, the high structural variability implies that query techniques based on exact tree matching, such as the standard XPath and XQuery languages, are not suitable for heterogeneous XML collections. As a consequence, approximate querying techniques based on similarity measures must be adopted. Within the thesis, we present a formal framework for the creation of similarity measures which is based on a study of the literature that shows that most approaches for approximate XML retrieval (i) are highly tailored to very specific problems and (ii) use similarity measures for ranking that can be expressed as ad-hoc combinations of a set of --basic' measures. Some examples of these widely used measures are tf-idf for textual information and several variations of edit distances. Our approach wraps these basic measures into generic, parametrizable components that can be combined into complex measures by exploiting the composite pattern, commonly used in Software Engineering. This approach also allows us to integrate seamlessly highly specific measures, such as protein-oriented matching functions.
Finally, these measures are employed for the approximate retrieval of data in a context of highly structural heterogeneity, using a new approach based on the concepts of pattern and fragment. In our context, a pattern is a concise representations of the information needs of a user, and a fragment is a match of a pattern found in the database. A pattern consists of a set of tree-structured elements --- basically an XML subtree that is intended to be found in the database, but with a flexible semantics that is strongly dependent on a particular similarity measure. For example, depending on a particular measure, the particular hierarchy of elements, or the ordering of siblings, may or may not be deemed to be relevant when searching for occurrences in the database.
Fragment matching, as a query primitive, can deal with a much higher degree of flexibility than existing approaches. In this thesis we provide exhaustive and top-k query algorithms. In the latter case, we adopt an approach that does not require the similarity measure to be monotonic, as all previous XML top-k algorithms (usually based on Fagin's algorithm) do. We also presents two extensions which are important in practical settings: a specification for the integration of the aforementioned techniques into XQuery, and a clustering algorithm that is useful to manage complex result sets.
All of the algorithms have been implemented as part of ArHeX, a toolkit for the development of multi-similarity XML applications, which supports fragment-based queries through an extension of the XQuery language, and includes graphical tools for designing similarity measures and querying collections. We have used ArHeX to demonstrate the effectiveness of our approach using both synthetic and real data sets, in the context of a biomedical research project.

APA, Harvard, Vancouver, ISO, and other styles

12

Chang, Chih-Kai, and 張智凱. "Approximate Cost Estimating Model and Database for Green Building." Thesis, 2011. http://ndltd.ncl.edu.tw/handle/3akf2p.

Full text

Abstract:

碩士
國立臺灣科技大學
營建工程系
99
The concept of Green Building has been widely-used for several years; however, most papers use case studies to discuss the existing buildings and their costs. Few papers discuss green building and value engineering at the planning phase. Because of that, less paper discuss databases for green building or cost analysis, even if most of them investigate the "Greenery Index" or "Ground Water Retention Index". Because of that, this study use "Green Building-aided design program" to get green building solutions. As stated above, designers not only want to know green building solutions, but also concern more about the cost and profit. This study found that no appropriate model for the cost of green building when reviewing current papers. This thesis establishes an approximate cost estimating model and uses MYSQL to build cost ratio, floor unit price and materials database for green building to estimate the approximate cost. This study selects some housings and office buildings in Silver, Gold and Diamond level to estimate the approximate green building cost. The results show that the total cost error should be considered acceptable. Moreover, the cost increase depends on the adapted solutions for green building. It may not have to pay high cost to built green buildings.

APA, Harvard, Vancouver, ISO, and other styles

13

LIANG, WEI-YU, and 梁緯宇. "Study of Approximate Cost Estimating Model and Database System for Green Building." Thesis, 2012. http://ndltd.ncl.edu.tw/handle/2gkhh7.

Full text

Abstract:

碩士
國立臺灣科技大學
營建工程系
100
In recent years, due to human over-exploitation of the environmental problems and energy crisis, "sustainable development" has become the subject of common concern internationally. In order to comply with the international development trend of green building, our country also made a set of green building rating system in 1999. Since the end of April from 2000 to 2012, there are total of 822 cases qualified, but also the number of cases has increasing trend by year. As stated above, designers not only want to know green building solutions, but also concern more about the cost and profit. This study found that there is no appropriate model for the cost of green building until now. This thesis establishes an approximate cost estimating model and uses MYSQL to build cost ratio, floor unit price and materials database for green building. This study uses uniform distribution to selects some housings and office buildings in Silver, Gold and Diamond level from 170 reality green building case to estimate the approximate green building cost. The results show that the total cost error is considerably acceptable.

APA, Harvard, Vancouver, ISO, and other styles

14

Abdella, Mussa Ismael. "The use of genetic algorithms and neural networks to approximate missing data in database." Thesis, 2006. http://hdl.handle.net/10539/105.

Full text

Abstract:

Missing data creates various problems in analysing and processing of data in databases. Due to this reason missing data has been an area of research in various disciplines for a quite long time. This report intro- duces a new method aimed at approximating missing data in a database using a combination of genetic algorithms and neural networks. The proposed method uses genetic algorithm to minimise an error function derived from an auto-associative neural network. The error function is expressed as the square of the di®erence between the actual observa- tions and predicted values from an auto-associative neural network. In the event of missing data, all the values of the actual observations are not known hence, the error function is decomposed to depend on the known and unknown (missing) values. Multi Layer Perceptron (MLP), and Radial Basis Function (RBF) neural networks are employed to train the neural networks. The research focus also lies on the investigation of using the proposed method in approximating missing data with great accuracy as the number of missing cases within a single record increases. The research also investigates the impact of using di®erent neural net- work architecture in training the neural network and the approximation ii found to the missing values. It is observed that approximations of miss- ing data obtained using the proposed model to be highly accurate with 95% correlation coe±cient between the actual missing values and cor- responding approximated values using the proposed model. It is found that results obtained using RBF are better than MLP. Results found us- ing the combination of both MLP and RBF are found to be better than those obtained using either MLP or RBF. It is also observed that there is no signi¯cant reduction in accuracy of results as the number of missing cases in a single record increases. Approximations found for missing data are also found to depend on the particular neural network architecture employed in training the data set.

APA, Harvard, Vancouver, ISO, and other styles

15

Tsai, Jenn-Shing, and 蔡振興. "Discovery of Approximate Dependencies from Fuzzy Relational Databases." Thesis, 2000. http://ndltd.ncl.edu.tw/handle/70108554410275536947.

Full text

Abstract:

碩士
義守大學
資訊工程學系
88
We present here data mining techniques for discovering approximate dependencies based on equivalence classes from the similarity-based fuzzy relational database and fuzzy functional dependencies from the possibility-based fuzzy relational database. The similarity-based and possibility-based fuzzy data models are two major data models of fuzzy relational databases that have been proposed to represent imprecise, uncertain, and incomplete information. The similarity-based fuzzy data model extends the traditional relational model by allowing attribute values to be a subset of an attribute domain. In addition, similarity relation may exist between attribute values of a domain. The model has been recognized as most suitable for describing imprecise data that are analogical over discrete domains. The possibility-based fuzzy data model is another extension of traditional relational model in that attribute values may contain fuzzy sets. An approximate dependency can be considered as a functional dependency that almost holds. It describes approximate relationships between attributes of a relation in a database. Research on generalizing the notion of functional dependencies into that of approximate dependencies on fuzzy relational databases has been undertaken in recent years. Various forms of approximate dependencies have been proposed. However, their emphases are on the conceptual viewpoints and no mining algorithms are given. In this thesis, the problems of validity testings of approximate and fuzzy functional dependencies are studied. In addition, data mining techniques based on top-down levelwise searching are proposed here to discover for all possible minimal non-trivial approximate and fuzzy functional dependencies on similarity-based and possibility-based fuzzy relational databases respectively. Experimental results showing the behaviors of these approximate dependencies are discussed. The dependencies discovered contain not only the conventional functional dependencies when similarity relations are reduced to identity relations but also semantic dependencies that describe the conceptual structures between attributes. The results developed here can be applied to the areas of fuzzy database design, query optimization and database reverse engineering.

APA, Harvard, Vancouver, ISO, and other styles

16

Zilles, Craig B. "Master/slave speculative parallelization and approximate code /." 2002. http://www.library.wisc.edu/databases/connect/dissertations.html.

Full text

APA, Harvard, Vancouver, ISO, and other styles

17

Hwang, Seok. "Kinetic decomposition of approximate solutions to conservation laws." 2002. http://www.library.wisc.edu/databases/connect/dissertations.html.

Full text

APA, Harvard, Vancouver, ISO, and other styles

18

Cui, Bin, Heng Tao Shen, Jialie Shen, and Kian Lee Tan. "Exploring Bit-Difference for Approximate KNN Search in High-dimensional Databases." 2004. http://hdl.handle.net/1721.1/7416.

Full text

Abstract:

In this paper, we develop a novel index structure to support efficient approximate k-nearest neighbor (KNN) query in high-dimensional databases. In high-dimensional spaces, the computational cost of the distance (e.g., Euclidean distance) between two points contributes a dominant portion of the overall query response time for memory processing. To reduce the distance computation, we first propose a structure (BID) using BIt-Difference to answer approximate KNN query. The BID employs one bit to represent each feature vector of point and the number of bit-difference is used to prune the further points. To facilitate real dataset which is typically skewed, we enhance the BID mechanism with clustering, cluster adapted bitcoder and dimensional weight, named the BID⁺. Extensive experiments are conducted to show that our proposed method yields significant performance advantages over the existing index structures on both real life and synthetic high-dimensional datasets.
Singapore-MIT Alliance (SMA)

APA, Harvard, Vancouver, ISO, and other styles

19

Hsu, Min-tze, and 徐敏哲. "A Hash Trie Filter Approach to Approximate String Match for Genomic Databases." Thesis, 2005. http://ndltd.ncl.edu.tw/handle/78957433919505789307.

Full text

Abstract:

碩士
國立中山大學
資訊工程學系研究所
93
Genomic sequence databases, like GenBank, EMBL, are widely used by molecular biologists for homology searching. Because of the long length of each genomic sequence and the increase of the size of genomic sequence databases, the importance of efficient searching methods for fast queries grows. The DNA sequences are composed of four kinds of nucleotides, and these genomic sequences can be regarded as the text strings. However, there is no concept of words in a genomic sequence, which makes the search of the genomic sequence in the genomic database much difficult. Approximate String Matching (ASM) with k errors is considered for genomic sequences, where k errors would be caused by insertion, deletion, and replacement operations. Filtration of the DNA sequence is a widely adopted technique to reduce the number of the text areas (i.e., candidates) for further verification. In most of the filter methods, they first split the database sequence into q-grams. A sequence of grams (subpatterns) which match some part of the text will be passed as a candidate. The match problem of grams with the part of the text could be speed up by using the index structure for the exact match. Candidates will then be examined by dynamic programming to get the final result. However, in the previous methods for ASM, most of them considered the local order within each gram. Only the (k + s) h-samples filter considers the global order of the sequence of matched grams. Although the (k + s) h-samples filter keeps the global order of the sequence of the grams, it still has some disadvantages. First, to be a candidate in the (k + s) h-samples filter, the number of the ordered matched grams, s, is always fixed to 2 which results in low precision. Second, the (k + s) h-samples filter uses the query time to build the index for query patterns. In this thesis, we propose a new approximate string matching method, the hash trie filter, for efficiently searching in genomic databases. We build a hash trie in the pre-computing time for the genomic sequence stored in database. Although the size q of each split grams is also decided by the same formula used in the (k + s) h-samples filter, we have proposed a different way to find the ordered subpatterns in text T. Moreover, we reduce the number of candidates by pruning some unreasonable matched positions. Furthermore, unlike the (k + s) h-samples filter which always uses s = 2 to decide whether s matched subpatterns could be a candidate or not, our method will dynamically decide s, resulting in the increase of precision. The simulation results show that our hash trie filter outperforms the (k +s) h-samples filter in terms of the response time, the number of verified candidates, and the precision under different length of the query patterns and different error levels.

APA, Harvard, Vancouver, ISO, and other styles

20

Dobiášovský, Jan. "Přibližná shoda znakových řetězců a její aplikace na ztotožňování metadat vědeckých publikací." Master's thesis, 2020. http://www.nusl.cz/ntk/nusl-415121.

Full text

Abstract:

The thesis explores the application of approximate string matching in scientific publication record linkage process. An introduction to record matching along with five commonly used metrics for string distance (Levenshtein, Jaro, Jaro-Winkler, Cosine distances and Jaccard coefficient) are provided. These metrics are applied on publication metadata from V3S current research information system of the Czech Technical University in Prague. Based on the findings, optimal thresholds in the F1, F2 and F3-measures are determined for each metric.

APA, Harvard, Vancouver, ISO, and other styles

21

Sahu, Sushanta. "Development of an approximate method for analyzing closed queuing networks with multiple-server stations." 2007. http://www.library.wisc.edu/databases/connect/dissertations.html.

Full text

APA, Harvard, Vancouver, ISO, and other styles

22

Moško, Juraj. "Explorácia multimediálnych kolekcií." Doctoral thesis, 2016. http://www.nusl.cz/ntk/nusl-348096.

Full text

Abstract:

Multimedia retrieval systems are supposed to provide the method and the interface for users to retrieve particular multimedia data from multimedia collections. Although, many different retrieval techniques evolved from times when the search in multimedia collections firstly appeared as a research task, not all of them can fulfill specific requirements that the multimedia exploration is determined for. The multimedia exploration is designated for revealing the content of a whole multimedia collection, quite often totally unknown to the users who retrieve data. Because of these facts a multimedia exploration system has to solve problems like, how to visualize (usually multidimensional) multimedia data, how to scale data retrieval from arbitrarily large collections and how to design such an interface that the users could intuitively use for the exploration. Taking these problems into consideration, we proposed and evaluated ideas for building the system that is well-suited for the multimedia exploration. We outlined the overall architecture of a multimedia exploration system, created the Multi-Layer Exploration Structure (MLES) as an underlying index structure that should solve problems of efficient and intuitive data retrieval and we also proposed definitions of exploration operations as an interactive and...

APA, Harvard, Vancouver, ISO, and other styles

Dissertations / Theses on the topic 'Approximate database'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles