Dissertations / Theses: 'Information storage and retrieval systems Data structures (Computer science)'

1

Daoud, Amjad M. "Efficient data structures for information retrieval." Diss., Virginia Tech, 1993. http://hdl.handle.net/10919/40031.

Full text

APA, Harvard, Vancouver, ISO, and other styles

2

Basu, Nandini. "Heuristics for searching chemical structures." Diss., Columbia, Mo. : University of Missouri-Columbia, 2007. http://hdl.handle.net/10355/5000.

Full text

Abstract:

Thesis (M.S.)--University of Missouri-Columbia, 2007. The entire dissertation/thesis text is included in the research.pdf file; the official abstract appears in the short.pdf file (which also appears in the research.pdf); a non-technical general description, or public abstract, appears in the public.pdf file. Title from title screen of research.pdf file (viewed Apr. 9, 2009). Includes bibliographical references.

APA, Harvard, Vancouver, ISO, and other styles

3

Higa, Kunihiko. "End user logical database design: The structured entity model approach." Diss., The University of Arizona, 1988. http://hdl.handle.net/10150/184539.

Full text

Abstract:

We live in the Information Age. The effective use of information to manage organizational resources is the key to an organization's competitive power. Thus, a database plays a major role in the Information Age. A well designed database contains relevant, nonredundant, and consistent data. However, a well designed database is rarely achieved in practice. One major reason for this problem is the lack of effective support for logical database design. Since the late 1980s, various methodologies for database design have been introduced, based on the relational model, the functional model, the semantic database model, and the entity structure model. They all have, however, a common drawback: the successful design of database systems requires the experience, skills, and competence of a database analyst/designer. Unfortunately, such database analyst/designers are a scarce resource in organizations. The Structured Entity Model (SEM) method, as an alternative diagrammatic method developed by this research, facilitates the logical design phases of database system development. Because of the hierarchical structure and decomposition constructs of SEM, it can help a novice designer in performing top-down structured analysis and design of databases. SEM also achieves high semantic expressiveness by using a frame representation for entities and three general association categories (aspect, specialization, and multiple decomposition) for relationships. This also enables SEM to have high potential as a knowledge representation scheme for an integrated heterogeneous database system. Unlike most methods, the SEM method does not require designers to have knowledge of normalization theory in order to design a logical database. Thus, an end-user will be able to complete logical database design successfully using this approach. In this research, existing data models used for a logical database design were first studied. Second, the framework of SEM and the design approach using SEM were described and then compared with other data models and their use. Third, the effectiveness of the SEM method was validated in two experiments using novice designers and by a case analysis. In the final chapter of this dissertation, future research directions, such as the design of a logical database design expert system based on the SEM method and applications of this approach to other problems, such as the problems in integration of multiple databases and in an intelligent mail system, are discussed.

APA, Harvard, Vancouver, ISO, and other styles

4

Choquette, Stephen Michael. "An experimental disk-resident spatial information system." Thesis, Virginia Tech, 1985. http://hdl.handle.net/10919/45717.

Full text

Abstract:

In Chapter 2, several other relational database systems will be reviewed. The primary goal of each analysis will be to identify the logical and physical approaches to data representation, as well as the user interface. Chapters 3 and 4 identify the physical and logical structure of the experimental disk-resident spatial information system developed for this project. As will be seen, this experimental system combines features of existing relational databases with new approaches to represent and manipulate information in a database. The Query Language Interpreter will be presented in Chapter 5. Through the interpreter, a database user can issue a variety of commands to perform database operations, as well as create sophisticated program control structures. Chapters 6-8 discuss the security, integrity, and recovery aspects of the disk-resident system. During the development of this project, various implementation problems were encountered. These problems and solutions are presented in Chapter 9. Chapter 10 discusses the performance of the disk-resident system. Statistics are presented comparing how the database commands performed when run under a variety of test environments. Chapter 11 uses the results from earlier chapters to draw conclusions concerning the disk-resident system and presents some directions for future work. Following the bibliography are related appendices that illustrate the various types of files recognized by the query language interpreter. Master of Science

APA, Harvard, Vancouver, ISO, and other styles

5

Kjerne, Daniel. "Modeling cadastral spatial relationships using an object-oriented information structure." PDXScholar, 1987. https://pdxscholar.library.pdx.edu/open_access_etds/3721.

Full text

Abstract:

This thesis identifies a problem in the current practice for storage of locational data of entities in the cadastral layer of a land information system (LIS), and presents as a solution an information model that uses an object-oriented paradigm.

APA, Harvard, Vancouver, ISO, and other styles

6

Douieb, Karim. "Hotlinks and dictionaries." Doctoral thesis, Universite Libre de Bruxelles, 2008. http://hdl.handle.net/2013/ULB-DIPOT:oai:dipot.ulb.ac.be:2013/210471.

Full text

Abstract:

Knowledge has always been a decisive factor of humankind's social evolutions. Collecting the world's knowledge is one of the greatest challenges of our civilization. Knowledge involves the use of information but information is not knowledge. It is a way of acquiring and understanding information. Improving the visibility and the accessibility of information requires to organize it efficiently. This thesis focuses on this general purpose.A fundamental objective of computer science is to store and retrieve information efficiently. This is known as the dictionary problem. A dictionary asks for a data structure which allows essentially the search operation. In general, information that is important and popular at a given time has to be accessed faster than less relevant information. This can be achieved by dynamically managing the data structure periodically such that relevant information is located closer from the search starting point. The second part of this thesis is devoted to the development and the understanding of self-adjusting dictionaries in various models of computation. In particular, we focus our attention on dictionaries which do not have any knowledge of the future accesses. Those dictionaries have to auto-adapt themselves to be competitive with dictionaries specifically tuned for a given access sequence. This approach, which transforms the information structure, is not always feasible. Reasons can be that the structure is based on the semantic of the information such as categorization. In this context, the search procedure is linked to the structure itself and modifying the structure will affect how a search is performed. A solution developed to improve search in static structure is the hotlink assignment. It is a way to enhance a structure without altering its original design. This approach speeds up the search by creating shortcuts in the structure. The first part of this thesis is devoted to this approach. Doctorat en Sciences info:eu-repo/semantics/nonPublished

APA, Harvard, Vancouver, ISO, and other styles

7

Senn, Erich 1957. "AN EXPERT SYSTEM APPROACH TO DATA COMMUNICATION FAILURE DIAGNOSIS AND INFORMATION RETRIEVAL." Thesis, The University of Arizona, 1986. http://hdl.handle.net/10150/275550.

Full text

APA, Harvard, Vancouver, ISO, and other styles

8

Radley, Johannes Jurgens. "Pseudo-random access compressed archive for security log data." Thesis, Rhodes University, 2015. http://hdl.handle.net/10962/d1020019.

Full text

Abstract:

We are surrounded by an increasing number of devices and applications that produce a huge quantity of machine generated data. Almost all the machine data contains some element of security information that can be used to discover, monitor and investigate security events.The work proposes a pseudo-random access compressed storage method for log data to be used with an information retrieval system that in turn provides the ability to search and correlate log data and the corresponding events. We explain the method for converting log files into distinct events and storing the events in a compressed file. This yields an entry identifier for each log entry that provides a pointer that can be used by indexing methods. The research also evaluates the compression performance penalties encountered by using this storage system, including decreased compression ratio, as well as increased compression and decompression times.

APA, Harvard, Vancouver, ISO, and other styles

9

Lofstead, Gerald Fredrick. "Extreme scale data management in high performance computing." Diss., Georgia Institute of Technology, 2010. http://hdl.handle.net/1853/37232.

Full text

Abstract:

Extreme scale data management in high performance computing requires consideration of the end-to-end scientific workflow process. Of particular importance for runtime performance, the write-read cycle must be addressed as a complete unit. Any optimization made to enhance writing performance must consider the subsequent impact on reading performance. Only by addressing the full write-read cycle can scientific productivity be enhanced. The ADIOS middleware developed as part of this thesis provides an API nearly as simple as the standard POSIX interface, but with the flexibilty to choose what transport mechanism(s) to employ at or during runtime. The accompanying BP file format is designed for high performance parallel output with limited coordination overheads while incorporating features to accelerate subsequent use of the output for reading operations. This pair of optimizations of the output mechanism and the output format are done such that they either do not negatively impact or greatly improve subsequent reading performance when compared to popular self-describing file formats. This end-to-end advantage of the ADIOS architecture is further enhanced through techniques to better enable asychronous data transports affording the incorporation of 'in flight' data processing operations and pseudo-transport mechanisms that can trigger workflows or other operations.

APA, Harvard, Vancouver, ISO, and other styles

10

Tatarinov, Igor. "Semantic data sharing with a peer data management system /." Thesis, Connect to this title online; UW restricted, 2004. http://hdl.handle.net/1773/6942.

Full text

APA, Harvard, Vancouver, ISO, and other styles

11

Bae, Soo Hyun. "Information retrieval via universal source coding." Diss., Atlanta, Ga. : Georgia Institute of Technology, 2008. http://hdl.handle.net/1853/26573.

Full text

Abstract:

Thesis (Ph.D)--Electrical and Computer Engineering, Georgia Institute of Technology, 2009. Committee Chair: Juang, Biing-Hwang; Committee Member: Al-Regib, Ghassan; Committee Member: Linda Wiils; Committee Member: Mersereau, Russell; Committee Member: Pappas, Thrasyvoulos. Part of the SMARTech Electronic Thesis and Dissertation Collection.

APA, Harvard, Vancouver, ISO, and other styles

12

Skovronski, John. "An ontology-based publish-subscribe framework." Diss., Online access via UMI:, 2006.

Find full text

APA, Harvard, Vancouver, ISO, and other styles

13

Dawson, Linda Louise 1954. "An investigation of the use of object-oriented models in requirements engineering practice." Monash University, School of Information Management and Systems, 2001. http://arrow.monash.edu.au/hdl/1959.1/8031.

Full text

APA, Harvard, Vancouver, ISO, and other styles

14

Conte, Simone Ivan. "The Sea of Stuff : a model to manage shared mutable data in a distributed environment." Thesis, University of St Andrews, 2019. http://hdl.handle.net/10023/16827.

Full text

Abstract:

Managing data is one of the main challenges in distributed systems and computer science in general. Data is created, shared, and managed across heterogeneous distributed systems of users, services, applications, and devices without a clear and comprehensive data model. This technological fragmentation and lack of a common data model result in a poor understanding of what data is, how it evolves over time, how it should be managed in a distributed system, and how it should be protected and shared. From a user perspective, for example, backing up data over multiple devices is a hard and error-prone process, or synchronising data with a cloud storage service can result in conflicts and unpredictable behaviours. This thesis identifies three challenges in data management: (1) how to extend the current data abstractions so that content, for example, is accessible irrespective of its location, versionable, and easy to distribute; (2) how to enable transparent data storage relative to locations, users, applications, and services; and (3) how to allow data owners to protect data against malicious users and automatically control content over a distributed system. These challenges are studied in detail in relation to the current state of the art and addressed throughout the rest of the thesis. The artefact of this work is the Sea of Stuff (SOS), a generic data model of immutable self-describing location-independent entities that allow the construction of a distributed system where data is accessible and organised irrespective of its location, easy to protect, and can be automatically managed according to a set of user-defined rules. The evaluation of this thesis demonstrates the viability of the SOS model for managing data in a distributed system and using user-defined rules to automatically manage data across multiple nodes.

APA, Harvard, Vancouver, ISO, and other styles

15

Honniger, Werner. "Networking the enterprise : a solution for HBR personnel." Thesis, Stellenbosch : Stellenbosch University, 2004. http://hdl.handle.net/10019.1/16481.

Full text

Abstract:

Thesis (MPhil)--Stellenbosch University, 2004. ENGLISH ABSTRACT: This Extended Research Assignment discusses the information systems found in HBR Personnel. The discussion, based on the research problems, proposes steps in which the systems of HBR can be integrated so that they add the most value. Furthermore, a review of Corporate Portals is undertaken to show the potential impact it may have on organisational efficiencies and knowledge. The Assignment, according to the methodologies given, analyses the HBR information system for system incompatibilities and bottlenecks and proposes solutions for these problems. The solutions include changing core system databases and computer systems, together with a portal to fully integrate HBR Personnel’s information systems. AFRIKAANSE OPSOMMING: Hierdie Uitgebreide Navorsingsopdrag bespreek die informasiestelsels gevind in HBR Personnel. Die bespreking, gebaseer op die navorsingsprobleme, stel stappe voor waardeur die stelsels van HBR geïntegreer kan word om die meeste waarde toe te voeg. Verder word ‘n oorsig gedoen van Korporatiewe Portale om te wys watter potensiële impak dit kan hê op organisatoriese doeltreffendheid en kennis. Na aanleiding van die gegewe metodologieë analiseer die opdrag HBR se informasiestelsel vir sistemiese probleme en bottelnekke en stel oplossings voor vir hierdie probleme. Die oplossings sluit in ‘n verandering van kern-sisteem databasisse en rekenaarstelsels, tesame met ‘n portaal om HBR Personnel se informasiestelsels ten volle te integreer.

APA, Harvard, Vancouver, ISO, and other styles

16

Mendes, Guilherme Firmino. "Filtragem de percepções em agentes baseada em objetivos e no modelo de revisão de crenças data-oriented belief revision (DBR)." Universidade Tecnológica Federal do Paraná, 2015. http://repositorio.utfpr.edu.br/jspui/handle/1/1896.

Full text

Abstract:

Em cenários onde agentes inteligentes atuam e percebem dados de ambientes com cada vez mais informações, identificar somente as percepções relevantes aos objetivos pode ser crucial para que o ciclo de raciocínio do programa seja realizado em tempo hábil. Como uma solução a este problema, o presente trabalho criou um modelo de filtragem de percepções baseado no modelo DBR (Data-oriented Belief Revision) para ser aplicado em agentes BDI (Belief Desire Intention). Para isto, o trabalho estendeu e formalizou parte dos conceitos do modelo DBR tornando-o computacionalmente aplicável. Entre as contribuições deste trabalho estão a extensão da definição dos processos de Foco (seleção de dados percebidos) e Esquecimento de dados inativos; a definição e formalização de modelos de cálculo da Relevância de percepções, que permitem filtrar ou descartar dados em função dos planos do agente e de suas valorações de importância; e as definições dos modelos de armazenamento de Dados Inativos capazes de suportar diferentes cenários de utilização de agentes BDI. O resultado foi um filtro de percepções, genérico e automatizado, orientado aos objetivos de agentes BDI. Para operacionalizar o modelo criado, ele foi implementado na plataforma de desenvolvimento de agentes Jason. Análises empíricas foram realizadas para avaliar a corretude e os impactos relacionados ao tempo de pro- cessamento após a aplicação do modelo. Os experimentos realizados indicaram que o modelo de filtragem de percepções proposto neste trabalho contribui no desempenho computacional de agentes expostos a ambientes com muitos ruídos. In scenarios where intelligent agents act and perceive data of environments with much information, identify only perceptions relevant to goals can be crucial for the agent reasoning cycle to be performed in time. As a solution to this problem, this work creates a model to filter perceptions based on DBR (Data-oriented Belief Revision) model to be applied at BDI (Belief Desire Intention) agents. In order to do it, this work has extended and formalized some of the DBR model concepts making it applicable in computer programs. Among this work contributions are the extension and definition of the processes Focus (selection of perceived data) and Oblivion of inactive data; definition and formalization of perception Relevance models, calculations that allow to filter or discard data based on agent plans and their importance values;definition of Inactive Data storage models able to support different usage scenarios of BDI agents. The result was a generic and automated perception filter oriented to the goals of BDI agents. To opera- tionalize the model, it was implemented in the agent development plataform Jason. Empirical analysis have been done to assess the correctness and identify the impact on the processing time after the model application. The results indicate that the perception filtering model proposed in this work contributes to the computational performance of agents exposed to environments with much noise.

APA, Harvard, Vancouver, ISO, and other styles

17

Peng, Xiaobo. "Mediation on XQuery Views." Thesis, University of North Texas, 2006. https://digital.library.unt.edu/ark:/67531/metadc5442/.

Full text

Abstract:

The major goal of information integration is to provide efficient and easy-to-use access to multiple heterogeneous data sources with a single query. At the same time, one of the current trends is to use standard technologies for implementing solutions to complex software problems. In this dissertation, I used XML and XQuery as the standard technologies and have developed an extended projection algorithm to provide a solution to the information integration problem. In order to demonstrate my solution, I implemented a prototype mediation system called Omphalos based on XML related technologies. The dissertation describes the architecture of the system, its metadata, and the process it uses to answer queries. The system uses XQuery expressions (termed metaqueries) to capture complex mappings between global schemas and data source schemas. The system then applies these metaqueries in order to rewrite a user query on a virtual global database (representing the integrated view of the heterogeneous data sources) to a query (termed an outsourced query) on the real data sources. An extended XML document projection algorithm was developed to increase the efficiency of selecting the relevant subset of data from an individual data source to answer the user query. The system applies the projection algorithm to decompose an outsourced query into atomic queries which are each executed on a single data source. I also developed an algorithm to generate integrating queries, which the system uses to compose the answers from the atomic queries into a single answer to the original user query. I present a proof of both the extended XML document projection algorithm and the query integration algorithm. An analysis of the efficiency of the new extended algorithm is also presented. Finally I describe a collaborative schema-matching tool that was implemented to facilitate maintaining metadata.

APA, Harvard, Vancouver, ISO, and other styles

18

Katchaounov, Timour. "Query Processing for Peer Mediator Databases." Doctoral thesis, Uppsala : Acta Universitatis Upsaliensis : Univ.-bibl. [distributör], 2003. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-3687.

Full text

APA, Harvard, Vancouver, ISO, and other styles

19

Sandrock, Trudie. "Multi-label feature selection with application to musical instrument recognition." Thesis, Stellenbosch : Stellenbosch University, 2013. http://hdl.handle.net/10019/11071.

Full text

Abstract:

Thesis (PhD)--Stellenbosch University, 2013. ENGLISH ABSTRACT: An area of data mining and statistics that is currently receiving considerable attention is the field of multi-label learning. Problems in this field are concerned with scenarios where each data case can be associated with a set of labels instead of only one. In this thesis, we review the field of multi-label learning and discuss the lack of suitable benchmark data available for evaluating multi-label algorithms. We propose a technique for simulating multi-label data, which allows good control over different data characteristics and which could be useful for conducting comparative studies in the multi-label field. We also discuss the explosion in data in recent years, and highlight the need for some form of dimension reduction in order to alleviate some of the challenges presented by working with large datasets. Feature (or variable) selection is one way of achieving dimension reduction, and after a brief discussion of different feature selection techniques, we propose a new technique for feature selection in a multi-label context, based on the concept of independent probes. This technique is empirically evaluated by using simulated multi-label data and it is shown to achieve classification accuracy with a reduced set of features similar to that achieved with a full set of features. The proposed technique for feature selection is then also applied to the field of music information retrieval (MIR), specifically the problem of musical instrument recognition. An overview of the field of MIR is given, with particular emphasis on the instrument recognition problem. The particular goal of (polyphonic) musical instrument recognition is to automatically identify the instruments playing simultaneously in an audio clip, which is not a simple task. We specifically consider the case of duets – in other words, where two instruments are playing simultaneously – and approach the problem as a multi-label classification one. In our empirical study, we illustrate the complexity of musical instrument data and again show that our proposed feature selection technique is effective in identifying relevant features and thereby reducing the complexity of the dataset without negatively impacting on performance. AFRIKAANSE OPSOMMING: ‘n Area van dataontginning en statistiek wat tans baie aandag ontvang, is die veld van multi-etiket leerteorie. Probleme in hierdie veld beskou scenarios waar elke datageval met ‘n stel etikette geassosieer kan word, instede van slegs een. In hierdie skripsie gee ons ‘n oorsig oor die veld van multi-etiket leerteorie en bespreek die gebrek aan geskikte standaard datastelle beskikbaar vir die evaluering van multi-etiket algoritmes. Ons stel ‘n tegniek vir die simulasie van multi-etiket data voor, wat goeie kontrole oor verskillende data eienskappe bied en wat nuttig kan wees om vergelykende studies in die multi-etiket veld uit te voer. Ons bespreek ook die onlangse ontploffing in data, en beklemtoon die behoefte aan ‘n vorm van dimensie reduksie om sommige van die uitdagings wat deur sulke groot datastelle gestel word die hoof te bied. Veranderlike seleksie is een manier van dimensie reduksie, en na ‘n vlugtige bespreking van verskillende veranderlike seleksie tegnieke, stel ons ‘n nuwe tegniek vir veranderlike seleksie in ‘n multi-etiket konteks voor, gebaseer op die konsep van onafhanklike soek-veranderlikes. Hierdie tegniek word empiries ge-evalueer deur die gebruik van gesimuleerde multi-etiket data en daar word gewys dat dieselfde klassifikasie akkuraatheid behaal kan word met ‘n verminderde stel veranderlikes as met die volle stel veranderlikes. Die voorgestelde tegniek vir veranderlike seleksie word ook toegepas in die veld van musiek dataontginning, spesifiek die probleem van die herkenning van musiekinstrumente. ‘n Oorsig van die musiek dataontginning veld word gegee, met spesifieke klem op die herkenning van musiekinstrumente. Die spesifieke doel van (polifoniese) musiekinstrument-herkenning is om instrumente te identifiseer wat saam in ‘n oudiosnit speel. Ons oorweeg spesifiek die geval van duette – met ander woorde, waar twee instrumente saam speel – en hanteer die probleem as ‘n multi-etiket klassifikasie een. In ons empiriese studie illustreer ons die kompleksiteit van musiekinstrumentdata en wys weereens dat ons voorgestelde veranderlike seleksie tegniek effektief daarin slaag om relevante veranderlikes te identifiseer en sodoende die kompleksiteit van die datastel te verminder sonder ‘n negatiewe impak op klassifikasie akkuraatheid.

APA, Harvard, Vancouver, ISO, and other styles

20

Gill, Martin L. "Combining MAS and P2P systems : the Agent Trees Multi-Agent System (ATMAS)." Thesis, University of Stirling, 2005. http://hdl.handle.net/1893/108.

Full text

Abstract:

The seamless retrieval of information distributed across networks has been one of the key goals of many systems. Early solutions involved the use of single static agents which would retrieve the unfiltered data and then process it. However, this was deemed costly and inefficient in terms of the bandwidth since complete files need to be downloaded when only a single value is often all that is required. As a result, mobile agents were developed to filter the data in situ before returning it to the user. However, mobile agents have their own associated problems, namely security and control. The Agent Trees Multi-Agent System (AT-MAS) has been developed to provide the remote processing and filtering capabilities but without the need for mobile code. It is implemented as a Peer to Peer (P2P) network of static intelligent cooperating agents, each of which control one or more data sources. This dissertation describes the two key technologies have directly influenced the design of ATMAS, Peer-to-Peer (P2P) systems and Multi-Agent Systems (MAS). P2P systems are conceptually simple, but limited in power, whereas MAS are significantly more complex but correspondingly more powerful. The resulting system exhibits the power of traditional MAS systems while retaining the simplicity of P2P systems. The dissertation describes the system in detail and analyses its performance.

APA, Harvard, Vancouver, ISO, and other styles

21

Fan, Yang, Hidehiko Masuhara, Tomoyuki Aotani, Flemming Nielson, and Hanne Riis Nielson. "AspectKE*: Security aspects with program analysis for distributed systems." Universität Potsdam, 2010. http://opus.kobv.de/ubp/volltexte/2010/4136/.

Full text

Abstract:

Enforcing security policies to distributed systems is difficult, in particular, when a system contains untrusted components. We designed AspectKE*, a distributed AOP language based on a tuple space, to tackle this issue. In AspectKE*, aspects can enforce access control policies that depend on future behavior of running processes. One of the key language features is the predicates and functions that extract results of static program analysis, which are useful for defining security aspects that have to know about future behavior of a program. AspectKE* also provides a novel variable binding mechanism for pointcuts, so that pointcuts can uniformly specify join points based on both static and dynamic information about the program. Our implementation strategy performs fundamental static analysis at load-time, so as to retain runtime overheads minimal. We implemented a compiler for AspectKE*, and demonstrate usefulness of AspectKE* through a security aspect for a distributed chat system.

APA, Harvard, Vancouver, ISO, and other styles

22

Sanga, Dione Aparecido de Oliveira. "Mineração de textos para o tratamento automático em sistemas de atendimento ao usuário." Universidade Tecnológica Federal do Paraná, 2017. http://repositorio.utfpr.edu.br/jspui/handle/1/2850.

Full text

Abstract:

A explosão de novas formas de comunicação entre empresas e clientes proporciona novas oportunidades e meios para que empresas possam tirar proveito desta interação. A forma como os clientes interagem com as empresas tem evoluído nos últimos anos, devido ao aumento dos dispositivos móveis e o acesso à internet: clientes que tradicionalmente solicitavam atendimento via telefone migraram para meios de atendimento eletrônicos, sejam eles via app´s dos smartphones ou via portais de atendimento a clientes. Como resultado desta transformação tecnológica do meio de comunicação, a Mineração de Textos tornou-se uma atrativa forma das empresas extraírem conhecimento novo a partir do registro das interações realizadas pelos clientes. Dentro deste contexto, o ambiente de telecomunicações proporciona os insumos para a realização de experimentos devido ao grande volume de dados gerados diariamente em sistemas de atendimento a clientes. Esse trabalho tem por objetivo analisar se o uso de Mineração de Textos aumenta a acurácia dos modelos de Mineração de Dados em aplicações que envolvem textos livres. Para isso é desenvolvido uma aplicação que visa a identificação de clientes propensos a saírem de ambientes internos de atendimento (CRM) e migrarem para órgãos regulamentadores do setor de telecomunicações. Também são abordados os principais problemas encontrados em aplicações de Mineração de Textos. Por fim, são apresentados os resultados da aplicação de algoritmos de classificação sobre diferentes conjuntos de dados, para a avaliação da melhoria obtida com a inclusão da Mineração de Textos para este tipo de aplicação. Os resultados obtidos mostram um ganho consolidado na melhoria da acuraria na ordem de 32%, fazendo da Mineração de Textos uma ferramenta útil para este tipo de problema. The explosion of new forms of communication between companies and new opportunities and means for companies to take advantage of this interaction. The way customers interact with companies has evolved in the recent years due to the increase in mobile devices and Internet access: clients who traditionally requested phone service migrated to electronic means of service, whether via smartphone app's or via customer service portals. As a result of this technological transformation of the communication medium, text mining has become an attractive form for companies to extract new knowledge from the register of interactions carried out by customers. Within this context, the telecommunications environment provides the inputs for conducting experiments due to the large volume of data generated daily in customer service systems. This job aims to analyze if the use of text mining increases the accuracy of data mining models in applications involving free texts. For this purpose, an application is developed that aims to identify clients likely to leave internal service environments (CRM) and migrate to regulatory agencies in the telecommunications sector [Baeza, Ricardo e Berthier ,1999]. Also addressed are the main problems encountered in text mining applications. Finally, the results of the application of classification algorithms on different data sets are presented for the evaluation of the improvement obtained with the inclusion of text mining for this type of application. The results obtained show a consolidated gain in the improvement of the acuraria in the order of 32%, making the mining of texts a useful tool for this type of problem.

APA, Harvard, Vancouver, ISO, and other styles

23

"Redundancy on content-based indexing." 1997. http://library.cuhk.edu.hk/record=b5889125.

Full text

Abstract:

by Cheung King Lum Kingly. Thesis (M.Phil.)--Chinese University of Hong Kong, 1997. Includes bibliographical references (leaves 108-110). Abstract --- p.ii Acknowledgement --- p.iii Chapter 1 --- Introduction --- p.1 Chapter 1.1 --- Motivation --- p.1 Chapter 1.2 --- Problems in Content-Based Indexing --- p.2 Chapter 1.3 --- Contributions --- p.3 Chapter 1.4 --- Thesis Organization --- p.4 Chapter 2 --- Content-Based Indexing Structures --- p.5 Chapter 2.1 --- R-Tree --- p.6 Chapter 2.2 --- R+-Tree --- p.8 Chapter 2.3 --- R*-Tree --- p.11 Chapter 3 --- Searching in Both R-Tree and R*-Tree --- p.15 Chapter 3.1 --- Exact Search --- p.15 Chapter 3.2 --- Nearest Neighbor Search --- p.19 Chapter 3.2.1 --- Definition of Searching Metrics --- p.19 Chapter 3.2.2 --- Pruning Heuristics --- p.21 Chapter 3.2.3 --- Nearest Neighbor Search Algorithm --- p.24 Chapter 3.2.4 --- Generalization to N-Nearest Neighbor Search --- p.25 Chapter 4 --- An Improved Nearest Neighbor Search Algorithm for R-Tree --- p.29 Chapter 4.1 --- Introduction --- p.29 Chapter 4.2 --- New Pruning Heuristics --- p.31 Chapter 4.3 --- An Improved Nearest Neighbor Search Algorithm --- p.34 Chapter 4.4 --- Replacing Heuristics --- p.36 Chapter 4.5 --- N-Nearest Neighbor Search --- p.41 Chapter 4.6 --- Performance Evaluation --- p.45 Chapter 5 --- Overlapping Nodes in R-Tree and R*-Tree --- p.53 Chapter 5.1 --- Overlapping Nodes --- p.54 Chapter 5.2 --- Problem Induced By Overlapping Nodes --- p.57 Chapter 5.2.1 --- Backtracking --- p.57 Chapter 5.2.2 --- Inefficient Exact Search --- p.57 Chapter 5.2.3 --- Inefficient Nearest Neighbor Search --- p.60 Chapter 6 --- Redundancy On R-Tree --- p.64 Chapter 6.1 --- Motivation --- p.64 Chapter 6.2 --- Adding Redundancy on Index Tree --- p.65 Chapter 6.3 --- R-Tree with Redundancy --- p.66 Chapter 6.3.1 --- Previous Models of R-Tree with Redundancy --- p.66 Chapter 6.3.2 --- Redundant R-Tree --- p.70 Chapter 6.3.3 --- Level List --- p.71 Chapter 6.3.4 --- Inserting Redundancy to R-Tree --- p.72 Chapter 6.3.5 --- Properties of Redundant R-Tree --- p.77 Chapter 7 --- Searching in Redundant R-Tree --- p.82 Chapter 7.1 --- Exact Search --- p.82 Chapter 7.2 --- Nearest Neighbor Search --- p.86 Chapter 7.3 --- Avoidance of Multiple Accesses --- p.89 Chapter 8 --- Experiment --- p.90 Chapter 8.1 --- Experimental Setup --- p.90 Chapter 8.2 --- Exact Search --- p.91 Chapter 8.2.1 --- Clustered Data --- p.91 Chapter 8.2.2 --- Real Data --- p.93 Chapter 8.3 --- Nearest Neighbor Search --- p.95 Chapter 8.3.1 --- Clustered Data --- p.95 Chapter 8.3.2 --- Uniform Data --- p.98 Chapter 8.3.3 --- Real Data --- p.100 Chapter 8.4 --- Discussion --- p.102 Chapter 9 --- Conclusions and Future Research --- p.105 Chapter 9.1 --- Conclusions --- p.105 Chapter 9.2 --- Future Research --- p.106 Bibliography --- p.108

APA, Harvard, Vancouver, ISO, and other styles

24

Newsom, Eric Tyner. "An exploratory study using the predicate-argument structure to develop methodology for measuring semantic similarity of radiology sentences." Thesis, 2013. http://hdl.handle.net/1805/3666.

Full text

Abstract:

Indiana University-Purdue University Indianapolis (IUPUI) The amount of information produced in the form of electronic free text in healthcare is increasing to levels incapable of being processed by humans for advancement of his/her professional practice. Information extraction (IE) is a sub-field of natural language processing with the goal of data reduction of unstructured free text. Pertinent to IE is an annotated corpus that frames how IE methods should create a logical expression necessary for processing meaning of text. Most annotation approaches seek to maximize meaning and knowledge by chunking sentences into phrases and mapping these phrases to a knowledge source to create a logical expression. However, these studies consistently have problems addressing semantics and none have addressed the issue of semantic similarity (or synonymy) to achieve data reduction. To achieve data reduction, a successful methodology for data reduction is dependent on a framework that can represent currently popular phrasal methods of IE but also fully represent the sentence. This study explores and reports on the benefits, problems, and requirements to using the predicate-argument statement (PAS) as the framework. A convenient sample from a prior study with ten synsets of 100 unique sentences from radiology reports deemed by domain experts to mean the same thing will be the text from which PAS structures are formed.

APA, Harvard, Vancouver, ISO, and other styles

25

He, Weimin. "Searching and ranking XML data in a distributed environment." 2008. http://hdl.handle.net/10106/1810.

Full text

APA, Harvard, Vancouver, ISO, and other styles

26

Hansraj, Sanjith. "Knowledge-directed intelligent information retrieval for research funding." Thesis, 2001. http://hdl.handle.net/10413/3087.

Full text

Abstract:

Researchers have always found difficulty in attaining funding from the National Research Foundation (NRF) for new research interests. The field of Artificial Intelligence (AI) holds the promise of improving the matching of research proposals to funding sources in the area of Intelligent Information Retrieval (IIR). IIR is a fairly new AI technique that has evolved from the traditional IR systems to solve real-world problems. Typically, an IIR system contains three main components, namely, a knowledge base, an inference engine and a user-interface. Due to its inferential capabilities. IIR has been found to be applicable to domains for which traditional techniques, such as the use of databases, have not been well suited. This applicability has led it to become a viable AI technique from both, a research and an application perspective. This dissertation concentrates on researching and implementing an IIR system in LPA Prolog, that we call FUND, to assist in the matching of research proposals of prospective researchers to funding sources within the National Research Foundation (NRF). FUND'S reasoning strategy for its inference engine is backward chaining that carries out a depth-first search over its knowledge representation structure, namely, a semantic network. The distance constraint of the Constrained Spreading Activation (CSA) technique is incorporated within the search strategy to help prune non-relevant returns by FUND. The evolution of IIR from IR was covered in detail. Various reasoning strategies and knowledge representation schemes were reviewed to find the combination that best suited the problem domain and programming language chosen. FUND accommodated a depth 4, depth 5 and an exhaustive search algorithm. FUND'S effectiveness was tested, in relation to the different searches with respect to their precision and recall ability and in comparison to other similar systems. FUND'S performance in providing researchers with better funding advice in the South African situation proved to be favourably comparable to other similar systems elsewhere. Thesis (M.Sc.)- University of Natal, Pietermaritzburg, 2001.

APA, Harvard, Vancouver, ISO, and other styles

27

Deedman, Galvin Charles. "Developing conceptual frameworks for structuring legal knowledge to build knowledge-based systems." Thesis, 1994. http://hdl.handle.net/2429/6998.

Full text

Abstract:

This dissertation adopts an interdisciplinary approach to the field of law and artificial intelligence. It argues that the conceptual structuring of legal knowledge within an appropriate theoretical framework is of primary importance when building knowledge-based systems. While technical considerations also play a role, they must take second place to an in-depth understanding of the law. Two alternative methods of structuring legal knowledge in very different domains are used to explore the thesis. A deep-structure approach is used on nervous shock, a rather obscure area of the law of negligence. A script-based method is applied to impaired driving, a well-known part of the criminal law. A knowledge-based system is implemented in each area. The two systems, Nervous Shock Advisor (NSA) and Impaired Driving Advisor (IDA), and the methodologies they embody, are described and contrasted. In light of the work undertaken, consideration is given to the feasibility of lawyers without much technical knowledge using general-purpose tools to build knowledge-based systems for themselves.

APA, Harvard, Vancouver, ISO, and other styles

28

Behrends, Erik. "Evaluation of Queries on Linked Distributed XML Data." Doctoral thesis, 2006. http://hdl.handle.net/11858/00-1735-0000-0006-B38A-9.

Full text

APA, Harvard, Vancouver, ISO, and other styles

29

"Statistical modeling for lexical chains for automatic Chinese news story segmentation." 2010. http://library.cuhk.edu.hk/record=b5894500.

Full text

Abstract:

Chan, Shing Kai. Thesis (M.Phil.)--Chinese University of Hong Kong, 2010. Includes bibliographical references (leaves 106-114). Abstracts in English and Chinese. Abstract --- p.i Acknowledgements --- p.v Chapter 1 --- Introduction --- p.1 Chapter 1.1 --- Problem Statement --- p.2 Chapter 1.2 --- Motivation for Story Segmentation --- p.4 Chapter 1.3 --- Terminologies --- p.5 Chapter 1.4 --- Thesis Goals --- p.6 Chapter 1.5 --- Thesis Organization --- p.8 Chapter 2 --- Background Study --- p.9 Chapter 2.1 --- Coherence-based Approaches --- p.10 Chapter 2.1.1 --- Defining Coherence --- p.10 Chapter 2.1.2 --- Lexical Chaining --- p.12 Chapter 2.1.3 --- Cosine Similarity --- p.15 Chapter 2.1.4 --- Language Modeling --- p.19 Chapter 2.2 --- Feature-based Approaches --- p.21 Chapter 2.2.1 --- Lexical Cues --- p.22 Chapter 2.2.2 --- Audio Cues --- p.23 Chapter 2.2.3 --- Video Cues --- p.24 Chapter 2.3 --- Pros and Cons and Hybrid Approaches --- p.25 Chapter 2.4 --- Chapter Summary --- p.27 Chapter 3 --- Experimental Corpora --- p.29 Chapter 3.1 --- The TDT2 and TDT3 Multi-language Text Corpus --- p.29 Chapter 3.1.1 --- Introduction --- p.29 Chapter 3.1.2 --- Program Particulars and Structures --- p.31 Chapter 3.2 --- Data Preprocessing --- p.33 Chapter 3.2.1 --- Challenges of Lexical Chain Formation on Chi- nese Text --- p.33 Chapter 3.2.2 --- Word Segmentation for Word Units Extraction --- p.35 Chapter 3.2.3 --- Part-of-speech Tagging for Candidate Words Ex- traction --- p.36 Chapter 3.3 --- Chapter Summary --- p.37 Chapter 4 --- Indication of Lexical Cohesiveness by Lexical Chains --- p.39 Chapter 4.1 --- Lexical Chain as a Representation of Cohesiveness --- p.40 Chapter 4.1.1 --- Choice of Word Relations for Lexical Chaining --- p.41 Chapter 4.1.2 --- Lexical Chaining by Connecting Repeated Lexi- cal Elements --- p.43 Chapter 4.2 --- Lexical Chain as an Indicator of Story Segments --- p.48 Chapter 4.2.1 --- Indicators of Absence of Cohesiveness --- p.49 Chapter 4.2.2 --- Indicator of Continuation of Cohesiveness --- p.58 Chapter 4.3 --- Chapter Summary --- p.62 Chapter 5 --- Indication of Story Boundaries by Lexical Chains --- p.63 Chapter 5.1 --- Formal Definition of the Classification Procedures --- p.64 Chapter 5.2 --- Theoretical Framework for Segmentation Based on Lex- ical Chaining --- p.65 Chapter 5.2.1 --- Evaluation of Story Segmentation Accuracy --- p.65 Chapter 5.2.2 --- Previous Approach of Story Segmentation Based on Lexical Chaining --- p.66 Chapter 5.2.3 --- Statistical Framework for Story Segmentation based on Lexical Chaining --- p.69 Chapter 5.2.4 --- Post Processing of Ratio for Boundary Identifi- cation --- p.73 Chapter 5.3 --- Comparing Segmentation Models --- p.75 Chapter 5.4 --- Chapter Summary --- p.79 Chapter 6 --- Analysis of Lexical Chains Features as Boundary Indi- cators --- p.80 Chapter 6.1 --- Error Analysis --- p.81 Chapter 6.2 --- Window Length in the LRT Model --- p.82 Chapter 6.3 --- The Relative Importance of Each Set of Features --- p.84 Chapter 6.4 --- The Effect of Removing Timing Information --- p.92 Chapter 6.5 --- Chapter Summary --- p.96 Chapter 7 --- Conclusions and Future Work --- p.98 Chapter 7.1 --- Contributions --- p.98 Chapter 7.2 --- Future Works --- p.100 Chapter 7.2.1 --- Further Extension of the Framework --- p.100 Chapter 7.2.2 --- Wider Applications of the Framework --- p.105 Bibliography --- p.106

APA, Harvard, Vancouver, ISO, and other styles

30

Krishnan, Anand. "MINING CAUSAL ASSOCIATIONS FROM GERIATRIC LITERATURE." 2013. http://hdl.handle.net/1805/3416.

Full text

Abstract:

Indiana University-Purdue University Indianapolis (IUPUI) Literature pertaining to geriatric care contains rich information regarding the best practices related to geriatric health care issues. The publication domain of geriatric care is small as compared to other health related areas, however, there are over a million articles pertaining to different cases and case interventions capturing best practice outcomes. If the data found in these articles could be harvested and processed effectively, such knowledge could then be translated from research to practice in a quicker and more efficient manner. Geriatric literature contains multiple domains or practice areas and within these domains is a wealth of information such as interventions, information on care for elderly, case studies, and real life scenarios. These articles are comprised of a variety of causal relationships such as the relationship between interventions and disorders. The goal of this study is to identify these causal relations from published abstracts. Natural language processing and statistical methods were adopted to identify and extract these causal relations. Using the developed methods, causal relations were extracted with precision of 79.54%, recall of 81% while only having a false positive rate 8%.

APA, Harvard, Vancouver, ISO, and other styles

31

Pandit, Yogesh. "Context specific text mining for annotating protein interactions with experimental evidence." Thesis, 2014. http://hdl.handle.net/1805/3809.

Full text

Abstract:

Indiana University-Purdue University Indianapolis (IUPUI) Proteins are the building blocks in a biological system. They interact with other proteins to make unique biological phenomenon. Protein-protein interactions play a valuable role in understanding the molecular mechanisms occurring in any biological system. Protein interaction databases are a rich source on protein interaction related information. They gather large amounts of information from published literature to enrich their data. Expert curators put in most of these efforts manually. The amount of accessible and publicly available literature is growing very rapidly. Manual annotation is a time consuming process. And with the rate at which available information is growing, it cannot be dealt with only manual curation. There need to be tools to process this huge amounts of data to bring out valuable gist than can help curators proceed faster. In case of extracting protein-protein interaction evidences from literature, just a mere mention of a certain protein by look-up approaches cannot help validate the interaction. Supporting protein interaction information with experimental evidence can help this cause. In this study, we are applying machine learning based classification techniques to classify and given protein interaction related document into an interaction detection method. We use biological attributes and experimental factors, different combination of which define any particular interaction detection method. Then using predicted detection methods, proteins identified using named entity recognition techniques and decomposing the parts-of-speech composition we search for sentences with experimental evidence for a protein-protein interaction. We report an accuracy of 75.1% with a F-score of 47.6% on a dataset containing 2035 training documents and 300 test documents.

APA, Harvard, Vancouver, ISO, and other styles

32

Mehrabi, Saeed. "Advanced natural language processing and temporal mining for clinical discovery." 2015. http://hdl.handle.net/1805/8895.

Full text

Abstract:

Indiana University-Purdue University Indianapolis (IUPUI) There has been vast and growing amount of healthcare data especially with the rapid adoption of electronic health records (EHRs) as a result of the HITECH act of 2009. It is estimated that around 80% of the clinical information resides in the unstructured narrative of an EHR. Recently, natural language processing (NLP) techniques have offered opportunities to extract information from unstructured clinical texts needed for various clinical applications. A popular method for enabling secondary uses of EHRs is information or concept extraction, a subtask of NLP that seeks to locate and classify elements within text based on the context. Extraction of clinical concepts without considering the context has many complications, including inaccurate diagnosis of patients and contamination of study cohorts. Identifying the negation status and whether a clinical concept belongs to patients or his family members are two of the challenges faced in context detection. A negation algorithm called Dependency Parser Negation (DEEPEN) has been developed in this research study by taking into account the dependency relationship between negation words and concepts within a sentence using the Stanford Dependency Parser. The study results demonstrate that DEEPEN, can reduce the number of incorrect negation assignment for patients with positive findings, and therefore improve the identification of patients with the target clinical findings in EHRs. Additionally, an NLP system consisting of section segmentation and relation discovery was developed to identify patients' family history. To assess the generalizability of the negation and family history algorithm, data from a different clinical institution was used in both algorithm evaluations.

APA, Harvard, Vancouver, ISO, and other styles

Dissertations / Theses on the topic 'Information storage and retrieval systems Data structures (Computer science)'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles