Dissertations / Theses: 'Semantic search engine'

1

Narayan, Nitesh. "Advanced Intranet Search Engine." Thesis, Mälardalen University, School of Innovation, Design and Engineering, 2009. http://urn.kb.se/resolve?urn=urn:nbn:se:mdh:diva-9408.

Full text

Abstract:

Information retrieval has been a prevasive part of human society since its existence.With the advent of internet and World wide Web it became an extensive area of researchand major foucs, which lead to development of various search engines to locate the de-sired information, mostly for globally connected computer networks viz. internet.Butthere is another major part of computer network viz. intranet, which has not seen muchof advancement in information retrieval approaches, in spite of being a major source ofinformation within a large number of organizations.Most common technique for intranet based search engines is still mere database-centric. Thus practically intranets are unable to avail the beneﬁts of sophisticated tech-niques that have been developed for internet based search engines without exposing thedata to commercial search engines.In this Master level thesis we propose a ”state of the art architecture” for an advancedsearch engine for intranet which is capable of dealing with continuously growing sizeof intranets knowledge base. This search engine employs lexical processing of doc-umetns,where documents are indexed and searched based on standalone terms or key-words, along with the semantic processing of the documents where the context of thewords and the relationship among them is given more importance.Combining lexical and semantic processing of the documents give an effective ap-proach to handle navigational queries along with research queries, opposite to the modernsearch engines which either uses lexical processing or semantic processing (or one as themajor) of the documents. We give equal importance to both the approaches in our design,considering best of the both world.This work also takes into account various widely acclaimed concepts like inferencerules, ontologies and active feedback from the user community to continuously enhanceand improve the quality of search results along with the possibility to infer and deducenew knowledge from the existing one, while preparing for the advent of semantic web.

APA, Harvard, Vancouver, ISO, and other styles

2

Xian, Yikun, and Liu Zhang. "Semantic Search with Information Integration." Thesis, Linnéuniversitetet, Institutionen för datavetenskap, fysik och matematik, DFM, 2011. http://urn.kb.se/resolve?urn=urn:nbn:se:lnu:diva-13832.

Full text

Abstract:

Since the search engine was first released in 1993, the development has never been slow down and various search engines emerged to vied for popularity. However, current traditional search engines like Google and Yahoo! are based on key words which lead to results impreciseness and information redundancy. A new search engine with semantic analysis can be the alternate solution in the future. It is more intelligent and informative, and provides better interaction with users. This thesis discusses the detail on semantic search, explains advantages of semantic search over other key-word-based search and introduces how to integrate semantic analysis with common search engines. At the end of this thesis, there is an example of implementation of a simple semantic search engine.

APA, Harvard, Vancouver, ISO, and other styles

3

Wieser, Christoph. "Building a semantic search engine with games and crowdsourcing." Diss., Ludwig-Maximilians-Universität München, 2014. http://nbn-resolving.de/urn:nbn:de:bvb:19-169754.

Full text

Abstract:

Semantic search engines aim at improving conventional search with semantic information, or meta-data, on the data searched for and/or on the searchers. So far, approaches to semantic search exploit characteristics of the searchers like age, education, or spoken language for selecting and/or ranking search results. Such data allow to build up a semantic search engine as an extension of a conventional search engine. The crawlers of well established search engines like Google, Yahoo! or Bing can index documents but, so far, their capabilities to recognize the intentions of searchers are still rather limited. Indeed, taking into account characteristics of the searchers considerably extend both, the quantity of data to analyse and the dimensionality of the search problem. Well established search engines therefore still focus on general search, that is, "search for all", not on specialized search, that is, "search for a few". This thesis reports on techniques that have been adapted or conceived, deployed, and tested for building a semantic search engine for the very specific context of artworks. In contrast to, for example, the interpretation of X-ray images, the interpretation of artworks is far from being fully automatable. Therefore artwork interpretation has been based on Human Computation, that is, a software-based gathering of contributions by many humans. The approach reported about in this thesis first relies on so called Games With A Purpose, or GWAPs, for this gathering: Casual games provide an incentive for a potentially unlimited community of humans to contribute with their appreciations of artworks. Designing convenient incentives is less trivial than it might seem at first. An ecosystem of games is needed so as to collect the meta-data on artworks intended for. One game generates the data that can serve as input of another game. This results in semantically rich meta-data that can be used for building up a successful semantic search engine. Thus, a first part of this thesis reports on a "game ecosystem" specifically designed from one known game and including several novel games belonging to the following game classes: (1) Description Games for collecting obvious and trivial meta-data, basically the well-known ESP (for extra-sensorial perception) game of Luis von Ahn, (2) the Dissemination Game Eligo generating translations, (3) the Diversification Game Karido aiming at sharpening differences between the objects, that is, the artworks, interpreted and (3) the Integration Games Combino, Sentiment and TagATag that generate structured meta-data. Secondly, the approach to building a semantic search engine reported about in this thesis relies on Higher-Order Singular Value Decomposition (SVD). More precisely, the data and meta-data on artworks gathered with the afore mentioned GWAPs are collected in a tensor, that is a mathematical structure generalising matrices to more than only two dimensions, columns and rows. The dimensions considered are the artwork descriptions, the players, and the artwork themselves. A Higher-Order SVD of this tensor is first used for noise reduction in This thesis reports also on deploying a Higher-Order LSA. The parallel Higher-Order SVD algorithm applied for the Higher-Order LSA and its implementation has been validated on an application related to, but independent from, the semantic search engine for artworks striven for: image compression. This thesis reports on the surprisingly good image compression which can be achieved with Higher-Order SVD. While compression methods based on matrix SVD for each color, the approach reported about in this thesis relies on one single (higher-order) SVD of the whole tensor. This results in both, better quality of the compressed image and in a significant reduction of the memory space needed. Higher-Order SVD is extremely time-consuming what calls for parallel computation. Thus, a step towards automatizing the construction of a semantic search engine for artworks was parallelizing the higher-order SVD method used and running the resulting parallel algorithm on a super-computer. This thesis reports on using Hestenes’ method and R-SVD for parallelising the higher-order SVD. This method is an unconventional choice which is explained and motivated. As of the super-computer needed, this thesis reports on turning the web browsers of the players or searchers into a distributed parallel computer. This is done by a novel specific system and a novel implementation of the MapReduce data framework to data parallelism. Harnessing the web browsers of the players or searchers saves computational power on the server-side. It also scales extremely well with the number of players or searchers because both, playing with and searching for artworks, require human reflection and therefore results in idle local processors that can be brought together into a distributed super-computer.
Semantische Suchmaschinen dienen der Verbesserung konventioneller Suche mit semantischen Informationen, oder Metadaten, zu Daten, nach denen gesucht wird, oder zu den Suchenden. Bisher nutzt Semantische Suche Charakteristika von Suchenden wie Alter, Bildung oder gesprochene Sprache für die Auswahl und/oder das Ranking von Suchergebnissen. Solche Daten erlauben den Aufbau einer Semantischen Suchmaschine als Erweiterung einer konventionellen Suchmaschine. Die Crawler der fest etablierten Suchmaschinen wie Google, Yahoo! oder Bing können Dokumente indizieren, bisher sind die Fähigkeiten eher beschränkt, die Absichten von Suchenden zu erkennen. Tatsächlich erweitert die Berücksichtigung von Charakteristika von Suchenden beträchtlich beides, die Menge an zu analysierenden Daten und die Dimensionalität des Such-Problems. Fest etablierte Suchmaschinen fokussieren deswegen stark auf allgemeine Suche, also "Suche für alle", nicht auf spezialisierte Suche, also "Suche für wenige". Diese Arbeit berichtet von Techniken, die adaptiert oder konzipiert, eingesetzt und getestet wurden, um eine semantische Suchmaschine für den sehr speziellen Kontext von Kunstwerken aufzubauen. Im Gegensatz beispielsweise zur Interpretation von Röntgenbildern ist die Interpretation von Kunstwerken weit weg davon gänzlich automatisiert werden zu können. Deswegen basiert die Interpretation von Kunstwerken auf menschlichen Berechnungen, also Software-basiertes Sammeln von menschlichen Beiträgen. Der Ansatz, über den in dieser Arbeit berichtet wird, beruht auf sogenannten "Games With a Purpose" oder GWAPs die folgendes sammeln: Zwanglose Spiele bieten einen Anreiz für eine potenziell unbeschränkte Gemeinde von Menschen, mit Ihrer Wertschätzung von Kunstwerken beizutragen. Geeignete Anreize zu entwerfen in weniger trivial als es zuerst scheinen mag. Ein Ökosystem von Spielen wird benötigt, um Metadaten gedacht für Kunstwerke zu sammeln. Ein Spiel erzeugt Daten, die als Eingabe für ein anderes Spiel dienen können. Dies resultiert in semantisch reichhaltigen Metadaten, die verwendet werden können, um eine erfolgreiche Semantische Suchmaschine aufzubauen. Deswegen berichtet der erste Teil dieser Arbeit von einem "Spiel-Ökosystem", entwickelt auf Basis eines bekannten Spiels und verschiedenen neuartigen Spielen, die zu verschiedenen Spiel-Klassen gehören. (1) Beschreibungs-Spiele zum Sammeln offensichtlicher und trivialer Metadaten, vor allem dem gut bekannten ESP-Spiel (Extra Sensorische Wahrnehmung) von Luis von Ahn, (2) dem Verbreitungs-Spiel Eligo zur Erzeugung von Übersetzungen, (3) dem Diversifikations-Spiel Karido, das Unterschiede zwischen Objekten, also interpretierten Kunstwerken, schärft und (3) Integrations-Spiele Combino, Sentiment und Tag A Tag, die strukturierte Metadaten erzeugen. Zweitens beruht der Ansatz zum Aufbau einer semantischen Suchmaschine, wie in dieser Arbeit berichtet, auf Singulärwertzerlegung (SVD) höherer Ordnung. Präziser werden die Daten und Metadaten über Kunstwerk gesammelt mit den vorher genannten GWAPs in einem Tensor gesammelt, einer mathematischen Struktur zur Generalisierung von Matrizen zu mehr als zwei Dimensionen, Spalten und Zeilen. Die betrachteten Dimensionen sind die Beschreibungen der Kunstwerke, die Spieler, und die Kunstwerke selbst. Eine Singulärwertzerlegung höherer Ordnung dieses Tensors wird zuerst zur Rauschreduktion verwendet nach der Methode der sogenannten Latenten Semantischen Analyse (LSA). Diese Arbeit berichtet auch über die Anwendung einer LSA höherer Ordnung. Der parallele Algorithmus für Singulärwertzerlegungen höherer Ordnung, der für LSA höherer Ordnung verwendet wird, und seine Implementierung wurden validiert an einer verwandten aber von der semantischen Suche unabhängig angestrebten Anwendung: Bildkompression. Diese Arbeit berichtet von überraschend guter Kompression, die mit Singulärwertzerlegung höherer Ordnung erzielt werden kann. Neben Matrix-SVD-basierten Kompressionsverfahren für jede Farbe, beruht der Ansatz wie in dieser Arbeit berichtet auf einer einzigen SVD (höherer Ordnung) auf dem gesamten Tensor. Dies resultiert in beidem, besserer Qualität von komprimierten Bildern und einer signifikant geringeren des benötigten Speicherplatzes. Singulärwertzerlegung höherer Ordnung ist extrem zeitaufwändig, was parallele Berechnung verlangt. Deswegen war ein Schritt in Richtung Aufbau einer semantischen Suchmaschine für Kunstwerke eine Parallelisierung der verwendeten SVD höherer Ordnung auf einem Super-Computer. Diese Arbeit berichtet vom Einsatz der Hestenes’-Methode und R-SVD zur Parallelisierung der SVD höherer Ordnung. Diese Methode ist eine unkonventionell Wahl, die erklärt und motiviert wird. Ab nun wird ein Super-Computer benötigt. Diese Arbeit berichtet über die Wandlung der Webbrowser von Spielern oder Suchenden in einen verteilten Super-Computer. Dies leistet ein neuartiges spezielles System und eine neuartige Implementierung des MapReduce Daten-Frameworks für Datenparallelismus. Das Einspannen der Webbrowser von Spielern und Suchenden spart server-seitige Berechnungskraft. Ebenso skaliert die Berechnungskraft so extrem gut mit der Spieleranzahl oder Suchenden, denn beides, Spiel mit oder Suche nach Kunstwerken, benötigt menschliche Reflektion, was deswegen zu ungenutzten lokalen Prozessoren führt, die zu einem verteilten Super-Computer zusammengeschlossen werden können.

APA, Harvard, Vancouver, ISO, and other styles

4

Hawkins, Brian M. "Developing a modular framework for implementing a semantic search engine." Thesis, Monterey, California : Naval Postgraduate School, 2009. http://edocs.nps.edu/npspubs/scholarly/theses/2009/Sep/09Sep%5FHawkins.pdf.

Full text

Abstract:

Thesis (M.S. in Computer Science)--Naval Postgraduate School, September 2009.
Thesis Advisor(s): Martell, Craig. "September 2009." Description based on title screen as viewed on November 6, 2009. Author(s) subject terms: Semantic Search, Modular Search Engine, object-oriented programming, Java, UML. Includes bibliographical references (p. 77-78). Also available in print.

APA, Harvard, Vancouver, ISO, and other styles

5

Gkoutzis, Konstantinos. "A Semantic Web based search engine with X3D visualisation of queries and results." Thesis, University of Plymouth, 2013. http://hdl.handle.net/10026.1/1595.

Full text

Abstract:

The Semantic Web project has introduced new techniques for managing information. Data can now be organised more efficiently and in such a way that computers can take advantage of the relationships that characterise the given input to present more relevant output. Semantic Web based search engines can quickly educe exactly what is needed to be found and retrieve it while avoiding information overload. Up until now, search engines have interacted with their users by asking them to look for words and phrases. We propose the creation of a new generation Semantic Web search engine that will offer a visual interface for queries and results. To create such an engine, information input must be viewed not merely as keywords, but as specific concepts and objects which are all part of the same universal system. To make the manipulation of the interconnected visual objects simpler and more natural, 3D graphics are utilised, based on the X3D Web standard, allowing users to semantically synthesise their queries faster and in a more logical way, both for them and the computer.

APA, Harvard, Vancouver, ISO, and other styles

6

Wieser, Christoph Verfasser], and François [Akademischer Betreuer] [Bry. "Building a semantic search engine with games and crowdsourcing / Christoph Wieser. Betreuer: François Bry." München : Universitätsbibliothek der Ludwig-Maximilians-Universität, 2014. http://d-nb.info/1051777127/34.

Full text

APA, Harvard, Vancouver, ISO, and other styles

7

Chatra, Raveesh Sandeep. "Using the Architectural Tradeoff Analysis Method to Evaluate the Software Architecture of a Semantic Search Engine: A Case Study." The Ohio State University, 2013. http://rave.ohiolink.edu/etdc/view?acc_num=osu1376916217.

Full text

APA, Harvard, Vancouver, ISO, and other styles

8

Wächter, Thomas. "Semi-automated Ontology Generation for Biocuration and Semantic Search." Doctoral thesis, Saechsische Landesbibliothek- Staats- und Universitaetsbibliothek Dresden, 2011. http://nbn-resolving.de/urn:nbn:de:bsz:14-qucosa-64838.

Full text

Abstract:

Background: In the life sciences, the amount of literature and experimental data grows at a tremendous rate. In order to effectively access and integrate these data, biomedical ontologies – controlled, hierarchical vocabularies – are being developed. Creating and maintaining such ontologies is a difficult, labour-intensive, manual process. Many computational methods which can support ontology construction have been proposed in the past. However, good, validated systems are largely missing. Motivation: The biocuration community plays a central role in the development of ontologies. Any method that can support their efforts has the potential to have a huge impact in the life sciences. Recently, a number of semantic search engines were created that make use of biomedical ontologies for document retrieval. To transfer the technology to other knowledge domains, suitable ontologies need to be created. One area where ontologies may prove particularly useful is the search for alternative methods to animal testing, an area where comprehensive search is of special interest to determine the availability or unavailability of alternative methods. Results: The Dresden Ontology Generator for Directed Acyclic Graphs (DOG4DAG) developed in this thesis is a system which supports the creation and extension of ontologies by semi-automatically generating terms, definitions, and parent-child relations from text in PubMed, the web, and PDF repositories. The system is seamlessly integrated into OBO-Edit and Protégé, two widely used ontology editors in the life sciences. DOG4DAG generates terms by identifying statistically significant noun-phrases in text. For definitions and parent-child relations it employs pattern-based web searches. Each generation step has been systematically evaluated using manually validated benchmarks. The term generation leads to high quality terms also found in manually created ontologies. Definitions can be retrieved for up to 78% of terms, child ancestor relations for up to 54%. No other validated system exists that achieves comparable results. To improve the search for information on alternative methods to animal testing an ontology has been developed that contains 17,151 terms of which 10% were newly created and 90% were re-used from existing resources. This ontology is the core of Go3R, the first semantic search engine in this field. When a user performs a search query with Go3R, the search engine expands this request using the structure and terminology of the ontology. The machine classification employed in Go3R is capable of distinguishing documents related to alternative methods from those which are not with an F-measure of 90% on a manual benchmark. Approximately 200,000 of the 19 million documents listed in PubMed were identified as relevant, either because a specific term was contained or due to the automatic classification. The Go3R search engine is available on-line under www.Go3R.org.

APA, Harvard, Vancouver, ISO, and other styles

9

Aluc, Gunes. "Design And Implementation Of An Ontology Extraction Framework And A Semantic Search Engine Over Jsr-170 Compliant Content Repositories." Master's thesis, METU, 2009. http://etd.lib.metu.edu.tr/upload/12610665/index.pdf.

Full text

Abstract:

A Content Management System (CMS) is a software application for creating, publishing, editing and managing content. The future step in content management system development is building intelligence over existing content resources that are heterogeneous in nature. Intelligence collected at the knowledge base can later on be used for executing semantic queries. Expressing the relations among content resources with ontological formalisms is therefore the key to implementing such semantic features. In this work, a methodology for the semantic lifting of JSR-170 compliant content repositories to ontologies is devised. The fact that in the worst case JSR-170 enforces no particular structural restrictions on the content model poses a technical challenge both for the initial build-up and further synchronization of the knowledge base. To address this problem, some recurring structural patterns in JSR-170 compliant content repositories are exploited. The value of the ontology extraction framework is assessed through a semantic search mechanism that is built on top of the extracted ontologies. The work in this thesis is complementary to the &ldquo
Interactive Knowledge Stack for small to medium CMS/KMS providers (IKS)&rdquo
project funded by the EC (FP7-ICT-2007-3).

APA, Harvard, Vancouver, ISO, and other styles

10

Arlitsch, Kenning Verfasser], Michael [Gutachter] Seadle, and Vivien [Gutachter] [Petras. "Semantic Web Identity of academic organizations : search engine entity recognition and the sources that influence Knowledge Graph Cards in search results / Kenning Arlitsch ; Gutachter: Michael Seadle, Vivien Petras." Berlin : Humboldt Universität zu Berlin, Philosophische Fakultät I, 2017. http://d-nb.info/1124893482/34.

Full text

APA, Harvard, Vancouver, ISO, and other styles

11

Arlitsch, Kenning [Verfasser], Michael Gutachter] Seadle, and Vivien [Gutachter] [Petras. "Semantic Web Identity of academic organizations : search engine entity recognition and the sources that influence Knowledge Graph Cards in search results / Kenning Arlitsch ; Gutachter: Michael Seadle, Vivien Petras." Berlin : Humboldt Universität zu Berlin, Philosophische Fakultät I, 2017. http://d-nb.info/1124893482/34.

Full text

APA, Harvard, Vancouver, ISO, and other styles

12

Yu, Liyang. "An Indexation and Discovery Architecture for Semantic Web Services and its Application in Bioinformatics." Digital Archive @ GSU, 2006. http://digitalarchive.gsu.edu/cs_theses/20.

Full text

Abstract:

Recently much research effort has been devoted to the discovery of relevant Web services. It is widely recognized that adding semantics to service description is the solution to this challenge. Web services with explicit semantic annotation are called Semantic Web Services (SWS). This research proposes an indexation and discovery architecture for SWS, together with a prototype application in the area of bioinformatics. In this approach, a SWS repository is created and maintained by crawling both ontology-oriented UDDI registries and Web sites that hosting SWS. For a given service request, the proposed system invokes the matching algorithm and a candidate set is returned with different degree of matching considered. This approach can add more flexibility to the current industry standards by offering more choices to both the service requesters and publishers. Also, the prototype developed in this research shows the value can be added by using SWS in application areas such as bioinformatics.

APA, Harvard, Vancouver, ISO, and other styles

13

Angelini, Marco. "Un approccio per la concettualizzazione di insiemi di documenti." Bachelor's thesis, Alma Mater Studiorum - Università di Bologna, 2013. http://amslaurea.unibo.it/5604/.

Full text

Abstract:

Introduzione a tecniche di web semantico e realizzazione di un approccio in grado di ricreare un ambiente familiare di un qualsiasi motore di ricerca con funzionalità semantico-lessicali e possibilità di estrazione, in base ai risultati di ricerca, dei concetti e termini chiave che costituiranno i relativi gruppi di raccolta per i vari documenti con argomenti in comune.

APA, Harvard, Vancouver, ISO, and other styles

14

Doms, Andreas. "GoPubMed: Ontology-based literature search for the life sciences." Doctoral thesis, Saechsische Landesbibliothek- Staats- und Universitaetsbibliothek Dresden, 2009. http://nbn-resolving.de/urn:nbn:de:bsz:14-ds-1232454035091-47450.

Full text

Abstract:

Background: Most of our biomedical knowledge is only accessible through texts. The biomedical literature grows exponentially and PubMed comprises over 18.000.000 literature abstracts. Recently much effort has been put into the creation of biomedical ontologies which capture biomedical facts. The exploitation of ontologies to explore the scientific literature is a new area of research. Motivation: When people search, they have questions in mind. Answering questions in a domain requires the knowledge of the terminology of that domain. Classical search engines do not provide background knowledge for the presentation of search results. Ontology annotated structured databases allow for data-mining. The hypothesis is that ontology annotated literature databases allow for text-mining. The central problem is to associate scientific publications with ontological concepts. This is a prerequisite for ontology-based literature search. The question then is how to answer biomedical questions using ontologies and a literature corpus. Finally the task is to automate bibliometric analyses on an corpus of scientific publications. Approach: Recent joint efforts on automatically extracting information from free text showed that the applied methods are complementary. The idea is to employ the rich terminological and relational information stored in biomedical ontologies to markup biomedical text documents. Based on established semantic links between documents and ontology concepts the goal is to answer biomedical question on a corpus of documents. The entirely annotated literature corpus allows for the first time to automatically generate bibliometric analyses for ontological concepts, authors and institutions. Results: This work includes a novel annotation framework for free texts with ontological concepts. The framework allows to generate recognition patterns rules from the terminological and relational information in an ontology. Maximum entropy models can be trained to distinguish the meaning of ambiguous concept labels. The framework was used to develop a annotation pipeline for PubMed abstracts with 27,863 Gene Ontology concepts. The evaluation of the recognition performance yielded a precision of 79.9% and a recall of 72.7% improving the previously used algorithm by 25,7% f-measure. The evaluation was done on a manually created (by the original authors) curation corpus of 689 PubMed abstracts with 18,356 curations of concepts. Methods to reason over large amounts of documents with ontologies were developed. The ability to answer questions with the online system was shown on a set of biomedical question of the TREC Genomics Track 2006 benchmark. This work includes the first ontology-based, large scale, online available, up-to-date bibliometric analysis for topics in molecular biology represented by GO concepts. The automatic bibliometric analysis is in line with existing, but often out-dated, manual analyses. Outlook: A number of promising continuations starting from this work have been spun off. A freely available online search engine has a growing user community. A spin-off company was funded by the High-Tech Gründerfonds which commercializes the new ontology-based search paradigm. Several off-springs of GoPubMed including GoWeb (general web search), Go3R (search in replacement, reduction, refinement methods for animal experiments), GoGene (search in gene/protein databases) are developed.

APA, Harvard, Vancouver, ISO, and other styles

15

Doms, Andreas. "GoPubMed: Ontology-based literature search for the life sciences." Doctoral thesis, Technische Universität Dresden, 2008. https://tud.qucosa.de/id/qucosa%3A23835.

Full text

Abstract:

Background: Most of our biomedical knowledge is only accessible through texts. The biomedical literature grows exponentially and PubMed comprises over 18.000.000 literature abstracts. Recently much effort has been put into the creation of biomedical ontologies which capture biomedical facts. The exploitation of ontologies to explore the scientific literature is a new area of research. Motivation: When people search, they have questions in mind. Answering questions in a domain requires the knowledge of the terminology of that domain. Classical search engines do not provide background knowledge for the presentation of search results. Ontology annotated structured databases allow for data-mining. The hypothesis is that ontology annotated literature databases allow for text-mining. The central problem is to associate scientific publications with ontological concepts. This is a prerequisite for ontology-based literature search. The question then is how to answer biomedical questions using ontologies and a literature corpus. Finally the task is to automate bibliometric analyses on an corpus of scientific publications. Approach: Recent joint efforts on automatically extracting information from free text showed that the applied methods are complementary. The idea is to employ the rich terminological and relational information stored in biomedical ontologies to markup biomedical text documents. Based on established semantic links between documents and ontology concepts the goal is to answer biomedical question on a corpus of documents. The entirely annotated literature corpus allows for the first time to automatically generate bibliometric analyses for ontological concepts, authors and institutions. Results: This work includes a novel annotation framework for free texts with ontological concepts. The framework allows to generate recognition patterns rules from the terminological and relational information in an ontology. Maximum entropy models can be trained to distinguish the meaning of ambiguous concept labels. The framework was used to develop a annotation pipeline for PubMed abstracts with 27,863 Gene Ontology concepts. The evaluation of the recognition performance yielded a precision of 79.9% and a recall of 72.7% improving the previously used algorithm by 25,7% f-measure. The evaluation was done on a manually created (by the original authors) curation corpus of 689 PubMed abstracts with 18,356 curations of concepts. Methods to reason over large amounts of documents with ontologies were developed. The ability to answer questions with the online system was shown on a set of biomedical question of the TREC Genomics Track 2006 benchmark. This work includes the first ontology-based, large scale, online available, up-to-date bibliometric analysis for topics in molecular biology represented by GO concepts. The automatic bibliometric analysis is in line with existing, but often out-dated, manual analyses. Outlook: A number of promising continuations starting from this work have been spun off. A freely available online search engine has a growing user community. A spin-off company was funded by the High-Tech Gründerfonds which commercializes the new ontology-based search paradigm. Several off-springs of GoPubMed including GoWeb (general web search), Go3R (search in replacement, reduction, refinement methods for animal experiments), GoGene (search in gene/protein databases) are developed.

APA, Harvard, Vancouver, ISO, and other styles

16

Lully, Vincent. "Vers un meilleur accès aux informations pertinentes à l’aide du Web sémantique : application au domaine du e-tourisme." Thesis, Sorbonne université, 2018. http://www.theses.fr/2018SORUL196.

Full text

Abstract:

Cette thèse part du constat qu’il y a une infobésité croissante sur le Web. Les deux types d’outils principaux, à savoir le système de recherche et celui de recommandation, qui sont conçus pour nous aider à explorer les données du Web, connaissent plusieurs problématiques dans : (1) l’assistance de la manifestation des besoins d’informations explicites, (2) la sélection des documents pertinents, et (3) la mise en valeur des documents sélectionnés. Nous proposons des approches mobilisant les technologies du Web sémantique afin de pallier à ces problématiques et d’améliorer l’accès aux informations pertinentes. Nous avons notamment proposé : (1) une approche sémantique d’auto-complétion qui aide les utilisateurs à formuler des requêtes de recherche plus longues et plus riches, (2) des approches de recommandation utilisant des liens hiérarchiques et transversaux des graphes de connaissances pour améliorer la pertinence, (3) un framework d’affinité sémantique pour intégrer des données sémantiques et sociales pour parvenir à des recommandations qualitativement équilibrées en termes de pertinence, diversité et nouveauté, (4) des approches sémantiques visant à améliorer la pertinence, l’intelligibilité et la convivialité des explications des recommandations, (5) deux approches de profilage sémantique utilisateur à partir des images, et (6) une approche de sélection des meilleures images pour accompagner les documents recommandés dans les bannières de recommandation. Nous avons implémenté et appliqué nos approches dans le domaine du e-tourisme. Elles ont été dûment évaluées quantitativement avec des jeux de données vérité terrain et qualitativement à travers des études utilisateurs
This thesis starts with the observation that there is an increasing infobesity on the Web. The two main types of tools, namely the search engine and the recommender system, which are designed to help us explore the Web data, have several problems: (1) in helping users express their explicit information needs, (2) in selecting relevant documents, and (3) in valuing the selected documents. We propose several approaches using Semantic Web technologies to remedy these problems and to improve the access to relevant information. We propose particularly: (1) a semantic auto-completion approach which helps users formulate longer and richer search queries, (2) several recommendation approaches using the hierarchical and transversal links in knowledge graphs to improve the relevance of the recommendations, (3) a semantic affinity framework to integrate semantic and social data to yield qualitatively balanced recommendations in terms of relevance, diversity and novelty, (4) several recommendation explanation approaches aiming at improving the relevance, the intelligibility and the user-friendliness, (5) two image user profiling approaches and (6) an approach which selects the best images to accompany the recommended documents in recommendation banners. We implemented and applied our approaches in the e-tourism domain. They have been properly evaluated quantitatively with ground-truth datasets and qualitatively through user studies

APA, Harvard, Vancouver, ISO, and other styles

17

Kozák, David. "Indexace rozsáhlých textových dat a vyhledávání v zaindexovaných datech." Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2020. http://www.nusl.cz/ntk/nusl-417263.

Full text

Abstract:

Tématem této práce je sémantické vyhledávání ve velkých textových datech. Cílem je navrhnout a implementovat vyhledávač, který se bude efektivně dotazovat nad sémanticky obohacenými dokumenty a prezentovat výsledky uživatelsky přívětivým způsobem. V práci jsou nejdříve analyzovány současné sémantické vyhledávače, spolu s jejich silnými a slabými stránkami. Poté je přednesen návrh nového vyhledávače s vlastním dotazovacím jazykem. Tento systém se skládá z komponent pro indexaci a dotazování se nad dokumenty, management serveru, překladače pro dotazovací jazyk a dvou klientských aplikací, webové a konzolové. Vyhledávač byl úspěšně navržen, implementován i nasazen a je veřejně dostupný na Internetu. Výsledky práce umožňují široké veřejnosti využívat sémantického vyhledávání.

APA, Harvard, Vancouver, ISO, and other styles

18

Noronha, Norman. "ReQuest - Validating Semantic Searches." Master's thesis, Department of Informatics, University of Lisbon, 2004. http://hdl.handle.net/10451/13849.

Full text

Abstract:

O ReQuest é um motor de pesquisa para buscas semânticas em domínios específicos. Oferece aos seus utilizadores a possibilidade de realizarem pesquisas integrando ontologias e ficheiros de descrição de recursos (RDF). O ReQuest foi criado para permitir avaliar comparativamente os resultados obtidos em pesquisas semânticas com os das pesquisas em recuperacão de informacão clássica. Os resultados da avaliacão por utilizadores do ReQuest no domínio da informacão noticiosa revelaram que a Web Semântica permite melhorar as pesquisas. ReQuest is a semantic search engine for specialized domains. It offers searches based on ontologies and resource description files (RDF). ReQuest was built to evaluate semantic searches against classic Information Retrieval searches. The results of a user survey in the newsdomain showed that the Semantic Web can improve searches

APA, Harvard, Vancouver, ISO, and other styles

19

Sharan, Ajitabh Sharan Ajitabh. "Exploiting semantic locality to improve peer-to-peer search mechanisms /." Online version of thesis, 2006. https://ritdml.rit.edu/dspace/handle/1850/2891.

Full text

APA, Harvard, Vancouver, ISO, and other styles

20

Kulkarni, Swarnim. "Capturing semantics using a link analysis based concept extractor approach." Thesis, Manhattan, Kan. : Kansas State University, 2009. http://hdl.handle.net/2097/1526.

Full text

APA, Harvard, Vancouver, ISO, and other styles

21

Tekli, Joe, Richard Chbeir, Agma J. M. Traina, Caetano Traina, Kokou Yetongnon, Carlos Raymundo Ibanez, Assad Marc Al, and Christian Kallas. "Full-fledged semantic indexing and querying model designed for seamless integration in legacy RDBMS." Elsevier B.V, 2018. http://hdl.handle.net/10757/624626.

Full text

Abstract:

El texto completo de este trabajo no está disponible en el Repositorio Académico UPC por restricciones de la casa editorial donde ha sido publicado.
In the past decade, there has been an increasing need for semantic-aware data search and indexing in textual (structured and NoSQL) databases, as full-text search systems became available to non-experts where users have no knowledge about the data being searched and often formulate query keywords which are different from those used by the authors in indexing relevant documents, thus producing noisy and sometimes irrelevant results. In this paper, we address the problem of semantic-aware querying and provide a general framework for modeling and processing semantic-based keyword queries in textual databases, i.e., considering the lexical and semantic similarities/disparities when matching user query and data index terms. To do so, we design and construct a semantic-aware inverted index structure called SemIndex, extending the standard inverted index by constructing a tightly coupled inverted index graph that combines two main resources: a semantic network and a standard inverted index on a collection of textual data. We then provide a general keyword query model with specially tailored query processing algorithms built on top of SemIndex, in order to produce semantic-aware results, allowing the user to choose the results' semantic coverage and expressiveness based on her needs. To investigate the practicality and effectiveness of SemIndex, we discuss its physical design within a standard commercial RDBMS allowing to create, store, and query its graph structure, thus enabling the system to easily scale up and handle large volumes of data. We have conducted a battery of experiments to test the performance of SemIndex, evaluating its construction time, storage size, query processing time, and result quality, in comparison with legacy inverted index. Results highlight both the effectiveness and scalability of our approach.
This study is partly funded by the National Council for Scientific Research - Lebanon (CNRS-L), by the Lebanese American University (LAU), and the Research Support Foundation of the State of Sao Paulo ( FAPESP ). Appendix SemIndex Weighting Scheme We propose a set of weighting functions to assign weight scores to SemIndex entries, including: index nodes , index edges, data nodes , and data edges . The weighting functions are used to select and rank semantically relevant results w.r.t. the user's query (cf. SemIndex query processing in Section 5). Other weight functions could be later added to cater to the index designer's needs.
Revisión por pares

APA, Harvard, Vancouver, ISO, and other styles

22

Rahuma, Awatef. "Semantically-enhanced image tagging system." Thesis, De Montfort University, 2013. http://hdl.handle.net/2086/9494.

Full text

Abstract:

In multimedia databases, data are images, audio, video, texts, etc. Research interests in these types of databases have increased in the last decade or so, especially with the advent of the Internet and Semantic Web. Fundamental research issues vary from unified data modelling, retrieval of data items and dynamic nature of updates. The thesis builds on findings in Semantic Web and retrieval techniques and explores novel tagging methods for identifying data items. Tagging systems have become popular which enable the users to add tags to Internet resources such as images, video and audio to make them more manageable. Collaborative tagging is concerned with the relationship between people and resources. Most of these resources have metadata in machine processable format and enable users to use free- text keywords (so-called tags) as search techniques. This research references some tagging systems, e.g. Flicker, delicious and myweb2.0. The limitation with such techniques includes polysemy (one word and different meaning), synonymy (different words and one meaning), different lexical forms (singular, plural, and conjugated words) and misspelling errors or alternate spellings. The work presented in this thesis introduces semantic characterization of web resources that describes the structure and organization of tagging, aiming to extend the existing Multimedia Query using similarity measures to cater for collaborative tagging. In addition, we discuss the semantic difficulties of tagging systems, suggesting improvements in their accuracies. The scope of our work is classified as follows: (i) Increase the accuracy and confidence of multimedia tagging systems. (ii) Increase the similarity measures of images by integrating varieties of measures. To address the first shortcoming, we use the WordNet based on a tagging system for social sharing and retrieval of images as a semantic lingual ontology resource. For the second shortcoming we use the similarity measures in different ways to recognise the multimedia tagging system. Fundamental to our work is the novel information model that we have constructed for our computation. This is based on the fact that an image is a rich object that can be characterised and formulated in n-dimensions, each dimension contains valuable information that will help in increasing the accuracy of the search. For example an image of a tree in a forest contains more information than an image of the same tree but in a different environment. In this thesis we characterise a data item (an image) by a primary description, followed by n-secondary descriptions. As n increases, the accuracy of the search improves. We give various techniques to analyse data and its associated query. To increase the accuracy of the tagging system we have performed different experiments on many images using similarity measures and various techniques from VoI (Value of Information). The findings have shown the linkage/integration between similarity measures and that VoI improves searches and helps/guides a tagger in choosing the most adequate of tags.

APA, Harvard, Vancouver, ISO, and other styles

23

Morales, Vidal Jorge Arturo. "Research on proposals and trends in the architectures of semantic search engines: a systematic literature review." Master's thesis, Pontificia Universidad Católica del Perú, 2018. http://tesis.pucp.edu.pe/repositorio/handle/123456789/11974.

Full text

Abstract:

Las tecnologías de web semántica han ganado atención en los últimos años, en su mayoría explicada por la proliferación de dispositivos móviles y el acceso a Internet de banda ancha. Tal como Tim Berners-Lee, creador de la World Wide Web, lo avisoró a principios de siglo, las tecnologías de la web semántica han fomentado el desarrollo de estándares que permiten, a su vez, la aparición de los motores de búsqueda semánticos que dan a los usuarios la información que están buscando. Este estudio de investigación presenta los resultados de una revisión sistemática de la literatura que se centra en la comprensión de las propuestas y tendencias en los motores de búsqueda semánticos desde el punto de vista de la arquitectura del software. A partir de los resultados, es posible decir que la mayoría de los estudios proponen una solución integral para sus usuarios, donde los requisitos, el contexto y los módulos que componen el buscador desempeñan un gran rol. Las ontologías y el conocimiento también juegan un papel importante en estas arquitecturas a medida que evolucionan, permitiendo una gran cantidad de soluciones que responden de una mejor manera a las expectativas de los usuarios. La presente tesis es una extensión del artículo "Research on proposals and trends in the architectures of semantic search engines: A systematic literature review", publicado en "Proceedings of the 2017 Federated Conference on Computer Science and Information Systems". Esta tesis expone mayores detalles con respecto al artículo publicado, teniendo ambos en común el desarrollo y los resultados de la revisión sistemática de la literatura.
Tesis

APA, Harvard, Vancouver, ISO, and other styles

24

Arlitsch, Kenning. "Semantic Web Identity of academic organizations." Doctoral thesis, Humboldt-Universität zu Berlin, Philosophische Fakultät I, 2017. http://dx.doi.org/10.18452/17671.

Full text

Abstract:

Semantic Web Identity kennzeichnet den Zustand, in dem ein Unternehmen von Suchmaschinen als Solches erkannt wird. Das Abrufen einer Knowledge Graph Card in Google-Suchergebnissen für eine akademische Organisation wird als Indikator für SWI nominiert, da es zeigt, dass Google nachprüfbare Tatsachen gesammelt hat, um die Organisation als Einheit zu etablieren. Diese Anerkennung kann wiederum die Relevanz ihrer Verweisungen an diese Organisation verbessern. Diese Dissertation stellt Ergebnisse einer Befragung der 125 Mitgliedsbibliotheken der Association of Research Libraries vor. Die Ergebnisse zeigen, dass diese Bibliotheken in den strukturierten Datensätzen, die eine wesentliche Grundlage des Semantic Web sind und Faktor bei der Erreichung der SWI sind, schlecht vertreten sind. Der Mangel an SWI erstreckt sich auf andere akademische Organisationen, insbesondere auf die unteren Hierarchieebenen von Universitäten. Ein Mangel an SWI kann andere Faktoren von Interesse für akademische Organisationen beeinflussen, einschließlich der Fähigkeit zur Gewinnung von Forschungsförderung, Immatrikulationsraten und Verbesserung des institutionellen Rankings. Diese Studie vermutet, dass der schlechte Zustand der SWI das Ergebnis eines Versagens dieser Organisationen ist, geeignete Linked Open Data und proprietäre Semantic Web Knowledge Bases zu belegen. Die Situation stellt eine Gelegenheit für akademische Bibliotheken dar, Fähigkeiten zu entwickeln, um ihre eigene SWI zu etablieren und den anderen Organisationen in ihren Institutionen einen SWI-Service anzubieten. Die Forschung untersucht den aktuellen Stand der SWI für ARL-Bibliotheken und einige andere akademische Organisationen und beschreibt Fallstudien, die die Wirksamkeit dieser Techniken zur Verbesserung der SWI validieren. Die erklärt auch ein neues Dienstmodell der SWI-Pflege, die von anderen akademischen Bibliotheken für ihren eigenen institutionellen Kontext angepasst werden.
Semantic Web Identity (SWI) characterizes an entity that has been recognized as such by search engines. The display of a Knowledge Graph Card in Google search results for an academic organization is proposed as an indicator of SWI, as it demonstrates that Google has gathered enough verifiable facts to establish the organization as an entity. This recognition may in turn improve the accuracy and relevancy of its referrals to that organization. This dissertation presents findings from an in-depth survey of the 125 member libraries of the Association of Research Libraries (ARL). The findings show that these academic libraries are poorly represented in the structured data records that are a crucial underpinning of the Semantic Web and a significant factor in achieving SWI. Lack of SWI extends to other academic organizations, particularly those at the lower hierarchical levels of academic institutions, including colleges, departments, centers, and research institutes. A lack of SWI may affect other factors of interest to academic organizations, including ability to attract research funding, increase student enrollment, and improve institutional reputation and ranking. This study hypothesizes that the poor state of SWI is in part the result of a failure by these organizations to populate appropriate Linked Open Data (LOD) and proprietary Semantic Web knowledge bases. The situation represents an opportunity for academic libraries to develop skills and knowledge to establish and maintain their own SWI, and to offer SWI service to other academic organizations in their institutions. The research examines the current state of SWI for ARL libraries and some other academic organizations, and describes case studies that validate the effectiveness of proposed techniques to correct the situation. It also explains new services that are being developed at the Montana State University Library to address SWI needs on its campus, which could be adapted by other academic libraries.

APA, Harvard, Vancouver, ISO, and other styles

25

Kidambi, Phani Nandan. "A HUMAN-COMPUTER INTEGRATED APPROACH TOWARDS CONTENT BASED IMAGE RETRIEVAL." Wright State University / OhioLINK, 2010. http://rave.ohiolink.edu/etdc/view?acc_num=wright1292647701.

Full text

APA, Harvard, Vancouver, ISO, and other styles

26

Haj-Bolouri, Amir. "Semantiska webben och sökmotorer." Thesis, University West, Department of Economics and IT, 2010. http://urn.kb.se/resolve?urn=urn:nbn:se:hv:diva-2591.

Full text

Abstract:

Den här semantiska webben. Syftet är att undersöka hur den semantiska webben påverkar sökmotorer på webben. Detta sker genom en undersökning av tio olika sökmotorer där nio är semantiskt sådana och den tionde är den mest använda sökmotorn idag. Studien är genomförd som både en deskriptiv och kvantitativ studie. En litteraturundersökning har också genomförts om den semantiska webben och sökmotorer. Slutsatserna av den här studien är att den semantiska webben är mångfacetterad med dess definitioner, och att resultatet kring hur konkreta sökmotorer tillämpar semantiska webbprinciper kan variera beroende vilken sökmotor man interagerar med.Nyckelord: Semantic web, Semantiska webben, Semantik, Informatik, Web 2.0, Internet, Search engines, Sökmotorerthat relates to the semantic web. Therapporten behandlar definitioner av begrepp som är kopplade till denDen här semantiska webben. Syftet är att undersöka hur den semantiska webben påverkar sökmotorer på webben. Detta sker genom en undersökning av tio olika sökmotorer där nio är semantiskt sådana och den tionde är den mest använda sökmotorn idag. Studien är genomförd som både en deskriptiv och kvantitativ studie. En litteraturundersökning har också genomförts om den semantiska webben och sökmotorer. Slutsatserna av den här studien är att den semantiska webben är mångfacetterad med dess definitioner, och att resultatet kring hur konkreta sökmotorer tillämpar semantiska webbprinciper kan variera beroende vilken sökmotor man interagerar med.

This report deals with the definitions and terms main purpose has been to investigate how the semantic web affects search engines on the web. This has been done through an investigation consisting of ten different search engines. Nine of these search engines are considering being semantic search engines, and the last one being the most used one on the web today. The study is conducted as a descriptive and quantitative study. A literature review has also been implemented by the relevant sources about the semantic web and search engines. The conclusions drawn where that the semantic web is multifaceted with its definitions and that the result of how concrete search engines implements semantic web principles can vary depending on which search engine one interacts with.

APA, Harvard, Vancouver, ISO, and other styles

27

Lopes, Rodrigo Arthur de Souza Pereira. "Proposta de sistema de busca de jogos eletrônicos pautada em ontologia e semântica." Universidade Presbiteriana Mackenzie, 2011. http://tede.mackenzie.br/jspui/handle/tede/1410.

Full text

Abstract:

Made available in DSpace on 2016-03-15T19:37:38Z (GMT). No. of bitstreams: 1 Rodrigo Arthur de Souza Pereira Lopes.pdf: 2274739 bytes, checksum: 9c19f5e6e3196f349ff838640ac37cc9 (MD5) Previous issue date: 2011-08-10
Universidade Presbiteriana Mackenzie
With the constant growth in the quantity of websites, and consequently the increase in content availability throughout the Internet, the development of search mechanisms that enable access to reliable information has become a complex activity. In this sense, this work presents a revision on the behavior of search mechanisms, as well as the manner through which they map information, including the study of ontologies and knowledge bases, as well as forms of knowledge representation on the Internet. These models integrate the Semantic Web, which constitutes a proposal for the organization of information. Based on these elements, a search mechanism was developed for a specific domain: videogames. This mechanism is based on the classification of electronic games by specialized review websites, where one may extract information about select titles. As such, this work is divided in four stages. Firstly, data is extracted from the aforementioned websites for previously selected titles through the use of a webcrawler. Secondly, an analysis is performed on the obtained data on two fronts, utilizing natural computing as well as power-law concepts. Next, an ontology for videogames is constructed, with its subsequent publication in a knowledge base accessible to the software. Lastly, the implementation of the actual mechanism, which will make use of the knowledge base and bring the user suggestions pertaining to his search, such as titles or related characteristics intrinsic to games that may be evaluated relating to the search. This work also hopes to present itself as a useful model that may be utilized in different domains, such as movies, travel destinations, electronic appliances and software, among others.
Com o crescimento da quantidade de websites e, consequentemente, o aumento de conteúdo disponível na Internet, desenvolver sistemas de busca que possibilitem o acesso à informação confiável tornou-se uma atividade complexa. Desta forma, este trabalho apresenta uma revisão do funcionamento dos mecanismos de busca e das formas pelas quais a informação é mapeada, o que inclui o estudo de ontologias e bases de conhecimento, bem como de formas de representação de informação na Internet. Estes modelos integram a Web Semântica, que constitui uma proposta de organização de informação. Com base nestes elementos foi desenvolvido um sistema de busca de conteúdo em um domínio específico: jogos eletrônicos. Este pauta-se na classificação de websites especializados, de onde pode-se extrair informações das resenhas disponíveis sobre os títulos escolhidos. Para tanto, a proposta divide-se em quatro fases. A primeira relaciona-se à coleta de dados dos websites mencionados por meio da implementação de um webcrawler que realiza a extração de informações de uma lista de jogos pré-determinada. Em seguida é feito o tratamento e a análise dos dados por meio de duas abordagens, que utilizam-se de computação natural e conceitos de lei de potência. Além disso, foi feita a construção de uma ontologia para estes jogos e publicação destes dados em uma base de conhecimento acessível ao software. Por último, foi implementado um mecanismo de busca que faz uso da base de conhecimento e apresenta como resultado, ao usuário, sugestões pertinentes à sua busca, como títulos ou características relacionadas. Este trabalho ainda apresenta um modelo que pode ser utilizado em outros domínios, tais como filmes, destinos de viagens, eletrodomésticos, softwares, dentre outros.

APA, Harvard, Vancouver, ISO, and other styles

28

Garcia, Léo Manoel Lopes da Silva [UNESP]. "Investigação e implementação de ferramentas computacionais para otimização de websites com ênfase na descrição de conteúdo." Universidade Estadual Paulista (UNESP), 2011. http://hdl.handle.net/11449/98701.

Full text

Abstract:

Made available in DSpace on 2014-06-11T19:29:41Z (GMT). No. of bitstreams: 0 Previous issue date: 2011-08-03Bitstream added on 2014-06-13T20:59:57Z : No. of bitstreams: 1 garcia_lmls_me_sjrp.pdf: 6057674 bytes, checksum: a26fce0d239fd5ca19b1f04d3236faa6 (MD5)
Quando fala-se de evolução da Web, poderia realmente ser mais apropriado falar de design inteligente. Com a Web se tornando a principal opção para quem produz e dissemina conteúdo digital, cada vez mais, as pessoas tomam a atenção para esse valioso repositório de conhecimento. Neste ambiente, os mecanismos de busca configuram-se em aplicativos populares, tornando-se intermediários entre os usuários e a miríade de informações, serviços e recursos disponíveis na grande rede. Neste sentido, o Webdesigner pode atuar de forma decisiva, proporcionando uma melhor resposta na classificação dos mecanismos de busca. A correta representação do conhecimento é a chave para a recuperação e para a disseminação efetiva de dados, de informação e de conhecimentos. Este trabalho apresenta um estudo que pode trazer um progresso relevante aos usuários desta grande rede, buscando apresentar uma ferramenta de domínio público que apoie a aplicação de técnicas de descrição semântica de informação na Web. No decorrer da pesquisa investigamos técnicas e metodologias capazes de otimizar a indexação dos Websites pelos mecanismos de busca, enfatizando a descrição do conteúdo nele presente, melhorando sua classificação e consequentemente colaborando com a qualidade na recuperação de informações realizadas por meio de mecanismos de buscas. Tais técnicas foram testadas em alguns Websites, obtendo resultado satisfatório, a partir de então a ferramenta foi implementada e submetida a usuários para sua validação, o resultado desta validação é apresentado demonstrando a viabilidade da ferramenta e enumeração de novas funcionalidades para trabalhos futuros
When we speak of evolution of the Web, it might actually be more appropriate to speak of intelligent design. With the Web becoming the primary choice for those who produce and disseminate digital content , more people take attention to this valuable repository of knowledge. In this environment , search engines are configured in popular, becoming an intermediary between users and the myriad of information, service and resources available on the World Wide Web. In this sense, the Web designer can act decisively, providing a better response in the ranking of search engines. The correct representation of knowledge is the key to recovery and effective dissemination of data, information and knowledge. This paper presents a study that significant progress can bring a large network of users, seeking to present a public domain tool that supports the application of techniques for semantic description of Web information in the course of the research investigated techniques and methodologies that can optimize Website indexing by search engines, emphasizing the description of the content in it, improving your ranking and thus contributing to quality in information retrieval conducted through search engines. These techniques were tested on some websites, obtaining satisfactory results, since then the tool was implemented and submitted to users validation, the result of the validation is present demonstrating the feasibility of the tool and list of new features for future work

APA, Harvard, Vancouver, ISO, and other styles

29

Garcia, Léo Manoel Lopes da Silva. "Investigação e implementação de ferramentas computacionais para otimização de websites com ênfase na descrição de conteúdo /." São José do Rio Preto : [s.n.], 2011. http://hdl.handle.net/11449/98701.

Full text

Abstract:

Resumo: Quando fala-se de evolução da Web, poderia realmente ser mais apropriado falar de design inteligente. Com a Web se tornando a principal opção para quem produz e dissemina conteúdo digital, cada vez mais, as pessoas tomam a atenção para esse valioso repositório de conhecimento. Neste ambiente, os mecanismos de busca configuram-se em aplicativos populares, tornando-se intermediários entre os usuários e a miríade de informações, serviços e recursos disponíveis na grande rede. Neste sentido, o Webdesigner pode atuar de forma decisiva, proporcionando uma melhor resposta na classificação dos mecanismos de busca. A correta representação do conhecimento é a chave para a recuperação e para a disseminação efetiva de dados, de informação e de conhecimentos. Este trabalho apresenta um estudo que pode trazer um progresso relevante aos usuários desta grande rede, buscando apresentar uma ferramenta de domínio público que apoie a aplicação de técnicas de descrição semântica de informação na Web. No decorrer da pesquisa investigamos técnicas e metodologias capazes de otimizar a indexação dos Websites pelos mecanismos de busca, enfatizando a descrição do conteúdo nele presente, melhorando sua classificação e consequentemente colaborando com a qualidade na recuperação de informações realizadas por meio de mecanismos de buscas. Tais técnicas foram testadas em alguns Websites, obtendo resultado satisfatório, a partir de então a ferramenta foi implementada e submetida a usuários para sua validação, o resultado desta validação é apresentado demonstrando a viabilidade da ferramenta e enumeração de novas funcionalidades para trabalhos futuros
Abstract: When we speak of evolution of the Web, it might actually be more appropriate to speak of intelligent design. With the Web becoming the primary choice for those who produce and disseminate digital content , more people take attention to this valuable repository of knowledge. In this environment , search engines are configured in popular, becoming an intermediary between users and the myriad of information, service and resources available on the World Wide Web. In this sense, the Web designer can act decisively, providing a better response in the ranking of search engines. The correct representation of knowledge is the key to recovery and effective dissemination of data, information and knowledge. This paper presents a study that significant progress can bring a large network of users, seeking to present a public domain tool that supports the application of techniques for semantic description of Web information in the course of the research investigated techniques and methodologies that can optimize Website indexing by search engines, emphasizing the description of the content in it, improving your ranking and thus contributing to quality in information retrieval conducted through search engines. These techniques were tested on some websites, obtaining satisfactory results, since then the tool was implemented and submitted to users validation, the result of the validation is present demonstrating the feasibility of the tool and list of new features for future work
Orientador: João Fernando Marar
Coorientador: Ivan Rizzo Guilherme
Banca: Edson Costa de Barros Carvalho Filho
Banca: Antonio Carlos Sementille
Mestre

APA, Harvard, Vancouver, ISO, and other styles

30

Huang, Fu-Ming, and 黃福銘. "Intelligent Search Engine with Semantic Web Technology." Thesis, 2005. http://ndltd.ncl.edu.tw/handle/23738510988590966036.

Full text

Abstract:

碩士
國立中央大學
網路學習科技研究所
93
According to the experience of retrieving content via search engine, people often get a great deal of responded information. Somehow, part of them is irrelevant to user’s intention. The main reason caused this problem is the lacking of enough semantic description for digital content during the analyzing, searching and matching processes. The purpose of our study is to improve the searching efficiency, user’s satisfactions and practicability. We apply ontology theory to content description, which is used in matching process. In digital library application, we proposed Digital Library Ontology to establish the descriptions of contents, domain knowledge and user profiles. Utilizing reasoning technique, we developed an inference-based intelligent search engine to assist literatures retrieval based on users' background knowledge. Experiments show the proposed intelligent search engine can efficiently improve searching performance. To distinguish from traditional keywords search, our approach can provide better searching result, which is based on deduced needs from user. The responded literatures are verified to have better comprehension to user.

APA, Harvard, Vancouver, ISO, and other styles

31

侯巧玲. "Developing a Fuzzy Search Engine Based on Fuzzy Logic and Semantic Search." Thesis, 2011. http://ndltd.ncl.edu.tw/handle/04233187003638416723.

Full text

Abstract:

碩士
國立彰化師範大學
資訊工程學系
99
Recently, with the physical network technology developing, there are more and more users take search engines as the information acquired entrance. Accompany with the amount of data being larger and larger, the information explosion has been a big trouble. Most of those search engines which online serviced are keywords-based search, one of their drawback is “semantic”, keywords-based search engine cannot judge a vocabulary which mean it’s represented. Cause of the same vocabulary in difference domain maybe having different meanings. The others drawbacks of online search engine are vocabulary’s importance and user’s opinions, most of those online search engines considered each vocabulary’s importance as similarly. It’s maybe conflict with user’s opinions. To conquer these problems, this thesis proposes a new type search engine which developed based on semantic technique and fuzzy theory. First, we constructed a fuzzy ontology as our knowledge base and involved fuzzy logic to represent the relationship which between term and term. Second, we develop web crawler to fetch webpage automatically. Finally, we allow user to set multiple vocabularies 、each vocabulary’s importance and six parameters , when these query instructions submit, system will search those vocabularies which related with user defined, and then these vocabularies which together with user defined and system found are regard as the input query command to search webpage, after search execution terminate, system execute fuzzy aggregation with those web pages and parameters which user defined to obtain the final ranking and display result.

APA, Harvard, Vancouver, ISO, and other styles

32

黃淑華. "Designed hierarchical semantic categorizaiton for the knowledge management via search engine." Thesis, 2004. http://ndltd.ncl.edu.tw/handle/45589468236681650182.

Full text

Abstract:

碩士
國立中正大學
電機工程研究所
92
With the huge amount of information available on the World Wide Web, Web servers provide a fertile ground for information searches. The Problem that knowledge workers face today is not lack of information any more. Instead, they are in the situation of information overload. People can not quickly and efficiently find out the wanted information among such huge data. Therefore a lots of information technology were still on developed. The traditionally search technology is by understanding the document experts assign specific categories to the document. However, it wastes a lot of resources and has no economic benefits. Therefore, an new automatic text classifier which can help classification process is demand. Inform ion retrieval is aimed at retrieving information that might be useful or relevant to the user. In this paper we research the mutual semantic relationship between terms via term concept. We collect Chinese synonyms for building a synonyms thesaurus, and make use of automatic text classifier subsystem. Keyword constructs a conceptual space or knowledge space by using semantic matrices. Through the idea of conceptual space and semantic network, we expect that traditional information retrieval will be evolved into knowledge retrieval. We apply structure information from XML structure in database.

APA, Harvard, Vancouver, ISO, and other styles

33

Hung-Chien, Chien, and 簡宏傑. "Study and Implementation of a Learning Content Management System Search Engine for Special Education Based on Semantic Web." Thesis, 2006. http://ndltd.ncl.edu.tw/handle/80627884641375096197.

Full text

Abstract:

碩士
明新科技大學
資訊管理研究所
94
Computer assisted teaching/learning has been trend of researches due to the accelerated evolutions of information technologies. However, most of them do not address the needs of special education. Individuals receiving special education diversify in every aspect of their learning and development process. As a result, individualized instruction has been one of the major characteristics in special education. Since there is no common teaching material that fits all special education students, teachers usually have to develop courseware specific to each student (at various grades) of their own. This imposes extra workload to most special education teachers. Accordingly, the idea of having a common repository (learning contents management system, LCMS) for the self-developed courseware and help the sharing of such courseware among teachers seems to be appealing, especially to special education teachers. In this research, we propose and implement an intelligent learning contents management system, which incorporates the semantic web and ontology mechanisms. In addition, the LCMS also provides an interface that accepts output from the DALE computerized IEP (Individualized Educational Program) system. Through these mechanisms, special education teachers can more accurately find the appropriate courseware that is suitable for their students. In recent survey, the LCMS system that we implement now contains more than 1000 units of courseware, and has become the most accessed LCMS system in Taiwan’s special education community.

APA, Harvard, Vancouver, ISO, and other styles

34

Wächter, Thomas. "Semi-automated Ontology Generation for Biocuration and Semantic Search." Doctoral thesis, 2010. https://tud.qucosa.de/id/qucosa%3A25496.

Full text

Abstract:

Background: In the life sciences, the amount of literature and experimental data grows at a tremendous rate. In order to effectively access and integrate these data, biomedical ontologies – controlled, hierarchical vocabularies – are being developed. Creating and maintaining such ontologies is a difficult, labour-intensive, manual process. Many computational methods which can support ontology construction have been proposed in the past. However, good, validated systems are largely missing. Motivation: The biocuration community plays a central role in the development of ontologies. Any method that can support their efforts has the potential to have a huge impact in the life sciences. Recently, a number of semantic search engines were created that make use of biomedical ontologies for document retrieval. To transfer the technology to other knowledge domains, suitable ontologies need to be created. One area where ontologies may prove particularly useful is the search for alternative methods to animal testing, an area where comprehensive search is of special interest to determine the availability or unavailability of alternative methods. Results: The Dresden Ontology Generator for Directed Acyclic Graphs (DOG4DAG) developed in this thesis is a system which supports the creation and extension of ontologies by semi-automatically generating terms, definitions, and parent-child relations from text in PubMed, the web, and PDF repositories. The system is seamlessly integrated into OBO-Edit and Protégé, two widely used ontology editors in the life sciences. DOG4DAG generates terms by identifying statistically significant noun-phrases in text. For definitions and parent-child relations it employs pattern-based web searches. Each generation step has been systematically evaluated using manually validated benchmarks. The term generation leads to high quality terms also found in manually created ontologies. Definitions can be retrieved for up to 78% of terms, child ancestor relations for up to 54%. No other validated system exists that achieves comparable results. To improve the search for information on alternative methods to animal testing an ontology has been developed that contains 17,151 terms of which 10% were newly created and 90% were re-used from existing resources. This ontology is the core of Go3R, the first semantic search engine in this field. When a user performs a search query with Go3R, the search engine expands this request using the structure and terminology of the ontology. The machine classification employed in Go3R is capable of distinguishing documents related to alternative methods from those which are not with an F-measure of 90% on a manual benchmark. Approximately 200,000 of the 19 million documents listed in PubMed were identified as relevant, either because a specific term was contained or due to the automatic classification. The Go3R search engine is available on-line under www.Go3R.org.

APA, Harvard, Vancouver, ISO, and other styles

35

Pereira, Tiago Filipe Roque. "Improving the Reliability of Web Search Results." Master's thesis, 2016. http://hdl.handle.net/10362/20511.

Full text

Abstract:

Over the last years, it has been possible to observe the exponential growth of the internet. Everyday new websites are created. Everyday new technologies are developed. Everyday new data is added into the web. The search for available online data on the web has become an increasingly common practice to any person because, the regular user wants to know more. For any existing question or doubt, the user wants the answer the fastest way possible. It is in this field where the search engines are an exceptional tool in helping their users. In order to aid the users reach for what they were seeking for, search engines have become a fantastic tool. Either it is searched for a certain website, some specific information or even for the seek of knowledge, search engines help the user reach his goal. Without their existence, it would be much more difficult and frustrating to find the needed information, which would lead to a tremendous loss of time and resources, and most of the cases, the user would probably not reach the results it was looking for. Thus, the development of web search engines provided a better comfort for the user. However, despite the fact there is a really effective tool, sometimes it can lead to unintended results. Towards a search, the search engine can lead to a suggestion of a website that does not correspond to the expectation of the user. This is due to the fact that search engines only show part of the content related with each correspondent hyperlink, which for several times, users think the answer for what they are looking for is in some website and when they start analysing it, the intended information is not there. Entering and leaving different websites, can be a big inconvenience, even more if the internet connection is slow (as it can happen outside the big cities or in least developed areas), which makes the user lose more time and patience. This dissertation intends to explore the possibility and prove the concept that, with the help and junction of different technologies such as parsing, web crawling, web mining and semantic web in a machine, it is possible to improve the reliability from the search engines, in order for the user lose the minimal time or resources possible.

APA, Harvard, Vancouver, ISO, and other styles

36

Biswas, Amitava. "Semantic Routed Network for Distributed Search Engines." Thesis, 2010. http://hdl.handle.net/1969.1/ETD-TAMU-2010-05-7942.

Full text

Abstract:

Searching for textual information has become an important activity on the web. To satisfy the rising demand and user expectations, search systems should be fast, scalable and deliver relevant results. To decide which objects should be retrieved, search systems should compare holistic meanings of queries and text document objects, as perceived by humans. Existing techniques do not enable correct comparison of composite holistic meanings like: "evidences on role of DR2 gene in development of diabetes in Caucasian population", which is composed of multiple elementary meanings: "evidence", "DR2 gene", etc. Thus these techniques can not discern objects that have a common set of keywords but convey different meanings. Hence we need new methods to compare composite meanings for superior search quality. In distributed search engines, for scalability, speed and efficiency, index entries should be systematically distributed across multiple index-server nodes based on the meaning of the objects. Furthermore, queries should be selectively sent to those index nodes which have relevant entries. This requires an overlay Semantic Routed Network which will route messages, based on meaning. This network will consist of fast response networking appliances called semantic routers. These appliances need to: (a) carry out sophisticated meaning comparison computations at high speed; and (b) have the right kind of behavior to automatically organize an optimal index system. This dissertation presents the following artifacts that enable the above requirements: (1) An algebraic theory, a design of a data structure and related techniques to efficiently compare composite meanings. (2) Algorithms and accelerator architectures for high speed meaning comparisons inside semantic routers and index-server nodes. (3) An overlay network to deliver search queries to the index nodes based on meanings. (4) Algorithms to construct a self-organizing, distributed meaning based index system. The proposed techniques can compare composite meanings ~105 times faster than an equivalent software code and existing hardware designs. Whereas, the proposed index organization approach can lead to 33% savings in number of servers and power consumption in a model search engine having 700,000 servers. Therefore, using all these techniques, it is possible to design a Semantic Routed Network which has a potential to improve search results and response time, while saving resources.

APA, Harvard, Vancouver, ISO, and other styles

37

Χάιδος, Γεώργιος. "Σχεδιασμός και υλοποίηση δημοσιογραφικού RDF portal με μηχανή αναζήτησης άρθρων." Thesis, 2013. http://hdl.handle.net/10889/6117.

Full text

Abstract:

Το Resource Description Framework (RDF) αποτελεί ένα πλαίσιο περιγραφής πόρων ως μεταδεδομένα για το σημασιολογικό ιστό. Ο σκοπός του σημασιολογικού ιστού είναι η εξέλιξη και επέκταση του υπάρχοντος παγκόσμιου ιστού, έτσι ώστε οι χρήστες του να μπορούν ευκολότερα να αντλούν συνδυασμένη την παρεχόμενη πληροφορία. Ο σημερινός ιστός είναι προσανατολισμένος στον άνθρωπο. Για τη διευκόλυνση σύνθετων αναζητήσεων και σύνθεσης επιμέρους πληροφοριών, ο ιστός αλλάζει προσανατολισμό, έτσι ώστε να μπορεί να ερμηνεύεται από μηχανές και να απαλλάσσει το χρήστη από τον επιπλέον φόρτο. Η πιο φιλόδοξη μορφή ενσωμάτωσης κατάλληλων μεταδεδομένων στον παγκόσμιο ιστό είναι με την περιγραφή των δεδομένων με RDF triples αποθηκευμένων ως XML. Το πλαίσιο RDF περιγράφει πόρους, ορισμένους με Uniform Resource Identifiers (URI’s) ή literals με τη μορφή υποκείμενου-κατηγορήματος-αντικειμένου. Για την ορθή περιγραφή των πόρων ενθαρρύνεται από το W3C η χρήση υπαρχόντων λεξιλογίων και σχημάτων , που περιγράφουν κλάσεις και ιδιότητες. Στην παρούσα εργασία γίνεται υλοποίηση ενός δημοσιογραφικού RDF portal. Για τη δημιουργία RDF/XML, έχουν χρησιμοποιηθεί τα λεξιλόγια και σχήματα που συνιστούνται από το W3C καθώς και των DCMI και PRISM. Επίσης χρησιμοποιείται για την περιγραφή typed literals to XML σχήμα του W3C και ένα σχήμα του portal. Η δημιουργία των μεταδεδομένων γίνεται αυτόματα από το portal με τη χρήση των στοιχείων που συμπληρώνονται στις φόρμες δημοσίευσης άρθρων και δημιουργίας λογαριασμών. Για τον περιορισμό του χώρου αποθήκευσης τα μεταδεδομένα δεν αποθηκεύονται αλλά δημιουργούνται όταν ζητηθούν. Στην υλοποίηση έχει δοθεί έμφαση στην ασφάλεια κατά τη δημιουργία λογαριασμών χρήστη με captcha και κωδικό ενεργοποίησης με hashing. Για τη διευκόλυνση του έργου του αρθρογράφου, έχει εισαχθεί και επεκταθεί ο TinyMCE Rich Text Editor, o οποίος επιτρέπει τη μορφοποίηση του κειμένου αλλά και την εισαγωγή εικόνων και media. Ο editor παράγει αυτόματα HTML κώδικα από το εμπλουτισμένο κείμενο. Οι δυνατότητες του editor επεκτάθηκαν κυρίως με τη δυνατότητα για upload εικόνων και media και με την αλλαγή κωδικοποίησης για συμβατότητα με τα πρότυπα της HTML5. Για επιπλέον συμβατότητα με την HTML5 εισάγονται από το portal στα άρθρα ετικέτες σημασιολογικής δομής. Εκτός από τα άρθρα που δημιουργούνται με τη χρήση του Editor, δημοσιοποιούνται και άρθρα από εξωτερικές πηγές. Στη διαδικασία που είναι αυτόματη και επαναλαμβανόμενη, γίνεται επεξεργασία και αποθήκευση μέρους των δεδομένων των εξωτερικών άρθρων. Στον αναγνώστη του portal παρουσιάζεται ένα πρωτοσέλιδο και σελίδες ανά κατηγορία με τα πρόσφατα άρθρα. Στο portal υπάρχει ενσωματωμένη μηχανή αναζήτησης των άρθρων, με πεδία για φιλτράρισμα χρονικά, κατηγορίας, αρθρογράφου-πηγής αλλά και λέξεων κλειδιών. Οι λέξεις κλειδιά προκύπτουν από την περιγραφή του άρθρου στη φόρμα δημιουργίας ή αυτόματα. Όταν τα άρθρα προέρχονται από εξωτερικές πηγές, η διαδικασία είναι υποχρεωτικά αυτόματη. Για την αυτόματη ανεύρεση των λέξεων κλειδιών από ένα άρθρο χρησιμοποιείται η συχνότητα της λέξης στο άρθρο, με τη βαρύτητα που δίνεται από την HTML για τη λέξη (τίτλος, έντονη γραφή), κανονικοποιημένη για το μέγεθος του άρθρου και η συχνότητα του λήμματος της λέξης σε ένα σύνολο άρθρων που ανανεώνεται. Για την ανάκτηση των άρθρων χρησιμοποιείται η τεχνική των inverted files για όλες τις λέξεις κλειδιά. Για τη μείωση του όγκου των δεδομένων και την επιτάχυνση απάντησης ερωτημάτων, αφαιρούνται από την περιγραφή λέξεις που παρουσιάζουν μεγάλη συχνότητα και μικρή αξία ανάκτησης πληροφορίας “stop words”. Η επιλογή μιας αντιπροσωπευτικής λίστας με stop words πραγματοποιήθηκε με τη χρήση ενός σώματος κειμένων από άρθρα εφημερίδων, τη μέτρηση της συχνότητας των λέξεων και τη σύγκριση τους με τη λίστα stop words της Google. Επίσης για τον περιορισμό του όγκου των δεδομένων αλλά και την ορθότερη απάντηση των ερωτημάτων, το portal κάνει stemming στις λέξεις κλειδιά, παράγοντας όρους που μοιάζουν με τα λήμματα των λέξεων. Για to stemming έγινε χρήση της διατριβής του Γεώργιου Νταή του Πανεπιστημίου της Στοκχόλμης που βασίζεται στη Γραμματική της Νεοελληνικής Γραμματικής του Μανώλη Τριανταφυλλίδη. Η επιστροφή των άρθρων στα ερωτήματα που περιλαμβάνουν λέξεις κλειδιά γίνεται με κατάταξη εγγύτητας των λέξεων κλειδιών του άρθρου με εκείνο του ερωτήματος. Γίνεται χρήση της συχνότητας των λέξεων κλειδιών και της συχνότητας που έχουν οι ίδιες λέξεις σε ένα σύνολο άρθρων που ανανεώνεται. Για την αναζήτηση γίνεται χρήση θησαυρού συνώνυμων λέξεων.
The Resource Description Framework (RDF) is an appropriate framework for describing resources as metadata in the Semantic Web. The aim of semantic web is the development and expansion of the existing web, so users can acquire more integrated the supplied information. Today's Web is human oriented. In order to facilitate complex queries and the combination of the acquired data, web is changing orientation. To relieve the user from the extra burden the semantic web shall be interpreted by machines. The most ambitious form incorporating appropriate metadata on the web is by the description of data with RDF triples stored as XML. The RDF framework describes resources, with the use of Uniform Resource Identifiers (URI's) or literals as subject-predicate-object. The use of existing RDF vocabularies to describe classes and properties is encouraged by the W3C. In this work an information-news RDF portal has been developed. The RDF / XML, is created using vocabularies and schemas recommended by W3C and the well known DCMI and PRISM. The metadata is created automatically with the use of data supplied when a new articles is published. To facilitate the journalist job, a Rich Text Editor, which enables formatting text and inserting images and media has been used and expanded. The editor automatically generates HTML code from text in a graphic environment. The capabilities of the editor were extended in order to support images and media uploading and media encoding changes for better compatibility with the standards of HTML5. Apart from uploading articles with the use of the editor the portal integrates articles published by external sources. The process is totally automatic and repetitive. The user of the portal is presented a front page and articles categorized by theme. The portal includes a search engine, with fields for filtering time, category, journalist-source and keywords. The keywords can be supplied by the publisher or selected automatically. When the articles are integrated from external sources, the process is necessarily automatic. For the automatic selection of the keywords the frequency of each word in the article is used. Extra weight is given by the HTML for the words stressed (e.g. title, bold, underlined), normalized for the size of the article and stem frequency of the word in a set of articles that were already uploaded. For the retrieval of articles by the search engine the portal is using an index as inverted files for all keywords. To reduce the data volume and accelerate the query processing words that have high frequency and low value information retrieval "stop words" are removed. The choice of a representative list of stop words is performed by using a corpus of newspaper articles, measuring the frequency of words and comparing them with the list of stop words of Google. To further reduce the volume of data and increase the recall to questions, the portal stems the keywords. For the stemming the rule based algorithm presented in the thesis of George Ntais in the University of Stockholm -based Grammar was used. The returned articles to the keywords queried by the search engine are ranked by the proximity of the keywords the article is indexed. To enhance the search engine synonymous words are also included by the portal.

APA, Harvard, Vancouver, ISO, and other styles

38

Hung-YuChen and 陳弘宇. "A Search Engine-based Mutually Reinforcing Approach on Measuring Semantics Relatedness of Biomedical Terms." Thesis, 2011. http://ndltd.ncl.edu.tw/handle/75984864695029397930.

Full text

APA, Harvard, Vancouver, ISO, and other styles

39

Πλέγας, Ιωάννης. "Αλγόριθμοι και τεχνικές εξατομικευμένης αναζήτησης σε διαδικτυακά περιβάλλοντα με χρήση υποκείμενων σημασιολογιών." Thesis, 2013. http://hdl.handle.net/10889/6465.

Full text

Abstract:

Η τεράστια ανάπτυξη του Παγκόσμιου Ιστού τις τελευταίες δεκαετίες έχει αναδείξει την αναζήτηση πληροφοριών ως ένα από τα πιο σημαντικά ζητήματα στον τομέα της έρευνας στις Τεχνολογίες της Πληροφορικής. Σήμερα, οι σύγχρονες μηχανές αναζήτησης απαντούν αρκετά ικανοποιητικά στα ερωτήματα των χρηστών, αλλά τα κορυφαία αποτελέσματα που επιστρέφονται δεν είναι πάντα σχετικά με τα δεδομένα που αναζητά ο χρήστης. Ως εκ τούτου, οι μηχανές αναζήτησης καταβάλλουν σημαντικές προσπάθειες για να κατατάξουν τα πιο σχετικά αποτελέσματα του ερωτήματος ως προς τον χρήστη στα κορυφαία αποτελέσματα της λίστας κατάταξης των αποτελεσμάτων. Η διατριβή αυτή ασχολείται κυρίως με το παραπάνω πρόβλημα, δηλαδή την κατάταξη στις υψηλότερες θέσεις των πιο σχετικών αποτελεσμάτων ως προς τον χρήστη (ειδικά για ερωτήματα που οι όροι τους έχουν πολλαπλές σημασίες). Στο πλαίσιο της παρούσας έρευνας κατασκευάστηκαν αλγόριθμοι και τεχνικές που βασίζονται στην τεχνική της σχετικής ανατροφοδότησης (relevance feedback) για την βελτίωση των αποτελεσμάτων που επιστρέφονται από μια μηχανή αναζήτησης. Βασική πηγή της ανατροφοδότησης ήταν τα αποτελέσματα που επιλέγουν οι χρήστες κατά την διαδικασία πλοήγησης. Ο χρήστης επεκτείνει την αρχική πληροφορία αναζήτησης (λέξεις κλειδιά) με νέα πληροφορία που προέρχεται από τα αποτελέσματα που διαλέγει. Έχοντας ένα νέο σύνολο πληροφορίας που αφορά τις προτιμήσεις του χρήστη, συγκρίνεται η σημασιολογική πληροφορία του συνόλου αυτού με τα υπόλοιπα αποτελέσματα (αυτά που επιστράφηκαν πριν επιλέξει το συγκεκριμένο αποτέλεσμα) και μεταβάλλεται η σειρά των αποτελεσμάτων προωθώντας και προτείνοντας τα αποτελέσματα που είναι πιο σχετικά με το νέο σύνολο πληροφορίας. Ένα άλλο πρόβλημα που πρέπει να αντιμετωπιστεί κατά την υποβολή ερωτημάτων από τους χρήστες σε μια μηχανή αναζήτησης είναι ότι τα ερωτήματα που υποβάλλονται στις μηχανές αναζήτησης είναι συνήθως μικρά σε αριθμό λέξεων και αμφίσημα. Συνεπώς, πρέπει να υπάρχουν τρόποι αποσαφήνισης των διαφορετικών εννοιών των όρων αναζήτησης και εύρεσης της έννοιας που ενδιαφέρει τον χρήστη. Η αποσαφήνιση των όρων αναζήτησης είναι μια διαδικασία που έχει μελετηθεί στην βιβλιογραφία με αρκετούς διαφορετικούς τρόπους. Στην διατριβή μου προτείνω νέες στρατηγικές αποσαφήνισης των εννοιών των όρων αναζήτησης των μηχανών αναζήτησης και εξερευνάται η αποδοτικότητά τους στις μηχανές αναζήτησης. Η καινοτομία τους έγκειται στη χρήση του Page-Rank σαν ενδείκτη της σημαντικότητας μιας έννοιας για έναν όρο του ερωτήματος. Επίσης είναι ευρέως γνωστό ότι ο Παγκόσμιος Ιστός περιέχει έγγραφα με την ίδια πληροφορία και έγγραφα με σχεδόν ίδια πληροφορία. Παρά τις προσπάθειες των μηχανών αναζήτησης με αλγόριθμους εύρεσης των κειμένων που περιέχουν επικαλυπτόμενη πληροφορία, ακόμα υπάρχουν περιπτώσεις που τα κείμενα που ανακτώνται από μια μηχανή αναζήτησης περιέχουν επαναλαμβανόμενη πληροφορία. Στην διατριβή αυτή παρουσιάζονται αποδοτικές τεχνικές εύρεσης και περικοπής της επικαλυπτόμενης πληροφορίας από τα αποτελέσματα των μηχανών αναζήτησης χρησιμοποιώντας τις σημασιολογικές πληροφορίες των αποτελεσμάτων των μηχανών αναζήτησης. Συγκεκριμένα αναγνωρίζονται τα αποτελέσματα που περιέχουν την ίδια πληροφορία και απομακρύνονται, ενώ ταυτόχρονα τα αποτελέσματα που περιέχουν επικαλυπτόμενη πληροφορία συγχωνεύονται σε νέα κείμενα(SuperTexts) που περιέχουν την πληροφορία των αρχικών αποτελεσμάτων χωρίς να υπάρχει επαναλαμβανόμενη πληροφορία. Ένας άλλος τρόπος βελτίωσης της αναζήτησης είναι ο σχολιασμός των κειμένων αναζήτησης έτσι ώστε να περιγράφεται καλύτερα η πληροφορία τους. Ο σχολιασμός κειμένων(text annotation) είναι μια τεχνική η οποία αντιστοιχίζει στις λέξεις του κειμένου επιπλέον πληροφορίες όπως η έννοια που αντιστοιχίζεται σε κάθε λέξη με βάση το εννοιολογικό περιεχόμενο του κειμένου. Η προσθήκη επιπλέον σημασιολογικών πληροφοριών σε ένα κείμενο βοηθάει τις μηχανές αναζήτησης να αναζητήσουν καλύτερα τις πληροφορίες που ενδιαφέρουν τους χρήστες και τους χρήστες να βρουν πιο εύκολα τις πληροφορίες που αναζητούν. Στην διατριβή αυτή αναλύονται αποδοτικές τεχνικές αυτόματου σχολιασμού κειμένων από τις οντότητες που περιέχονται στην Wikipedia, μια διαδικασία που αναφέρεται στην βιβλιογραφία ως Wikification. Με τον τρόπο αυτό οι χρήστες μπορούν να εξερευνήσουν επιπλέον πληροφορίες για τις οντότητες που περιέχονται στο κείμενο που τους επιστρέφεται. Ένα άλλο τμήμα της διατριβής αυτής προσπαθεί να εκμεταλλευτεί την σημασιολογία των αποτελεσμάτων των μηχανών αναζήτησης χρησιμοποιώντας εργαλεία του Σημασιολογικού Ιστού. Ο στόχος του Σημασιολογικού Ιστού (Semantic Web) είναι να κάνει τους πόρους του Ιστού κατανοητούς και στους ανθρώπους και στις μηχανές. Ο Σημασιολογικός Ιστός στα πρώτα βήματά του λειτουργούσε σαν μια αναλυτική περιγραφή του σώματος των έγγραφων του Ιστού. Η ανάπτυξη εργαλείων για την αναζήτηση σε Σημασιολογικό Ιστό είναι ακόμα σε πρώιμο στάδιο. Οι σημερινές τεχνικές αναζήτησης δεν έχουν προσαρμοστεί στην δεικτοδότηση και στην ανάκτηση σημασιολογικής πληροφορίας εκτός από μερικές εξαιρέσεις. Στην έρευνά μας έχουν δημιουργηθεί αποδοτικές τεχνικές και εργαλεία χρήσης του Παγκόσμιου Ιστού. Συγκεκριμένα έχει κατασκευαστεί αλγόριθμος μετατροπής ενός κειμένου σε οντολογία ενσωματώνοντας την σημασιολογική και συντακτική του πληροφορία έτσι ώστε να επιστρέφονται στους χρήστες απαντήσεις σε ερωτήσεις φυσικής γλώσσας. Επίσης στην διατριβή αυτή αναλύονται τεχνικές φιλτραρίσματος XML εγγράφων χρησιμοποιώντας σημασιολογικές πληροφορίες. Συγκεκριμένα παρουσιάζεται ένα αποδοτικό κατανεμημένο σύστημα σημασιολογικού φιλτραρίσματος XML εγγράφων που δίνει καλύτερα αποτελέσματα από τις υπάρχουσες προσεγγίσεις. Τέλος, στα πλαίσια αυτής της διδακτορικής διατριβής γίνεται επιπλέον έρευνα για την βελτίωση της απόδοσης των μηχανών αναζήτησης από μια διαφορετική οπτική γωνία. Στην κατεύθυνση αυτή παρουσιάζονται τεχνικές περικοπής ανεστραμμένων λιστών ανεστραμμένων αρχείων. Επίσης επιτυγχάνεται ένας συνδυασμός των προτεινόμενων τεχνικών με υπάρχουσες τεχνικές συμπίεσης ανεστραμμένων αρχείων πράγμα που οδηγεί σε καλύτερα αποτελέσματα συμπίεσης από τα ήδη υπάρχοντα.
The tremendous growth of the Web in the recent decades has made the searching for information as one of the most important issues in research in Computer Technologies. Today, modern search engines respond quite well to the user queries, but the results are not always relative to the data the user is looking for. Therefore, search engines are making significant efforts to rank the most relevant query results to the user in the top results of the ranking list. This work mainly deals with this problem, the ranking of the relevant results to the user in the top of the ranking list even when the queries contain multiple meanings. In the context of this research, algorithms and techniques were constructed based on the technique of relevance feedback which improves the results returned by a search engine. Main source of feedback are the results which the users selects during the navigation process. The user extends the original information (search keywords) with new information derived from the results that chooses. Having a new set of information concerning to the user's preferences, the relevancy of this information is compared with the other results (those returned before choosing this effect) and change the order of the results by promoting and suggesting the results that are more relevant to the new set of information. Another problem that must be addressed when the users submit queries to the search engines is that the queries are usually small in number of words and ambiguous. Therefore, there must be ways to disambiguate the different concepts/senses and ways to find the concept/sense that interests the user. Disambiguation of the search terms is a process that has been studied in the literature in several different ways. This work proposes new strategies to disambiguate the senses/concepts of the search terms and explore their efficiency in search engines. Their innovation is the use of PageRank as an indicator of the importance of a sense/concept for a query term. Another technique that exploits semantics in our work is the use of text annotation. The use of text annotation is a technique that assigns to the words of the text extra information such as the meaning assigned to each word based on the semantic content of the text. Assigning additional semantic information in a text helps users and search engines to seek or describe better the text information. In my thesis, techniques for improving the automatic annotation of small texts with entities from Wikipedia are presented, a process that referred in the literature as Wikification. It is widely known that the Web contain documents with the same information and documents with almost identical information. Despite the efforts of the search engine’s algorithms to find the results that contain repeated information; there are still cases where the results retrieved by a search engine contain repeated information. In this work effective techniques are presented that find and cut the repeated information from the results of the search engines. Specifically, the results that contain the same information are removed, and the results that contain repeated information are merged into new texts (SuperTexts) that contain the information of the initial results without the repeated information. Another part of this work tries to exploit the semantic information of search engine’s results using tools of the Semantic Web. The goal of the Semantic Web is to make the resources of the Web understandable to humans and machines. The Semantic Web in their first steps functioned as a detailed description of the body of the Web documents. The development of tools for querying Semantic Web is still in its infancy. The current search techniques are not adapted to the indexing and retrieval of semantic information with a few exceptions. In our research we have created efficient techniques and tools for using the Semantic Web. Specifically an algorithm was constructed that converts to ontology the search engine’s results integrating semantic and syntactic information in order to answer natural language questions. Also this paper contains XML filtering techniques that use semantic information. Specifically, an efficient distributed system is proposed for the semantic filtering of XML documents that gives better results than the existing approaches. Finally as part of this thesis is additional research that improves the performance of the search engines from a different angle. It is presented a technique for cutting the inverted lists of the inverted files. Specifically a combination of the proposed technique with existing compression techniques is achieved, leading to better compression results than the existing ones.

APA, Harvard, Vancouver, ISO, and other styles

40

Wang, Yuanyong Computer Science &amp Engineering Faculty of Engineering UNSW. "Using web texts for word sense disambiguation." 2007. http://handle.unsw.edu.au/1959.4/40530.

Full text

Abstract:

In all natural languages, ambiguity is a universal phenomenon. When a word has multiple meaning depending on its contexts it is called an ambiguous word. The process of determining the correct meaning of a word (formally named word sense) in a given context is word sense disambiguation(WSD). WSD is one of the most fundamental problems in natural language processing. If properly addressed, it could lead to revolutionary advancement in many other technologies such as text search engine technology, automatic text summarization and classification, automatic lexicon construction, machine translation and automatic learning agent technology. One difficulty that has always confronted WSD researchers is the lack of high quality sense specific information. For example, if the word "power" Immediately preceds the word "plant", it would strongly constrain the meaning of "plant" to be "an industrial facility". If "power" is replaced by the phrase "root of a", then the sense of "plant" is dictated to be "an organism" of the kingdom Planate. It is obvious that manually building a comprehensive sense specific information base for each sense of each word is impractical. Researchers also tried to extract such information from large dictionaries as well as manually sense tagged corpora. Most of the dictionaries used for WSD are not built for this purpose and have a lot of inherited peculiarities. While manual tagging is slow and costly, automatic tagging is not successful in providing a reliable performance. Furthermore, it is often the case that for a randomly chosen word (to be disambiguated), the sense specific context corpora that can be collected from dictionaries are not large enough. Therefore, manually building sense specific information bases or extraction of such information from dictionaries are not effective approaches to obtain sense specific information. A web text, due to its vast quantity and wide diversity, becomes an ideal source for extraction of large quantity of sense specific information. In this thesis, the impacts of Web texts on various aspects of WSD has been investigated. New measures and models are proposed to tame enormous amount of Web texts for the purpose of WSD. They are formally evaluated by experimenting their disambiguation performance on about 70 ambiguous nouns. The results are very encouraging and have helped revealing the great potential of using Web texts for WSD. The results are published in three papers at Australia national and international level (Wang&Hoffmann,2004,2005,2006)[42][43][44].

APA, Harvard, Vancouver, ISO, and other styles

41

Mooman, Abdelniser. "Multi-Agent User-Centric Specialization and Collaboration for Information Retrieval." Thesis, 2012. http://hdl.handle.net/10012/6991.

Full text

Abstract:

The amount of information on the World Wide Web (WWW) is rapidly growing in pace and topic diversity. This has made it increasingly difficult, and often frustrating, for information seekers to retrieve the content they are looking for as information retrieval systems (e.g., search engines) are unable to decipher the relevance of the retrieved information as it pertains to the information they are searching for. This issue can be decomposed into two aspects: 1) variability of information relevance as it pertains to an information seeker. In other words, different information seekers may enter the same search text, or keywords, but expect completely different results. It is therefore, imperative that information retrieval systems possess an ability to incorporate a model of the information seeker in order to estimate the relevance and context of use of information before presenting results. Of course, in this context, by a model we mean the capture of trends in the information seeker's search behaviour. This is what many researchers refer to as the personalized search. 2) Information diversity. Information available on the World Wide Web today spans multitudes of inherently overlapping topics, and it is difficult for any information retrieval system to decide effectively on the relevance of the information retrieved in response to an information seeker's query. For example, the information seeker who wishes to use WWW to learn about a cure for a certain illness would receive a more relevant answer if the search engine was optimized into such domains of topics. This is what is being referred to in the WWW nomenclature as a 'specialized search'. This thesis maintains that the information seeker's search is not intended to be completely random and therefore tends to portray itself as consistent patterns of behaviour. Nonetheless, this behaviour, despite being consistent, can be quite complex to capture. To accomplish this goal the thesis proposes a Multi-Agent Personalized Information Retrieval with Specialization Ontology (MAPIRSO). MAPIRSO offers a complete learning framework that is able to model the end user's search behaviour and interests and to organize information into categorized domains so as to ensure maximum relevance of its responses as they pertain to the end user queries. Specialization and personalization are accomplished using a group of collaborative agents. Each agent employs a Reinforcement Learning (RL) strategy to capture end user's behaviour and interests. Reinforcement learning allows the agents to evolve their knowledge of the end user behaviour and interests as they function to serve him or her. Furthermore, REL allows each agent to adapt to changes in an end user's behaviour and interests. Specialization is the process by which new information domains are created based on existing information topics, allowing new kinds of content to be built exclusively for information seekers. One of the key characteristics of specialization domains is the seeker centric - which allows intelligent agents to create new information based on the information seekers' feedback and their behaviours. Specialized domains are created by intelligent agents that collect information from a specific domain topic. The task of these specialized agents is to map the user's query to a repository of specific domains in order to present users with relevant information. As a result, mapping users' queries to only relevant information is one of the fundamental challenges in Artificial Intelligent (AI) and machine learning research. Our approach employs intelligent cooperative agents that specialize in building personalized ontology information domains that pertain to each information seeker's specific needs. Specializing and categorizing information into unique domains is one of the challenge areas that have been addressed and various proposed solutions were evaluated and adopted to address growing information. However, categorizing information into unique domains does not satisfy each individualized information seeker. Information seekers might search for similar topics, but each would have different interests. For example, medical information of a specific medical domain has different importance to both the doctor and patients. The thesis presents a novel solution that will resolve the growing and diverse information by building seeker centric specialized information domains that are personalized through the information seekers' feedback and behaviours. To address this challenge, the research examines the fundamental components that constitute the specialized agent: an intelligent machine learning system, user input queries, an intelligent agent, and information resources constructed through specialized domains. Experimental work is reported to demonstrate the efficiency of the proposed solution in addressing the overlapping information growth. The experimental work utilizes extensive user-centric specialized domain topics. This work employs personalized and collaborative multi learning agents and ontology techniques thereby enriching the queries and domains of the user. Therefore, experiments and results have shown that building specialized ontology domains, pertinent to the information seekers' needs, are more precise and efficient compared to other information retrieval applications and existing search engines.

APA, Harvard, Vancouver, ISO, and other styles

Dissertations / Theses on the topic 'Semantic search engine'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles