Dissertations / Theses: 'Semantic web based search engines'

1

Kulkarni, Swarnim. "Capturing semantics using a link analysis based concept extractor approach." Thesis, Manhattan, Kan. : Kansas State University, 2009. http://hdl.handle.net/2097/1526.

Full text

APA, Harvard, Vancouver, ISO, and other styles

2

Gkoutzis, Konstantinos. "A Semantic Web based search engine with X3D visualisation of queries and results." Thesis, University of Plymouth, 2013. http://hdl.handle.net/10026.1/1595.

Full text

Abstract:

The Semantic Web project has introduced new techniques for managing information. Data can now be organised more efficiently and in such a way that computers can take advantage of the relationships that characterise the given input to present more relevant output. Semantic Web based search engines can quickly educe exactly what is needed to be found and retrieve it while avoiding information overload. Up until now, search engines have interacted with their users by asking them to look for words and phrases. We propose the creation of a new generation Semantic Web search engine that will offer a visual interface for queries and results. To create such an engine, information input must be viewed not merely as keywords, but as specific concepts and objects which are all part of the same universal system. To make the manipulation of the interconnected visual objects simpler and more natural, 3D graphics are utilised, based on the X3D Web standard, allowing users to semantically synthesise their queries faster and in a more logical way, both for them and the computer.

APA, Harvard, Vancouver, ISO, and other styles

3

Noronha, Norman. "ReQuest - Validating Semantic Searches." Master's thesis, Department of Informatics, University of Lisbon, 2004. http://hdl.handle.net/10451/13849.

Full text

Abstract:

O ReQuest é um motor de pesquisa para buscas semânticas em domínios específicos. Oferece aos seus utilizadores a possibilidade de realizarem pesquisas integrando ontologias e ficheiros de descrição de recursos (RDF). O ReQuest foi criado para permitir avaliar comparativamente os resultados obtidos em pesquisas semânticas com os das pesquisas em recuperacão de informacão clássica. Os resultados da avaliacão por utilizadores do ReQuest no domínio da informacão noticiosa revelaram que a Web Semântica permite melhorar as pesquisas. ReQuest is a semantic search engine for specialized domains. It offers searches based on ontologies and resource description files (RDF). ReQuest was built to evaluate semantic searches against classic Information Retrieval searches. The results of a user survey in the newsdomain showed that the Semantic Web can improve searches

APA, Harvard, Vancouver, ISO, and other styles

4

Martins, Flávio Nuno Fernandes. "Improving search engines with open Web-based SKOS vocabularies." Master's thesis, Faculdade de Ciências e Tecnologia, 2012. http://hdl.handle.net/10362/8745.

Full text

Abstract:

Dissertação para obtenção do Grau de Mestre em Engenharia Informática The volume of digital information is increasingly larger and even though organiza-tions are making more of this information available, without the proper tools users have great difficulties in retrieving documents about subjects of interest. Good infor-mation retrieval mechanisms are crucial for answering user information needs. Nowadays, search engines are unavoidable - they are an essential feature in docu-ment management systems. However, achieving good relevancy is a difficult problem particularly when dealing with specific technical domains where vocabulary mismatch problems can be prejudicial. Numerous research works found that exploiting the lexi-cal or semantic relations of terms in a collection attenuates this problem. In this dissertation, we aim to improve search results and user experience by inves-tigating the use of potentially connected Web vocabularies in information retrieval en-gines. In the context of open Web-based SKOS vocabularies we propose a query expan-sion framework implemented in a widely used IR system (Lucene/Solr), and evaluated using standard IR evaluation datasets. The components described in this thesis were applied in the development of a new search system that was integrated with a rapid applications development tool in the context of an internship at Quidgest S.A. Fundação para a Ciência e Tecnologia - ImTV research project, in the context of the UTAustin-Portugal collaboration (UTA-Est/MAI/0010/2009); QSearch project (FCT/Quidgest)

APA, Harvard, Vancouver, ISO, and other styles

5

Adya, Kaushik. "An implicit-feedback based ranking methodology for Web search engines /." Available to subscribers only, 2005. http://proquest.umi.com/pqdweb?did=1079672381&sid=1&Fmt=2&clientId=1509&RQT=309&VName=PQD.

Full text

APA, Harvard, Vancouver, ISO, and other styles

6

Lakshmi, Shriram. "Web-based search engine for Radiology Teaching File." [Gainesville, Fla.] : University of Florida, 2002. http://purl.fcla.edu/fcla/etd/UFE0000559.

Full text

APA, Harvard, Vancouver, ISO, and other styles

7

Miotto, Riccardo. "Content-based Music Access: Combining Audio Features and Semantic Information for Music Search Engines." Doctoral thesis, Università degli studi di Padova, 2011. http://hdl.handle.net/11577/3421582.

Full text

Abstract:

During the last decade, the Internet has reinvented the music industry. Physical media have evolved towards online products and services. As a consequence of this transition, online music corpora have reached a massive scale and are constantly being enriched with new documents. At the same time, a great quantity of cultural heritage content remains undisclosed because of the lack of metadata to describe and contextualize it. This has created a need for music retrieval and discovery technologies that allow users to interact with all these music repositories efficiently and effectively. Music Information Retrieval (MIR) is the research field that studies methods and tools for improving such interaction as well as access to music documents. Most of the research works in MIR focuses on content-based approaches, which exploit the analysis of the audio signal of a song to extract significant descriptors of the music content. These content descriptors may be processed and used in different application scenarios, such as retrieval, recommendation, dissemination, musicology analysis, and so on. The thesis explores novel automatic (content-based) methodologies for music retrieval which are based on semantic textual descriptors, acoustic similarity, and a combination of the two; we show empirically how the proposed approaches lead to efficient and competitive solutions with respect to other alternative state-of-the-art strategies. Part of the thesis focuses on music discovery systems, that is search engines where users do not look for a specific song or artist, but may have some general criteria they wish to satisfy. These criteria are commonly expressed in the form of tags, that is short phrases that capture relevant characteristics of the songs, such as genre, instrumentation, emotions, and so on. Because of the scale of current collections, manually assigning tags to songs is becoming an infeasible task; for this reason the automatic tagging of music content is now considered a core challenge in the design of fully functional music retrieval systems. State-of-the-art content-based systems for music annotation (which are usually called auto-taggers) model the acoustic patterns of the songs associated with each tag in a vocabulary through machine learning approaches. Based on these tag models, auto-taggers generate a vector of tag weights when annotating a new song. This vector may be interpreted as a semantic multinomial (SMN), that is a distribution characterizing the relevance of each tag to a song, which can be used for music annotation and retrieval. A first original contribution reported in the thesis aims at improving state-of-the-art auto-taggers by considering tag co-occurrences. While a listener may derive semantic associations for audio clips from direct auditory cues (e.g. hearing “bass guitar”) as well as from context (e.g. inferring “bass guitar” in the context of a “rock” song), auto-taggers ignore this context. Indeed, although contextual relationships correlate tags, many state-of-the-art auto-taggers model tags independently. We present a novel approach for improving automatic music annotation by modeling contextual relationships between tags. A Dirichlet mixture model (DMM) is proposed as a second, additional stage in the modeling process to supplement any auto-tagging system that generates a semantic multinomial over a vocabulary of tags. For each tag in the vocabulary, a DMM captures the broader context defined by the tag by modeling tag co-occurrence patterns in the SMNs of songs associated with the tag. When annotating songs, the DMMs refine SMN annotations by leveraging contextual evidence. Experimental results demonstrate the benefits of combining a variety of auto-taggers with this generative context model; it generally outperforms other approaches to context modeling as well. The use of tags alone allows for efficient and effective music retrieval mechanisms; however, automatic tagging strategies may lead to noisy representations that may negatively affect the effectiveness of retrieval algorithms. Yet, search and discovery operations across music collections can be also carried out matching users interests or exploiting acoustic similarity. One major issue in music information retrieval is how to combine such noisy and heterogeneous information sources in order to improve retrieval effectiveness. At this aim, the thesis explores a statistical retrieval framework based on combining tags and acoustic similarity through a hidden Markov model. The retrieval mechanism relies on an application of the Viterbi algorithm which highlights the sequence of songs that best represents a user query. The model is presented for improving state-of-the-art music search and discovery engines by delivering more relevant ranking lists. In fact, through an empirical evaluation we show how the proposed model leads to better performances than retrieval approaches which rank songs according to individual information sources alone or which use a combination of them. Additionally, the high generality of the framework makes it suitable for other media as well, such as images and videos. Besides music discovery, the thesis challenges also the problem of music identification, the goal which is to match different recordings of the same songs (i.e. finding covers of a given query). At this aim we present two novel music descriptors based on the harmonic content of the audio signals. Their main purpose is to provide a compact representation which is likely to be shared by different performances of the same music score. At the same time, they also aim at reducing the storage requirements of the music representation as well as enabling efficient retrieval over large music corpora. The effectiveness of these two descriptors, combined in a single scalable system, has been tested for classical music identification, which is probably the applicative scenario that mostly needs automatic strategies for labeling unknown recordings. Scalability is guaranteed by an index-based pre-retrieval step which handles music features as textual words; in addition, precision in the identification is brought by alignment carried out through an application of hidden Markov models. Results with a collection of more than ten thousand recordings have been satisfying in terms of efficiency and effectiveness. Nell’ultimo decennio l’avvento di Internet ha reinventato l’industria musicale, in particolare i supporti fisici si sono evoluti verso prodotti e servizi reperibili online. Questa transizione ha portato le collezioni musicali disponibili su Internet ad avere dimensioni enormi e in continua crescita, a causa del quotidiano inserimento di nuovo contenuto musicale. Allo stesso tempo, una buona parte dei documenti musicali tipici del patrimonio culturale rimane inaccessibile, a causa della mancanza di dati che li descrivano e li contestualizzino. Tutto ciò evidenzia la necessità di nuove tecnologie che permettano agli utenti di interagire con tutte queste collezioni musicali in modo effettivo ed efficiente. Il reperimento d’informazioni musicali (i.e. MIR) è il settore di ricerca che studia le tecniche e gli strumenti per migliorare sia questa interazione, sia l’accesso ai documenti musicali. La maggior parte della ricerca effettuata nel MIR riguarda tecniche automatiche basate sul contenuto (i.e. content-based), le quali analizzano il segnale audio di una canzone ed estraggono dei descrittori, che ne caratterizzano, appunto, il contenuto. Questi descrittori possono essere elaborati ed utilizzati in varie applicazioni: motori di ricerca, divulgazione, analisi musicologa e così via. La tesi presenta dei modelli originali content-based per motori di ricerca musicali di vario genere, che si basano, sia su descrittori semantici testuali e su similarità acustica, sia su una loro combinazione. Attraverso esperimenti pratici, dimostreremo come i modelli proposti ottengano prestazioni efficienti e competitive se confrontate con alcuni dei sistemi alternativi presenti nello stato dell’arte. Una buona parte della tesi si concentra sui sistemi di music discovery, ovvero motori di ricerca nei quali gli utenti non cercano una canzone o un’artista specifico, ma hanno perlopiù un criterio generale che vogliono soddisfare. Questi criteri di ricerca sono in genere espressi sottoforma di tag, ovvero annotazioni che caratterizzano gli aspetti rilevanti delle canzoni (e.g. genere, strumenti, emozioni). A causa delle dimensioni raggiunte ormai dalle varie collezioni, l’assegnazione manuale dei tag alle canzoni è però diventata un’operazione impraticabile. Per questa ragione, i modelli che assegnano i tag in modo automatico sono diventati dei punti chiave nella progettazione dei motori di ricerca musicale. I sistemi content-based per l’assegnazione automatica di tag (i.e. auto-tagger) generalmente si basano su approcci di machine learning, che modellano le caratteristiche audio delle canzoni associate ad un certo tag. Questi modelli sono poi utilizzati per annotare le nuove canzoni generando un vettore di pesi, uno per ogni tag nel vocabolario, che misurano la rilevanza che ogni tag ha per quella canzone (i.e. SMN). Un primo contributo originale della tesi ha l’obiettivo di migliorare lo stato dell’arte degli auto-tagger, modellando le co-occorrenze tra i tag. Infatti mentre una persona può associare tag a una canzone sia direttamente (e.g. ascolta lo strumento“basso”), sia dal contesto (e.g. intuisce“basso” sapendo che la canzone `e di genere “rock”), gli auto-tagger diversamente ignorano questo contesto. Infatti, nonostante le relazioni contestuali correlino i tag, la maggior parte degli auto-tagger modella ogni tag in modo indipendente. Il nostro sistema pertanto cerca di migliorare l’assegnazione automatica di tag, modellando le relazioni contestuali che occorrono tra i vari tag di un vocabolario. Per far questo utilizziamo un modello di misture di Dirichlet (DMM) al fine di migliorare qualsiasi auto-tagger che genera delle SMN. Per ogni tag nel vocabolario, una DMM è usata per catturare le co-occorrenze con gli altri tag nelle SMN delle canzoni associate con quel tag. Quando una nuova canzone è annotata, il DMM rifinisce le SMN prodotte da un auto-tagger sfruttando le sue caratteristiche contestuali. I risultati sperimentali dimostrano i benefici di combinare vari auto-tagger con le DMM; in aggiunta, i risultati migliorano rispetto anche a quelli ottenuti con modelli contestuali alternativi dello stato dell’arte. L’uso dei tag permette di costruire efficienti ed effettivi motori di ricerca musicali; tuttavia le strategie automatiche per l’assegnazione di tag a volte ottengono rappresentazioni non precise che possono influenzare negativamente le funzioni di reperimento. Al tempo stesso, le ricerca di documenti musicali può essere anche fatta confrontando gli interessi degli utenti o sfruttando le similarit`a acustiche tra le canzoni. Uno dei principali problemi aperti nel MIR è come combinare tutte queste diverse informazioni per migliorare le funzioni di ricerca. Ponendosi questo obiettivo, la tesi propone un modello di reperimento statistico basato sulla combinazione tra i tag e la similarità acustica mediante un modello di Markov nascosto. Il meccanismo di ricerca si basa su un’applicazione dell’algoritmo di Viterbi, il quale estrae dal modello la sequenza di canzoni che meglio rappresenta la query. L’obiettivo è di migliorare lo stato dell’arte dei sistemi di ricerca musicale e, in particolare, di music discovery fornendo all’utente liste di canzoni maggiormente rilevanti. Gli esperimenti infatti mostrano come il modello proposto risulta migliore sia di algoritmi che ordinano le canzoni utilizzando un’informazione sola, sia di quelli che le combinano in modo diverso. In aggiunta, l’alta generalità a del modello lo rende adatto anche ad altri settori multimediali, come le immagini e i video. In parallelo con i sistemi di music discovery, la tesi affronta anche il problema di identificazione musicale (i.e. music identification), il cui obiettivo è quello di associare tra loro diverse registrazioni audio che condividono lo stesso spartito musicale (i.e. trovare le versioni cover di una certa query). In funzione di questo, la tesi presenta due descrittori che si basano sulla progressione armonica della musica. Il loro scopo principale è quello di fornire una rappresentazione compatta del segnale audio che possa essere condivisa dalle canzoni aventi lo stesso spartito musicale. Al tempo stesso, mirano anche a ridurre lo spazio di memoria occupato e a permettere operazioni di ricerca efficienti anche in presenza di grandi collezioni. La validità dei due descrittori è stata verificata per l’identificazione di musica classica, ovvero lo scenario che maggiormente necessita di strategie automatiche per la gestione di registrazioni audio non catalogate. La scalabilità del sistema è garantita da una pre-ricerca basata su un indice che gestisce i descrittori musicali come fossero parole di un testo; in aggiunta, la precisione dell’identificazione è aumentata mediante un’operazione di allineamento eseguita utilizzando i modelli di Markov nascosti. I risultati sperimentali ottenuti con una collezione di più di diecimila registrazioni audio sono stati soddisfacenti sia da un punto di vista di efficienza sia di efficacia.

APA, Harvard, Vancouver, ISO, and other styles

8

Arlitsch, Kenning. "Semantic Web Identity of academic organizations." Doctoral thesis, Humboldt-Universität zu Berlin, Philosophische Fakultät I, 2017. http://dx.doi.org/10.18452/17671.

Full text

Abstract:

Semantic Web Identity kennzeichnet den Zustand, in dem ein Unternehmen von Suchmaschinen als Solches erkannt wird. Das Abrufen einer Knowledge Graph Card in Google-Suchergebnissen für eine akademische Organisation wird als Indikator für SWI nominiert, da es zeigt, dass Google nachprüfbare Tatsachen gesammelt hat, um die Organisation als Einheit zu etablieren. Diese Anerkennung kann wiederum die Relevanz ihrer Verweisungen an diese Organisation verbessern. Diese Dissertation stellt Ergebnisse einer Befragung der 125 Mitgliedsbibliotheken der Association of Research Libraries vor. Die Ergebnisse zeigen, dass diese Bibliotheken in den strukturierten Datensätzen, die eine wesentliche Grundlage des Semantic Web sind und Faktor bei der Erreichung der SWI sind, schlecht vertreten sind. Der Mangel an SWI erstreckt sich auf andere akademische Organisationen, insbesondere auf die unteren Hierarchieebenen von Universitäten. Ein Mangel an SWI kann andere Faktoren von Interesse für akademische Organisationen beeinflussen, einschließlich der Fähigkeit zur Gewinnung von Forschungsförderung, Immatrikulationsraten und Verbesserung des institutionellen Rankings. Diese Studie vermutet, dass der schlechte Zustand der SWI das Ergebnis eines Versagens dieser Organisationen ist, geeignete Linked Open Data und proprietäre Semantic Web Knowledge Bases zu belegen. Die Situation stellt eine Gelegenheit für akademische Bibliotheken dar, Fähigkeiten zu entwickeln, um ihre eigene SWI zu etablieren und den anderen Organisationen in ihren Institutionen einen SWI-Service anzubieten. Die Forschung untersucht den aktuellen Stand der SWI für ARL-Bibliotheken und einige andere akademische Organisationen und beschreibt Fallstudien, die die Wirksamkeit dieser Techniken zur Verbesserung der SWI validieren. Die erklärt auch ein neues Dienstmodell der SWI-Pflege, die von anderen akademischen Bibliotheken für ihren eigenen institutionellen Kontext angepasst werden. Semantic Web Identity (SWI) characterizes an entity that has been recognized as such by search engines. The display of a Knowledge Graph Card in Google search results for an academic organization is proposed as an indicator of SWI, as it demonstrates that Google has gathered enough verifiable facts to establish the organization as an entity. This recognition may in turn improve the accuracy and relevancy of its referrals to that organization. This dissertation presents findings from an in-depth survey of the 125 member libraries of the Association of Research Libraries (ARL). The findings show that these academic libraries are poorly represented in the structured data records that are a crucial underpinning of the Semantic Web and a significant factor in achieving SWI. Lack of SWI extends to other academic organizations, particularly those at the lower hierarchical levels of academic institutions, including colleges, departments, centers, and research institutes. A lack of SWI may affect other factors of interest to academic organizations, including ability to attract research funding, increase student enrollment, and improve institutional reputation and ranking. This study hypothesizes that the poor state of SWI is in part the result of a failure by these organizations to populate appropriate Linked Open Data (LOD) and proprietary Semantic Web knowledge bases. The situation represents an opportunity for academic libraries to develop skills and knowledge to establish and maintain their own SWI, and to offer SWI service to other academic organizations in their institutions. The research examines the current state of SWI for ARL libraries and some other academic organizations, and describes case studies that validate the effectiveness of proposed techniques to correct the situation. It also explains new services that are being developed at the Montana State University Library to address SWI needs on its campus, which could be adapted by other academic libraries.

APA, Harvard, Vancouver, ISO, and other styles

9

Gopinathan-Leela, Ligon, and n/a. "Personalisation of web information search: an agent based approach." University of Canberra. Information Sciences & Engineering, 2005. http://erl.canberra.edu.au./public/adt-AUC20060728.120849.

Full text

Abstract:

The main purpose of this research is to find an effective way to personalise information searching on the Internet using middleware search agents, namely, Personalised Search Agents (PSA). The PSA acts between users and search engines, and applies new and existing techniques to mine and exploit relevant and personalised information for users. Much research has already been done in developing personalising filters, as a middleware technique which can act between user and search engines to deliver more personalised results. These personalising filters, apply one or more of the popular techniques for search result personalisation, such as the category concept, learning from user actions and using metasearch engines. By developing the PSA, these techniques have been investigated and incorporated to create an effective middleware agent for web search personalisation. In this thesis, a conceptual model for the Personalised Search Agent is developed, implemented by developing a prototype and benchmarked the prototype against existing web search practices. System development methodology which has flexible and iterative procedures that switch between conceptual design and prototype development was adopted as the research methodology. In the conceptual model of the PSA, a multi-layer client server architecture is used by applying generalisation-specialisation features. The client and the server are structurally the same, but differ in the level of generalisation and interface. The client handles personalising information regarding one user whereas the server effectively combines the personalising information of all the clients (i.e. its users) to generate a global profile. Both client and server apply the category concept where user selected URLs are mapped against categories. The PSA learns the user relevant URLs both by requesting explicit feedback and by implicitly capturing user actions (for instance the active time spent by the user on a URL). The PSA also employs a keyword-generating algorithm, and tries different combinations of words in a user search string by effectively combining them with the relevant category values. The core functionalities of the conceptual model for the PSA, were implemented in a prototype, used to test the ideas in the real word. The result was benchmarked with the results from existing search engines to determine the efficiency of the PSA over conventional searching. A comparison of the test results revealed that the PSA is more effective and efficient in finding relevant and personalised results for individual users and possesses a unique user sense rather than the general user sense of traditional search engines. The PSA, is a novel architecture and contributes to the domain of knowledge web information searching, by delivering new ideas such as active time based user relevancy calculations, automatic generation of sensible search keyword combinations and the implementation of a multi-layer agent architecture. Moreover, the PSA has high potential for future extensions as well. Because it captures highly personalised data, data mining techniques which employ case-based reasoning make the PSA a more responsive, more accurate and more effective tool for personalised information searching.

APA, Harvard, Vancouver, ISO, and other styles

10

Ngindana, Mongezi. "Visibility of e-commerce websites to search engines : a comparison between text-based and graphic-based hyperlinks /." Thesis, Click here for online access, 2006. http://dk.cput.ac.za/cgi/viewcontent.cgi?article=1081&context=td_cput.

Full text

APA, Harvard, Vancouver, ISO, and other styles

11

Bicer, Veli [Verfasser], and R. [Akademischer Betreuer] Studer. "Search Relevance based on the Semantic Web / Veli Bicer. Betreuer: R. Studer." Karlsruhe : KIT-Bibliothek, 2012. http://d-nb.info/1028567200/34.

Full text

APA, Harvard, Vancouver, ISO, and other styles

12

Blacoe, Ian. "SERSE- An agent-based system for scalable search on the semantic web." Thesis, University of Liverpool, 2009. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.511093.

Full text

APA, Harvard, Vancouver, ISO, and other styles

13

Morales, Vidal Jorge Arturo. "Research on proposals and trends in the architectures of semantic search engines: a systematic literature review." Master's thesis, Pontificia Universidad Católica del Perú, 2018. http://tesis.pucp.edu.pe/repositorio/handle/123456789/11974.

Full text

Abstract:

Las tecnologías de web semántica han ganado atención en los últimos años, en su mayoría explicada por la proliferación de dispositivos móviles y el acceso a Internet de banda ancha. Tal como Tim Berners-Lee, creador de la World Wide Web, lo avisoró a principios de siglo, las tecnologías de la web semántica han fomentado el desarrollo de estándares que permiten, a su vez, la aparición de los motores de búsqueda semánticos que dan a los usuarios la información que están buscando. Este estudio de investigación presenta los resultados de una revisión sistemática de la literatura que se centra en la comprensión de las propuestas y tendencias en los motores de búsqueda semánticos desde el punto de vista de la arquitectura del software. A partir de los resultados, es posible decir que la mayoría de los estudios proponen una solución integral para sus usuarios, donde los requisitos, el contexto y los módulos que componen el buscador desempeñan un gran rol. Las ontologías y el conocimiento también juegan un papel importante en estas arquitecturas a medida que evolucionan, permitiendo una gran cantidad de soluciones que responden de una mejor manera a las expectativas de los usuarios. La presente tesis es una extensión del artículo "Research on proposals and trends in the architectures of semantic search engines: A systematic literature review", publicado en "Proceedings of the 2017 Federated Conference on Computer Science and Information Systems". Esta tesis expone mayores detalles con respecto al artículo publicado, teniendo ambos en común el desarrollo y los resultados de la revisión sistemática de la literatura. Tesis

APA, Harvard, Vancouver, ISO, and other styles

14

Zamir, Oren Eli. "Clustering web documents : a phrase-based method for grouping search engine results /." Thesis, Connect to this title online; UW restricted, 1999. http://hdl.handle.net/1773/6884.

Full text

APA, Harvard, Vancouver, ISO, and other styles

15

Rahuma, Awatef. "Semantically-enhanced image tagging system." Thesis, De Montfort University, 2013. http://hdl.handle.net/2086/9494.

Full text

Abstract:

In multimedia databases, data are images, audio, video, texts, etc. Research interests in these types of databases have increased in the last decade or so, especially with the advent of the Internet and Semantic Web. Fundamental research issues vary from unified data modelling, retrieval of data items and dynamic nature of updates. The thesis builds on findings in Semantic Web and retrieval techniques and explores novel tagging methods for identifying data items. Tagging systems have become popular which enable the users to add tags to Internet resources such as images, video and audio to make them more manageable. Collaborative tagging is concerned with the relationship between people and resources. Most of these resources have metadata in machine processable format and enable users to use free- text keywords (so-called tags) as search techniques. This research references some tagging systems, e.g. Flicker, delicious and myweb2.0. The limitation with such techniques includes polysemy (one word and different meaning), synonymy (different words and one meaning), different lexical forms (singular, plural, and conjugated words) and misspelling errors or alternate spellings. The work presented in this thesis introduces semantic characterization of web resources that describes the structure and organization of tagging, aiming to extend the existing Multimedia Query using similarity measures to cater for collaborative tagging. In addition, we discuss the semantic difficulties of tagging systems, suggesting improvements in their accuracies. The scope of our work is classified as follows: (i) Increase the accuracy and confidence of multimedia tagging systems. (ii) Increase the similarity measures of images by integrating varieties of measures. To address the first shortcoming, we use the WordNet based on a tagging system for social sharing and retrieval of images as a semantic lingual ontology resource. For the second shortcoming we use the similarity measures in different ways to recognise the multimedia tagging system. Fundamental to our work is the novel information model that we have constructed for our computation. This is based on the fact that an image is a rich object that can be characterised and formulated in n-dimensions, each dimension contains valuable information that will help in increasing the accuracy of the search. For example an image of a tree in a forest contains more information than an image of the same tree but in a different environment. In this thesis we characterise a data item (an image) by a primary description, followed by n-secondary descriptions. As n increases, the accuracy of the search improves. We give various techniques to analyse data and its associated query. To increase the accuracy of the tagging system we have performed different experiments on many images using similarity measures and various techniques from VoI (Value of Information). The findings have shown the linkage/integration between similarity measures and that VoI improves searches and helps/guides a tagger in choosing the most adequate of tags.

APA, Harvard, Vancouver, ISO, and other styles

16

Ayvaz, Serkan. "NEAR NEIGHBOR EXPLORATIONS FOR KEYWORD-BASED SEMANTIC SEARCHES USING RDF SUMMARY GRAPH." Kent State University / OhioLINK, 2015. http://rave.ohiolink.edu/etdc/view?acc_num=kent1447710652.

Full text

APA, Harvard, Vancouver, ISO, and other styles

17

Zhang, Limin. "Contextual Web Search Based on Semantic Relationships: A Theoretical Framework, Evaluation and a Medical Application Prototype." Diss., Tucson, Arizona : University of Arizona, 2006. http://etd.library.arizona.edu/etd/GetFileServlet?file=file:///data1/pdf/etd/azu%5Fetd%5F1602%5F1%5Fm.pdf&type=application/pdf.

Full text

APA, Harvard, Vancouver, ISO, and other styles

18

Zhu, Dengya. "Improving the relevance of search results via search-term disambiguation and ontological filtering." Thesis, Curtin University, 2007. http://hdl.handle.net/20.500.11937/2486.

Full text

Abstract:

With the exponential growth of the Web and the inherent polysemy and synonymy problems of the natural languages, search engines are facing many challenges such as information overload, mismatch of search results, missing relevant documents, poorly organized search results, and mismatch of human mental model of clustering engines. To address these issues, much effort including employing different information retrieval (IR) models, information categorization/clustering, personalization, semantic Web, ontology-based IR, and so on, has been devoted to improve the relevance of search results. The major focus of this study is to dynamically re-organize Web search results under a socially constructed hierarchical knowledge structure, to facilitate information seekers to access and manipulate the retrieved search results, and consequently to improve the relevance of search results.To achieve the above research goal, a special search-browser is developed, and its retrieval effectiveness is evaluated. The hierarchical structure of the Open Directory Project (ODP) is employed as the socially constructed knowledge structure which is represented by the Tree component of Java. Yahoo! Search Web Services API is utilized to obtain search results directly from Yahoo! search engine databases. The Lucene text search engine calculates similarities between each returned search result and the semantic characteristics of each category in the ODP; and thus to assign the search results to the corresponding ODP categories by Majority Voting algorithm. When an interesting category is selected by a user, only search results categorized under the category are presented to the user, and the quality of the search results is consequently improved.Experiments demonstrate that the proposed approach of this research can improve the precision of Yahoo! search results at the 11 standard recall levels from an average 41.7 per cent to 65.2 per cent; the improvement is as high as 23.5 per cent. This conclusion is verified by comparing the improvements of the P@5 and P@10 of Yahoo! search results and the categorized search results of the special search-browser. The improvement of P@5 and P@10 are 38.3 per cent (85 per cent - 46.7 per cent) and 28 per cent (70 per cent - 42 per cent) respectively. The experiment of this research is well designed and controlled. To minimize the subjectiveness of relevance judgments, in this research five judges (experts) are asked to make their relevance judgments independently, and the final relevance judgment is a combination of the five judges’ judgments. The judges are presented with only search-terms, information needs, and the 50 search results of Yahoo! Search Web Service API. They are asked to make relevance judgments based on the information provided above, there is no categorization information provided.The first contribution of this research is to use an extracted category-document to represent the semantic characteristics of each of the ODP categories. A category-document is composed of the topic of the category, description of the category, the titles and the brief descriptions of the submitted Web pages under this category. Experimental results demonstrate the category-documents of the ODP can represent the semantic characteristics of the ODP in most cases. Furthermore, for machine learning algorithms, the extracted category-documents can be utilized as training data which otherwise demand much human labor to create to ensure the learning algorithm to be properly trained. The second contribution of this research is the suggestion of the new concepts of relevance judgment convergent degree and relevance judgment divergent degree that are used to measure how well different judges agree with each other when they are asked to judge the relevance of a list of search results. When the relevance judgment convergent degree of a search-term is high, an IR algorithm should obtain a higher precision as well. On the other hand, if the relevance judgment convergent degree is low, or the relevance judgment divergent degree is high, it is arguable to use the data to evaluate the IR algorithm. This intuition is manifested by the experiment of this research. The last contribution of this research is that the developed search-browser is the first IR system (IRS) to utilize the ODP hierarchical structure to categorize and filter search results, to the best of my knowledge.

APA, Harvard, Vancouver, ISO, and other styles

19

Zhu, Dengya. "Improving the relevance of search results via search-term disambiguation and ontological filtering." Curtin University of Technology, School of Information Systems, 2007. http://espace.library.curtin.edu.au:80/R/?func=dbin-jump-full&object_id=9348.

Full text

Abstract:

With the exponential growth of the Web and the inherent polysemy and synonymy problems of the natural languages, search engines are facing many challenges such as information overload, mismatch of search results, missing relevant documents, poorly organized search results, and mismatch of human mental model of clustering engines. To address these issues, much effort including employing different information retrieval (IR) models, information categorization/clustering, personalization, semantic Web, ontology-based IR, and so on, has been devoted to improve the relevance of search results. The major focus of this study is to dynamically re-organize Web search results under a socially constructed hierarchical knowledge structure, to facilitate information seekers to access and manipulate the retrieved search results, and consequently to improve the relevance of search results. To achieve the above research goal, a special search-browser is developed, and its retrieval effectiveness is evaluated. The hierarchical structure of the Open Directory Project (ODP) is employed as the socially constructed knowledge structure which is represented by the Tree component of Java. Yahoo! Search Web Services API is utilized to obtain search results directly from Yahoo! search engine databases. The Lucene text search engine calculates similarities between each returned search result and the semantic characteristics of each category in the ODP; and thus to assign the search results to the corresponding ODP categories by Majority Voting algorithm. When an interesting category is selected by a user, only search results categorized under the category are presented to the user, and the quality of the search results is consequently improved. Experiments demonstrate that the proposed approach of this research can improve the precision of Yahoo! search results at the 11 standard recall levels from an average 41.7 per cent to 65.2 per cent; the improvement is as high as 23.5 per cent. This conclusion is verified by comparing the improvements of the P@5 and P@10 of Yahoo! search results and the categorized search results of the special search-browser. The improvement of P@5 and P@10 are 38.3 per cent (85 per cent - 46.7 per cent) and 28 per cent (70 per cent - 42 per cent) respectively. The experiment of this research is well designed and controlled. To minimize the subjectiveness of relevance judgments, in this research five judges (experts) are asked to make their relevance judgments independently, and the final relevance judgment is a combination of the five judges’ judgments. The judges are presented with only search-terms, information needs, and the 50 search results of Yahoo! Search Web Service API. They are asked to make relevance judgments based on the information provided above, there is no categorization information provided. The first contribution of this research is to use an extracted category-document to represent the semantic characteristics of each of the ODP categories. A category-document is composed of the topic of the category, description of the category, the titles and the brief descriptions of the submitted Web pages under this category. Experimental results demonstrate the category-documents of the ODP can represent the semantic characteristics of the ODP in most cases. Furthermore, for machine learning algorithms, the extracted category-documents can be utilized as training data which otherwise demand much human labor to create to ensure the learning algorithm to be properly trained. The second contribution of this research is the suggestion of the new concepts of relevance judgment convergent degree and relevance judgment divergent degree that are used to measure how well different judges agree with each other when they are asked to judge the relevance of a list of search results. When the relevance judgment convergent degree of a search-term is high, an IR algorithm should obtain a higher precision as well. On the other hand, if the relevance judgment convergent degree is low, or the relevance judgment divergent degree is high, it is arguable to use the data to evaluate the IR algorithm. This intuition is manifested by the experiment of this research. The last contribution of this research is that the developed search-browser is the first IR system (IRS) to utilize the ODP hierarchical structure to categorize and filter search results, to the best of my knowledge.

APA, Harvard, Vancouver, ISO, and other styles

20

Lisena, Pasquale. "Knowledge-based music recommendation : models, algorithms and exploratory search." Electronic Thesis or Diss., Sorbonne université, 2019. http://www.theses.fr/2019SORUS614.

Full text

Abstract:

Représenter l'information décrivant la musique est une activité complexe, qui implique différentes sous-tâches. Ce manuscrit de thèse porte principalement sur la musique classique et étudie comment représenter et exploiter ses informations. L'objectif principal est l'étude de stratégies de représentation et de découverte des connaissances appliquées à la musique classique, dans des domaines tels que la production de base de connaissances, la prédiction de métadonnées et les systèmes de recommandation. Nous proposons une architecture pour la gestion des métadonnées de musique à l'aide des technologies du Web Sémantique. Nous introduisons une ontologie spécialisée et un ensemble de vocabulaires contrôlés pour les différents concepts spécifiques à la musique. Ensuite, nous présentons une approche de conversion des données, afin d’aller au-delà de la pratique bibliothécaire actuellement utilisée, en s’appuyant sur des règles de mapping et sur l’interconnexion avec des vocabulaires contrôlés. Enfin, nous montrons comment ces données peuvent être exploitées. En particulier, nous étudions des approches basées sur des plongements calculés sur des métadonnées structurées, des titres et de la musique symbolique pour classer et recommander de la musique. Plusieurs applications de démonstration ont été réalisées pour tester les approches et les ressources précédentes Representing the information about music is a complex activity that involves different sub-tasks. This thesis manuscript mostly focuses on classical music, researching how to represent and exploit its information. The main goal is the investigation of strategies of knowledge representation and discovery applied to classical music, involving subjects such as Knowledge-Base population, metadata prediction, and recommender systems. We propose a complete workflow for the management of music metadata using Semantic Web technologies. We introduce a specialised ontology and a set of controlled vocabularies for the different concepts specific to music. Then, we present an approach for converting data, in order to go beyond the librarian practice currently in use, relying on mapping rules and interlinking with controlled vocabularies. Finally, we show how these data can be exploited. In particular, we study approaches based on embeddings computed on structured metadata, titles, and symbolic music for ranking and recommending music. Several demo applications have been realised for testing the previous approaches and resources

APA, Harvard, Vancouver, ISO, and other styles

21

Sahay, Saurav. "Socio-semantic conversational information access." Diss., Georgia Institute of Technology, 2011. http://hdl.handle.net/1853/42855.

Full text

Abstract:

The main contributions of this thesis revolve around development of an integrated conversational recommendation system, combining data and information models with community network and interactions to leverage multi-modal information access. We have developed a real time conversational information access community agent that leverages community knowledge by pushing relevant recommendations to users of the community. The recommendations are delivered in the form of web resources, past conversation and people to connect to. The information agent (cobot, for community/ collaborative bot) monitors the community conversations, and is 'aware' of users' preferences by implicitly capturing their short term and long term knowledge models from conversations. The agent leverages from health and medical domain knowledge to extract concepts, associations and relationships between concepts; formulates queries for semantic search and provides socio-semantic recommendations in the conversation after applying various relevance filters to the candidate results. The agent also takes into account users' verbal intentions in conversations while making recommendation decision. One of the goals of this thesis is to develop an innovative approach to delivering relevant information using a combination of social networking, information aggregation, semantic search and recommendation techniques. The idea is to facilitate timely and relevant social information access by mixing past community specific conversational knowledge and web information access to recommend and connect users with relevant information. Language and interaction creates usable memories, useful for making decisions about what actions to take and what information to retain. Cobot leverages these interactions to maintain users' episodic and long term semantic models. The agent analyzes these memory structures to match and recommend users in conversations by matching with the contextual information need. The social feedback on the recommendations is registered in the system for the algorithms to promote community preferred, contextually relevant resources. The nodes of the semantic memory are frequent concepts extracted from user's interactions. The concepts are connected with associations that develop when concepts co-occur frequently. Over a period of time when the user participates in more interactions, new concepts are added to the semantic memory. Different conversational facets are matched with episodic memories and a spreading activation search on the semantic net is performed for generating the top candidate user recommendations for the conversation. The tying themes in this thesis revolve around informational and social aspects of a unified information access architecture that integrates semantic extraction and indexing with user modeling and recommendations.

APA, Harvard, Vancouver, ISO, and other styles

22

Linckels, Serge. "An e-librarian service : supporting explorative learning by a description logics based semantic retrieval tool." Phd thesis, Universität Potsdam, 2008. http://opus.kobv.de/ubp/volltexte/2008/1745/.

Full text

Abstract:

Although educational content in electronic form is increasing dramatically, its usage in an educational environment is poor, mainly due to the fact that there is too much of (unreliable) redundant, and not relevant information. Finding appropriate answers is a rather difficult task being reliant on the user filtering of the pertinent information from the noise. Turning knowledge bases like the online tele-TASK archive into useful educational resources requires identifying correct, reliable, and "machine-understandable" information, as well as developing simple but efficient search tools with the ability to reason over this information. Our vision is to create an E-Librarian Service, which is able to retrieve multimedia resources from a knowledge base in a more efficient way than by browsing through an index, or by using a simple keyword search. In our E-Librarian Service, the user can enter his question in a very simple and human way; in natural language (NL). Our premise is that more pertinent results would be retrieved if the search engine understood the sense of the user's query. The returned results are then logical consequences of an inference rather than of keyword matchings. Our E-Librarian Service does not return the answer to the user's question, but it retrieves the most pertinent document(s), in which the user finds the answer to his/her question. Among all the documents that have some common information with the user query, our E-Librarian Service identifies the most pertinent match(es), keeping in mind that the user expects an exhaustive answer while preferring a concise answer with only little or no information overhead. Also, our E-Librarian Service always proposes a solution to the user, even if the system concludes that there is no exhaustive answer. Our E-Librarian Service was implemented prototypically in three different educational tools. A first prototype is CHESt (Computer History Expert System); it has a knowledge base with 300 multimedia clips that cover the main events in computer history. A second prototype is MatES (Mathematics Expert System); it has a knowledge base with 115 clips that cover the topic of fractions in mathematics for secondary school w.r.t. the official school programme. All clips were recorded mainly by pupils. The third and most advanced prototype is the "Lecture Butler's E-Librarain Service"; it has a Web service interface to respect a service oriented architecture (SOA), and was developed in the context of the Web-University project at the Hasso-Plattner-Institute (HPI). Two major experiments in an educational environment - at the Lycée Technique Esch/Alzette in Luxembourg - were made to test the pertinence and reliability of our E-Librarian Service as a complement to traditional courses. The first experiment (in 2005) was made with CHESt in different classes, and covered a single lesson. The second experiment (in 2006) covered a period of 6 weeks of intensive use of MatES in one class. There was no classical mathematics lesson where the teacher gave explanations, but the students had to learn in an autonomous and exploratory way. They had to ask questions to the E-Librarian Service just the way they would if there was a human teacher. Obwohl sich die Verfügbarkeit von pädagogischen Inhalten in elektronischer Form stetig erhöht, ist deren Nutzen in einem schulischen Umfeld recht gering. Die Hauptursache dessen ist, dass es zu viele unzuverlässige, redundante und nicht relevante Informationen gibt. Das Finden von passenden Lernobjekten ist eine schwierige Aufgabe, die vom benutzerbasierten Filtern der passenden Informationen abhängig ist. Damit Wissensbanken wie das online Tele-TASK Archiv zu nützlichen, pädagogischen Ressourcen werden, müssen Lernobjekte korrekt, zuverlässig und in maschinenverständlicher Form identifiziert werden, sowie effiziente Suchwerkzeuge entwickelt werden. Unser Ziel ist es, einen E-Bibliothekar-Dienst zu schaffen, der multimediale Ressourcen in einer Wissensbank auf effizientere Art und Weise findet als mittels Navigieren durch ein Inhaltsverzeichnis oder mithilfe einer einfachen Stichwortsuche. Unsere Prämisse ist, dass passendere Ergebnisse gefunden werden könnten, wenn die semantische Suchmaschine den Sinn der Benutzeranfrage verstehen würde. In diesem Fall wären die gelieferten Antworten logische Konsequenzen einer Inferenz und nicht die einer Schlüsselwortsuche. Tests haben gezeigt, dass unser E-Bibliothekar-Dienst unter allen Dokumenten in einer gegebenen Wissensbank diejenigen findet, die semantisch am besten zur Anfrage des Benutzers passen. Dabei gilt, dass der Benutzer eine vollständige und präzise Antwort erwartet, die keine oder nur wenige Zusatzinformationen enthält. Außerdem ist unser System in der Lage, dem Benutzer die Qualität und Pertinenz der gelieferten Antworten zu quantifizieren und zu veranschaulichen. Schlussendlich liefert unser E-Bibliothekar-Dienst dem Benutzer immer eine Antwort, selbst wenn das System feststellt, dass es keine vollständige Antwort auf die Frage gibt. Unser E-Bibliothekar-Dienst ermöglicht es dem Benutzer, seine Fragen in einer sehr einfachen und menschlichen Art und Weise auszudrücken, nämlich in natürlicher Sprache. Linguistische Informationen und ein gegebener Kontext in Form einer Ontologie werden für die semantische Übersetzung der Benutzereingabe in eine logische Form benutzt. Unser E-Bibliothekar-Dienst wurde prototypisch in drei unterschiedliche pädagogische Werkzeuge umgesetzt. In zwei Experimenten wurde in einem pädagogischen Umfeld die Angemessenheit und die Zuverlässigkeit dieser Werkzeuge als Komplement zum klassischen Unterricht geprüft. Die Hauptergebnisse sind folgende: Erstens wurde festgestellt, dass Schüler generell akzeptieren, ganze Fragen einzugeben - anstelle von Stichwörtern - wenn dies ihnen hilft, bessere Suchresultate zu erhalten. Zweitens, das wichtigste Resultat aus den Experimenten ist die Erkenntnis, dass Schuleresultate verbessert werden können, wenn Schüler unseren E-Bibliothekar-Dienst verwenden. Wir haben eine generelle Verbesserung von 5% der Schulresultate gemessen. 50% der Schüler haben ihre Schulnoten verbessert, 41% von ihnen sogar maßgeblich. Einer der Hauptgründe für diese positiven Resultate ist, dass die Schüler motivierter waren und folglich bereit waren, mehr Einsatz und Fleiß in das Lernen und in das Erwerben von neuem Wissen zu investieren.

APA, Harvard, Vancouver, ISO, and other styles

23

Anderson, James D. "Interactive Visualization of Search Results of Large Document Sets." Wright State University / OhioLINK, 2018. http://rave.ohiolink.edu/etdc/view?acc_num=wright1547048073451373.

Full text

APA, Harvard, Vancouver, ISO, and other styles

24

Kidambi, Phani Nandan. "A HUMAN-COMPUTER INTEGRATED APPROACH TOWARDS CONTENT BASED IMAGE RETRIEVAL." Wright State University / OhioLINK, 2010. http://rave.ohiolink.edu/etdc/view?acc_num=wright1292647701.

Full text

APA, Harvard, Vancouver, ISO, and other styles

25

Khan, Arshad Ali. "Exploiting Linked Open Data (LoD) and Crowdsourcing-based semantic annotation & tagging in web repositories to improve and sustain relevance in search results." Thesis, University of Southampton, 2018. https://eprints.soton.ac.uk/428046/.

Full text

Abstract:

Online searching of multi-disciplinary web repositories is a topic of increasing importance as the number of repositories increases and the diversity of skills and backgrounds of their users widens. Earlier term-frequency based approaches have been improved by ontology-based semantic annotation, but such approaches are predominantly driven by "domain ontologies engineering first" and lack dynamicity, whereas the information is dynamic; the meaning of things changes with time; and new concepts are constantly being introduced. Further, there is no sustainable framework or method, discovered so far, which could automatically enrich the content of heterogeneous online resources for information retrieval over time. Furthermore, the methods and techniques being applied are fast becoming inadequate due to increasing data volume, concept obsolescence, and complexity and heterogeneity of content types in web repositories. In the face of such complexities, term matching alone between a query and the indexed documents will no longer fulfil complex user needs. The ever growing gap between syntax and semantics needs to be continually bridged in order to address the above issues; and ensure accurate search results retrieval, against natural language queries, despite such challenges. This thesis investigates that by domain-specific expert crowd-annotation of content, on top of the automatic semantic annotation (using Linked Open Data sources), the contemporary value of content in scientific repositories, can be continually enriched and sustained. A purpose-built annotation, indexing and searching environment has been developed and deployed to a web repository, which hosts more than 3,400 heterogeneous web documents. Based on expert crowd annotations, automatic LoD-based named entity extraction and search results evaluations, this research finds that search results retrieval, having the crowd-sourced element, performs better than those having no crowd-sourced element. This thesis also shows that a consensus can be reached between the expert and non-expert crowd-sourced annotators on annotating and tagging the content of web repositories, using the controlled vocabulary (typology) and free-text terms and keywords.

APA, Harvard, Vancouver, ISO, and other styles

26

Garcia, Léo Manoel Lopes da Silva [UNESP]. "Investigação e implementação de ferramentas computacionais para otimização de websites com ênfase na descrição de conteúdo." Universidade Estadual Paulista (UNESP), 2011. http://hdl.handle.net/11449/98701.

Full text

Abstract:

Made available in DSpace on 2014-06-11T19:29:41Z (GMT). No. of bitstreams: 0 Previous issue date: 2011-08-03Bitstream added on 2014-06-13T20:59:57Z : No. of bitstreams: 1 garcia_lmls_me_sjrp.pdf: 6057674 bytes, checksum: a26fce0d239fd5ca19b1f04d3236faa6 (MD5) Quando fala-se de evolução da Web, poderia realmente ser mais apropriado falar de design inteligente. Com a Web se tornando a principal opção para quem produz e dissemina conteúdo digital, cada vez mais, as pessoas tomam a atenção para esse valioso repositório de conhecimento. Neste ambiente, os mecanismos de busca configuram-se em aplicativos populares, tornando-se intermediários entre os usuários e a miríade de informações, serviços e recursos disponíveis na grande rede. Neste sentido, o Webdesigner pode atuar de forma decisiva, proporcionando uma melhor resposta na classificação dos mecanismos de busca. A correta representação do conhecimento é a chave para a recuperação e para a disseminação efetiva de dados, de informação e de conhecimentos. Este trabalho apresenta um estudo que pode trazer um progresso relevante aos usuários desta grande rede, buscando apresentar uma ferramenta de domínio público que apoie a aplicação de técnicas de descrição semântica de informação na Web. No decorrer da pesquisa investigamos técnicas e metodologias capazes de otimizar a indexação dos Websites pelos mecanismos de busca, enfatizando a descrição do conteúdo nele presente, melhorando sua classificação e consequentemente colaborando com a qualidade na recuperação de informações realizadas por meio de mecanismos de buscas. Tais técnicas foram testadas em alguns Websites, obtendo resultado satisfatório, a partir de então a ferramenta foi implementada e submetida a usuários para sua validação, o resultado desta validação é apresentado demonstrando a viabilidade da ferramenta e enumeração de novas funcionalidades para trabalhos futuros When we speak of evolution of the Web, it might actually be more appropriate to speak of intelligent design. With the Web becoming the primary choice for those who produce and disseminate digital content , more people take attention to this valuable repository of knowledge. In this environment , search engines are configured in popular, becoming an intermediary between users and the myriad of information, service and resources available on the World Wide Web. In this sense, the Web designer can act decisively, providing a better response in the ranking of search engines. The correct representation of knowledge is the key to recovery and effective dissemination of data, information and knowledge. This paper presents a study that significant progress can bring a large network of users, seeking to present a public domain tool that supports the application of techniques for semantic description of Web information in the course of the research investigated techniques and methodologies that can optimize Website indexing by search engines, emphasizing the description of the content in it, improving your ranking and thus contributing to quality in information retrieval conducted through search engines. These techniques were tested on some websites, obtaining satisfactory results, since then the tool was implemented and submitted to users validation, the result of the validation is present demonstrating the feasibility of the tool and list of new features for future work

APA, Harvard, Vancouver, ISO, and other styles

27

Lopes, Rodrigo Arthur de Souza Pereira. "Proposta de sistema de busca de jogos eletrônicos pautada em ontologia e semântica." Universidade Presbiteriana Mackenzie, 2011. http://tede.mackenzie.br/jspui/handle/tede/1410.

Full text

Abstract:

Made available in DSpace on 2016-03-15T19:37:38Z (GMT). No. of bitstreams: 1 Rodrigo Arthur de Souza Pereira Lopes.pdf: 2274739 bytes, checksum: 9c19f5e6e3196f349ff838640ac37cc9 (MD5) Previous issue date: 2011-08-10 Universidade Presbiteriana Mackenzie With the constant growth in the quantity of websites, and consequently the increase in content availability throughout the Internet, the development of search mechanisms that enable access to reliable information has become a complex activity. In this sense, this work presents a revision on the behavior of search mechanisms, as well as the manner through which they map information, including the study of ontologies and knowledge bases, as well as forms of knowledge representation on the Internet. These models integrate the Semantic Web, which constitutes a proposal for the organization of information. Based on these elements, a search mechanism was developed for a specific domain: videogames. This mechanism is based on the classification of electronic games by specialized review websites, where one may extract information about select titles. As such, this work is divided in four stages. Firstly, data is extracted from the aforementioned websites for previously selected titles through the use of a webcrawler. Secondly, an analysis is performed on the obtained data on two fronts, utilizing natural computing as well as power-law concepts. Next, an ontology for videogames is constructed, with its subsequent publication in a knowledge base accessible to the software. Lastly, the implementation of the actual mechanism, which will make use of the knowledge base and bring the user suggestions pertaining to his search, such as titles or related characteristics intrinsic to games that may be evaluated relating to the search. This work also hopes to present itself as a useful model that may be utilized in different domains, such as movies, travel destinations, electronic appliances and software, among others. Com o crescimento da quantidade de websites e, consequentemente, o aumento de conteúdo disponível na Internet, desenvolver sistemas de busca que possibilitem o acesso à informação confiável tornou-se uma atividade complexa. Desta forma, este trabalho apresenta uma revisão do funcionamento dos mecanismos de busca e das formas pelas quais a informação é mapeada, o que inclui o estudo de ontologias e bases de conhecimento, bem como de formas de representação de informação na Internet. Estes modelos integram a Web Semântica, que constitui uma proposta de organização de informação. Com base nestes elementos foi desenvolvido um sistema de busca de conteúdo em um domínio específico: jogos eletrônicos. Este pauta-se na classificação de websites especializados, de onde pode-se extrair informações das resenhas disponíveis sobre os títulos escolhidos. Para tanto, a proposta divide-se em quatro fases. A primeira relaciona-se à coleta de dados dos websites mencionados por meio da implementação de um webcrawler que realiza a extração de informações de uma lista de jogos pré-determinada. Em seguida é feito o tratamento e a análise dos dados por meio de duas abordagens, que utilizam-se de computação natural e conceitos de lei de potência. Além disso, foi feita a construção de uma ontologia para estes jogos e publicação destes dados em uma base de conhecimento acessível ao software. Por último, foi implementado um mecanismo de busca que faz uso da base de conhecimento e apresenta como resultado, ao usuário, sugestões pertinentes à sua busca, como títulos ou características relacionadas. Este trabalho ainda apresenta um modelo que pode ser utilizado em outros domínios, tais como filmes, destinos de viagens, eletrodomésticos, softwares, dentre outros.

APA, Harvard, Vancouver, ISO, and other styles

28

Garcia, Léo Manoel Lopes da Silva. "Investigação e implementação de ferramentas computacionais para otimização de websites com ênfase na descrição de conteúdo /." São José do Rio Preto : [s.n.], 2011. http://hdl.handle.net/11449/98701.

Full text

Abstract:

Resumo: Quando fala-se de evolução da Web, poderia realmente ser mais apropriado falar de design inteligente. Com a Web se tornando a principal opção para quem produz e dissemina conteúdo digital, cada vez mais, as pessoas tomam a atenção para esse valioso repositório de conhecimento. Neste ambiente, os mecanismos de busca configuram-se em aplicativos populares, tornando-se intermediários entre os usuários e a miríade de informações, serviços e recursos disponíveis na grande rede. Neste sentido, o Webdesigner pode atuar de forma decisiva, proporcionando uma melhor resposta na classificação dos mecanismos de busca. A correta representação do conhecimento é a chave para a recuperação e para a disseminação efetiva de dados, de informação e de conhecimentos. Este trabalho apresenta um estudo que pode trazer um progresso relevante aos usuários desta grande rede, buscando apresentar uma ferramenta de domínio público que apoie a aplicação de técnicas de descrição semântica de informação na Web. No decorrer da pesquisa investigamos técnicas e metodologias capazes de otimizar a indexação dos Websites pelos mecanismos de busca, enfatizando a descrição do conteúdo nele presente, melhorando sua classificação e consequentemente colaborando com a qualidade na recuperação de informações realizadas por meio de mecanismos de buscas. Tais técnicas foram testadas em alguns Websites, obtendo resultado satisfatório, a partir de então a ferramenta foi implementada e submetida a usuários para sua validação, o resultado desta validação é apresentado demonstrando a viabilidade da ferramenta e enumeração de novas funcionalidades para trabalhos futuros Abstract: When we speak of evolution of the Web, it might actually be more appropriate to speak of intelligent design. With the Web becoming the primary choice for those who produce and disseminate digital content , more people take attention to this valuable repository of knowledge. In this environment , search engines are configured in popular, becoming an intermediary between users and the myriad of information, service and resources available on the World Wide Web. In this sense, the Web designer can act decisively, providing a better response in the ranking of search engines. The correct representation of knowledge is the key to recovery and effective dissemination of data, information and knowledge. This paper presents a study that significant progress can bring a large network of users, seeking to present a public domain tool that supports the application of techniques for semantic description of Web information in the course of the research investigated techniques and methodologies that can optimize Website indexing by search engines, emphasizing the description of the content in it, improving your ranking and thus contributing to quality in information retrieval conducted through search engines. These techniques were tested on some websites, obtaining satisfactory results, since then the tool was implemented and submitted to users validation, the result of the validation is present demonstrating the feasibility of the tool and list of new features for future work Orientador: João Fernando Marar Coorientador: Ivan Rizzo Guilherme Banca: Edson Costa de Barros Carvalho Filho Banca: Antonio Carlos Sementille Mestre

APA, Harvard, Vancouver, ISO, and other styles

29

Raje, Satyajeet. "ResearchIQ: An End-To-End Semantic Knowledge Platform For Resource Discovery in Biomedical Research." The Ohio State University, 2012. http://rave.ohiolink.edu/etdc/view?acc_num=osu1354657305.

Full text

APA, Harvard, Vancouver, ISO, and other styles

30

Drivas, Ioannis C. "Improving the Visibility and the Accessibility of Web Services. A User-Centric Approach." Thesis, Linnéuniversitetet, Institutionen för informatik (IK), 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:lnu:diva-66893.

Full text

Abstract:

The World Wide Web provides a well standing environment in any kind of organizations for exposing online products and services. However, no one ensures that web products or services which provided by organizations or enterprises, would receive the proper visibility and accessibility by the internet users. The process of Search Engine Optimization examines usability in design, architecture and content that an internet-based system has, for improving its visibility and accessibility in the web. Successful SEO process in an internet-based system, which is set under the paternity of an organization, ensures higher recognition, visibility and accessibility for the web services that the system provides to internet users. The aim of this study characterized with a trinity of axes. In the first axe, an internet-based system and the web services that provides is examined in order to understand its initial situation regarding its visibility and accessibility in the web. In the second axe, the study follows a user-centric approach on how and in what way the examined system could be improved based on its users’ needs and desires. After the encapsulation of needs and desires that the users expressed as regards the usability of the system in design, architecture and content, the third axe takes place. In the third axe, the extracted needs and desires of users are implemented in the under-examined system, in order to understand if its visibility and accessibility has improved in the World Wide Web.For the completion of this trinity of axes, the Soft Systems Methodology approach is adopted. SSM is an action-oriented process of inquiry which deals with a problematic situation from the Finding Out about the situation through the Taking Action to improve it. Following an interpretative research approach, ten semi-structured interviews take place in order to capture all the participants’ perceptions and different worldviews regarding of what are the changes that they need and desire from the examined system. Moreover, in this study, the conduction of three Workshops, constitute a cornerstone for implementing systemically desirable and culturally feasible changes where all participants can live with, in order to improve system’s visibility and accessibility in the internet world. The results indicate that the adoption of participants’ needs and desires, improved the levels of usability, visibility and accessibility of the under examined internet-based system. Overall, this study firstly contributes to expand the knowledge as regards the process of improving the visibility and accessibility of internet-based systems and their web services in the internet world, based on a user-centric approach. Secondly, this study works as a practical toolbox for any kind of organization which intends to improve the visibility and accessibility of its current or potential web services in the World Wide Web.

APA, Harvard, Vancouver, ISO, and other styles

31

Andrade, Julietti de. "Interoperabilidade e mapeamentos entre sistemas de organização do conhecimento na busca e recuperação de informações em saúde: estudo de caso em ortopedia e traumatologia." Universidade de São Paulo, 2015. http://www.teses.usp.br/teses/disponiveis/27/27151/tde-29062015-121813/.

Full text

Abstract:

Esta pesquisa apresenta o desenvolvimento de método de busca e recuperação de informações em bases de dados especializadas para produção do conhecimento científico na área da Saúde, com ênfase na Saúde Baseada em Evidências. Recorremos, neste trabalho, a diferentes metodologias considerando as especificidades de cada etapa: pesquisa exploratória, método hipotético dedutivo e estudo de caso empírico qualitativo. Mobilizamos os fundamentos teórico-metodológicos da Ciência da Informação e da Saúde nos domínios da Organização e Recuperação da Informação e do Conhecimento, Web Semântica, Saúde Baseada em Evidências e Metodologia Científica, assim como realizamos dois experimentos: estudo de caso em Ortopedia e Traumatologia no sentido de identificar e estabelecer critérios para busca, recuperação, organização e seleção de informações de modo que possam integrar parte da metodologia de trabalhos científicos na área da Saúde; e análise dos tipos de busca e recuperação e dos mapeamentos entre Sistemas de Organização do Conhecimento (SOC) propostos no Metatesauro no escopo da Unified Medical Language System (UMLS) da US National Library of Medicine e no BioPortal da National Center for Biomedical Ontology, ambos na área biomédica. O UMLS disponibiliza acesso a 151 SOC, e o BioPortal, um conjunto de 302 ontologias. Apresentam-se propostas para construção de estratégias de busca com uso de Sistemas de Organização do Conhecimento mapeados e interoperados, bem como para realização de pesquisas bibliográficas para elaboração de trabalhos científicos na área da Saúde. This research presents the development of method for search and information retrieval in specialized databases aiming the production of scientific knowledge in healthcare, with emphasis on Evidence-Based Health. We have used, in this work, different techniques considering the specificities of each stage: exploratory research, hypothetical deductive method and qualitative empirical case study. It mobilizes the theoretical and methodological foundations in Information Science and Health, appling them to areas as knowledge organization and information retrieval, Semantic Web, Evidence-Based Health and Scientific Methodology. Two experiments were performed: a case study in Orthopedics and Traumatology in order to identify and establish criterions for search, retrieval, organization and selection of information, so that these criterions can integrate part of the methodology of scientific work in healthcare; and analysis of kinds of search and retrieval and mappings on Knowledge Organization Systems-KOS available in Metathesaurus, considering the scope of the Unified Medical Language System (UMLS) of the US National Library of Medicine (NLM), and in the BioPortal National Center for Biomedical Ontology, both in the biomedical field. The UMLS provides access to 151 KOS, and the BioPortal provides a set of 302 ontologies. We presented proposals for construction of search strategies by using Knowledge Organization System mapped and interoperate as well as for conducting literature searches for preparation of scientific papers in healthcare.

APA, Harvard, Vancouver, ISO, and other styles

32

Hung-Chien, Chien, and 簡宏傑. "Study and Implementation of a Learning Content Management System Search Engine for Special Education Based on Semantic Web." Thesis, 2006. http://ndltd.ncl.edu.tw/handle/80627884641375096197.

Full text

Abstract:

碩士 明新科技大學 資訊管理研究所 94 Computer assisted teaching/learning has been trend of researches due to the accelerated evolutions of information technologies. However, most of them do not address the needs of special education. Individuals receiving special education diversify in every aspect of their learning and development process. As a result, individualized instruction has been one of the major characteristics in special education. Since there is no common teaching material that fits all special education students, teachers usually have to develop courseware specific to each student (at various grades) of their own. This imposes extra workload to most special education teachers. Accordingly, the idea of having a common repository (learning contents management system, LCMS) for the self-developed courseware and help the sharing of such courseware among teachers seems to be appealing, especially to special education teachers. In this research, we propose and implement an intelligent learning contents management system, which incorporates the semantic web and ontology mechanisms. In addition, the LCMS also provides an interface that accepts output from the DALE computerized IEP (Individualized Educational Program) system. Through these mechanisms, special education teachers can more accurately find the appropriate courseware that is suitable for their students. In recent survey, the LCMS system that we implement now contains more than 1000 units of courseware, and has become the most accessed LCMS system in Taiwan’s special education community.

APA, Harvard, Vancouver, ISO, and other styles

33

Biswas, Amitava. "Semantic Routed Network for Distributed Search Engines." Thesis, 2010. http://hdl.handle.net/1969.1/ETD-TAMU-2010-05-7942.

Full text

Abstract:

Searching for textual information has become an important activity on the web. To satisfy the rising demand and user expectations, search systems should be fast, scalable and deliver relevant results. To decide which objects should be retrieved, search systems should compare holistic meanings of queries and text document objects, as perceived by humans. Existing techniques do not enable correct comparison of composite holistic meanings like: "evidences on role of DR2 gene in development of diabetes in Caucasian population", which is composed of multiple elementary meanings: "evidence", "DR2 gene", etc. Thus these techniques can not discern objects that have a common set of keywords but convey different meanings. Hence we need new methods to compare composite meanings for superior search quality. In distributed search engines, for scalability, speed and efficiency, index entries should be systematically distributed across multiple index-server nodes based on the meaning of the objects. Furthermore, queries should be selectively sent to those index nodes which have relevant entries. This requires an overlay Semantic Routed Network which will route messages, based on meaning. This network will consist of fast response networking appliances called semantic routers. These appliances need to: (a) carry out sophisticated meaning comparison computations at high speed; and (b) have the right kind of behavior to automatically organize an optimal index system. This dissertation presents the following artifacts that enable the above requirements: (1) An algebraic theory, a design of a data structure and related techniques to efficiently compare composite meanings. (2) Algorithms and accelerator architectures for high speed meaning comparisons inside semantic routers and index-server nodes. (3) An overlay network to deliver search queries to the index nodes based on meanings. (4) Algorithms to construct a self-organizing, distributed meaning based index system. The proposed techniques can compare composite meanings ~105 times faster than an equivalent software code and existing hardware designs. Whereas, the proposed index organization approach can lead to 33% savings in number of servers and power consumption in a model search engine having 700,000 servers. Therefore, using all these techniques, it is possible to design a Semantic Routed Network which has a potential to improve search results and response time, while saving resources.

APA, Harvard, Vancouver, ISO, and other styles

34

Hsu, Cheng-Jui, and 許承睿. "Assisting Intention Oriented Web Search Based on Semantic Web." Thesis, 2008. http://ndltd.ncl.edu.tw/handle/16036152077072884594.

Full text

Abstract:

碩士 淡江大學 資訊管理學系碩士班 96 The inaccuracy of the internet search based solely on key words is usually resulted from the enormous scope of the search. In order to solve the problem, this research provides an intention-oriented system to assist the search engine. Thus, we establish the possible intention to the web pages through Ontology which includes the search-engine-users’ intensions that will be collected cooperatively. The users could get the intention-oriented search results by adding its search intension when working on the search surface. In existing searching systems, the function of specifying the search conditions is actually one way of showing the intension of users. However, it’s not easy for computer to process the search conditions which might be oversimplified or too complicated. In summary, we expect to assist users to express their searching intention more accurately and to get the required information by providing an intention-oriented web search system.

APA, Harvard, Vancouver, ISO, and other styles

35

Vassallo, Salvatore. "Frammenti semantici. Riflessioni su descrizioni archivistiche e web semantico: Il caso dell’archivio Giovanni Testori." Thesis, 2010. http://eprints.rclis.org/17365/1/tesisolofronte.pdf.

Full text

Abstract:

This doctoral thesis is about the possibility to put archival data (for example archival descriptions, authority records and so on) directly into semantic web, particularly using a technology called “Topic Maps”. Topic Maps are an ISO standard quite similar to RDF. To show that it's possible to express archival data directly into semantic web, I translated all archival standards (such ISAD, ISAAR, ISDIAH, ISDF) into Topic Maps Constraint Language schemas. TMCL is a standard of the Topic Maps family. What it's important is that you can declare constraints and inference rules to ensure that a topic map holding archival data is compliant to archival standards. So, once shown that is possible expressing archival data into semantic web, which is the main advantage? I think that we can find three advantages in this approach: you can build flexible and extensible information systems you can create import and export as linked data you can merge data of different areas (for example you can merge archival authority records and librarian authority records) For that aim I also created crosswalks between different standards. For example I mapped FRAD with ISAAR. For import and export I created some XSL-T stylesheets to convert EAC and EAD to Topic Maps and I released them as opensource on google code. Lastly I proposed a strong idea to develop flexible information systems (for example a software to manage a digital library or an archival information system such the archival system for national archives). In those systems data wouldn't be stored in a database, but directly in RDF triples or, why not, in a topic map. This allows to extend or change the system without a database change (so you can add or change a descriptive field without touch the database).

APA, Harvard, Vancouver, ISO, and other styles

36

Nien, Tsung-Kai, and 粘琮凱. "A Multilayer-Based Fully Skyline Computation Algorithm for Web Search Engines." Thesis, 2013. http://ndltd.ncl.edu.tw/handle/55564410041444620277.

Full text

Abstract:

碩士 輔仁大學 資訊管理學系 101 There are no structures to measure the growing of web page data, so we need a web page search engine to retrieve and sieve out information we really want. However, current web page search engine algorithms are with limited subjective weighted factors respectively, and the keywords users entered might not specific or precise enough to express their preference and needs, which result to diverge search results and unwanted information. Skyline is the only multi-dimension measuring method now, so our research implements the idea of Multilayer Complete Skyline to develop a new web page search algorithm. We explain the feasibilities and meanings of Multilayer Complete Skyline by actual data sets, and the results show that these operations do not include weighted functions, the objectivities of the computation are assured, and with the characteristics of Multilayer Complete Skyline algorithm which our research proposed, this algorithm designs and guides users to select web items layer by layer together to adapt the mechanism of the fussy and drift preference search patterns of users, and helps them to find their preferences and needs gradually.

APA, Harvard, Vancouver, ISO, and other styles

37

(14030507), Deepani B. Guruge. "Effective document clustering system for search engines." Thesis, 2008. https://figshare.com/articles/thesis/Effective_document_clustering_system_for_search_engines/21433218.

Full text

Abstract:

People use web search engines to fill a wide variety of navigational, informational and transactional needs. However, current major search engines on the web retrieve a large number of documents of which only a small fraction are relevant to the user query. The user then has to manually search for relevant documents by traversing a topic hierarchy, into which a collection is categorised. As more information becomes available, it becomes a time consuming task to search for required relevant information. This research develops an effective tool, the web document clustering (WDC) system, to cluster, and then rank, the output data obtained from queries submitted to a search engine, into three pre-defined fuzzy clusters. Namely closely related, related and not related. Documents in closely related and related documents are ranked based on their context. The WDC output has been compared against document clustering results from the Google, Vivisimo and Dogpile systems as these where considered the best at the fourth Search Engine Awards [24]. Test data was from standard document sets, such as the TREC-8 [118] data files and the Iris database [38], or 3 from test text retrieval tasks, "Latex", "Genetic Algorithms" and "Evolutionary Algorithms". Our proposed system had as good as, or better results, than that obtained by these other systems. We have shown that the proposed system can effectively and efficiently locate closely related, related and not related, documents among the retrieved document set for queries submitted to a search engine. We developed a methodology to supply the user with a list of keywords filtered from the initial search result set to further refine the search. Again we tested our clustering results against the Google, Vivisimo and Dogpile systems. In all cases we have found that our WDC performs as well as, or better than these systems. The contributions of this research are: <ol> <li>A post-retrieval fuzzy document clustering algorithm that groups documents into closely related, related and not related clusters. This algorithm uses modified fuzzy c-means (FCM) algorithm to cluter documents into predefined intelligent fuzzy clusters and this approach has not been used before.</li> <li>The fuzzy WDC system satisfies the user's information need as far as possible by allowing the user to reformulate the initial query. The system prepares an initial word list by selecting a few characteristics terms of high frequency from the first twenty documents in the initial search engine output. The user is then able to use these terms to input a secondary query. The WDC system then creates a second word list, or the context of the user query (COQ), from the closely related documents to provide training data to refine the search. Documents containing words with high frequency from the training list, based on a pre-defined threshold value, are then presented to the user to refine the search by reformulating the query. In this way the context of the user query is built, enabling the user to learn from the keyword list. This approach is not available in current search engine technology.</li> <li>A number of modifications were made to the FCM algorithm to improve its performance in web document clustering. A factor swkq is introduced into the membership function as a measure of the amount of overlaping between the components of the feature vector and the cluster prototype. As the FCM algorithm is greatly affected by the values used to initialise the components of cluster prototypes a machine learning approach, using an Evolutionary Algorithm, was used to resolve the initialisation problem.</li> <li>Experimental results indicate that the WDC system outperformed Google, Dogpile and the Vivisimo search engines. The post-retrieval fuzzy web document clustering algorithm designed in this research improves the precision of web searches and it also contributes to the knowledge of document retrieval using fuzzy logic.</li> <li>A relational data model was used to automatically store data output from the search engine off-line. This takes the processing of data of the Internet off-line, saving resources and making better use of the local CPU.</li> <li>This algorithm uses Latent Semantic Indexing (LSI) to rank documents in the closely related and related clusters. Using LSI to rank document is wellknown, however, we are the first to apply it in the context of ranking closely related documents by using COQ to form the term x document matrix in LSI, to obtain better ranking results.</li> <li>Adjustments based on document size are proposed for dealing with problems associated with varying document size in the retrieved documents and the effect this has on cluster analysis.</li> </ol>

APA, Harvard, Vancouver, ISO, and other styles

38

Liew, Ji Seok. "Web-based distributed applications for cytosensor." Thesis, 2003. http://hdl.handle.net/1957/31587.

Full text

Abstract:

To protect the environment and save human lives, the detection of various hazardous toxins of biological or chemical origin has been a major challenge to the researchers at Oregon State University. Living fish cells can indicate the presence of a wide range of toxins by reactions such as changing color and shape changes. A research team in Electrical and Computer Engineering Department is developing a hybrid detection device (Cytosensor) that combines biological reaction and digital technology. The functions of Cytosensor can be divided into three parts, which are real-time image acquisition, data processing and statistical data analysis. User-friendly Web-Based Distributed Applications (WBDA) for Cytosensor offer various utilities. WBDA allow the users to control and observe the local Cytosensor, search and retrieve data acquired by the sensor network, and process the acquired images remotely using only a web browser. Additionally, these applications minimize the user's exposure to dangerous chemicals or biological products. This thesis describes the design of a remote controller, system observer, remote processor, and search engine using JAVA applets, XML, Perl, MATLAB, and Peer-to-Peer models. Furthermore, the implementations of image segmentation technique in MATLAB and the Machine Vision Algorithm in JAVA for independent web-based processing are investigated. Graduation date: 2003

APA, Harvard, Vancouver, ISO, and other styles

39

"M&A2: a complete associative word network based Chinese document search engine." 2001. http://library.cuhk.edu.hk/record=b5890824.

Full text

APA, Harvard, Vancouver, ISO, and other styles

40

Πλέγας, Ιωάννης. "Αλγόριθμοι και τεχνικές εξατομικευμένης αναζήτησης σε διαδικτυακά περιβάλλοντα με χρήση υποκείμενων σημασιολογιών". Thesis, 2013. http://hdl.handle.net/10889/6465.

Full text

Abstract:

Η τεράστια ανάπτυξη του Παγκόσμιου Ιστού τις τελευταίες δεκαετίες έχει αναδείξει την αναζήτηση πληροφοριών ως ένα από τα πιο σημαντικά ζητήματα στον τομέα της έρευνας στις Τεχνολογίες της Πληροφορικής. Σήμερα, οι σύγχρονες μηχανές αναζήτησης απαντούν αρκετά ικανοποιητικά στα ερωτήματα των χρηστών, αλλά τα κορυφαία αποτελέσματα που επιστρέφονται δεν είναι πάντα σχετικά με τα δεδομένα που αναζητά ο χρήστης. Ως εκ τούτου, οι μηχανές αναζήτησης καταβάλλουν σημαντικές προσπάθειες για να κατατάξουν τα πιο σχετικά αποτελέσματα του ερωτήματος ως προς τον χρήστη στα κορυφαία αποτελέσματα της λίστας κατάταξης των αποτελεσμάτων. Η διατριβή αυτή ασχολείται κυρίως με το παραπάνω πρόβλημα, δηλαδή την κατάταξη στις υψηλότερες θέσεις των πιο σχετικών αποτελεσμάτων ως προς τον χρήστη (ειδικά για ερωτήματα που οι όροι τους έχουν πολλαπλές σημασίες). Στο πλαίσιο της παρούσας έρευνας κατασκευάστηκαν αλγόριθμοι και τεχνικές που βασίζονται στην τεχνική της σχετικής ανατροφοδότησης (relevance feedback) για την βελτίωση των αποτελεσμάτων που επιστρέφονται από μια μηχανή αναζήτησης. Βασική πηγή της ανατροφοδότησης ήταν τα αποτελέσματα που επιλέγουν οι χρήστες κατά την διαδικασία πλοήγησης. Ο χρήστης επεκτείνει την αρχική πληροφορία αναζήτησης (λέξεις κλειδιά) με νέα πληροφορία που προέρχεται από τα αποτελέσματα που διαλέγει. Έχοντας ένα νέο σύνολο πληροφορίας που αφορά τις προτιμήσεις του χρήστη, συγκρίνεται η σημασιολογική πληροφορία του συνόλου αυτού με τα υπόλοιπα αποτελέσματα (αυτά που επιστράφηκαν πριν επιλέξει το συγκεκριμένο αποτέλεσμα) και μεταβάλλεται η σειρά των αποτελεσμάτων προωθώντας και προτείνοντας τα αποτελέσματα που είναι πιο σχετικά με το νέο σύνολο πληροφορίας. Ένα άλλο πρόβλημα που πρέπει να αντιμετωπιστεί κατά την υποβολή ερωτημάτων από τους χρήστες σε μια μηχανή αναζήτησης είναι ότι τα ερωτήματα που υποβάλλονται στις μηχανές αναζήτησης είναι συνήθως μικρά σε αριθμό λέξεων και αμφίσημα. Συνεπώς, πρέπει να υπάρχουν τρόποι αποσαφήνισης των διαφορετικών εννοιών των όρων αναζήτησης και εύρεσης της έννοιας που ενδιαφέρει τον χρήστη. Η αποσαφήνιση των όρων αναζήτησης είναι μια διαδικασία που έχει μελετηθεί στην βιβλιογραφία με αρκετούς διαφορετικούς τρόπους. Στην διατριβή μου προτείνω νέες στρατηγικές αποσαφήνισης των εννοιών των όρων αναζήτησης των μηχανών αναζήτησης και εξερευνάται η αποδοτικότητά τους στις μηχανές αναζήτησης. Η καινοτομία τους έγκειται στη χρήση του Page-Rank σαν ενδείκτη της σημαντικότητας μιας έννοιας για έναν όρο του ερωτήματος. Επίσης είναι ευρέως γνωστό ότι ο Παγκόσμιος Ιστός περιέχει έγγραφα με την ίδια πληροφορία και έγγραφα με σχεδόν ίδια πληροφορία. Παρά τις προσπάθειες των μηχανών αναζήτησης με αλγόριθμους εύρεσης των κειμένων που περιέχουν επικαλυπτόμενη πληροφορία, ακόμα υπάρχουν περιπτώσεις που τα κείμενα που ανακτώνται από μια μηχανή αναζήτησης περιέχουν επαναλαμβανόμενη πληροφορία. Στην διατριβή αυτή παρουσιάζονται αποδοτικές τεχνικές εύρεσης και περικοπής της επικαλυπτόμενης πληροφορίας από τα αποτελέσματα των μηχανών αναζήτησης χρησιμοποιώντας τις σημασιολογικές πληροφορίες των αποτελεσμάτων των μηχανών αναζήτησης. Συγκεκριμένα αναγνωρίζονται τα αποτελέσματα που περιέχουν την ίδια πληροφορία και απομακρύνονται, ενώ ταυτόχρονα τα αποτελέσματα που περιέχουν επικαλυπτόμενη πληροφορία συγχωνεύονται σε νέα κείμενα(SuperTexts) που περιέχουν την πληροφορία των αρχικών αποτελεσμάτων χωρίς να υπάρχει επαναλαμβανόμενη πληροφορία. Ένας άλλος τρόπος βελτίωσης της αναζήτησης είναι ο σχολιασμός των κειμένων αναζήτησης έτσι ώστε να περιγράφεται καλύτερα η πληροφορία τους. Ο σχολιασμός κειμένων(text annotation) είναι μια τεχνική η οποία αντιστοιχίζει στις λέξεις του κειμένου επιπλέον πληροφορίες όπως η έννοια που αντιστοιχίζεται σε κάθε λέξη με βάση το εννοιολογικό περιεχόμενο του κειμένου. Η προσθήκη επιπλέον σημασιολογικών πληροφοριών σε ένα κείμενο βοηθάει τις μηχανές αναζήτησης να αναζητήσουν καλύτερα τις πληροφορίες που ενδιαφέρουν τους χρήστες και τους χρήστες να βρουν πιο εύκολα τις πληροφορίες που αναζητούν. Στην διατριβή αυτή αναλύονται αποδοτικές τεχνικές αυτόματου σχολιασμού κειμένων από τις οντότητες που περιέχονται στην Wikipedia, μια διαδικασία που αναφέρεται στην βιβλιογραφία ως Wikification. Με τον τρόπο αυτό οι χρήστες μπορούν να εξερευνήσουν επιπλέον πληροφορίες για τις οντότητες που περιέχονται στο κείμενο που τους επιστρέφεται. Ένα άλλο τμήμα της διατριβής αυτής προσπαθεί να εκμεταλλευτεί την σημασιολογία των αποτελεσμάτων των μηχανών αναζήτησης χρησιμοποιώντας εργαλεία του Σημασιολογικού Ιστού. Ο στόχος του Σημασιολογικού Ιστού (Semantic Web) είναι να κάνει τους πόρους του Ιστού κατανοητούς και στους ανθρώπους και στις μηχανές. Ο Σημασιολογικός Ιστός στα πρώτα βήματά του λειτουργούσε σαν μια αναλυτική περιγραφή του σώματος των έγγραφων του Ιστού. Η ανάπτυξη εργαλείων για την αναζήτηση σε Σημασιολογικό Ιστό είναι ακόμα σε πρώιμο στάδιο. Οι σημερινές τεχνικές αναζήτησης δεν έχουν προσαρμοστεί στην δεικτοδότηση και στην ανάκτηση σημασιολογικής πληροφορίας εκτός από μερικές εξαιρέσεις. Στην έρευνά μας έχουν δημιουργηθεί αποδοτικές τεχνικές και εργαλεία χρήσης του Παγκόσμιου Ιστού. Συγκεκριμένα έχει κατασκευαστεί αλγόριθμος μετατροπής ενός κειμένου σε οντολογία ενσωματώνοντας την σημασιολογική και συντακτική του πληροφορία έτσι ώστε να επιστρέφονται στους χρήστες απαντήσεις σε ερωτήσεις φυσικής γλώσσας. Επίσης στην διατριβή αυτή αναλύονται τεχνικές φιλτραρίσματος XML εγγράφων χρησιμοποιώντας σημασιολογικές πληροφορίες. Συγκεκριμένα παρουσιάζεται ένα αποδοτικό κατανεμημένο σύστημα σημασιολογικού φιλτραρίσματος XML εγγράφων που δίνει καλύτερα αποτελέσματα από τις υπάρχουσες προσεγγίσεις. Τέλος, στα πλαίσια αυτής της διδακτορικής διατριβής γίνεται επιπλέον έρευνα για την βελτίωση της απόδοσης των μηχανών αναζήτησης από μια διαφορετική οπτική γωνία. Στην κατεύθυνση αυτή παρουσιάζονται τεχνικές περικοπής ανεστραμμένων λιστών ανεστραμμένων αρχείων. Επίσης επιτυγχάνεται ένας συνδυασμός των προτεινόμενων τεχνικών με υπάρχουσες τεχνικές συμπίεσης ανεστραμμένων αρχείων πράγμα που οδηγεί σε καλύτερα αποτελέσματα συμπίεσης από τα ήδη υπάρχοντα. The tremendous growth of the Web in the recent decades has made the searching for information as one of the most important issues in research in Computer Technologies. Today, modern search engines respond quite well to the user queries, but the results are not always relative to the data the user is looking for. Therefore, search engines are making significant efforts to rank the most relevant query results to the user in the top results of the ranking list. This work mainly deals with this problem, the ranking of the relevant results to the user in the top of the ranking list even when the queries contain multiple meanings. In the context of this research, algorithms and techniques were constructed based on the technique of relevance feedback which improves the results returned by a search engine. Main source of feedback are the results which the users selects during the navigation process. The user extends the original information (search keywords) with new information derived from the results that chooses. Having a new set of information concerning to the user's preferences, the relevancy of this information is compared with the other results (those returned before choosing this effect) and change the order of the results by promoting and suggesting the results that are more relevant to the new set of information. Another problem that must be addressed when the users submit queries to the search engines is that the queries are usually small in number of words and ambiguous. Therefore, there must be ways to disambiguate the different concepts/senses and ways to find the concept/sense that interests the user. Disambiguation of the search terms is a process that has been studied in the literature in several different ways. This work proposes new strategies to disambiguate the senses/concepts of the search terms and explore their efficiency in search engines. Their innovation is the use of PageRank as an indicator of the importance of a sense/concept for a query term. Another technique that exploits semantics in our work is the use of text annotation. The use of text annotation is a technique that assigns to the words of the text extra information such as the meaning assigned to each word based on the semantic content of the text. Assigning additional semantic information in a text helps users and search engines to seek or describe better the text information. In my thesis, techniques for improving the automatic annotation of small texts with entities from Wikipedia are presented, a process that referred in the literature as Wikification. It is widely known that the Web contain documents with the same information and documents with almost identical information. Despite the efforts of the search engine’s algorithms to find the results that contain repeated information; there are still cases where the results retrieved by a search engine contain repeated information. In this work effective techniques are presented that find and cut the repeated information from the results of the search engines. Specifically, the results that contain the same information are removed, and the results that contain repeated information are merged into new texts (SuperTexts) that contain the information of the initial results without the repeated information. Another part of this work tries to exploit the semantic information of search engine’s results using tools of the Semantic Web. The goal of the Semantic Web is to make the resources of the Web understandable to humans and machines. The Semantic Web in their first steps functioned as a detailed description of the body of the Web documents. The development of tools for querying Semantic Web is still in its infancy. The current search techniques are not adapted to the indexing and retrieval of semantic information with a few exceptions. In our research we have created efficient techniques and tools for using the Semantic Web. Specifically an algorithm was constructed that converts to ontology the search engine’s results integrating semantic and syntactic information in order to answer natural language questions. Also this paper contains XML filtering techniques that use semantic information. Specifically, an efficient distributed system is proposed for the semantic filtering of XML documents that gives better results than the existing approaches. Finally as part of this thesis is additional research that improves the performance of the search engines from a different angle. It is presented a technique for cutting the inverted lists of the inverted files. Specifically a combination of the proposed technique with existing compression techniques is achieved, leading to better compression results than the existing ones.

APA, Harvard, Vancouver, ISO, and other styles

41

Trzmielewski, Marcin. "Valorisation de l’information et des services du learning center à travers un portail de ressource." Thesis, 2018. http://eprints.rclis.org/33538/1/Marcin%20Trzmielewski_M%C3%A9moire%20professionnel_Master%202%20GIMD_2017-2018.pdf.

Full text

Abstract:

This report focuses on the relevance of a documentary portal to promote documentary and educational resources as well as learning center innovating services. Through the present work, we assess different digital tools available on the market (OPACs, discovery tools, protocols for bibliographic metadata exchange, Content Managment Systems) and dealing with specialized information and with new dimensions exploited by augmented libraries. We also discuss about future challenges for specialists in information-documentation, supported by the migration of online catalogues into the cloud and general search engines.

APA, Harvard, Vancouver, ISO, and other styles

42

Mooman, Abdelniser. "Multi-Agent User-Centric Specialization and Collaboration for Information Retrieval." Thesis, 2012. http://hdl.handle.net/10012/6991.

Full text

Abstract:

The amount of information on the World Wide Web (WWW) is rapidly growing in pace and topic diversity. This has made it increasingly difficult, and often frustrating, for information seekers to retrieve the content they are looking for as information retrieval systems (e.g., search engines) are unable to decipher the relevance of the retrieved information as it pertains to the information they are searching for. This issue can be decomposed into two aspects: 1) variability of information relevance as it pertains to an information seeker. In other words, different information seekers may enter the same search text, or keywords, but expect completely different results. It is therefore, imperative that information retrieval systems possess an ability to incorporate a model of the information seeker in order to estimate the relevance and context of use of information before presenting results. Of course, in this context, by a model we mean the capture of trends in the information seeker's search behaviour. This is what many researchers refer to as the personalized search. 2) Information diversity. Information available on the World Wide Web today spans multitudes of inherently overlapping topics, and it is difficult for any information retrieval system to decide effectively on the relevance of the information retrieved in response to an information seeker's query. For example, the information seeker who wishes to use WWW to learn about a cure for a certain illness would receive a more relevant answer if the search engine was optimized into such domains of topics. This is what is being referred to in the WWW nomenclature as a 'specialized search'. This thesis maintains that the information seeker's search is not intended to be completely random and therefore tends to portray itself as consistent patterns of behaviour. Nonetheless, this behaviour, despite being consistent, can be quite complex to capture. To accomplish this goal the thesis proposes a Multi-Agent Personalized Information Retrieval with Specialization Ontology (MAPIRSO). MAPIRSO offers a complete learning framework that is able to model the end user's search behaviour and interests and to organize information into categorized domains so as to ensure maximum relevance of its responses as they pertain to the end user queries. Specialization and personalization are accomplished using a group of collaborative agents. Each agent employs a Reinforcement Learning (RL) strategy to capture end user's behaviour and interests. Reinforcement learning allows the agents to evolve their knowledge of the end user behaviour and interests as they function to serve him or her. Furthermore, REL allows each agent to adapt to changes in an end user's behaviour and interests. Specialization is the process by which new information domains are created based on existing information topics, allowing new kinds of content to be built exclusively for information seekers. One of the key characteristics of specialization domains is the seeker centric - which allows intelligent agents to create new information based on the information seekers' feedback and their behaviours. Specialized domains are created by intelligent agents that collect information from a specific domain topic. The task of these specialized agents is to map the user's query to a repository of specific domains in order to present users with relevant information. As a result, mapping users' queries to only relevant information is one of the fundamental challenges in Artificial Intelligent (AI) and machine learning research. Our approach employs intelligent cooperative agents that specialize in building personalized ontology information domains that pertain to each information seeker's specific needs. Specializing and categorizing information into unique domains is one of the challenge areas that have been addressed and various proposed solutions were evaluated and adopted to address growing information. However, categorizing information into unique domains does not satisfy each individualized information seeker. Information seekers might search for similar topics, but each would have different interests. For example, medical information of a specific medical domain has different importance to both the doctor and patients. The thesis presents a novel solution that will resolve the growing and diverse information by building seeker centric specialized information domains that are personalized through the information seekers' feedback and behaviours. To address this challenge, the research examines the fundamental components that constitute the specialized agent: an intelligent machine learning system, user input queries, an intelligent agent, and information resources constructed through specialized domains. Experimental work is reported to demonstrate the efficiency of the proposed solution in addressing the overlapping information growth. The experimental work utilizes extensive user-centric specialized domain topics. This work employs personalized and collaborative multi learning agents and ontology techniques thereby enriching the queries and domains of the user. Therefore, experiments and results have shown that building specialized ontology domains, pertinent to the information seekers' needs, are more precise and efficient compared to other information retrieval applications and existing search engines.

APA, Harvard, Vancouver, ISO, and other styles

43

Woodcock-Reynolds, Hilary Julian. "The use of browser based resources for literature searches in the postgraduate cohort of the Faculty of Humanities, Development and Social Sciences (HDSS) at the Howard College Campus of the University of KwaZulu-Natal." Thesis, 2011. http://hdl.handle.net/10413/7784.

Full text

Abstract:

The research reflected here examined in depth how one cohort of learners viewed and engaged in literature searches using web browser based resources. Action research was employed using a mixed methods approach. The research started with a survey followed by interviews and a screencast examining practice based on a series of search related exercises. These were analysed and used as data to establish what deficits in using the web to search for literature existed in the target group. Based on the analysis of these instruments, the problem was redefined and a workshop intended to help remediate deficiencies uncovered was run. Based on this a recommendation is made that a credit bearing course teaching digital research literacy be made available which would include information literacy as a component. Thesis (M.A.)-University of KwaZulu-Natal, Durban, 2011.

APA, Harvard, Vancouver, ISO, and other styles

Dissertations / Theses on the topic 'Semantic web based search engines'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles