To see the other types of publications on this topic, follow the link: Latent semantic analysis (LSA).

Dissertations / Theses on the topic 'Latent semantic analysis (LSA)'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 dissertations / theses for your research on the topic 'Latent semantic analysis (LSA).'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.

1

Makovoz, Gennadiy. "Latent Semantic Analysis as a Method of Content-Based Image Retrieval in Medical Applications." NSUWorks, 2010. http://nsuworks.nova.edu/gscis_etd/227.

Full text
Abstract:
The research investigated whether a Latent Semantic Analysis (LSA)-based approach to image retrieval can map pixel intensity into a smaller concept space with good accuracy and reasonable computational cost. From a large set of computed tomography (CT) images, a retrieval query found all images for a particular patient based on semantic similarity. The effectiveness of the LSA retrieval was evaluated based on precision, recall, and F-score. This work extended the application of LSA to high-resolution CT radiology images. The images were chosen for their unique characteristics and their importance in medicine. Because CT images are intensity-only, they carry less information than color images. They typically have greater noise, higher intensity, greater contrast, and fewer colors than a raw RGB image. The study targeted level of intensity for image features extraction. The focus of this work was a formal evaluation of the LSA method in the context of large number of high-resolution radiology images. The study reported on preprocessing and retrieval time and discussed how reduction of the feature set size affected the results. LSA is an information retrieval technique that is based on the vector-space model. It works by reducing the dimensionality of the vector space, bringing similar terms and documents closer together. Matlab software was used to report on retrieval and preprocessing time. In determining the minimum size of concept space, it was found that the best combination of precision, recall, and F-score was achieved with 250 concepts (k = 250). This research reported precision of 100% on 100% of the queries and recall close to 90% on 100% of the queries with k=250. Selecting a higher number of concepts did not improve recall and resulted in significantly increased computational cost.
APA, Harvard, Vancouver, ISO, and other styles
2

Natividad, Beltrán del Río Gloria Ofelia. "An Analysis of Educational Technology Publications: Who, What and Where in the Last 20 Years." Thesis, University of North Texas, 2016. https://digital.library.unt.edu/ark:/67531/metadc849761/.

Full text
Abstract:
This exploratory and descriptive study examines research articles published in ten of the top journals in the broad area of educational technology during the last 20 years: 1) Educational Technology Research and Development (ETR&D); 2) Instructional Science; 3) Journal of the Learning Sciences; 4) TechTrends; 5) Educational Technology: The Magazine for Managers of Change in Education; 6) Journal of Educational Technology & Society; 7) Computers and Education; 8) British Journal of Educational Technology (BJET); 9) Journal of Educational Computing Research; and 10) Journal of Research on Technology in Education. To discover research trends in the articles published from 1995-2014, abstracts from all contributing articles published in those ten prominent journals were analyzed to extract a latent semantic space of broad research areas, top authors, and top-cited publications. Concepts that have emerged, grown, or diminished in the field were noted in order to identify the most dominant in the last two decades; and the most frequent contributors to each journal as well as those who contributed to more than one of the journals studied were identified.
APA, Harvard, Vancouver, ISO, and other styles
3

Hossain, Muhammad Muazzem. "Investigating the relationship between the business performance management framework and the Malcolm Baldrige National Quality Award framework." Thesis, University of North Texas, 2009. https://digital.library.unt.edu/ark:/67531/metadc11034/.

Full text
Abstract:
The business performance management (BPM) framework helps an organization continuously adjust and successfully execute its strategies. BPM helps increase flexibility by providing managers with an early alert about changes and, as a result, allows faster response to such changes. The Malcolm Baldrige National Quality Award (MBNQA) framework provides a basis for self-assessment and a systems perspective for managing an organization's key processes for achieving business results. The MBNQA framework is a more comprehensive framework and encapsulates the underlying constructs in the BPM framework. The objectives of this dissertation are fourfold: (1) to validate the underlying relationships presented in the 2008 MBNQA framework, (2) to explore the MBNQA framework at the dimension level, and develop and test constructs measured at that level in a causal model, (3) to validate and create a common general framework for the business performance model by integrating the practitioner literature with basic theory including existing MBNQA theory, and (4) to integrate the BPM framework and the MBNQA framework into a new framework (BPM-MBNQA framework) that can guide organizations in their journey toward achieving and sustaining competitive and strategic advantages. The purpose of this study is to achieve these objectives by means of a combination of methodologies including literature reviews, expert opinions, interviews, presentation feedbacks, content analysis, and latent semantic analysis. An initial BPM framework was developed based on the reviews of literature and expert opinions. There is a paucity of academic research on business performance management. Therefore, this study reviewed the practitioner literature on BPM and from the numerous organization-specific BPM models developed a generic, conceptual BPM framework. With the intent of obtaining valuable feedback, this initial BPM framework was presented to Baldrige Award recipients (BARs) and selected academicians from across the United States who participated in the Fall Summit 2007 held at Caterpillar Financial Headquarter in Nashville, TN on October 1 and 2, 2007. Incorporating the feedback from that group allowed refining and improving the proposed BPM framework. This study developed a variant of the traditional latent semantic analysis (LSA) called causal latent semantic analysis (cLSA) that enables us to test causal models using textual data. This method was used to validate the 2008 MBNQA framework based on article abstracts on the Baldrige Award and program published in both practitioner and academic journals from 1987 to 2009. The cLSA was also used to validate the BPM framework using the full body text data from all articles published in the practitioner journal entitled the Business Performance Management Magazine since its inception in 2003. The results provide the first cLSA study of these frameworks. This is also the first study to examine all the causal relationships within the MBNQA and BPM frameworks.
APA, Harvard, Vancouver, ISO, and other styles
4

SANTOS, João Carlos Alves dos. "Avaliação automática de questões discursivas usando LSA." Universidade Federal do Pará, 2016. http://repositorio.ufpa.br/jspui/handle/2011/7485.

Full text
Abstract:
Submitted by camilla martins (camillasmmartins@gmail.com) on 2017-01-27T15:50:37Z No. of bitstreams: 2 license_rdf: 0 bytes, checksum: d41d8cd98f00b204e9800998ecf8427e (MD5) Tese_AvaliacaoAutomaticaQuestoes.pdf: 5106074 bytes, checksum: c401d50ce5e666c52948ece7af20b2c3 (MD5)<br>Approved for entry into archive by Edisangela Bastos (edisangela@ufpa.br) on 2017-01-30T13:02:31Z (GMT) No. of bitstreams: 2 license_rdf: 0 bytes, checksum: d41d8cd98f00b204e9800998ecf8427e (MD5) Tese_AvaliacaoAutomaticaQuestoes.pdf: 5106074 bytes, checksum: c401d50ce5e666c52948ece7af20b2c3 (MD5)<br>Made available in DSpace on 2017-01-30T13:02:31Z (GMT). No. of bitstreams: 2 license_rdf: 0 bytes, checksum: d41d8cd98f00b204e9800998ecf8427e (MD5) Tese_AvaliacaoAutomaticaQuestoes.pdf: 5106074 bytes, checksum: c401d50ce5e666c52948ece7af20b2c3 (MD5) Previous issue date: 2016-02-05<br>Este trabalho investiga o uso de um modelo usando Latent Semantic Analysis (LSA) na avaliação automática de respostas curtas, com média de 25 a 70 palavras, de questões discursivas. Com o surgimento de ambientes virtuais de aprendizagem, pesquisas sobre correção automática tornaram-se mais relevantes, pois permitem a correção mecânica com baixo custo para questões abertas. Além disso, a correção automática permite um feedback instantâneo e elimina o trabalho de correção manual. Isto possibilita criar turmas virtuais com grande quantidade de alunos (centenas ou milhares). Pesquisas sobre avaliação automática de textos estão sendo desenvolvidas desde a década de 60, mas somente na década atual estão alcançando a acurácia necessária para uso prático em instituições de ensino. Para que os usuários finais tenham confiança, o desafio de pesquisa é desenvolver sistemas de avaliação robustos e com acurácia próxima de avaliadores humanos. Apesar de alguns estudos apontarem nesta direção, existem ainda muitos pontos a serem explorados nas pesquisas. Um ponto é a utilização de bigramas com LSA, mesmo que não contribua muito com a acurácia, contribui com a robustez, que podemos definir como confiabilidade2, pois considera a ordem das palavras dentro do texto. Buscando aperfeiçoar um modelo LSA na direção de melhorar a acurácia e aumentar a robustez trabalhamos em quatro direções: primeira, incluímos bigramas de palavras no modelo LSA; segunda, combinamos modelos de co-ocorrência de unigrama e bigramas com uso de regressão linear múltipla; terceira, acrescentamos uma etapa de ajustes sobre a pontuação do modelo LSA baseados no número de palavras das respostas avaliadas; quarta, realizamos uma análise da distribuição das pontuações atribuídas pelo modelo LSA contra avaliadores humanos. Para avaliar os resultados comparamos a acurácia do sistema contra a acurácia de avaliadores humanos verificando o quanto o sistema se aproxima de um avaliador humano. Utilizamos um modelo LSA com cinco etapas: 1) pré- processamento, 2) ponderação, 3) decomposição a valores singulares, 4) classificação e 5) ajustes do modelo. Para cada etapa explorou-se estratégias alternativas que influenciaram na acurácia final. Nos experimentos obtivemos uma acurácia de 84,94% numa avaliação comparativa contra especialistas humanos, onde a correlação da acurácia entre especialistas humanos foi de 84,93%. No domínio estudado, a tecnologia de avaliação automática teve resultados próximos aos dos avaliadores humanos mostrando que esta alcançando um grau de maturidade para ser utilizada em sistemas de avaliação automática em ambientes virtuais de aprendizagem.<br>This work investigates the use of a model using Latent Semantic Analysis (LSA) In the automatic evaluation of short answers, with an average of 25 to 70 words, of questions Discursive With the emergence of virtual learning environments, research on Automatic correction have become more relevant as they allow the mechanical correction With low cost for open questions. In addition, automatic Feedback and eliminates manual correction work. This allows you to create classes With large numbers of students (hundreds or thousands). Evaluation research Texts have been developed since the 1960s, but only in the The current decade are achieving the necessary accuracy for practical use in teaching. For end users to have confidence, the research challenge is to develop Evaluation systems that are robust and close to human evaluators. despite Some studies point in this direction, there are still many points to be explored In the surveys. One point is the use of bigrasms with LSA, even if it does not contribute Very much with the accuracy, contributes with the robustness, that we can define as reliability2, Because it considers the order of words within the text. Seeking to perfect an LSA model In the direction of improving accuracy and increasing robustness we work in four directions: First, we include word bigrasms in the LSA model; Second, we combine models Co-occurrence of unigram and bigrams using multiple linear regression; third, We added a stage of adjustments on the LSA model score based on the Number of words of the responses evaluated; Fourth, we performed an analysis of the Of the scores attributed by the LSA model against human evaluators. To evaluate the We compared the accuracy of the system against the accuracy of human evaluators Verifying how close the system is to a human evaluator. We use a LSA model with five steps: 1) pre-processing, 2) weighting, 3) decomposition a Singular values, 4) classification and 5) model adjustments. For each stage it was explored Strategies that influenced the final accuracy. In the experiments we obtained An 84.94% accuracy in a comparative assessment against human Correlation among human specialists was 84.93%. In the field studied, the Evaluation technology had results close to those of the human evaluators Showing that it is reaching a degree of maturity to be used in Assessment in virtual learning environments. Google Tradutor para empresas:Google Toolkit de tradução para appsTradutor de sitesGlobal Market Finder.
APA, Harvard, Vancouver, ISO, and other styles
5

Kučaidze, Artiom. "Tinklalapio navigavimo asociacijų analizės ir prognozavimo modelis." Master's thesis, Lithuanian Academic Libraries Network (LABT), 2009. http://vddb.library.lt/obj/LT-eLABa-0001:E.02~2008~D_20090908_201802-74260.

Full text
Abstract:
Darbe, remiantis informacijos paieškos teorija, bandoma sukurti tinklalapio navigavimo asociacijų analizės ir prognozavimo modelį. Šio modelio tikslas – simuliuoti potencialių tinklalapio vartotojų informacijos paieškos kelius turint apibrėžtą informacinį tikslą. Modelis kuriamas apjungiant LSA, SVD algoritmus ir koreliacijos koeficientų skaičiavimus. LSA algoritmas naudojamas kuriant semantines erdves, o koreliacijos koeficientų skaičiavimai naudojami statistikoje. Kartu jie leidžia tinklalapio navigavimo asociacijų analizės ir prognozavimo modeliui analizuoti žodžių semantinį panašumą. Darbo eigoje išskiriamos pagrindinės problemos, su kuriomis gali susidurti tinklalapio lankytojai sudarant tinklalapio navigavimo asociacijas – tai yra konkurencijos tarp nuorodų problema, klaidinančių nuorodų problema ir nesuprantamų nuorodų problema. Demonstruojama kaip sukurtas modelis atpažįsta ir analizuoja šias problemas.<br>In this document we develop a model for analyzing and predicting the scent of a web site, which is based on information foraging theory. The goal of this model is to simulate potential web page users and their information foraging paths having specific information needs. Model is being developed combining LSA, SVD algorithms and correlation values calculations. LSA algorithm is used for creating semantic spaces and correlation values are user in statistics. Together they provide possibility to analyze word‘s semantic similarity. Primary problems of web navigation are described in this document. These problems can occur for users while creating the scent of a web site. User can face with concurrency between links problem, wrong sense link problem and unfamiliar link problem. In this document we demonstrate how model recognizes and analyzes these problems.
APA, Harvard, Vancouver, ISO, and other styles
6

Belica, Michal. "Metody sumarizace dokumentů na webu." Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2013. http://www.nusl.cz/ntk/nusl-236386.

Full text
Abstract:
The work deals with automatic summarization of documents in HTML format. As a language of web documents, Czech language has been chosen. The project is focused on algorithms of text summarization. The work also includes document preprocessing for summarization and conversion of text into representation suitable for summarization algorithms. General text mining is also briefly discussed but the project is mainly focused on the automatic document summarization. Two simple summarization algorithms are introduced. Then, the main attention is paid to an advanced algorithm that uses latent semantic analysis. Result of the work is a design and implementation of summarization module for Python language. Final part of the work contains evaluation of summaries generated by implemented summarization methods and their subjective comparison of the author.
APA, Harvard, Vancouver, ISO, and other styles
7

Ozsoy, Makbule Gulcin. "Text Summarization Using Latent Semantic Analysis." Master's thesis, METU, 2011. http://etd.lib.metu.edu.tr/upload/12612988/index.pdf.

Full text
Abstract:
Text summarization solves the problem of presenting the information needed by a user in a compact form. There are different approaches to create well formed summaries in literature. One of the newest methods in text summarization is the Latent Semantic Analysis (LSA) method. In this thesis, different LSA based summarization algorithms are explained and two new LSA based summarization algorithms are proposed. The algorithms are evaluated on Turkish and English documents, and their performances are compared using their ROUGE scores.
APA, Harvard, Vancouver, ISO, and other styles
8

Anaya, Leticia H. "Comparing Latent Dirichlet Allocation and Latent Semantic Analysis as Classifiers." Thesis, University of North Texas, 2011. https://digital.library.unt.edu/ark:/67531/metadc103284/.

Full text
Abstract:
In the Information Age, a proliferation of unstructured text electronic documents exists. Processing these documents by humans is a daunting task as humans have limited cognitive abilities for processing large volumes of documents that can often be extremely lengthy. To address this problem, text data computer algorithms are being developed. Latent Semantic Analysis (LSA) and Latent Dirichlet Allocation (LDA) are two text data computer algorithms that have received much attention individually in the text data literature for topic extraction studies but not for document classification nor for comparison studies. Since classification is considered an important human function and has been studied in the areas of cognitive science and information science, in this dissertation a research study was performed to compare LDA, LSA and humans as document classifiers. The research questions posed in this study are: R1: How accurate is LDA and LSA in classifying documents in a corpus of textual data over a known set of topics? R2: How accurate are humans in performing the same classification task? R3: How does LDA classification performance compare to LSA classification performance? To address these questions, a classification study involving human subjects was designed where humans were asked to generate and classify documents (customer comments) at two levels of abstraction for a quality assurance setting. Then two computer algorithms, LSA and LDA, were used to perform classification on these documents. The results indicate that humans outperformed all computer algorithms and had an accuracy rate of 94% at the higher level of abstraction and 76% at the lower level of abstraction. At the high level of abstraction, the accuracy rates were 84% for both LSA and LDA and at the lower level, the accuracy rate were 67% for LSA and 64% for LDA. The findings of this research have many strong implications for the improvement of information systems that process unstructured text. Document classifiers have many potential applications in many fields (e.g., fraud detection, information retrieval, national security, and customer management). Development and refinement of algorithms that classify text is a fruitful area of ongoing research and this dissertation contributes to this area.
APA, Harvard, Vancouver, ISO, and other styles
9

Huang, Fang. "Multi-document summarization with latent semantic analysis." Thesis, University of Sheffield, 2004. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.419255.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

Buys, Stephanus. "Log analysis aided by latent semantic mapping." Thesis, Rhodes University, 2013. http://hdl.handle.net/10962/d1002963.

Full text
Abstract:
In an age of zero-day exploits and increased on-line attacks on computing infrastructure, operational security practitioners are becoming increasingly aware of the value of the information captured in log events. Analysis of these events is critical during incident response, forensic investigations related to network breaches, hacking attacks and data leaks. Such analysis has led to the discipline of Security Event Analysis, also known as Log Analysis. There are several challenges when dealing with events, foremost being the increased volumes at which events are often generated and stored. Furthermore, events are often captured as unstructured data, with very little consistency in the formats or contents of the events. In this environment, security analysts and implementers of Log Management (LM) or Security Information and Event Management (SIEM) systems face the daunting task of identifying, classifying and disambiguating massive volumes of events in order for security analysis and automation to proceed. Latent Semantic Mapping (LSM) is a proven paradigm shown to be an effective method of, among other things, enabling word clustering, document clustering, topic clustering and semantic inference. This research is an investigation into the practical application of LSM in the discipline of Security Event Analysis, showing the value of using LSM to assist practitioners in identifying types of events, classifying events as belonging to certain sources or technologies and disambiguating different events from each other. The culmination of this research presents adaptations to traditional natural language processing techniques that resulted in improved efficacy of LSM when dealing with Security Event Analysis. This research provides strong evidence supporting the wider adoption and use of LSM, as well as further investigation into Security Event Analysis assisted by LSM and other natural language or computer-learning processing techniques.<br>LaTeX with hyperref package<br>Adobe Acrobat 9.54 Paper Capture Plug-in
APA, Harvard, Vancouver, ISO, and other styles
11

Favre, Benoit. "Résumé automatique de parole pour un accès efficace aux bases de données audio." Phd thesis, Université d'Avignon, 2007. http://tel.archives-ouvertes.fr/tel-00444105.

Full text
Abstract:
L'avènement du numérique permet de stocker de grandes quantités de parole à moindre coût. Malgré les récentes avancées en recherche documentaire audio, il reste difficile d'exploiter les documents à cause du temps nécessaire pour les écouter. Nous tentons d'atténuer cet inconvénient en produisant un résumé automatique parlé à partir des informations les plus importantes. Pour y parvenir, une méthode de résumé par extraction est appliquée au contenu parlé, transcrit et structuré automatiquement. La transcription enrichie est réalisée grâce aux outils Speeral et Alize développés au LIA. Nous complétons cette chaîne de structuration par une segmentation en phrases et une détection des entités nommées, deux caractéristiques importantes pour le résumé par extraction. La méthode de résumé proposée prend en compte les contraintes imposées par des données audio et par des interactions avec l'utilisateur. De plus, cette méthode intègre une projection dans un espace pseudo-sémantique des phrases. Les différents modules mis en place aboutissent à un démonstrateur complet facilitant l'étude des interactions avec l'utilisateur. En l'absence de données d'évaluation sur la parole, la méthode de résumé est évaluée sur le texte lors de la campagne DUC 2006. Nous simulons l'impact d'un contenu parlé en dégradant artificiellement les données de cette même campagne. Enfin, l'ensemble de la chaîne de traitement est mise en œuvre au sein d'un démonstrateur facilitant l'accès aux émissions radiophoniques de la campagne ESTER. Nous proposons, dans le cadre de ce démonstrateur, une frise chronologique interactive complémentaire au résumé parlé.
APA, Harvard, Vancouver, ISO, and other styles
12

Lin, Sheng-Ting. "Latent semantic analysis for retrieving related biomedical articles." Thesis, University of British Columbia, 2017. http://hdl.handle.net/2429/61273.

Full text
Abstract:
Retrieving relevant scientific papers in a scalable way is increasingly important, as more and more studies are published. PubMed’s relevant article recommendation is based on MeSH assignments by indexers, which requires significant human resources and can become a limitation in making papers searchable. Many recommendation systems use singular value decomposition (SVD) to pre-compute related products. In this study, we look at using latent semantic analysis (LSA), an application of SVD to determine relationships in a set of documents and terms, to find related biomedical papers. We focused on determining the best parameters for SVD in retrieving relevant biomedical articles given a paper of interest. Using PubMed's recommendations as guidance, we found that using cosine distance to measure document similarity leads to better results than using Euclidean distance. We re-evaluated other parameters, including the weighting scheme and the number of singular values and using a larger abstract corpus. Finally, we asked people to compare the relevant abstract retrieved with our method against those retrieved by PubMed. Our method retrieved sensible articles that were chosen over PubMed's relevant papers one-third of the time. We looked into the abstracts retrieved by either method and discuss possible areas for experimentation and improvement.<br>Science, Faculty of<br>Graduate
APA, Harvard, Vancouver, ISO, and other styles
13

Stone, Andrew John William. "Performing pre-requirements tracing using latent semantic analysis." Thesis, Lancaster University, 2007. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.497169.

Full text
Abstract:
Requirements tracing is a universally neglected practice in industry, despite clear knowledgement that it supports high-end practices, such as change management and control and impact analysis. Requirements tracing is comprised of two principal types; pre- and post-requirement specification tracing. Post-requirement.specificadon tracing, or post-RST, is concerned with tracing requirements after they have already bncluded in the specification. Pre-RST is concerned with life before inclusion, and therefore represents the origin of each requirement. Pre-RST is less often implemented in practice than post-RST. The principal reason for this is that equirements tracing of any kind doesn't appear to offer enough tangible benefits to the organisation that is developing the software to make it worthwhile, especially given the high cost of tracing by hand. Automating the process of pre-RST is therefore likely to increase its appeal to practitioners by significantly reducing the cost ing pre-RST.
APA, Harvard, Vancouver, ISO, and other styles
14

MENDONCA, DIOGO SILVEIRA. "PROBABILISTIC LATENT SEMANTIC ANALYSIS APPLIED TO RECOMMENDER SYSTEMS." PONTIFÍCIA UNIVERSIDADE CATÓLICA DO RIO DE JANEIRO, 2008. http://www.maxwell.vrac.puc-rio.br/Busca_etds.php?strSecao=resultado&nrSeq=13073@1.

Full text
Abstract:
Os sistemas de recomendação são um tema de pesquisa constante devido a sua grande quantidade de aplicações práticas. Estes sistemas podem ser abordados de diversas maneiras, sendo uma das mais utilizadas a filtragem colaborativa, em que para recomendar um item a um usuário são utilizados dados de comportamento de outros usuários. Porém, nem sempre os algoritmos de filtragem colaborativa atingem níveis de precisão necessários para serem utilizados em aplicações reais. Desta forma este trabalho tem como objetivo avaliar o desempenho da análise probabilística de semântica latente (PLSA) aplicado a sistemas de recomendação. Este modelo identifica grupos de usuários com comportamento semelhante através de atributos latentes, permitindo que o comportamento dos grupos seja utilizado na recomendação. Para verificar a eficácia do método, apresentamos experimentos com o PLSA utilizando os problemas de recomendação de anúncios na web e a recomendação de filmes. Evidenciamos uma melhoria de 18,7% na precisão da recomendação de anúncios na web e 3,7% de melhoria no erro quadrático sobre a Média das Médias para o corpus do Netflix. Além dos experimentos, o algoritmo foi implementado de forma flexível e reutilizável, permitindo adaptação a outros problemas com esforço reduzido. Tal implementação também foi incorporada como um módulo do LearnAds, um framework de recomendação de anúncios na web.<br>Recommender systems are a constant research topic because of their large number of practical applications. There are many approaches to address these problems, one of the most widely used being collaborative filtering, in which in order to recommend an item to a user, data of other users` behaviors are employed. However, collaborative filtering algorithms do not always reach levels of precision required for the use in real applications. Within this context, the present work aims to evaluate the performance of the probabilistic latent semantic analysis (PLSA) applied to recommender systems. This model identifies groups of users with similar behaviors through latent attributes, allowing the use of these behaviors in the recommendation. To check the effectiveness of the method, there were presented experiments with problems of both web ad recommending and film recommending. An improvement of 18,7% were found in the accuracy of the recommendation of ads on the web and we also found 3.7% of improvement in Root Mean Square Error over the Means of Means baseline system for the Netflix corpus. Apart from the aforementioned experiments, the algorithm was implemented in a flexible and reusable way, allowing its adaptation to other problems with reduced effort. This implementation has also been incorporated as a module of LearnAds, a framework for the recommendation of ads on the web.
APA, Harvard, Vancouver, ISO, and other styles
15

Akther, Aysha. "Social Tag-based Community Recommendation Using Latent Semantic Analysis." Thèse, Université d'Ottawa / University of Ottawa, 2012. http://hdl.handle.net/10393/23238.

Full text
Abstract:
Collaboration and sharing of information are the basis of modern social web system. Users in the social web systems are establishing and joining online communities, in order to collectively share their content with a group of people having common topic of interest. Group or community activities have increased exponentially in modern social Web systems. With the explosive growth of social communities, users of social Web systems have experienced considerable difficulty with discovering communities relevant to their interests. In this study, we address the problem of recommending communities to individual users. Recommender techniques that are based solely on community affiliation, may fail to find a wide range of proper communities for users when their available data are insufficient. We regard this problem as tag-based personalized searches. Based on social tags used by members of communities, we first represent communities in a low-dimensional space, the so-called latent semantic space, by using Latent Semantic Analysis. Then, for recommending communities to a given user, we capture how each community is relevant to both user’s personal tag usage and other community members’ tagging patterns in the latent space. We specially focus on the challenging problem of recommending communities to users who have joined very few communities or having no prior community membership. Our evaluation on two heterogeneous datasets shows that our approach can significantly improve the recommendation quality.
APA, Harvard, Vancouver, ISO, and other styles
16

Sheikha, Hassan. "Text mining Twitter social media for Covid-19 : Comparing latent semantic analysis and latent Dirichlet allocation." Thesis, Högskolan i Gävle, Avdelningen för datavetenskap och samhällsbyggnad, 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:hig:diva-32567.

Full text
Abstract:
In this thesis, the Twitter social media is data mined for information about the covid-19 outbreak during the month of March, starting from the 3’rd and ending on the 31’st. 100,000 tweets were collected from Harvard’s opensource data and recreated using Hydrate. This data is analyzed further using different Natural Language Processing (NLP) methodologies, such as termfrequency inverse document frequency (TF-IDF), lemmatizing, tokenizing, Latent Semantic Analysis (LSA) and Latent Dirichlet Allocation (LDA). Furthermore, the results of the LSA and LDA algorithms is reduced dimensional data that will be clustered using clustering algorithms HDBSCAN and K-Means for later comparison. Different methodologies are used to determine the optimal parameters for the algorithms. This is all done in the python programing language, as there are libraries for supporting this research, the most important being scikit-learn. The frequent words of each cluster will then be displayed and compared with factual data regarding the outbreak to discover if there are any correlations. The factual data is collected by World Health Organization (WHO) and is then visualized in graphs in ourworldindata.org. Correlations with the results are also looked for in news articles to find any significant moments to see if that affected the top words in the clustered data. The news articles with good timelines used for correlating incidents are that of NBC News and New York Times. The results show no direct correlations with the data reported by WHO, however looking into the timelines reported by news sources some correlation can be seen with the clustered data. Also, the combination of LDA and HDBSCAN yielded the most desireable results in comparison to the other combinations of the dimnension reductions and clustering. This was much due to the use of GridSearchCV on LDA to determine the ideal parameters for the LDA models on each dataset as well as how well HDBSCAN clusters its data in comparison to K-Means.
APA, Harvard, Vancouver, ISO, and other styles
17

Arroniz, Inigo. "EXTRACTING QUANTITATIVE INFORMATIONFROM NONNUMERIC MARKETING DATA: AN AUGMENTEDLATENT SEMANTIC ANALYSIS APPROACH." Doctoral diss., University of Central Florida, 2007. http://digital.library.ucf.edu/cdm/ref/collection/ETD/id/3083.

Full text
Abstract:
Despite the widespread availability and importance of nonnumeric data, marketers do not have the tools to extract information from large amounts of nonnumeric data. This dissertation attempts to fill this void: I developed a scalable methodology that is capable of extracting information from extremely large volumes of nonnumeric data. The proposed methodology integrates concepts from information retrieval and content analysis to analyze textual information. This approach avoids a pervasive difficulty of traditional content analysis, namely the classification of terms into predetermined categories, by creating a linear composite of all terms in the document and, then, weighting the terms according to their inferred meaning. In the proposed approach, meaning is inferred by the collocation of the term across all the texts in the corpus. It is assumed that there is a lower dimensional space of concepts that underlies word usage. The semantics of each word are inferred by identifying its various contexts in a document and across documents (i.e., in the corpus). After the semantic similarity space is inferred from the corpus, the words in each document are weighted to obtain their representation on the lower dimensional semantic similarity space, effectively mapping the terms to the concept space and ultimately creating a score that measures the concept of interest. I propose an empirical application of the outlined methodology. For this empirical illustration, I revisit an important marketing problem, the effect of movie critics on the performance of the movies. In the extant literature, researchers have used an overall numerical rating of the review to capture the content of the movie reviews. I contend that valuable information present in the textual materials remains uncovered. I use the proposed methodology to extract this information from the nonnumeric text contained in a movie review. The proposed setting is particularly attractive to validate the methodology because the setting allows for a simple test of the text-derived metrics by comparing them to the numeric ratings provided by the reviewers. I empirically show the application of this methodology and traditional computer-aided content analytic methods to study an important marketing topic, the effect of movie critics on movie performance. In the empirical application of the proposed methodology, I use two datasets that combined contain more than 9,000 movie reviews nested in more than 250 movies. I am restudying this marketing problem in the light of directly obtaining information from the reviews instead of following the usual practice of using an overall rating or a classification of the review as either positive or negative. I find that the addition of direct content and structure of the review adds a significant amount of exploratory power as a determinant of movie performance, even in the presence of actual reviewer overall ratings (stars) and other controls. This effect is robust across distinct opertaionalizations of both the review content and the movie performance metrics. In fact, my findings suggest that as we move from sales to profitability to financial return measures, the role of the content of the review, and therefore the critic's role, becomes increasingly important.<br>Ph.D.<br>Department of Marketing<br>Business Administration<br>Business Administration PhD
APA, Harvard, Vancouver, ISO, and other styles
18

Eryol, Erkin. "Probabilistic Latent Semantic Analysis Based Framework For Hybrid Social Recommender Systems." Master's thesis, METU, 2010. http://etd.lib.metu.edu.tr/upload/2/12611921/index.pdf.

Full text
Abstract:
Today, there are user annotated internet sites, user interaction logs, online user communities which are valuable sources of information concerning the personalized recommendation problem. In the literature, hybrid social recommender systems have been proposed to reduce the sparsity of the usage data by integrating the user related information sources together. In this thesis, a method based on probabilistic latent semantic analysis is used as a framework for a hybrid social recommendation system. Different data hybridization approaches on probabilistic latent semantic analysis are experimented. Based on this flexible probabilistic model, network regularization and model blending approaches are applied on probabilistic latent semantic analysis model as a solution for social trust network usage throughout the collaborative filtering process. The proposed model has outperformed the baseline methods in our experiments. As a result of the research, it is shown that the proposed methods successfully model the rating and social trust data together in a theoretically principled way.
APA, Harvard, Vancouver, ISO, and other styles
19

Mrva, David. "The use of probabilistic latent semantic analysis in adaptive language models." Thesis, University of Cambridge, 2009. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.611121.

Full text
APA, Harvard, Vancouver, ISO, and other styles
20

Coccaro, Noah B. "Latent semantic analysis as a tool to improve automatic speech recognition performance." Diss., Connect to online resource, 2005. http://wwwlib.umi.com/cr/colorado/fullcit?p3190360.

Full text
APA, Harvard, Vancouver, ISO, and other styles
21

Cosma, Georgina. "An approach to source-code plagiarism detection investigation using latent semantic analysis." Thesis, University of Warwick, 2008. http://wrap.warwick.ac.uk/3575/.

Full text
Abstract:
This thesis looks at three aspects of source-code plagiarism. The first aspect of the thesis is concerned with creating a definition of source-code plagiarism; the second aspect is concerned with describing the findings gathered from investigating the Latent Semantic Analysis information retrieval algorithm for source-code similarity detection; and the final aspect of the thesis is concerned with the proposal and evaluation of a new algorithm that combines Latent Semantic Analysis with plagiarism detection tools. A recent review of the literature revealed that there is no commonly agreed definition of what constitutes source-code plagiarism in the context of student assignments. This thesis first analyses the findings from a survey carried out to gather an insight into the perspectives of UK Higher Education academics who teach programming on computing courses. Based on the survey findings, a detailed definition of source-code plagiarism is proposed. Secondly, the thesis investigates the application of an information retrieval technique, Latent Semantic Analysis, to derive semantic information from source-code files. Various parameters drive the effectiveness of Latent Semantic Analysis. The performance of Latent Semantic Analysis using various parameter settings and its effectiveness in retrieving similar source-code files when optimising those parameters are evaluated. Finally, an algorithm for combining Latent Semantic Analysis with plagiarism detection tools is proposed and a tool is created and evaluated. The proposed tool, PlaGate, is a hybrid model that allows for the integration of Latent Semantic Analysis with plagiarism detection tools in order to enhance plagiarism detection. In addition, PlaGate has a facility for investigating the importance of source-code fragments with regards to their contribution towards proving plagiarism. PlaGate provides graphical output that indicates the clusters of suspicious files and source-code fragments.
APA, Harvard, Vancouver, ISO, and other styles
22

Spomer, Judith E. "Latent semantic analysis and classification modeling in applications for social movement theory /." Abstract Full Text (HTML) Full Text (PDF), 2008. http://eprints.ccsu.edu/archive/00000552/02/1996FT.htm.

Full text
Abstract:
Thesis (M.S.) -- Central Connecticut State University, 2008.<br>Thesis advisor: Roger Bilisoly. "... in partial fulfillment of the requirements for the degree of Master of Science in Data Mining." Includes bibliographical references (leaves 122-127). Also available via the World Wide Web.
APA, Harvard, Vancouver, ISO, and other styles
23

Scalzo, Gabriella C. "Using Latent Semantic Analysis to Evaluate the Coherence of Traumatic Event Narratives." VCU Scholars Compass, 2019. https://scholarscompass.vcu.edu/etd/5802.

Full text
Abstract:
While a growing evidence base suggests that expressive writing about a traumatic event may be an effective intervention which results in a variety of health benefits, there are still multiple competing theories that seek to explain expressive writing’s mechanism(s) of action. Two of the theories with stronger evidence bases are exposure theory and cognitive processing theory. The state of this field is complicated by methodological limitations; operationalizing and measuring the relative constructs of trauma narratives, such as coherence, traditionally requires time- and labor-intensive methods such as using a narrative coding scheme. This study used a computer-based methodology, latent semantic analysis (LSA), to quantify narrative coherence and analyze the relationship between narrative coherence and both short- and long-term outcomes of expressive writing. A subsample of unscreened undergraduates (N=113) who had been randomly assigned to the expressive writing group of a larger study wrote about the most traumatic event that had happened to them for three twenty-minute sessions; their narratives were analyzed using LSA. There were three main hypotheses, informed by cognitive processing theory: 1) That higher coherence in a given session would be associated with a more positive reported valence at the conclusion of that session, 2) that increasing narrative coherence across writing sessions would be associated with increasing reported valence at the conclusion of each session, and 3) that increasing narrative coherence over time would be associated with a decrease in post-traumatic stress symptoms. Overall, initial hypotheses were not supported, but higher coherence in the third writing session was associated with more negative valence at the conclusion of the session. Furthermore, relationships between pre- and post-session valence strengthened over time, and coherence, pre-session valence, and post-session valence all trended over time. These results suggest a collection of temporal effects, the implications of which are discussed in terms of future directions.
APA, Harvard, Vancouver, ISO, and other styles
24

Elsas, Jonathan L. "An Evaluation of Projection Techniques for Document Clustering: Latent Semantic Analysis and Independent Component Analysis." Thesis, School of Information and Library Science, 2005. http://hdl.handle.net/1901/208.

Full text
Abstract:
Dimensionality reduction in the bag-of-words vector space document representation model has been widely studied for the purposes of improving accuracy and reducing computational load of document retrieval tasks. These techniques, however, have not been studied to the same degree with regard to document clustering tasks. This study evaluates the effectiveness of two popular dimensionality reduction techniques for clustering, and their effect on discovering accurate and understandable topical groupings of documents. The two techniques studied are Latent Semantic Analysis and Independent Component Analysis, each of which have been shown to be effective in the past for retrieval purposes.
APA, Harvard, Vancouver, ISO, and other styles
25

Hassan, Samer. "Measuring Semantic Relatedness Using Salient Encyclopedic Concepts." Thesis, University of North Texas, 2011. https://digital.library.unt.edu/ark:/67531/metadc84212/.

Full text
Abstract:
While pragmatics, through its integration of situational awareness and real world relevant knowledge, offers a high level of analysis that is suitable for real interpretation of natural dialogue, semantics, on the other end, represents a lower yet more tractable and affordable linguistic level of analysis using current technologies. Generally, the understanding of semantic meaning in literature has revolved around the famous quote ``You shall know a word by the company it keeps''. In this thesis we investigate the role of context constituents in decoding the semantic meaning of the engulfing context; specifically we probe the role of salient concepts, defined as content-bearing expressions which afford encyclopedic definitions, as a suitable source of semantic clues to an unambiguous interpretation of context. Furthermore, we integrate this world knowledge in building a new and robust unsupervised semantic model and apply it to entail semantic relatedness between textual pairs, whether they are words, sentences or paragraphs. Moreover, we explore the abstraction of semantics across languages and utilize our findings into building a novel multi-lingual semantic relatedness model exploiting information acquired from various languages. We demonstrate the effectiveness and the superiority of our mono-lingual and multi-lingual models through a comprehensive set of evaluations on specialized synthetic datasets for semantic relatedness as well as real world applications such as paraphrase detection and short answer grading. Our work represents a novel approach to integrate world-knowledge into current semantic models and a means to cross the language boundary for a better and more robust semantic relatedness representation, thus opening the door for an improved abstraction of meaning that carries the potential of ultimately imparting understanding of natural language to machines.
APA, Harvard, Vancouver, ISO, and other styles
26

Dietl, Reinhard. "A Reference Architecture for Providing Latent Semantic Analysis Applications in Distributed Systems. Diploma Thesis." WU Vienna University of Economics and Business, 2010. http://epub.wu.ac.at/3016/1/EPUB_Thesis_Dietl.pdf.

Full text
Abstract:
With the increasing availability of storage and computing power, Latent Semantic Analysis (LSA) has gained more and more significance in practice over the last decade. This diploma thesis aims to develop a reference architecture which can be utilised to provide LSA based applications in a distributed system. It outlines the underlying problems of generation, processing and storage of large data objects resulting from LSA operations, the problems arising from bringing LSA into a distributed context, suggests an architecture for the software components necessary to perform the tasks, and evaluates the applicability to real world scenarios, including the implementation of a classroom scenario as a proof-of-concept. (author's abstract)<br>Series: Theses / Institute for Statistics and Mathematics
APA, Harvard, Vancouver, ISO, and other styles
27

Polyakov, Serhiy. "Enhancing User Search Experience in Digital Libraries with Rotated Latent Semantic Indexing." Thesis, University of North Texas, 2015. https://digital.library.unt.edu/ark:/67531/metadc804881/.

Full text
Abstract:
This study investigates a semi-automatic method for creation of topical labels representing the topical concepts in information objects. The method is called rotated latent semantic indexing (rLSI). rLSI has found application in text mining but has not been used for topical labels generation in digital libraries (DLs). The present study proposes a theoretical model and an evaluation framework which are based on the LSA theory of meaning and investigates rLSI in a DL environment. The proposed evaluation framework for rLSI topical labels is focused on human-information search behavior and satisfaction measures. The experimental systems that utilize those topical labels were built for the purposes of evaluating user satisfaction with the search process. A new instrument was developed for this study and the experiment showed high reliability of the measurement scales and confirmed the construct validity. Data was collected through the information search tasks performed by 122 participants using two experimental systems. A quantitative method of analysis, partial least squares structural equation modeling (PLS-SEM), was used to test a set of research hypotheses and to answer research questions. The results showed a not significant, indirect effect of topical label type on both guidance and satisfaction. The conclusion of the study is that topical labels generated using rLSI provide the same levels of alignment, guidance, and satisfaction with the search process as topical labels created by the professional indexers using best practices.
APA, Harvard, Vancouver, ISO, and other styles
28

Ashton, Triss A. "Accuracy and Interpretability Testing of Text Mining Methods." Thesis, University of North Texas, 2013. https://digital.library.unt.edu/ark:/67531/metadc283791/.

Full text
Abstract:
Extracting meaningful information from large collections of text data is problematic because of the sheer size of the database. However, automated analytic methods capable of processing such data have emerged. These methods, collectively called text mining first began to appear in 1988. A number of additional text mining methods quickly developed in independent research silos with each based on unique mathematical algorithms. How good each of these methods are at analyzing text is unclear. Method development typically evolves from some research silo centric requirement with the success of the method measured by a custom requirement-based metric. Results of the new method are then compared to another method that was similarly developed. The proposed research introduces an experimentally designed testing method to text mining that eliminates research silo bias and simultaneously evaluates methods from all of the major context-region text mining method families. The proposed research method follows a random block factorial design with two treatments consisting of three and five levels (RBF-35) with repeated measures. Contribution of the research is threefold. First, the users perceived a difference in the effectiveness of the various methods. Second, while still not clear, there are characteristics with in the text collection that affect the algorithms ability to extract meaningful results. Third, this research develops an experimental design process for testing the algorithms that is adaptable into other areas of software development and algorithm testing. This design eliminates the bias based practices historically employed by algorithm developers.
APA, Harvard, Vancouver, ISO, and other styles
29

Shen, Yao. "Scene Analysis Using Scale Invariant Feature Extraction and Probabilistic Modeling." Thesis, University of North Texas, 2011. https://digital.library.unt.edu/ark:/67531/metadc84275/.

Full text
Abstract:
Conventional pattern recognition systems have two components: feature analysis and pattern classification. For any object in an image, features could be considered as the major characteristic of the object either for object recognition or object tracking purpose. Features extracted from a training image, can be used to identify the object when attempting to locate the object in a test image containing many other objects. To perform reliable scene analysis, it is important that the features extracted from the training image are detectable even under changes in image scale, noise and illumination. Scale invariant feature has wide applications such as image classification, object recognition and object tracking in the image processing area. In this thesis, color feature and SIFT (scale invariant feature transform) are considered to be scale invariant feature. The classification, recognition and tracking result were evaluated with novel evaluation criterion and compared with some existing methods. I also studied different types of scale invariant feature for the purpose of solving scene analysis problems. I propose probabilistic models as the foundation of analysis scene scenario of images. In order to differential the content of image, I develop novel algorithms for the adaptive combination for multiple features extracted from images. I demonstrate the performance of the developed algorithm on several scene analysis tasks, including object tracking, video stabilization, medical video segmentation and scene classification.
APA, Harvard, Vancouver, ISO, and other styles
30

Zaras, Dimitrios. "Evaluating Semantic Internalization Among Users of an Online Review Platform." Thesis, University of North Texas, 2015. https://digital.library.unt.edu/ark:/67531/metadc804823/.

Full text
Abstract:
The present study draws on recent sociological literature that argues that the study of cognition and culture can benefit from theories of embodied cognition. The concept of semantic internalization is introduced, which is conceptualized as the ability to perceive and articulate the topics that are of most concern to a community as they are manifested in social discourse. Semantic internalization is partly an application of emotional intelligence in the context of community-level discourse. Semantic internalization is measured through the application of Latent Semantic Analysis. Furthermore, it is investigated whether this ability is related to an individual’s social capital and habitus. The analysis is based on data collected from the online review platform yelp.com.
APA, Harvard, Vancouver, ISO, and other styles
31

Karlsson, Kristina. "Semantic represenations of retrieved memory information depend on cue-modality." Thesis, Stockholms universitet, Psykologiska institutionen, 2011. http://urn.kb.se/resolve?urn=urn:nbn:se:su:diva-58817.

Full text
Abstract:
The semantic content (i.e., meaning of words) is the essence of retrieved autobiographical memories. In comparison to previous research, which has mainly focused on phenomenological experiences and age distribution of memory events, the present study provides a novel view on the retrieval of event information by addressing the semantic representation of memories. In the present study the semantic representation (i.e., word locations represented by vectors in a high dimensional space) of retrieved memory information were investigated, by analyzing the data with an automatic statistical algorithm. The experiment comprised a cued recall task, where participants were presented with unimodal (i.e., one sense modality) or multimodal (i.e., three sense modalities in conjunction) retrieval cues and asked to recall autobiographical memories. The memories were verbally narrated, recorded and transcribed to text. The semantic content of the memory narrations was analyzed with a semantic representation generated by latent semantic analysis (LSA). The results indicated that the semantic representation of visually evoked memories were most similar to the multimodally evoked memories, followed by auditorily and olfactorily evoked memories. By categorizing the semantic content into clusters, the present study also identified unique characteristics in the memory content across modalities.
APA, Harvard, Vancouver, ISO, and other styles
32

Chen, Xin. "Human-centered semantic retrieval in multimedia databases." Birmingham, Ala. : University of Alabama at Birmingham, 2008. https://www.mhsl.uab.edu/dt/2008p/chen.pdf.

Full text
Abstract:
Thesis (Ph. D.)--University of Alabama at Birmingham, 2008.<br>Additional advisors: Barrett R. Bryant, Yuhua Song, Alan Sprague, Robert W. Thacker. Description based on contents viewed Oct. 8, 2008; title from PDF t.p. Includes bibliographical references (p. 172-183).
APA, Harvard, Vancouver, ISO, and other styles
33

Haley, Debra. "Applying latent semantic analysis to computer assisted assessment in the Computer Science domain : a framework, a tool, and an evaluation." Thesis, Open University, 2009. http://oro.open.ac.uk/25955/.

Full text
Abstract:
This dissertation argues that automated assessment systems can be useful for both students and educators provided that the results correspond well with human markers. Thus, evaluating such a system is crucial. I present an evaluation framework and show how and why it can be useful for both producers and consumers of automated assessment systems. The framework is a refinement of a research taxonomy that came out of the effort to analyse the literature review of systems based on Latent Semantic Analysis (LSA), a statistical natural language processing technique that has been used for automated assessment of essays. The evaluation framework can help developers publish their results in a format that is comprehensive, relatively compact, and useful to other researchers. The thesis claims that, in order to see a complete picture of an automated assessment system, certain pieces must be emphasised. It presents the framework as a jigsaw puzzle whose pieces join together to form the whole picture. The dissertation uses the framework to compare the accuracy of human markers and EMMA, the LSA-based assessment system I wrote as part of this dissertation. EMMA marks short, free text answers in the domain of computer science. I conducted a study of five human markers and then used the results as a benchmark against which to evaluate EMMA. An integral part of the evaluation was the success metric. The standard inter-rater reliability statistic was not useful; I located a new statistic and applied it to the domain of computer assisted assessment for the first time, as far as I know. Although EMMA exceeds human markers on a few questions, overall it does not achieve the same level of agreement with humans as humans do with each other. The last chapter maps out a plan for further research to improve EMMA.
APA, Harvard, Vancouver, ISO, and other styles
34

Al, Batineh Mohammed S. "Latent Semantic Analysis, Corpus stylistics and Machine Learning Stylometry for Translational and Authorial Style Analysis: The Case of Denys Johnson-Davies’ Translations into English." Kent State University / OhioLINK, 2015. http://rave.ohiolink.edu/etdc/view?acc_num=kent1429300641.

Full text
APA, Harvard, Vancouver, ISO, and other styles
35

Bose, Aishwarya. "Effective web service discovery using a combination of a semantic model and a data mining technique." Thesis, Queensland University of Technology, 2008. https://eprints.qut.edu.au/26425/1/Aishwarya_Bose_Thesis.pdf.

Full text
Abstract:
With the advent of Service Oriented Architecture, Web Services have gained tremendous popularity. Due to the availability of a large number of Web services, finding an appropriate Web service according to the requirement of the user is a challenge. This warrants the need to establish an effective and reliable process of Web service discovery. A considerable body of research has emerged to develop methods to improve the accuracy of Web service discovery to match the best service. The process of Web service discovery results in suggesting many individual services that partially fulfil the user’s interest. By considering the semantic relationships of words used in describing the services as well as the use of input and output parameters can lead to accurate Web service discovery. Appropriate linking of individual matched services should fully satisfy the requirements which the user is looking for. This research proposes to integrate a semantic model and a data mining technique to enhance the accuracy of Web service discovery. A novel three-phase Web service discovery methodology has been proposed. The first phase performs match-making to find semantically similar Web services for a user query. In order to perform semantic analysis on the content present in the Web service description language document, the support-based latent semantic kernel is constructed using an innovative concept of binning and merging on the large quantity of text documents covering diverse areas of domain of knowledge. The use of a generic latent semantic kernel constructed with a large number of terms helps to find the hidden meaning of the query terms which otherwise could not be found. Sometimes a single Web service is unable to fully satisfy the requirement of the user. In such cases, a composition of multiple inter-related Web services is presented to the user. The task of checking the possibility of linking multiple Web services is done in the second phase. Once the feasibility of linking Web services is checked, the objective is to provide the user with the best composition of Web services. In the link analysis phase, the Web services are modelled as nodes of a graph and an allpair shortest-path algorithm is applied to find the optimum path at the minimum cost for traversal. The third phase which is the system integration, integrates the results from the preceding two phases by using an original fusion algorithm in the fusion engine. Finally, the recommendation engine which is an integral part of the system integration phase makes the final recommendations including individual and composite Web services to the user. In order to evaluate the performance of the proposed method, extensive experimentation has been performed. Results of the proposed support-based semantic kernel method of Web service discovery are compared with the results of the standard keyword-based information-retrieval method and a clustering-based machine-learning method of Web service discovery. The proposed method outperforms both information-retrieval and machine-learning based methods. Experimental results and statistical analysis also show that the best Web services compositions are obtained by considering 10 to 15 Web services that are found in phase-I for linking. Empirical results also ascertain that the fusion engine boosts the accuracy of Web service discovery by combining the inputs from both the semantic analysis (phase-I) and the link analysis (phase-II) in a systematic fashion. Overall, the accuracy of Web service discovery with the proposed method shows a significant improvement over traditional discovery methods.
APA, Harvard, Vancouver, ISO, and other styles
36

Bose, Aishwarya. "Effective web service discovery using a combination of a semantic model and a data mining technique." Queensland University of Technology, 2008. http://eprints.qut.edu.au/26425/.

Full text
Abstract:
With the advent of Service Oriented Architecture, Web Services have gained tremendous popularity. Due to the availability of a large number of Web services, finding an appropriate Web service according to the requirement of the user is a challenge. This warrants the need to establish an effective and reliable process of Web service discovery. A considerable body of research has emerged to develop methods to improve the accuracy of Web service discovery to match the best service. The process of Web service discovery results in suggesting many individual services that partially fulfil the user’s interest. By considering the semantic relationships of words used in describing the services as well as the use of input and output parameters can lead to accurate Web service discovery. Appropriate linking of individual matched services should fully satisfy the requirements which the user is looking for. This research proposes to integrate a semantic model and a data mining technique to enhance the accuracy of Web service discovery. A novel three-phase Web service discovery methodology has been proposed. The first phase performs match-making to find semantically similar Web services for a user query. In order to perform semantic analysis on the content present in the Web service description language document, the support-based latent semantic kernel is constructed using an innovative concept of binning and merging on the large quantity of text documents covering diverse areas of domain of knowledge. The use of a generic latent semantic kernel constructed with a large number of terms helps to find the hidden meaning of the query terms which otherwise could not be found. Sometimes a single Web service is unable to fully satisfy the requirement of the user. In such cases, a composition of multiple inter-related Web services is presented to the user. The task of checking the possibility of linking multiple Web services is done in the second phase. Once the feasibility of linking Web services is checked, the objective is to provide the user with the best composition of Web services. In the link analysis phase, the Web services are modelled as nodes of a graph and an allpair shortest-path algorithm is applied to find the optimum path at the minimum cost for traversal. The third phase which is the system integration, integrates the results from the preceding two phases by using an original fusion algorithm in the fusion engine. Finally, the recommendation engine which is an integral part of the system integration phase makes the final recommendations including individual and composite Web services to the user. In order to evaluate the performance of the proposed method, extensive experimentation has been performed. Results of the proposed support-based semantic kernel method of Web service discovery are compared with the results of the standard keyword-based information-retrieval method and a clustering-based machine-learning method of Web service discovery. The proposed method outperforms both information-retrieval and machine-learning based methods. Experimental results and statistical analysis also show that the best Web services compositions are obtained by considering 10 to 15 Web services that are found in phase-I for linking. Empirical results also ascertain that the fusion engine boosts the accuracy of Web service discovery by combining the inputs from both the semantic analysis (phase-I) and the link analysis (phase-II) in a systematic fashion. Overall, the accuracy of Web service discovery with the proposed method shows a significant improvement over traditional discovery methods.
APA, Harvard, Vancouver, ISO, and other styles
37

Seifried, Eva [Verfasser], and Birgit [Akademischer Betreuer] Spinath. "Improving Learning and Teaching at Universities: The Potential of Applying Automatic Essay Scoring with Latent Semantic Analysis / Eva Seifried ; Betreuer: Birgit Spinath." Heidelberg : Universitätsbibliothek Heidelberg, 2016. http://d-nb.info/1180617347/34.

Full text
APA, Harvard, Vancouver, ISO, and other styles
38

Haridas, Mandar. "Exploring knowledge bases for engineering a user interests hierarchy for social network applications." Thesis, Manhattan, Kan. : Kansas State University, 2009. http://hdl.handle.net/2097/1528.

Full text
APA, Harvard, Vancouver, ISO, and other styles
39

Duff, Dawna Margaret. "Lexical semantic richness : effect on reading comprehension and on readers' hypotheses about the meanings of novel words." Diss., University of Iowa, 2015. https://ir.uiowa.edu/etd/1591.

Full text
Abstract:
Purpose: This study investigates one possible reason for individual differences in vocabulary learning from written context. A Latent Semantic Analysis (LSA) model is used to motivate the prediction of a causal relationship between semantic knowledge for words in a text and the quality of their hypotheses about the semantics of novel words, an effect mediated by reading comprehension. The purpose of this study was to test this prediction behaviorally, using a within subject repeated measures design to control for other variables affecting semantic word learning. Methods: Participants in 6th grades (n=23) were given training to increase semantic knowledge of words from one of two texts, counterbalanced across participants. After training, participants read untreated and treated texts, which contained six nonword forms. Measures were taken of reading comprehension (RC) and the quality of the readers' hypotheses about the semantics of the novel words (HSNW). Text difficulty and semantic informativeness of the texts about nonwords were controlled. Results: All participants had increases in semantic knowledge of taught words after intervention. For the group as a whole, RC scores were significantly higher in the treated than untreated condition, but HSNW scores were not significantly higher in the treated than untreated condition. Reading comprehension ability was a significant moderator of the effect of treatment on HSNW. A subgroup of participants with lower scores on a standardized reading comprehension measure (n=6) had significantly higher HSNW and RC scores in the treated than untreated condition. Participants with higher standardized reading comprehension scores (n=17) showed no effect of treatment on either RC or HSNW. Difference scores for RC and difference scores for HSNW were strongly related, indicating that within subjects, there is a relationship between RC and HSNW. Conclusions: The results indicate that for a subgroup of readers with weaker reading comprehension, intervention to enhance lexical semantic richness had a substantial and significant effect on both their reading comprehension and on the quality of hypotheses that they generated about the meanings of novel words. Neither effect was found for a subgroup of readers with stronger reading comprehension. Clinical and educational implications are discussed.
APA, Harvard, Vancouver, ISO, and other styles
40

Morel, Gwendolyn. "Educational Technology: A Comparison of Ten Academic Journals and the New Media Consortium Horizon Reports for the Period of 2000-2017." Thesis, University of North Texas, 2017. https://digital.library.unt.edu/ark:/67531/metadc1062887/.

Full text
Abstract:
This exploratory and descriptive study provides an increased understanding of the topics being explored in both published research and industry reporting in the field of educational technology. Although literature in the field is plentiful, the task of synthesizing the information for practical use is a massive undertaking. Latent semantic analysis was used to review journal abstracts from ten highly respected journals and the New Media Consortium Horizon Reports to identify trends within the publications. As part of the analysis, 25 topics and technologies were identified in the combined corpus of academic journals and Horizon Reports. The journals tended to focus on pedagogical issues whereas the Horizon Reports tended to focus on technological aspects in education. In addition to differences between publication types, trends over time are also described. Findings may assist researchers, practitioners, administrators, and policy makers with decision-making in their respective educational areas.
APA, Harvard, Vancouver, ISO, and other styles
41

King, John Douglas. "Deep Web Collection Selection." Thesis, Queensland University of Technology, 2004. https://eprints.qut.edu.au/15992/3/John_King_Thesis.pdf.

Full text
Abstract:
The deep web contains a massive number of collections that are mostly invisible to search engines. These collections often contain high-quality, structured information that cannot be crawled using traditional methods. An important problem is selecting which of these collections to search. Automatic collection selection methods try to solve this problem by suggesting the best subset of deep web collections to search based on a query. A few methods for deep Web collection selection have proposed in Collection Retrieval Inference Network system and Glossary of Servers, Server system. The drawback in these methods is that they require communication between the search broker and the collections, and need metadata about each collection. This thesis compares three different sampling methods that do not require communication with the broker or metadata about each collection. It also transforms some traditional information retrieval based techniques to this area. In addition, the thesis tests these techniques using INEX collection for total 18 collections (including 12232 XML documents) and total 36 queries. The experiment shows that the performance of sample-based technique is satisfactory in average.
APA, Harvard, Vancouver, ISO, and other styles
42

King, John Douglas. "Deep Web Collection Selection." Queensland University of Technology, 2004. http://eprints.qut.edu.au/15992/.

Full text
Abstract:
The deep web contains a massive number of collections that are mostly invisible to search engines. These collections often contain high-quality, structured information that cannot be crawled using traditional methods. An important problem is selecting which of these collections to search. Automatic collection selection methods try to solve this problem by suggesting the best subset of deep web collections to search based on a query. A few methods for deep Web collection selection have proposed in Collection Retrieval Inference Network system and Glossary of Servers Server system. The drawback in these methods is that they require communication between the search broker and the collections, and need metadata about each collection. This thesis compares three different sampling methods that do not require communication with the broker or metadata about each collection. It also transforms some traditional information retrieval based techniques to this area. In addition, the thesis tests these techniques using INEX collection for total 18 collections (including 12232 XML documents) and total 36 queries. The experiment shows that the performance of sample-based technique is satisfactory in average.
APA, Harvard, Vancouver, ISO, and other styles
43

Novák, Ján. "Automatická tvorba tezauru z wikipedie." Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2011. http://www.nusl.cz/ntk/nusl-236964.

Full text
Abstract:
This thesis deals with automatic acquiring thesauri from Wikipedia. It describes Wikipedia as a suitable data set for thesauri acquiring and also methods for computing semantic similarity of terms are described. The thesis also contains a description of concepts and implementation of the system for automatic thesauri acquiring. Finally, the implemented system is evaluated by the standard metrics, such as precision or recall.
APA, Harvard, Vancouver, ISO, and other styles
44

Papadouka, Maria Eirini. "Using Topic Models to Study Journalist-Audience Convergence and Divergence: The Case of Human Trafficking Coverage on British Online Newspapers." Thesis, University of North Texas, 2016. https://digital.library.unt.edu/ark:/67531/metadc862882/.

Full text
Abstract:
Despite the accessibility of online news and availability of sophisticated methods for analyzing news content, no previous study has focused on the simultaneous examination of news coverage on human trafficking and audiences' interpretations of this coverage. In my research, I have examined both journalists' and commenters' topic choices in coverage and discussion of human trafficking from the online platforms of three British newspapers covering the period 2009–2015. I used latent semantic analysis (LSA) to identify emergent topics in my corpus of newspaper articles and readers' comments, and I then quantitatively investigated topic preferences to identify convergence and divergence on the topics discussed by journalists and their readers. I addressed my research questions in two distinctive studies. The first case study implemented topic modelling techniques and further quantitative analyses on article and comment paragraphs from The Guardian. The second extensive study included article and comment paragraphs from the online platforms of three British newspapers: The Guardian, The Times and the Daily Mail. The findings indicate that the theories of "agenda setting" and of "active audience" are not mutually exclusive, and the scope of explanation of each depends partly on the specific topic or subtopic that is analyzed. Taking into account further theoretical concepts related to agenda setting, four more additional research questions were addressed. Topic convergence and divergence was further identified when taking into account the newspapers' political orientation and the articles' and comments' year of publication.
APA, Harvard, Vancouver, ISO, and other styles
45

Alshareef, Abdulrhman M. "Academic Recommendation System Based on the Similarity Learning of the Citation Network Using Citation Impact." Thesis, Université d'Ottawa / University of Ottawa, 2019. http://hdl.handle.net/10393/39111.

Full text
Abstract:
In today's significant and rapidly increasing amount of scientific publications, exploring recent studies in a given research area and building an effective scientific collaboration has become more challenging than any time before. Scientific production growth has been increasing the difficulties for identifying the most relevant papers to cite or to find an appropriate conference or journal to submit a paper to publish. As a result, authors and publishers rely on different analytical approaches in order to measure the relationship among the citation network. Different parameters have been used such as the impact factor, number of citations, co-citation to assess the impact of the produced research publication. However, using one assessing factor considers only one level of relationship exploration, since it does not reflect the effect of the other factors. In this thesis, we propose an approach to measure the Academic Citation Impact that will help to identify the impact of articles, authors, and venues at their extended nearby citation network. We combine the content similarity with the bibliometric indices to evaluate the citation impact of articles, authors, and venues in their surrounding citation network. Using the article metadata, we calculate the semantic similarity between any two articles in the extended network. Then we use the similarity score and bibliometric indices to evaluate the impact of the articles, authors, and venues among their extended nearby citation network. Furthermore, we propose an academic recommendation model to identify the latent preferences among the citation network of the given article in order to expose the concealed connection between the academic objects (articles, authors, and venues) at the citation network of the given article. To reveal the degree of trust for collaboration between academic objects (articles, authors, and venues), we use the similarity learning to estimate the collaborative confidence score that represents the anticipation of a prospect relationship between the academic objects among a scientific community. We conducted an offline experiment to measure the accuracy of delivering personalized recommendations, based on the user’s selection preferences; real-world datasets were used. Our evaluation results show a potential improvement to the quality of the recommendation when compared to baseline recommendation algorithms that consider co-citation information.
APA, Harvard, Vancouver, ISO, and other styles
46

Macedo, Alessandra Alaniz. "Especificação, instanciação e experimentação de um arcabouço para criação automática de ligações hipertexto entre informações homogêneas." Universidade de São Paulo, 2004. http://www.teses.usp.br/teses/disponiveis/55/55134/tde-05102004-113421/.

Full text
Abstract:
Com a evolução da informática, diferentes meios de comunicação passaram a explorar a Web como um meio de divulgação de suas informações. Diferentes fontes de informações, diferentes estilos de escrita e a curiosidade nata do ser humano despertam o interesse de leitores por conhecer mais de um relato sobre um mesmo tema. Para que a leitura de diferentes relatos com conteúdo similar seja possível, leitores precisam procurar, ler e analisar informações fornecidas por diferentes fontes de informação. Essa atividade, além de exigir grande investimento de tempo, sobrecarrega cognitivamente usuários. Faz parte das pesquisas da área de Hipermídia investigar mecanismos que apóiem usuários no processo de identificação de informações em repositórios homogêneos, sejam eles disponibilizados na Web ou não. No contexto desta tese, repositórios com informações de conteúdo homogêneo são aqueles cujas informações tratam do mesmo assunto. Esta tese tem por objetivo investigar a especificação, a instanciação e a experimentação de um arcabouço para apoiar a tarefa de criação automática de ligações hipertexto entre repositórios homogêneos. O arcabouço proposto, denominado CARe (Criação Automática de Relacionamentos), é representado por um conjunto de classes que realizam a coleta de informações a serem relacionadas e que processam essas informações para a geração de índices. Esses índices são relacionados e utilizados na criação automática de ligações hipertexto entre a informação original. A definição do arcabouço se deu após uma fase de análise de domínio na qual foram identificados requisitos e construídos componentes de software. Nessa fase, vários protótipos também foram construídos de modo iterativo<br>With the evolution of the Internet, distinct communication media have focused on the Web as a channel of information publishing. An immediate consequence is an abundance of sources of information and writing styles in the Web. This effect, combining with the inherent curiosity of human beings, has led Web users to look for more than a single article about a same subject. To gain access to separate on a same subject, readers need to search, read and analyze information provided by different sources of information. Besides consuming a great amount of time, that activity imposes a cognitive overhead to users. Several hypermedia researches have investigated mechanisms for supporting users during the process of identifying information on homogeneous repositories, available or not on the Web. In this thesis, homogeneous repositories are those containing information that describes a same subject. This thesis aims at investigating the specification and the construction of a framework intended to support the task of automatic creation of hypertext links between homogeneous repositories. The framework proposed, called CARe (Automatic Creation of Relationships), is composed of a set of classes, methods and relationships that gather information to be related, and also process that information for generating an index. Those indexes are related and used in the automatic creation of hypertext links among distinct excerpts of original information. The framework was defined based on a phase of domain analysis in which requirements were identified and software components were built. In that same phase several prototypes were developed in an iterative prototyping
APA, Harvard, Vancouver, ISO, and other styles
47

Gorrell, Genevieve. "Generalized Hebbian Algorithm for Dimensionality Reduction in Natural Language Processing." Doctoral thesis, Linköping : Department of Computer and Information Science, Linköpings universitet, 2006. http://www.bibl.liu.se/liupubl/disp/disp2006/tek1045s.pdf.

Full text
APA, Harvard, Vancouver, ISO, and other styles
48

Zougris, Konstantinos. "Sociological Applications of Topic Extraction Techniques: Two Case Studies." Thesis, University of North Texas, 2015. https://digital.library.unt.edu/ark:/67531/metadc804982/.

Full text
Abstract:
Limited research has been conducted with regards to the applicability of topic extraction techniques in Sociology. Addressing the modern methodological opportunities, and responding to the skepticism with regards to the absence of theoretical foundations supporting the use of text analytics, I argue that Latent Semantic Analysis (LSA), complemented by other text analysis techniques and multivariate techniques, can constitute a unique hybrid method that can facilitate the sociological interpretations of web-based textual data. To illustrate the applicability of the hybrid technique, I developed two case studies. My first case study is associated with the Sociology of media. It focuses on the topic extraction and sentiment polarization among partisan texts posted on two major news sites. I find evidence of highly polarized opinions on comments posted on the Huffington Post and the Daily Caller. The highest polarizing topic was associated with a commentator’s reference on Hoodies in the context of the Trayvon Martin’s incident. My findings support contemporary research suggesting that media pundits frequently use tactics of outrage to provoke polarization of public opinion. My second case study contributes to the research domain of the Sociology of knowledge. The hybrid method revealed evidence of topical divides and topical “bridges” in the intellectual landscape of the British and the American sociological journals. My findings confirm the theoretical assertions describing Sociology as a fractured field, and partially support the existence of more globalized topics in the discipline.
APA, Harvard, Vancouver, ISO, and other styles
49

Pinheiro, José Claudio dos Santos. "USO DE TEORIAS NO CAMPO DE SISTEMAS DE INFORMAÇÃO: MAPEAMENTO USANDO TÉCNICAS DE MINERAÇÃO DE TEXTOS." Universidade Metodista de São Paulo, 2009. http://tede.metodista.br/jspui/handle/tede/152.

Full text
Abstract:
Made available in DSpace on 2016-08-02T21:42:57Z (GMT). No. of bitstreams: 1 Jose Claudio dos Santos Pinheiro.pdf: 5349646 bytes, checksum: 057189cedae5b7fc79c3e7cec83d51aa (MD5) Previous issue date: 2009-09-17<br>This work aim to map the use of information system s theories, based on analytic resources that came from information retrieval techniques and data mining and text mining methodologies. The theories addressed by this research were Transactions Costs Economics (TCE), Resource-based view (RBV) and Institutional Theory (IT), which were chosen given their usefulness, while alternatives of approach in processes of allocation of investments and implementation of information systems. The empirical data are based on the content of textual data in abstract and review sections, of articles from ISR, MISQ and JIMS along the period from 2000 to 2008. The results stemming from the text mining technique combined with data mining were compared with the advanced search tool EBSCO and demonstrated greater efficiency in the identification of content. Articles based on three theories accounted for 10% of all articles of the three journals and the most useful publication was the 2001 and 2007.(AU)<br>Esta dissertação visa apresentar o mapeamento do uso das teorias de sistemas de informações, usando técnicas de recuperação de informação e metodologias de mineração de dados e textos. As teorias abordadas foram Economia de Custos de Transações (Transactions Costs Economics TCE), Visão Baseada em Recursos da Firma (Resource-Based View-RBV) e Teoria Institucional (Institutional Theory-IT), sendo escolhidas por serem teorias de grande relevância para estudos de alocação de investimentos e implementação em sistemas de informação, tendo como base de dados o conteúdo textual (em inglês) do resumo e da revisão teórica dos artigos dos periódicos Information System Research (ISR), Management Information Systems Quarterly (MISQ) e Journal of Management Information Systems (JMIS) no período de 2000 a 2008. Os resultados advindos da técnica de mineração textual aliada à mineração de dados foram comparadas com a ferramenta de busca avançada EBSCO e demonstraram uma eficiência maior na identificação de conteúdo. Os artigos fundamentados nas três teorias representaram 10% do total de artigos dos três períodicos e o período mais profícuo de publicação foi o de 2001 e 2007.(AU)
APA, Harvard, Vancouver, ISO, and other styles
50

Ong, James Kwan Yau. "The predictability problem." Phd thesis, kostenfrei, 2007. http://opus.kobv.de/ubp/volltexte/2007/1502/.

Full text
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!