Academic literature on the topic 'Allocation de Dirichlet latente (LDA)'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the lists of relevant articles, books, theses, conference reports, and other scholarly sources on the topic 'Allocation de Dirichlet latente (LDA).'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Journal articles on the topic "Allocation de Dirichlet latente (LDA)"

1

Guo, Yunyan, and Jianzhong Li. "Distributed Latent Dirichlet Allocation on Streams." ACM Transactions on Knowledge Discovery from Data 16, no. 1 (July 3, 2021): 1–20. http://dx.doi.org/10.1145/3451528.

Full text
Abstract:
Latent Dirichlet Allocation (LDA) has been widely used for topic modeling, with applications spanning various areas such as natural language processing and information retrieval. While LDA on small and static datasets has been extensively studied, several real-world challenges are posed in practical scenarios where datasets are often huge and are gathered in a streaming fashion. As the state-of-the-art LDA algorithm on streams, Streaming Variational Bayes (SVB) introduced Bayesian updating to provide a streaming procedure. However, the utility of SVB is limited in applications since it ignored three challenges of processing real-world streams: topic evolution , data turbulence , and real-time inference . In this article, we propose a novel distributed LDA algorithm—referred to as StreamFed-LDA— to deal with challenges on streams. For topic modeling of streaming data, the ability to capture evolving topics is essential for practical online inference. To achieve this goal, StreamFed-LDA is based on a specialized framework that supports lifelong (continual) learning of evolving topics. On the other hand, data turbulence is commonly present in streams due to real-life events. In that case, the design of StreamFed-LDA allows the model to learn new characteristics from the most recent data while maintaining the historical information. On massive streaming data, it is difficult and crucial to provide real-time inference results. To increase the throughput and reduce the latency, StreamFed-LDA introduces additional techniques that substantially reduce both computation and communication costs in distributed systems. Experiments on four real-world datasets show that the proposed framework achieves significantly better performance of online inference compared with the baselines. At the same time, StreamFed-LDA also reduces the latency by orders of magnitudes in real-world datasets.
APA, Harvard, Vancouver, ISO, and other styles
2

Garg, Mohit, and Priya Rangra. "Bibliometric Analysis of Latent Dirichlet Allocation." DESIDOC Journal of Library & Information Technology 42, no. 2 (February 28, 2022): 105–13. http://dx.doi.org/10.14429/djlit.42.2.17307.

Full text
Abstract:
Latent Dirichlet Allocation (LDA) has emerged as an important algorithm in big data analysis that finds the group of topics in the text data. It posits that each text document consists of a group of topics, and each topic is a mixture of words related to it. With the emergence of a plethora of text data, the LDA has become a popular algorithm for topic modeling among researchers from different domains. Therefore, it is essential to understand the trends of LDA researches. Bibliometric techniques are established methods to study the research progress of a topic. In this study, bibliographic data of 18715 publications that have cited the LDA were extracted from the Scopus database. The software R and Vosviewer were used to carry out the analysis. The analysis revealed that research interest in LDA had grown exponentially. The results showed that most authors preferred “Book Series” followed by “Conference Proceedings” as the publication venue. The majority of the institutions and authors were from the USA, followed by China. The co-occurrence analysis of keywords indicated that text mining and machine learning were dominant topics in LDA research with significant interest in social media. This study attempts to provide a comprehensive analysis and intellectual structure of LDA compared to previous studies.
APA, Harvard, Vancouver, ISO, and other styles
3

Kim, Anastasiia, Sanna Sevanto, Eric R. Moore, and Nicholas Lubbers. "Latent Dirichlet Allocation modeling of environmental microbiomes." PLOS Computational Biology 19, no. 6 (June 8, 2023): e1011075. http://dx.doi.org/10.1371/journal.pcbi.1011075.

Full text
Abstract:
Interactions between stressed organisms and their microbiome environments may provide new routes for understanding and controlling biological systems. However, microbiomes are a form of high-dimensional data, with thousands of taxa present in any given sample, which makes untangling the interaction between an organism and its microbial environment a challenge. Here we apply Latent Dirichlet Allocation (LDA), a technique for language modeling, which decomposes the microbial communities into a set of topics (non-mutually-exclusive sub-communities) that compactly represent the distribution of full communities. LDA provides a lens into the microbiome at broad and fine-grained taxonomic levels, which we show on two datasets. In the first dataset, from the literature, we show how LDA topics succinctly recapitulate many results from a previous study on diseased coral species. We then apply LDA to a new dataset of maize soil microbiomes under drought, and find a large number of significant associations between the microbiome topics and plant traits as well as associations between the microbiome and the experimental factors, e.g. watering level. This yields new information on the plant-microbial interactions in maize and shows that LDA technique is useful for studying the coupling between microbiomes and stressed organisms.
APA, Harvard, Vancouver, ISO, and other styles
4

Zhou, Qi, Haipeng Chen, Yitao Zheng, and Zhen Wang. "EvaLDA: Efficient Evasion Attacks Towards Latent Dirichlet Allocation." Proceedings of the AAAI Conference on Artificial Intelligence 35, no. 16 (May 18, 2021): 14602–11. http://dx.doi.org/10.1609/aaai.v35i16.17716.

Full text
Abstract:
As one of the most powerful topic models, Latent Dirichlet Allocation (LDA) has been used in a vast range of tasks, including document understanding, information retrieval and peer-reviewer assignment. Despite its tremendous popularity, the security of LDA has rarely been studied. This poses severe risks to security-critical tasks such as sentiment analysis and peer-reviewer assignment that are based on LDA. In this paper, we are interested in knowing whether LDA models are vulnerable to adversarial perturbations of benign document examples during inference time. We formalize the evasion attack to LDA models as an optimization problem and prove it to be NP-hard. We then propose a novel and efficient algorithm, EvaLDA to solve it. We show the effectiveness of EvaLDA via extensive empirical evaluations. For instance, in the NIPS dataset, EvaLDA can averagely promote the rank of a target topic from 10 to around 7 by only replacing 1% of the words with similar words in a victim document. Our work provides significant insights into the power and limitations of evasion attacks to LDA models.
APA, Harvard, Vancouver, ISO, and other styles
5

Christy, A., Anto Praveena, and Jany Shabu. "A Hybrid Model for Topic Modeling Using Latent Dirichlet Allocation and Feature Selection Method." Journal of Computational and Theoretical Nanoscience 16, no. 8 (August 1, 2019): 3367–71. http://dx.doi.org/10.1166/jctn.2019.8234.

Full text
Abstract:
In this information age, Knowledge discovery and pattern matching plays a significant role. Topic Modeling, an area of Text mining is used detecting hidden patterns in a document collection. Topic Modeling and Document Clustering are two important key terms which are similar in concepts and functionality. In this paper, topic modeling is carried out using Latent Dirichlet Allocation-Brute Force Method (LDA-BF), Latent Dirichlet Allocation-Back Tracking (LDA-BT), Latent Semantic Indexing (LSI) method and Nonnegative Matrix Factorization (NMF) method. A hybrid model is proposed which uses Latent Dirichlet Allocation (LDA) for extracting feature terms and Feature Selection (FS) method for feature reduction. The efficiency of document clustering depends upon the selection of good features. Topic modeling is performed by enriching the good features obtained through feature selection method. The proposed hybrid model produces improved accuracy than K-Means clustering method.
APA, Harvard, Vancouver, ISO, and other styles
6

Fernanda, Jerhi Wahyu. "PEMODELAN PERSEPSI PEMBELAJARAN ONLINE MENGGUNAKAN LATENT DIRICHLET ALLOCATION." Jurnal Statistika Universitas Muhammadiyah Semarang 9, no. 2 (December 31, 2021): 79. http://dx.doi.org/10.26714/jsunimus.9.2.2021.79-85.

Full text
Abstract:
Latent Dirichlet Allocation (LDA) merupakan metode untuk pemodelan topik adalah yang didasarkan kepada konsep probabilitas untuk mencari kemiripan suatu dokumen dan mengelompokkan dokumen-dokumen menjadi beberapa topik atau kelompok. Metode ini masuk dalam unsupervised learning karena tidak ada label atau target pada data yang dianalisis. Penelitian ini bertujuan untuk mengelompokkan persepsi tentang pembelajaran online ke dalam beberapa topik menggunakan metode LDA. Data penelitian ini adalah data primer yang dikumpulkan melalui formulir online. Hasil analisis menunjukkan bahwa pemodelan LDA menggunakan 6 topik memiliki coherence score paling besar. Hasil visualisasi data text menggunakan wordcloud didapatkan kata tidak memiliki frekuensi kemunculan terbesar. Penentuan jumlah topik yang optimal berdasarkan coherence score, didapatkan pemodelan LDA dengan 6 topik adalah yang paling optimal. secara garis besar terdapat beberapa kata yang saling beririsan dengan topik yang lain. Hasil pemodelan memberikan gambaran bahwa persepsi/pandangan mahasiswa terdapat pembelajaran online terkait pemahaman materi yang diberikan dosen, sinyal atau jaringan internet, kuota, dan tugas. Pada kata-kata terkait pemahaman materi, mahasiswa memberikan pandangan bahwa mereka tidak dapat memahami dengan baik materi yang diberikan oleh dosen.
APA, Harvard, Vancouver, ISO, and other styles
7

Yuan, Ling, JiaLi Bin, YinZhen Wei, Fei Huang, XiaoFei Hu, and Min Tan. "Big Data Aspect-Based Opinion Mining Using the SLDA and HME-LDA Models." Wireless Communications and Mobile Computing 2020 (November 18, 2020): 1–19. http://dx.doi.org/10.1155/2020/8869385.

Full text
Abstract:
In order to make better use of massive network comment data for decision-making support of customers and merchants in the big data era, this paper proposes two unsupervised optimized LDA (Latent Dirichlet Allocation) models, namely, SLDA (SentiWordNet WordNet-Latent Dirichlet Allocation) and HME-LDA (Hierarchical Clustering MaxEnt-Latent Dirichlet Allocation), for aspect-based opinion mining. One scheme of each of two optimized models, which both use seed words as topic words and construct the inverted index, is designed to enhance the readability of experiment results. Meanwhile, based on the LDA topic model, we introduce new indicator variables to refine the classification of topics and try to classify the opinion target words and the sentiment opinion words by two different schemes. For better classification effect, the similarity between words and seed words is calculated in two ways to offset the fixed parameters in the standard LDA. In addition, based on the SemEval2016ABSA data set and the Yelp data set, we design comparative experiments with training sets of different sizes and different seed words, which prove that the SLDA and the HME-LDA have better performance on the accuracy, recall value, and harmonic value with unannotated training sets.
APA, Harvard, Vancouver, ISO, and other styles
8

Ogundare, A. O., A. U. Saleh, O. A. James, E. E. Ajayi, and S. Gostoji. "Performance evaluation of Latent Dirichlet Allocation on legal documents." Applied and Computational Engineering 52, no. 1 (March 27, 2024): 96–101. http://dx.doi.org/10.54254/2755-2721/52/20241322.

Full text
Abstract:
Latent Dirichlet Allocation (LDA) is an algorithm with the capability of processing large amount of text data. In this study, the LDA is used to produce topic modelling of topic clusters from corpus of legal texts generated under 4 topics within Nigeria context Employment Contract, Election Petition, Deeds, and Articles of Incorporation. Each topic has a substantial number of articles and the LDA method proves effective in extracting topics and generating index words that are in each topic cluster. At the end of experimentation, results are compared with manually pre-annotated dataset for validation purpose and the results show high accuracy. The LDA output shows optimal performance in the word indexing processing for Election Petition as all the documents annotated under the topic were accurately classified.
APA, Harvard, Vancouver, ISO, and other styles
9

Syed, Shaheen, and Marco Spruit. "Exploring Symmetrical and Asymmetrical Dirichlet Priors for Latent Dirichlet Allocation." International Journal of Semantic Computing 12, no. 03 (September 2018): 399–423. http://dx.doi.org/10.1142/s1793351x18400184.

Full text
Abstract:
Latent Dirichlet Allocation (LDA) has gained much attention from researchers and is increasingly being applied to uncover underlying semantic structures from a variety of corpora. However, nearly all researchers use symmetrical Dirichlet priors, often unaware of the underlying practical implications that they bear. This research is the first to explore symmetrical and asymmetrical Dirichlet priors on topic coherence and human topic ranking when uncovering latent semantic structures from scientific research articles. More specifically, we examine the practical effects of several classes of Dirichlet priors on 2000 LDA models created from abstract and full-text research articles. Our results show that symmetrical or asymmetrical priors on the document–topic distribution or the topic–word distribution for full-text data have little effect on topic coherence scores and human topic ranking. In contrast, asymmetrical priors on the document–topic distribution for abstract data show a significant increase in topic coherence scores and improved human topic ranking compared to a symmetrical prior. Symmetrical or asymmetrical priors on the topic–word distribution show no real benefits for both abstract and full-text data.
APA, Harvard, Vancouver, ISO, and other styles
10

Ohmura, Masahiro, Koh Kakusho, and Takeshi Okadome. "Tweet Sentiment Analysis with Latent Dirichlet Allocation." International Journal of Information Retrieval Research 4, no. 3 (July 2014): 66–79. http://dx.doi.org/10.4018/ijirr.2014070105.

Full text
Abstract:
The method proposed here analyzes the social sentiments from collected tweets that have at least 1 of 800 sentimental or emotional adjectives. By dealing with tweets posted in a half a day as an input document, the method uses Latent Dirichlet Allocation (LDA) to extract social sentiments, some of which coincide with our daily sentiments. The extracted sentiments, however, indicate lowered sensitivity to changes in time, which suggests that they are not suitable for predicting daily social or economic events. Using LDA for the representative 72 adjectives to which each of the 800 adjectives maps while preserving word frequencies permits us to obtain social sentiments that show improved sensitivity to changes in time. A regression model with autocorrelated errors in which the inputs are social sentiments obtained by analyzing the contracted adjectives predicts Dow Jones Industrial Average (DJIA) more precisely than autoregressive moving-average models.
APA, Harvard, Vancouver, ISO, and other styles
More sources

Dissertations / Theses on the topic "Allocation de Dirichlet latente (LDA)"

1

Ponweiser, Martin. "Latent Dirichlet Allocation in R." WU Vienna University of Economics and Business, 2012. http://epub.wu.ac.at/3558/1/main.pdf.

Full text
Abstract:
Topic models are a new research field within the computer sciences information retrieval and text mining. They are generative probabilistic models of text corpora inferred by machine learning and they can be used for retrieval and text mining tasks. The most prominent topic model is latent Dirichlet allocation (LDA), which was introduced in 2003 by Blei et al. and has since then sparked off the development of other topic models for domain-specific purposes. This thesis focuses on LDA's practical application. Its main goal is the replication of the data analyses from the 2004 LDA paper ``Finding scientific topics'' by Thomas Griffiths and Mark Steyvers within the framework of the R statistical programming language and the R~package topicmodels by Bettina Grün and Kurt Hornik. The complete process, including extraction of a text corpus from the PNAS journal's website, data preprocessing, transformation into a document-term matrix, model selection, model estimation, as well as presentation of the results, is fully documented and commented. The outcome closely matches the analyses of the original paper, therefore the research by Griffiths/Steyvers can be reproduced. Furthermore, this thesis proves the suitability of the R environment for text mining with LDA. (author's abstract)
Series: Theses / Institute for Statistics and Mathematics
APA, Harvard, Vancouver, ISO, and other styles
2

Lindgren, Jennifer. "Evaluating Hierarchical LDA Topic Models for Article Categorization." Thesis, Linköpings universitet, Institutionen för datavetenskap, 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-167080.

Full text
Abstract:
With the vast amount of information available on the Internet today, helping users find relevant content has become a prioritized task in many software products that recommend news articles. One such product is Opera for Android, which has a news feed containing articles the user may be interested in. In order to easily determine what articles to recommend, they can be categorized by the topics they contain. One approach of categorizing articles is using Machine Learning and Natural Language Processing (NLP). A commonly used model is Latent Dirichlet Allocation (LDA), which finds latent topics within large datasets of for example text articles. An extension of LDA is hierarchical Latent Dirichlet Allocation (hLDA) which is an hierarchical variant of LDA. In hLDA, the latent topics found among a set of articles are structured hierarchically in a tree. Each node represents a topic, and the levels represent different levels of abstraction in the topics. A further extension of hLDA is constrained hLDA, where a set of predefined, constrained topics are added to the tree. The constrained topics are extracted from the dataset by grouping highly correlated words. The idea of constrained hLDA is to improve the topic structure derived by a hLDA model by making the process semi-supervised. The aim of this thesis is to create a hLDA and a constrained hLDA model from a dataset of articles provided by Opera. The models should then be evaluated using the novel metric word frequency similarity, which is a measure of the similarity between the words representing the parent and child topics in a hierarchical topic model. The results show that word frequency similarity can be used to evaluate whether the topics in a parent-child topic pair are too similar, so that the child does not specify a subtopic of the parent. It can also be used to evaluate if the topics are too dissimilar, so that the topics seem unrelated and perhaps should not be connected in the hierarchy. The results also show that the two topic models created had comparable word frequency similarity scores. None of the models seemed to significantly outperform the other with regard to the metric.
APA, Harvard, Vancouver, ISO, and other styles
3

Jaradat, Shatha. "OLLDA: Dynamic and Scalable Topic Modelling for Twitter : AN ONLINE SUPERVISED LATENT DIRICHLET ALLOCATION ALGORITHM." Thesis, KTH, Skolan för informations- och kommunikationsteknik (ICT), 2015. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-177535.

Full text
Abstract:
Providing high quality of topics inference in today's large and dynamic corpora, such as Twitter, is a challenging task. This is especially challenging taking into account that the content in this environment contains short texts and many abbreviations. This project proposes an improvement of a popular online topics modelling algorithm for Latent Dirichlet Allocation (LDA), by incorporating supervision to make it suitable for Twitter context. This improvement is motivated by the need for a single algorithm that achieves both objectives: analyzing huge amounts of documents, including new documents arriving in a stream, and, at the same time, achieving high quality of topics’ detection in special case environments, such as Twitter. The proposed algorithm is a combination of an online algorithm for LDA and a supervised variant of LDA - labeled LDA. The performance and quality of the proposed algorithm is compared with these two algorithms. The results demonstrate that the proposed algorithm has shown better performance and quality when compared to the supervised variant of LDA, and it achieved better results in terms of quality in comparison to the online algorithm. These improvements make our algorithm an attractive option when applied to dynamic environments, like Twitter. An environment for analyzing and labelling data is designed to prepare the dataset before executing the experiments. Possible application areas for the proposed algorithm are tweets recommendation and trends detection.
Tillhandahålla högkvalitativa ämnen slutsats i dagens stora och dynamiska korpusar, såsom Twitter, är en utmanande uppgift. Detta är särskilt utmanande med tanke på att innehållet i den här miljön innehåller korta texter och många förkortningar. Projektet föreslår en förbättring med en populär online ämnen modellering algoritm för Latent Dirichlet Tilldelning (LDA), genom att införliva tillsyn för att göra den lämplig för Twitter sammanhang. Denna förbättring motiveras av behovet av en enda algoritm som uppnår båda målen: analysera stora mängder av dokument, inklusive nya dokument som anländer i en bäck, och samtidigt uppnå hög kvalitet på ämnen "upptäckt i speciella fall miljöer, till exempel som Twitter. Den föreslagna algoritmen är en kombination av en online-algoritm för LDA och en övervakad variant av LDA - Labeled LDA. Prestanda och kvalitet av den föreslagna algoritmen jämförs med dessa två algoritmer. Resultaten visar att den föreslagna algoritmen har visat bättre prestanda och kvalitet i jämförelse med den övervakade varianten av LDA, och det uppnådde bättre resultat i fråga om kvalitet i jämförelse med den online-algoritmen. Dessa förbättringar gör vår algoritm till ett attraktivt alternativ när de tillämpas på dynamiska miljöer, som Twitter. En miljö för att analysera och märkning uppgifter är utformad för att förbereda dataset innan du utför experimenten. Möjliga användningsområden för den föreslagna algoritmen är tweets rekommendation och trender upptäckt.
APA, Harvard, Vancouver, ISO, and other styles
4

Mungre, Surbhi. "LDA-based dimensionality reduction and domain adaptation with application to DNA sequence classification." Thesis, Kansas State University, 2011. http://hdl.handle.net/2097/8846.

Full text
Abstract:
Master of Science
Department of Computing and Information Sciences
Doina Caragea
Several computational biology and bioinformatics problems involve DNA sequence classification using supervised machine learning algorithms. The performance of these algorithms is largely dependent on the availability of labeled data and the approach used to represent DNA sequences as {\it feature vectors}. For many organisms, the labeled DNA data is scarce, while the unlabeled data is easily available. However, for a small number of well-studied model organisms, large amounts of labeled data are available. This calls for {\it domain adaptation} approaches, which can transfer knowledge from a {\it source} domain, for which labeled data is available, to a {\it target} domain, for which large amounts of unlabeled data are available. Intuitively, one approach to domain adaptation can be obtained by extracting and representing the features that the source domain and the target domain sequences share. \emph{Latent Dirichlet Allocation} (LDA) is an unsupervised dimensionality reduction technique that has been successfully used to generate features for sequence data such as text. In this work, we explore the use of LDA for generating predictive DNA sequence features, that can be used in both supervised and domain adaptation frameworks. More precisely, we propose two dimensionality reduction approaches, LDA Words (LDAW) and LDA Distribution (LDAD) for DNA sequences. LDA is a probabilistic model, which is generative in nature, and is used to model collections of discrete data such as document collections. For our problem, a sequence is considered to be a ``document" and k-mers obtained from a sequence are ``document words". We use LDA to model our sequence collection. Given the LDA model, each document can be represented as a distribution over topics (where a topic can be seen as a distribution over k-mers). In the LDAW method, we use the top k-mers in each topic as our features (i.e., k-mers with the highest probability); while in the LDAD method, we use the topic distribution to represent a document as a feature vector. We study LDA-based dimensionality reduction approaches for both supervised DNA sequence classification, as well as domain adaptation approaches. We apply the proposed approaches on the splice site predication problem, which is an important DNA sequence classification problem in the context of genome annotation. In the supervised learning framework, we study the effectiveness of LDAW and LDAD methods by comparing them with a traditional dimensionality reduction technique based on the information gain criterion. In the domain adaptation framework, we study the effect of increasing the evolutionary distances between the source and target organisms, and the effect of using different weights when combining labeled data from the source domain and with labeled data from the target domain. Experimental results show that LDA-based features can be successfully used to perform dimensionality reduction and domain adaptation for DNA sequence classification problems.
APA, Harvard, Vancouver, ISO, and other styles
5

Harrysson, Mattias. "Neural probabilistic topic modeling of short and messy text." Thesis, KTH, Skolan för datavetenskap och kommunikation (CSC), 2016. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-189532.

Full text
Abstract:
Exploring massive amount of user generated data with topics posits a new way to find useful information. The topics are assumed to be “hidden” and must be “uncovered” by statistical methods such as topic modeling. However, the user generated data is typically short and messy e.g. informal chat conversations, heavy use of slang words and “noise” which could be URL’s or other forms of pseudo-text. This type of data is difficult to process for most natural language processing methods, including topic modeling. This thesis attempts to find the approach that objectively give the better topics from short and messy text in a comparative study. The compared approaches are latent Dirichlet allocation (LDA), Re-organized LDA (RO-LDA), Gaussian Mixture Model (GMM) with distributed representation of words, and a new approach based on previous work named Neural Probabilistic Topic Modeling (NPTM). It could only be concluded that NPTM have a tendency to achieve better topics on short and messy text than LDA and RO-LDA. GMM on the other hand could not produce any meaningful results at all. The results are less conclusive since NPTM suffers from long running times which prevented enough samples to be obtained for a statistical test.
Att utforska enorma mängder användargenererad data med ämnen postulerar ett nytt sätt att hitta användbar information. Ämnena antas vara “gömda” och måste “avtäckas” med statistiska metoder såsom ämnesmodellering. Dock är användargenererad data generellt sätt kort och stökig t.ex. informella chattkonversationer, mycket slangord och “brus” som kan vara URL:er eller andra former av pseudo-text. Denna typ av data är svår att bearbeta för de flesta algoritmer i naturligt språk, inklusive ämnesmodellering. Det här arbetet har försökt hitta den metod som objektivt ger dem bättre ämnena ur kort och stökig text i en jämförande studie. De metoder som jämfördes var latent Dirichlet allocation (LDA), Re-organized LDA (RO-LDA), Gaussian Mixture Model (GMM) with distributed representation of words samt en egen metod med namnet Neural Probabilistic Topic Modeling (NPTM) baserat på tidigare arbeten. Den slutsats som kan dras är att NPTM har en tendens att ge bättre ämnen på kort och stökig text jämfört med LDA och RO-LDA. GMM lyckades inte ge några meningsfulla resultat alls. Resultaten är mindre bevisande eftersom NPTM har problem med långa körtider vilket innebär att tillräckligt många stickprov inte kunde erhållas för ett statistiskt test.
APA, Harvard, Vancouver, ISO, and other styles
6

Chen, Yuxin. "Apprentissage interactif de mots et d'objets pour un robot humanoïde." Thesis, Université Paris-Saclay (ComUE), 2017. http://www.theses.fr/2017SACLY003/document.

Full text
Abstract:
Les applications futures de la robotique, en particulier pour des robots de service à la personne, exigeront des capacités d’adaptation continue à l'environnement, et notamment la capacité à reconnaître des nouveaux objets et apprendre des nouveaux mots via l'interaction avec les humains. Bien qu'ayant fait d'énormes progrès en utilisant l'apprentissage automatique, les méthodes actuelles de vision par ordinateur pour la détection et la représentation des objets reposent fortement sur de très bonnes bases de données d’entrainement et des supervisions d'apprentissage idéales. En revanche, les enfants de deux ans ont une capacité impressionnante à apprendre à reconnaître des nouveaux objets et en même temps d'apprendre les noms des objets lors de l'interaction avec les adultes et sans supervision précise. Par conséquent, suivant l'approche de le robotique développementale, nous développons dans la thèse des approches d'apprentissage pour les objets, en associant leurs noms et leurs caractéristiques correspondantes, inspirées par les capacités des enfants, en particulier l'interaction ambiguë avec l’homme en s’inspirant de l'interaction qui a lieu entre les enfants et les parents.L'idée générale est d’utiliser l'apprentissage cross-situationnel (cherchant les points communs entre différentes présentations d’un objet ou d’une caractéristique) et la découverte de concepts multi-modaux basée sur deux approches de découverte de thèmes latents: la Factorisation en Natrices Non-Négatives (NMF) et l'Allocation de Dirichlet latente (LDA). Sur la base de descripteurs de vision et des entrées audio / vocale, les approches proposées vont découvrir les régularités sous-jacentes dans le flux de données brutes afin de parvenir à produire des ensembles de mots et leur signification visuelle associée (p.ex le nom d’un objet et sa forme, ou un adjectif de couleur et sa correspondance dans les images). Nous avons développé une approche complète basée sur ces algorithmes et comparé leur comportements face à deux sources d'incertitudes: ambiguïtés de références, dans des situations où plusieurs mots sont donnés qui décrivent des caractéristiques d'objets multiples; et les ambiguïtés linguistiques, dans des situations où les mots-clés que nous avons l'intention d'apprendre sont intégrés dans des phrases complètes. Cette thèse souligne les solutions algorithmiques requises pour pouvoir effectuer un apprentissage efficace de ces associations de mot-référent à partir de données acquises dans une configuration d'acquisition simplifiée mais réaliste qui a permis d'effectuer des simulations étendues et des expériences préliminaires dans des vraies interactions homme-robot. Nous avons également apporté des solutions pour l'estimation automatique du nombre de thèmes pour les NMF et LDA.Nous avons finalement proposé deux stratégies d'apprentissage actives: la Sélection par l'Erreur de Reconstruction Maximale (MRES) et l'Exploration Basée sur la Confiance (CBE), afin d'améliorer la qualité et la vitesse de l'apprentissage incrémental en laissant les algorithmes choisir les échantillons d'apprentissage suivants. Nous avons comparé les comportements produits par ces algorithmes et montré leurs points communs et leurs différences avec ceux des humains dans des situations d'apprentissage similaires
Future applications of robotics, especially personal service robots, will require continuous adaptability to the environment, and particularly the ability to recognize new objects and learn new words through interaction with humans. Though having made tremendous progress by using machine learning, current computational models for object detection and representation still rely heavily on good training data and ideal learning supervision. In contrast, two year old children have an impressive ability to learn to recognize new objects and at the same time to learn the object names during interaction with adults and without precise supervision. Therefore, following the developmental robotics approach, we develop in the thesis learning approaches for objects, associating their names and corresponding features, inspired by the infants' capabilities, in particular, the ambiguous interaction with humans, inspired by the interaction that occurs between children and parents.The general idea is to use cross-situational learning (finding the common points between different presentations of an object or a feature) and to implement multi-modal concept discovery based on two latent topic discovery approaches : Non Negative Matrix Factorization (NMF) and Latent Dirichlet Association (LDA). Based on vision descriptors and sound/voice inputs, the proposed approaches will find the underlying regularities in the raw dataflow to produce sets of words and their associated visual meanings (eg. the name of an object and its shape, or a color adjective and its correspondence in images). We developed a complete approach based on these algorithms and compared their behavior in front of two sources of uncertainties: referential ambiguities, in situations where multiple words are given that describe multiple objects features; and linguistic ambiguities, in situations where keywords we intend to learn are merged in complete sentences. This thesis highlights the algorithmic solutions required to be able to perform efficient learning of these word-referent associations from data acquired in a simplified but realistic acquisition setup that made it possible to perform extensive simulations and preliminary experiments in real human-robot interactions. We also gave solutions for the automatic estimation of the number of topics for both NMF and LDA.We finally proposed two active learning strategies, Maximum Reconstruction Error Based Selection (MRES) and Confidence Based Exploration (CBE), to improve the quality and speed of incremental learning by letting the algorithms choose the next learning samples. We compared the behaviors produced by these algorithms and show their common points and differences with those of humans in similar learning situations
APA, Harvard, Vancouver, ISO, and other styles
7

Johansson, Richard, and Heino Otto Engström. "Topic propagation over time in internet security conferences : Topic modeling as a tool to investigate trends for future research." Thesis, Linköpings universitet, Institutionen för datavetenskap, 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-177748.

Full text
Abstract:
When conducting research, it is valuable to find high-ranked papers closely related to the specific research area, without spending too much time reading insignificant papers. To make this process more effective an automated process to extract topics from documents would be useful, and this is possible using topic modeling. Topic modeling can also be used to provide topic trends, where a topic is first mentioned, and who the original author was. In this paper, over 5000 articles are scraped from four different top-ranked internet security conferences, using a web scraper built in Python. From the articles, fourteen topics are extracted, using the topic modeling library Gensim and LDA Mallet, and the topics are visualized in graphs to find trends about which topics are emerging and fading away over twenty years. The result found in this research is that topic modeling is a powerful tool to extract topics, and when put into a time perspective, it is possible to identify topic trends, which can be explained when put into a bigger context.
APA, Harvard, Vancouver, ISO, and other styles
8

Chen, Yuxin. "Apprentissage interactif de mots et d'objets pour un robot humanoïde." Electronic Thesis or Diss., Université Paris-Saclay (ComUE), 2017. http://www.theses.fr/2017SACLY003.

Full text
Abstract:
Les applications futures de la robotique, en particulier pour des robots de service à la personne, exigeront des capacités d’adaptation continue à l'environnement, et notamment la capacité à reconnaître des nouveaux objets et apprendre des nouveaux mots via l'interaction avec les humains. Bien qu'ayant fait d'énormes progrès en utilisant l'apprentissage automatique, les méthodes actuelles de vision par ordinateur pour la détection et la représentation des objets reposent fortement sur de très bonnes bases de données d’entrainement et des supervisions d'apprentissage idéales. En revanche, les enfants de deux ans ont une capacité impressionnante à apprendre à reconnaître des nouveaux objets et en même temps d'apprendre les noms des objets lors de l'interaction avec les adultes et sans supervision précise. Par conséquent, suivant l'approche de le robotique développementale, nous développons dans la thèse des approches d'apprentissage pour les objets, en associant leurs noms et leurs caractéristiques correspondantes, inspirées par les capacités des enfants, en particulier l'interaction ambiguë avec l’homme en s’inspirant de l'interaction qui a lieu entre les enfants et les parents.L'idée générale est d’utiliser l'apprentissage cross-situationnel (cherchant les points communs entre différentes présentations d’un objet ou d’une caractéristique) et la découverte de concepts multi-modaux basée sur deux approches de découverte de thèmes latents: la Factorisation en Natrices Non-Négatives (NMF) et l'Allocation de Dirichlet latente (LDA). Sur la base de descripteurs de vision et des entrées audio / vocale, les approches proposées vont découvrir les régularités sous-jacentes dans le flux de données brutes afin de parvenir à produire des ensembles de mots et leur signification visuelle associée (p.ex le nom d’un objet et sa forme, ou un adjectif de couleur et sa correspondance dans les images). Nous avons développé une approche complète basée sur ces algorithmes et comparé leur comportements face à deux sources d'incertitudes: ambiguïtés de références, dans des situations où plusieurs mots sont donnés qui décrivent des caractéristiques d'objets multiples; et les ambiguïtés linguistiques, dans des situations où les mots-clés que nous avons l'intention d'apprendre sont intégrés dans des phrases complètes. Cette thèse souligne les solutions algorithmiques requises pour pouvoir effectuer un apprentissage efficace de ces associations de mot-référent à partir de données acquises dans une configuration d'acquisition simplifiée mais réaliste qui a permis d'effectuer des simulations étendues et des expériences préliminaires dans des vraies interactions homme-robot. Nous avons également apporté des solutions pour l'estimation automatique du nombre de thèmes pour les NMF et LDA.Nous avons finalement proposé deux stratégies d'apprentissage actives: la Sélection par l'Erreur de Reconstruction Maximale (MRES) et l'Exploration Basée sur la Confiance (CBE), afin d'améliorer la qualité et la vitesse de l'apprentissage incrémental en laissant les algorithmes choisir les échantillons d'apprentissage suivants. Nous avons comparé les comportements produits par ces algorithmes et montré leurs points communs et leurs différences avec ceux des humains dans des situations d'apprentissage similaires
Future applications of robotics, especially personal service robots, will require continuous adaptability to the environment, and particularly the ability to recognize new objects and learn new words through interaction with humans. Though having made tremendous progress by using machine learning, current computational models for object detection and representation still rely heavily on good training data and ideal learning supervision. In contrast, two year old children have an impressive ability to learn to recognize new objects and at the same time to learn the object names during interaction with adults and without precise supervision. Therefore, following the developmental robotics approach, we develop in the thesis learning approaches for objects, associating their names and corresponding features, inspired by the infants' capabilities, in particular, the ambiguous interaction with humans, inspired by the interaction that occurs between children and parents.The general idea is to use cross-situational learning (finding the common points between different presentations of an object or a feature) and to implement multi-modal concept discovery based on two latent topic discovery approaches : Non Negative Matrix Factorization (NMF) and Latent Dirichlet Association (LDA). Based on vision descriptors and sound/voice inputs, the proposed approaches will find the underlying regularities in the raw dataflow to produce sets of words and their associated visual meanings (eg. the name of an object and its shape, or a color adjective and its correspondence in images). We developed a complete approach based on these algorithms and compared their behavior in front of two sources of uncertainties: referential ambiguities, in situations where multiple words are given that describe multiple objects features; and linguistic ambiguities, in situations where keywords we intend to learn are merged in complete sentences. This thesis highlights the algorithmic solutions required to be able to perform efficient learning of these word-referent associations from data acquired in a simplified but realistic acquisition setup that made it possible to perform extensive simulations and preliminary experiments in real human-robot interactions. We also gave solutions for the automatic estimation of the number of topics for both NMF and LDA.We finally proposed two active learning strategies, Maximum Reconstruction Error Based Selection (MRES) and Confidence Based Exploration (CBE), to improve the quality and speed of incremental learning by letting the algorithms choose the next learning samples. We compared the behaviors produced by these algorithms and show their common points and differences with those of humans in similar learning situations
APA, Harvard, Vancouver, ISO, and other styles
9

Ficapal, Vila Joan. "Anemone: a Visual Semantic Graph." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-252810.

Full text
Abstract:
Semantic graphs have been used for optimizing various natural language processing tasks as well as augmenting search and information retrieval tasks. In most cases these semantic graphs have been constructed through supervised machine learning methodologies that depend on manually curated ontologies such as Wikipedia or similar. In this thesis, which consists of two parts, we explore in the first part the possibility to automatically populate a semantic graph from an ad hoc data set of 50 000 newspaper articles in a completely unsupervised manner. The utility of the visual representation of the resulting graph is tested on 14 human subjects performing basic information retrieval tasks on a subset of the articles. Our study shows that, for entity finding and document similarity our feature engineering is viable and the visual map produced by our artifact is visually useful. In the second part, we explore the possibility to identify entity relationships in an unsupervised fashion by employing abstractive deep learning methods for sentence reformulation. The reformulated sentence structures are qualitatively assessed with respect to grammatical correctness and meaningfulness as perceived by 14 test subjects. We negatively evaluate the outcomes of this second part as they have not been good enough to acquire any definitive conclusion but have instead opened new doors to explore.
Semantiska grafer har använts för att optimera olika processer för naturlig språkbehandling samt för att förbättra sökoch informationsinhämtningsuppgifter. I de flesta fall har sådana semantiska grafer konstruerats genom övervakade maskininlärningsmetoder som förutsätter manuellt kurerade ontologier såsom Wikipedia eller liknande. I denna uppsats, som består av två delar, undersöker vi i första delen möjligheten att automatiskt generera en semantisk graf från ett ad hoc dataset bestående av 50 000 tidningsartiklar på ett helt oövervakat sätt. Användbarheten hos den visuella representationen av den resulterande grafen testas på 14 försökspersoner som utför grundläggande informationshämtningsuppgifter på en delmängd av artiklarna. Vår studie visar att vår funktionalitet är lönsam för att hitta och dokumentera likhet med varandra, och den visuella kartan som produceras av vår artefakt är visuellt användbar. I den andra delen utforskar vi möjligheten att identifiera entitetsrelationer på ett oövervakat sätt genom att använda abstraktiva djupa inlärningsmetoder för meningsomformulering. De omformulerade meningarna utvärderas kvalitativt med avseende på grammatisk korrekthet och meningsfullhet såsom detta uppfattas av 14 testpersoner. Vi utvärderar negativt resultaten av denna andra del, eftersom de inte har varit tillräckligt bra för att få någon definitiv slutsats, men har istället öppnat nya dörrar för att utforska.
APA, Harvard, Vancouver, ISO, and other styles
10

Schneider, Bruno. "Visualização em multirresolução do fluxo de tópicos em coleções de texto." reponame:Repositório Institucional do FGV, 2014. http://hdl.handle.net/10438/11745.

Full text
Abstract:
Submitted by Bruno Schneider (bruno.sch@gmail.com) on 2014-05-08T17:46:04Z No. of bitstreams: 1 dissertacao_bruno_schneider.pdf.pdf: 8019497 bytes, checksum: 70ff1fddb844b630666397e95c188672 (MD5)
Approved for entry into archive by Janete de Oliveira Feitosa (janete.feitosa@fgv.br) on 2014-05-13T12:56:21Z (GMT) No. of bitstreams: 1 dissertacao_bruno_schneider.pdf.pdf: 8019497 bytes, checksum: 70ff1fddb844b630666397e95c188672 (MD5)
Approved for entry into archive by Marcia Bacha (marcia.bacha@fgv.br) on 2014-05-14T19:44:51Z (GMT) No. of bitstreams: 1 dissertacao_bruno_schneider.pdf.pdf: 8019497 bytes, checksum: 70ff1fddb844b630666397e95c188672 (MD5)
Made available in DSpace on 2014-05-14T19:45:33Z (GMT). No. of bitstreams: 1 dissertacao_bruno_schneider.pdf.pdf: 8019497 bytes, checksum: 70ff1fddb844b630666397e95c188672 (MD5) Previous issue date: 2014-03-21
The combined use of algorithms for topic discovery in document collections with topic flow visualization techniques allows the exploration of thematic patterns in long corpus. In this task, those patterns could be revealed through compact visual representations. This research has investigated the requirements for viewing data about the thematic composition of documents obtained through topic modeling - where datasets are sparse and has multi-attributes - at different levels of detail through the development of an own technique and the use of an open source library for data visualization, comparatively. About the studied problem of topic flow visualization, we observed the presence of conflicting requirements for data display in different resolutions, which led to detailed investigation on ways of manipulating and displaying this data. In this study, the hypothesis put forward was that the integrated use of more than one visualization technique according to the resolution of data expands the possibilities for exploitation of the object under study in relation to what would be obtained using only one method. The exhibition of the limits on the use of these techniques according to the resolution of data exploration is the main contribution of this work, in order to provide subsidies for the development of new applications.
O uso combinado de algoritmos para a descoberta de tópicos em coleções de documentos com técnicas orientadas à visualização da evolução daqueles tópicos no tempo permite a exploração de padrões temáticos em corpora extensos a partir de representações visuais compactas. A pesquisa em apresentação investigou os requisitos de visualização do dado sobre composição temática de documentos obtido através da modelagem de tópicos – o qual é esparso e possui multiatributos – em diferentes níveis de detalhe, através do desenvolvimento de uma técnica de visualização própria e pelo uso de uma biblioteca de código aberto para visualização de dados, de forma comparativa. Sobre o problema estudado de visualização do fluxo de tópicos, observou-se a presença de requisitos de visualização conflitantes para diferentes resoluções dos dados, o que levou à investigação detalhada das formas de manipulação e exibição daqueles. Dessa investigação, a hipótese defendida foi a de que o uso integrado de mais de uma técnica de visualização de acordo com a resolução do dado amplia as possibilidades de exploração do objeto em estudo em relação ao que seria obtido através de apenas uma técnica. A exibição dos limites no uso dessas técnicas de acordo com a resolução de exploração do dado é a principal contribuição desse trabalho, no intuito de dar subsídios ao desenvolvimento de novas aplicações.
APA, Harvard, Vancouver, ISO, and other styles
More sources

Books on the topic "Allocation de Dirichlet latente (LDA)"

1

Jockers, Matthew L. Theme. University of Illinois Press, 2017. http://dx.doi.org/10.5406/illinois/9780252037528.003.0008.

Full text
Abstract:
This chapter demonstrates how big data and computation can be used to identify and track recurrent themes as the products of external influence. It first considers the limitations of the Google Ngram Viewer as a tool for tracing thematic trends over time before turning to Douglas Biber's Corpus Linguistics: Investigating Language Structure and Use, a primer on various factors complicating word-focused text analysis and the subsequent conclusions one might draw regarding word meanings. It then discusses the results of the author's application of latent Dirichlet allocation (LDA) to a corpus of 3,346 nineteenth-century novels using the open-source MALLET (MAchine Learning for LanguagE Toolkit), a software package for topic modeling. It also explains the different types of analyses performed by the author, including text segmentation, word chunking, and author nationality, gender and time-themes relationship analyses. The thematic data from the LDA model reveal the degree to which author nationality, author gender, and date of publication could be predicted by the thematic signals expressed in the nineteenth-century novels corpus.
APA, Harvard, Vancouver, ISO, and other styles

Book chapters on the topic "Allocation de Dirichlet latente (LDA)"

1

Bibyan, Ritu, Sameer Anand, and Ajay Jaiswal. "Latent Dirichlet Allocation (LDA) Based on Automated Bug Severity Prediction Model." In Proceedings of Data Analytics and Management, 363–77. Singapore: Springer Singapore, 2022. http://dx.doi.org/10.1007/978-981-16-6289-8_31.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Hasan, Mahedi, Anichur Rahman, Md Razaul Karim, Md Saikat Islam Khan, and Md Jahidul Islam. "Normalized Approach to Find Optimal Number of Topics in Latent Dirichlet Allocation (LDA)." In Advances in Intelligent Systems and Computing, 341–54. Singapore: Springer Singapore, 2020. http://dx.doi.org/10.1007/978-981-33-4673-4_27.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Balasubramanian, Sreejith, Supriya Kaitheri, Krishnadas Nanath, Sony Sreejith, and Cody Morris Paris. "Examining Post COVID-19 Tourist Concerns Using Sentiment Analysis and Topic Modeling." In Information and Communication Technologies in Tourism 2021, 564–69. Cham: Springer International Publishing, 2021. http://dx.doi.org/10.1007/978-3-030-65785-7_54.

Full text
Abstract:
AbstractThe COVID-19 pandemic has had a destructive effect on the tourism sector, especially on tourists’ fears and risk perceptions, and is likely to have a lasting impact on their intention to travel. Governments and businesses worldwide looking to revive and revamp their tourism sector, therefore, must first develop a critical understanding of tourist concerns starting from the dreaming/planning phase to booking, travel, stay, and experiencing. This formed the motivation of this study, which empirically examines the tourist sentiments and concerns across the tourism supply chain. Natural Language Processing (NLP) using sentiment analysis and Latent Dirichlet Allocation (LDA) approach was applied to analyze the semi-structured survey data collected from 72 respondents. Practitioners and policymakers could use the study findings to enable various support mechanisms for restoring tourist confidence and help them adjust to the’new normal.’
APA, Harvard, Vancouver, ISO, and other styles
4

Hori, Kennichiro, Ibuki Yoshida, Miki Suzuki, Zhu Yiwen, and Yohei Kurata. "Emergence and Rapid Popularization of Paid Web-Conferencing-Application-Based Tours in Japan: An Analysis of Their Business Potential." In Information and Communication Technologies in Tourism 2022, 41–54. Cham: Springer International Publishing, 2022. http://dx.doi.org/10.1007/978-3-030-94751-4_4.

Full text
Abstract:
AbstractFollowing the emergence of COVID-19 pandemic, people in Japan were asked to refrain from traveling, resulting in various companies coming up with new ways of experiencing tourism. Among them, the online tourism experience of H.I.S. Co., Ltd. (HIS) drew more than 100,000 participants as of August 29, 2021. In this study, we focused on an online tour where the host goes to the site and records real time communication using a web conference application. The destinations of online tours were analyzed through text mining, and the characteristics of online tours were analyzed using Latent Dirichlet Allocation (LDA) of topic models. The results show that the number of online tours is weakly negatively correlated with distance and time differences. From the topic model, it is evident that the guide is important in online tours. In addition, the sense of presence, communication environment, and images, which are considered to be unique topics in online tours, are also relevant to the evaluation.
APA, Harvard, Vancouver, ISO, and other styles
5

Evangelista, Adelia, Annalina Sarra, and Tonio Di Battista. "Students’ feedback on the digital ecosystem: a structural topic modeling approach." In Proceedings e report, 203–8. Florence: Firenze University Press and Genova University Press, 2023. http://dx.doi.org/10.36253/979-12-215-0106-3.36.

Full text
Abstract:
Starting from March 2020, strict containment measures against COVID-19 forced the Italian Universities to activate remote learning and supply didactic methods online. This work is aimed at showing students’ perceptions towards a learning-teaching experience practised within a digital learning ecosystem designed in the period of first emergency and then re-proposed for the blended mode. Specifically, students, attending six teaching large courses held by four professors in two different Italian universities, were asked to express their impression in a text guided by questions, requiring the reflections and clarification of their and inner deep thoughts on the ecosystem. To automate the analysis of the resulting open-ended responses and avoid a labour-intensive human coding, we focused on a machine learning approach based on structural topic modelling (STM). Alike to Latent Dirichlet Allocation model (LDA), STM is a probabilistic generative model that defines a document generated as a mixture of hidden topics. In addition, STM extends the LDA framework by allowing covariates of interest to be included in the prior distributions for open-ended-response topic proportions and topic word distributions. Based on model diagnostics and researchers’ expertise, a 10-topic model is best fitted the data. Prevalent topics described by respondents include: “Physical space”, “Bulding the community: use of Whatsapp”, “Communication and tools”, “Interaction with Teacher”, “Feedback”.
APA, Harvard, Vancouver, ISO, and other styles
6

Vílchez-Román, Carlos, Farita Huamán-Delgado, and Sol Sanguinetti-Cordero. "Topic Modeling Applied to Business Research: A Latent Dirichlet Allocation (LDA)-Based Classification for Organization Studies." In Information Management and Big Data, 212–19. Cham: Springer International Publishing, 2019. http://dx.doi.org/10.1007/978-3-030-11680-4_21.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Calleo, Yuri, and Simone Di Zio. "Unsupervised spatial data mining for the development of future scenarios: a Covid-19 application." In Proceedings e report, 173–78. Florence: Firenze University Press, 2021. http://dx.doi.org/10.36253/978-88-5518-461-8.33.

Full text
Abstract:
In the context of Futures Studies, the scenario development process permits to make assumptions on what the futures can be in order to support better today decisions. In the initial stages of the scenario building (Framing and Scanning phases), the process requires much time and efforts to scanning data and information (reading of documents, literature review and consultation of experts) to understand more about the object of the foresight study. The daily use of social networks causes an exponential increase of data and for this reason here we deal with the problem of speeding up and optimizing the Scanning phase by applying a new combined method based on the analysis of tweets with the use of unsupervised classification models, text-mining and spatial data mining techniques. For the purpose of having a qualitative overview, we applied the bag-of-words model and a Sentiment Analysis with the Afinn and Vader algorithms. Then, in order to extrapolate the influence factors, and the relevant key factors (Kayser and Blind, 2017; 2020) the Latent Dirichlet Allocation (LDA) was used (Tong and Zhang, 2016). Furthermore, to acquire also spatial information we used spatial data mining technique to extract georeferenced data from which it was possible to analyse and obtain a geographic analysis of the data. To showcase our method, we provide an example using Covid-19 tweets (Uhl and Schiebel, 2017), upon which 5 topics and 6 key factors have been extracted. In the last instance, for each influence factor, a cartogram was created through the relative frequencies in order to have a spatial distribution of the users discussing each particular topic. The results fully answer the research objectives and the model used could be a new approach that can offer benefits in the scenario developments process.
APA, Harvard, Vancouver, ISO, and other styles
8

Pon, Abisheka, C. Deisy, and P. Sharmila. "A Case-Study on Topic Modeling Approach with Latent Dirichlet Allocation (LDA) Model." In New Frontiers in Communication and Intelligent Systems, 291–99. Soft Computing Research Society, 2021. http://dx.doi.org/10.52458/978-81-95502-00-4-30.

Full text
Abstract:
In natural language processing, subject displaying is a sort of factual information models for identifying the points from an enormous assortment of corpus of records. Subject demonstrating is a sort of text-digging device for revelation of stowed away semantic designs in a text body. In the proposed model high layered text informational index named article.csv is handled to acquire primary themes or as often as possible happening subjects for our text information by giving the catchphrases of every point. In this work, a dataset of abstracts are collected from two different domain journals for tagging journal abstracts. The document models are built using Latent Dirichlet Allocation (LDA).Topics thus extracted can be used to get meaningful insights from the text data. In this paper LDA model is used to gives an extra analytical boost for the model. First the data is preprocessed as text data before giving the data to the model as the predictions. Then topic modeling is performed on the preprocessed data by integrating the framework of LDA topic modeling for more optimal classification of topics in the documents.
APA, Harvard, Vancouver, ISO, and other styles
9

Daud, Ali, Jamal Ahmad Khan, Jamal Abdul Nasir, Rabeeh Ayaz Abbasi, Naif Radi Aljohani, and Jalal S. Alowibdi. "Latent Dirichlet Allocation and POS Tags Based Method for External Plagiarism Detection." In Scholarly Ethics and Publishing, 319–36. IGI Global, 2019. http://dx.doi.org/10.4018/978-1-5225-8057-7.ch015.

Full text
Abstract:
In this article we present a new semantic and syntactic-based method for external plagiarism detection. In the proposed approach, latent dirichlet allocation (LDA) and parts of speech (POS) tags are used together to detect plagiarism between the sample and a number of source documents. The basic hypothesis is that considering semantic and syntactic information between two text documents may improve the performance of the plagiarism detection task. Our method is based on two steps, naming, which is a pre-processing where we detect the topics from the sentences in documents using the LDA and convert each sentence in POS tags array; then a post processing step where the suspicious cases are verified purely on the basis of semantic rules. For two types of external plagiarism (copy and random obfuscation), we empirically compare our approach to the state-of-the-art N-gram based and stop-word N-gram based methods and observe significant improvements.
APA, Harvard, Vancouver, ISO, and other styles
10

Keikhosrokiani, Pantea, Moussa Pourya Asl, Kah Em Chu, and Nur Ain Nasuha Anuar. "Artificial Intelligence Framework for Opinion Mining of Netizen Readers' Reviews of Arundhati Roy's The God of Small Things." In Advances in Computational Intelligence and Robotics, 68–92. IGI Global, 2022. http://dx.doi.org/10.4018/978-1-6684-6242-3.ch004.

Full text
Abstract:
In recent years, South-Asian literature in English has experienced a surge of newfound love and popularity both in the local and the global market. In this regard, Arundhati Roy's The God of Small Things (1997) has garnered an astounding mix of positive and negative reactions from readers across the globe. This chapter adopts an artificial intelligence approach to analyse netizen readers' feedback on the novel as documented in the book cataloguing website Goodreads. To this end, an opinion mining framework is proposed based on artificial intelligence techniques such as topic modelling and sentiment analysis. Latent semantic analysis (LSA) and latent Dirichlet allocation (LDA) are applied and compared to find the abstract “topics” that occur in a collection of reviews. Furthermore, lexicon-based sentiment analysis approaches such as Vader and Textblob algorithms are used and compared to find the review sentiment polarities.
APA, Harvard, Vancouver, ISO, and other styles

Conference papers on the topic "Allocation de Dirichlet latente (LDA)"

1

Zhao, Fangyuan, Xuebin Ren, Shusen Yang, and Xinyu Yang. "On Privacy Protection of Latent Dirichlet Allocation Model Training." In Twenty-Eighth International Joint Conference on Artificial Intelligence {IJCAI-19}. California: International Joint Conferences on Artificial Intelligence Organization, 2019. http://dx.doi.org/10.24963/ijcai.2019/675.

Full text
Abstract:
Latent Dirichlet Allocation (LDA) is a popular topic modeling technique for discovery of hidden semantic architecture of text datasets, and plays a fundamental role in many machine learning applications. However, like many other machine learning algorithms, the process of training a LDA model may leak the sensitive information of the training datasets and bring significant privacy risks. To mitigate the privacy issues in LDA, we focus on studying privacy-preserving algorithms of LDA model training in this paper. In particular, we first develop a privacy monitoring algorithm to investigate the privacy guarantee obtained from the inherent randomness of the Collapsed Gibbs Sampling (CGS) process in a typical LDA training algorithm on centralized curated datasets. Then, we further propose a locally private LDA training algorithm on crowdsourced data to provide local differential privacy for individual data contributors. The experimental results on real-world datasets demonstrate the effectiveness of our proposed algorithms.
APA, Harvard, Vancouver, ISO, and other styles
2

"Tutorial 2: Latent Dirichlet Allocation (LDA) by Abram Hindle." In 2014 IEEE 4th Workshop on Mining Unstructured Data (MUD). IEEE, 2014. http://dx.doi.org/10.1109/mud.2014.15.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Thornton, Adam, Brandon Meiners, and Donald Poole. "Latent Dirichlet Allocation (LDA) for Anomaly Detection in Avionics Networks." In 2020 IEEE/AIAA 39th Digital Avionics Systems Conference (DASC). IEEE, 2020. http://dx.doi.org/10.1109/dasc50938.2020.9256582.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Shakeel, Khadija, Ghulam Rasool Tahir, Irsha Tehseen, and Mubashir Ali. "A framework of Urdu topic modeling using latent dirichlet allocation (LDA)." In 2018 IEEE 8th Annual Computing and Communication Workshop and Conference (CCWC). IEEE, 2018. http://dx.doi.org/10.1109/ccwc.2018.8301655.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Prabhudesai, Kedar S., Boyla O. Mainsah, Leslie M. Collins, and Chandra S. Throckmorton. "Augmented Latent Dirichlet Allocation (Lda) Topic Model with Gaussian Mixture Topics." In ICASSP 2018 - 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2018. http://dx.doi.org/10.1109/icassp.2018.8462003.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Basuki, Setio, Yufis Azhar, Agus Eko Minarno, Christian Sri Kusuma Aditya, Fauzi Dwi Setiawan Sumadi, and Ardiansah Ilham Ramadhan. "Detection of Reference Topics and Suggestions using Latent Dirichlet Allocation (LDA)." In 2019 12th International Conference on Information & Communication Technology and System (ICTS). IEEE, 2019. http://dx.doi.org/10.1109/icts.2019.8850993.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Gorro, Ken D., Glicerio A. Baguia, and Moustafa F. Ali. "An analysis of Disaster Risk Suggestions using Latent Dirichlet Allocation and Hierarchical Dirichlet Process (Nonparametric LDA)." In ICIT 2021: IoT and Smart City. New York, NY, USA: ACM, 2021. http://dx.doi.org/10.1145/3512576.3512608.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Yoon, Young Seog, Junhee Lee, and Kwangroh Park. "Extracting Promising Topics on Smart Manufacturing Based on Latent Dirichlet Allocation (LDA)." In 2019 International Conference on Information and Communication Technology Convergence (ICTC). IEEE, 2019. http://dx.doi.org/10.1109/ictc46691.2019.8939701.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Ishmael, Ontiretse, Etain Kiely, Cormac Quigley, and Donal McGinty. "Topic Modelling using Latent Dirichlet Allocation (LDA) and Analysis of Students Sentiments." In 2023 20th International Joint Conference on Computer Science and Software Engineering (JCSSE). IEEE, 2023. http://dx.doi.org/10.1109/jcsse58229.2023.10201965.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

Habibi, Muhammad, Adri Priadana, Andika Bayu Saputra, and Puji Winar Cahyo. "Topic Modelling of Germas Related Content on Instagram Using Latent Dirichlet Allocation (LDA)." In International Conference on Health and Medical Sciences (AHMS 2020). Paris, France: Atlantis Press, 2021. http://dx.doi.org/10.2991/ahsr.k.210127.060.

Full text
APA, Harvard, Vancouver, ISO, and other styles

Reports on the topic "Allocation de Dirichlet latente (LDA)"

1

Moreno Pérez, Carlos, and Marco Minozzo. “Making Text Talk”: The Minutes of the Central Bank of Brazil and the Real Economy. Madrid: Banco de España, November 2022. http://dx.doi.org/10.53479/23646.

Full text
Abstract:
This paper investigates the relationship between the views expressed in the minutes of the meetings of the Central Bank of Brazil’s Monetary Policy Committee (COPOM) and the real economy. It applies various computational linguistic machine learning algorithms to construct measures of the minutes of the COPOM. First, we create measures of the content of the paragraphs of the minutes using Latent Dirichlet Allocation (LDA). Second, we build an uncertainty index for the minutes using Word Embedding and K-Means. Then, we combine these indices to create two topic-uncertainty indices. The first one is constructed from paragraphs with a higher probability of topics related to “general economic conditions”. The second topic-uncertainty index is constructed from paragraphs that have a higher probability of topics related to “inflation” and the “monetary policy discussion”. Finally, we employ a structural VAR model to explore the lasting effects of these uncertainty indices on certain Brazilian macroeconomic variables. Our results show that greater uncertainty leads to a decline in inflation, the exchange rate, industrial production and retail trade in the period from January 2000 to July 2019.
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography