Log in

Relevant bibliographies by topics / Allocation de Dirichlet latente (LDA) / Journal articles

To see the other types of publications on this topic, follow the link: Allocation de Dirichlet latente (LDA).

Journal articles on the topic 'Allocation de Dirichlet latente (LDA)'

Author: Grafiati

Published: 4 June 2021

Last updated: 1 June 2024

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 journal articles for your research on the topic 'Allocation de Dirichlet latente (LDA).'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse journal articles on a wide variety of disciplines and organise your bibliography correctly.

1

Guo, Yunyan, and Jianzhong Li. "Distributed Latent Dirichlet Allocation on Streams." ACM Transactions on Knowledge Discovery from Data 16, no. 1 (July 3, 2021): 1–20. http://dx.doi.org/10.1145/3451528.

Full text

Abstract:

Latent Dirichlet Allocation (LDA) has been widely used for topic modeling, with applications spanning various areas such as natural language processing and information retrieval. While LDA on small and static datasets has been extensively studied, several real-world challenges are posed in practical scenarios where datasets are often huge and are gathered in a streaming fashion. As the state-of-the-art LDA algorithm on streams, Streaming Variational Bayes (SVB) introduced Bayesian updating to provide a streaming procedure. However, the utility of SVB is limited in applications since it ignored three challenges of processing real-world streams: topic evolution , data turbulence , and real-time inference . In this article, we propose a novel distributed LDA algorithm—referred to as StreamFed-LDA— to deal with challenges on streams. For topic modeling of streaming data, the ability to capture evolving topics is essential for practical online inference. To achieve this goal, StreamFed-LDA is based on a specialized framework that supports lifelong (continual) learning of evolving topics. On the other hand, data turbulence is commonly present in streams due to real-life events. In that case, the design of StreamFed-LDA allows the model to learn new characteristics from the most recent data while maintaining the historical information. On massive streaming data, it is difficult and crucial to provide real-time inference results. To increase the throughput and reduce the latency, StreamFed-LDA introduces additional techniques that substantially reduce both computation and communication costs in distributed systems. Experiments on four real-world datasets show that the proposed framework achieves significantly better performance of online inference compared with the baselines. At the same time, StreamFed-LDA also reduces the latency by orders of magnitudes in real-world datasets.

APA, Harvard, Vancouver, ISO, and other styles

2

Garg, Mohit, and Priya Rangra. "Bibliometric Analysis of Latent Dirichlet Allocation." DESIDOC Journal of Library & Information Technology 42, no. 2 (February 28, 2022): 105–13. http://dx.doi.org/10.14429/djlit.42.2.17307.

Full text

Abstract:

Latent Dirichlet Allocation (LDA) has emerged as an important algorithm in big data analysis that finds the group of topics in the text data. It posits that each text document consists of a group of topics, and each topic is a mixture of words related to it. With the emergence of a plethora of text data, the LDA has become a popular algorithm for topic modeling among researchers from different domains. Therefore, it is essential to understand the trends of LDA researches. Bibliometric techniques are established methods to study the research progress of a topic. In this study, bibliographic data of 18715 publications that have cited the LDA were extracted from the Scopus database. The software R and Vosviewer were used to carry out the analysis. The analysis revealed that research interest in LDA had grown exponentially. The results showed that most authors preferred “Book Series” followed by “Conference Proceedings” as the publication venue. The majority of the institutions and authors were from the USA, followed by China. The co-occurrence analysis of keywords indicated that text mining and machine learning were dominant topics in LDA research with significant interest in social media. This study attempts to provide a comprehensive analysis and intellectual structure of LDA compared to previous studies.

APA, Harvard, Vancouver, ISO, and other styles

3

Kim, Anastasiia, Sanna Sevanto, Eric R. Moore, and Nicholas Lubbers. "Latent Dirichlet Allocation modeling of environmental microbiomes." PLOS Computational Biology 19, no. 6 (June 8, 2023): e1011075. http://dx.doi.org/10.1371/journal.pcbi.1011075.

Full text

Abstract:

Interactions between stressed organisms and their microbiome environments may provide new routes for understanding and controlling biological systems. However, microbiomes are a form of high-dimensional data, with thousands of taxa present in any given sample, which makes untangling the interaction between an organism and its microbial environment a challenge. Here we apply Latent Dirichlet Allocation (LDA), a technique for language modeling, which decomposes the microbial communities into a set of topics (non-mutually-exclusive sub-communities) that compactly represent the distribution of full communities. LDA provides a lens into the microbiome at broad and fine-grained taxonomic levels, which we show on two datasets. In the first dataset, from the literature, we show how LDA topics succinctly recapitulate many results from a previous study on diseased coral species. We then apply LDA to a new dataset of maize soil microbiomes under drought, and find a large number of significant associations between the microbiome topics and plant traits as well as associations between the microbiome and the experimental factors, e.g. watering level. This yields new information on the plant-microbial interactions in maize and shows that LDA technique is useful for studying the coupling between microbiomes and stressed organisms.

APA, Harvard, Vancouver, ISO, and other styles

4

Zhou, Qi, Haipeng Chen, Yitao Zheng, and Zhen Wang. "EvaLDA: Efficient Evasion Attacks Towards Latent Dirichlet Allocation." Proceedings of the AAAI Conference on Artificial Intelligence 35, no. 16 (May 18, 2021): 14602–11. http://dx.doi.org/10.1609/aaai.v35i16.17716.

Full text

Abstract:

As one of the most powerful topic models, Latent Dirichlet Allocation (LDA) has been used in a vast range of tasks, including document understanding, information retrieval and peer-reviewer assignment. Despite its tremendous popularity, the security of LDA has rarely been studied. This poses severe risks to security-critical tasks such as sentiment analysis and peer-reviewer assignment that are based on LDA. In this paper, we are interested in knowing whether LDA models are vulnerable to adversarial perturbations of benign document examples during inference time. We formalize the evasion attack to LDA models as an optimization problem and prove it to be NP-hard. We then propose a novel and efficient algorithm, EvaLDA to solve it. We show the effectiveness of EvaLDA via extensive empirical evaluations. For instance, in the NIPS dataset, EvaLDA can averagely promote the rank of a target topic from 10 to around 7 by only replacing 1% of the words with similar words in a victim document. Our work provides significant insights into the power and limitations of evasion attacks to LDA models.

APA, Harvard, Vancouver, ISO, and other styles

5

Christy, A., Anto Praveena, and Jany Shabu. "A Hybrid Model for Topic Modeling Using Latent Dirichlet Allocation and Feature Selection Method." Journal of Computational and Theoretical Nanoscience 16, no. 8 (August 1, 2019): 3367–71. http://dx.doi.org/10.1166/jctn.2019.8234.

Full text

Abstract:

In this information age, Knowledge discovery and pattern matching plays a significant role. Topic Modeling, an area of Text mining is used detecting hidden patterns in a document collection. Topic Modeling and Document Clustering are two important key terms which are similar in concepts and functionality. In this paper, topic modeling is carried out using Latent Dirichlet Allocation-Brute Force Method (LDA-BF), Latent Dirichlet Allocation-Back Tracking (LDA-BT), Latent Semantic Indexing (LSI) method and Nonnegative Matrix Factorization (NMF) method. A hybrid model is proposed which uses Latent Dirichlet Allocation (LDA) for extracting feature terms and Feature Selection (FS) method for feature reduction. The efficiency of document clustering depends upon the selection of good features. Topic modeling is performed by enriching the good features obtained through feature selection method. The proposed hybrid model produces improved accuracy than K-Means clustering method.

APA, Harvard, Vancouver, ISO, and other styles

6

Fernanda, Jerhi Wahyu. "PEMODELAN PERSEPSI PEMBELAJARAN ONLINE MENGGUNAKAN LATENT DIRICHLET ALLOCATION." Jurnal Statistika Universitas Muhammadiyah Semarang 9, no. 2 (December 31, 2021): 79. http://dx.doi.org/10.26714/jsunimus.9.2.2021.79-85.

Full text

Abstract:

Latent Dirichlet Allocation (LDA) merupakan metode untuk pemodelan topik adalah yang didasarkan kepada konsep probabilitas untuk mencari kemiripan suatu dokumen dan mengelompokkan dokumen-dokumen menjadi beberapa topik atau kelompok. Metode ini masuk dalam unsupervised learning karena tidak ada label atau target pada data yang dianalisis. Penelitian ini bertujuan untuk mengelompokkan persepsi tentang pembelajaran online ke dalam beberapa topik menggunakan metode LDA. Data penelitian ini adalah data primer yang dikumpulkan melalui formulir online. Hasil analisis menunjukkan bahwa pemodelan LDA menggunakan 6 topik memiliki coherence score paling besar. Hasil visualisasi data text menggunakan wordcloud didapatkan kata tidak memiliki frekuensi kemunculan terbesar. Penentuan jumlah topik yang optimal berdasarkan coherence score, didapatkan pemodelan LDA dengan 6 topik adalah yang paling optimal. secara garis besar terdapat beberapa kata yang saling beririsan dengan topik yang lain. Hasil pemodelan memberikan gambaran bahwa persepsi/pandangan mahasiswa terdapat pembelajaran online terkait pemahaman materi yang diberikan dosen, sinyal atau jaringan internet, kuota, dan tugas. Pada kata-kata terkait pemahaman materi, mahasiswa memberikan pandangan bahwa mereka tidak dapat memahami dengan baik materi yang diberikan oleh dosen.

APA, Harvard, Vancouver, ISO, and other styles

7

Yuan, Ling, JiaLi Bin, YinZhen Wei, Fei Huang, XiaoFei Hu, and Min Tan. "Big Data Aspect-Based Opinion Mining Using the SLDA and HME-LDA Models." Wireless Communications and Mobile Computing 2020 (November 18, 2020): 1–19. http://dx.doi.org/10.1155/2020/8869385.

Full text

Abstract:

In order to make better use of massive network comment data for decision-making support of customers and merchants in the big data era, this paper proposes two unsupervised optimized LDA (Latent Dirichlet Allocation) models, namely, SLDA (SentiWordNet WordNet-Latent Dirichlet Allocation) and HME-LDA (Hierarchical Clustering MaxEnt-Latent Dirichlet Allocation), for aspect-based opinion mining. One scheme of each of two optimized models, which both use seed words as topic words and construct the inverted index, is designed to enhance the readability of experiment results. Meanwhile, based on the LDA topic model, we introduce new indicator variables to refine the classification of topics and try to classify the opinion target words and the sentiment opinion words by two different schemes. For better classification effect, the similarity between words and seed words is calculated in two ways to offset the fixed parameters in the standard LDA. In addition, based on the SemEval2016ABSA data set and the Yelp data set, we design comparative experiments with training sets of different sizes and different seed words, which prove that the SLDA and the HME-LDA have better performance on the accuracy, recall value, and harmonic value with unannotated training sets.

APA, Harvard, Vancouver, ISO, and other styles

8

Ogundare, A. O., A. U. Saleh, O. A. James, E. E. Ajayi, and S. Gostoji. "Performance evaluation of Latent Dirichlet Allocation on legal documents." Applied and Computational Engineering 52, no. 1 (March 27, 2024): 96–101. http://dx.doi.org/10.54254/2755-2721/52/20241322.

Full text

Abstract:

Latent Dirichlet Allocation (LDA) is an algorithm with the capability of processing large amount of text data. In this study, the LDA is used to produce topic modelling of topic clusters from corpus of legal texts generated under 4 topics within Nigeria context Employment Contract, Election Petition, Deeds, and Articles of Incorporation. Each topic has a substantial number of articles and the LDA method proves effective in extracting topics and generating index words that are in each topic cluster. At the end of experimentation, results are compared with manually pre-annotated dataset for validation purpose and the results show high accuracy. The LDA output shows optimal performance in the word indexing processing for Election Petition as all the documents annotated under the topic were accurately classified.

APA, Harvard, Vancouver, ISO, and other styles

9

Syed, Shaheen, and Marco Spruit. "Exploring Symmetrical and Asymmetrical Dirichlet Priors for Latent Dirichlet Allocation." International Journal of Semantic Computing 12, no. 03 (September 2018): 399–423. http://dx.doi.org/10.1142/s1793351x18400184.

Full text

Abstract:

Latent Dirichlet Allocation (LDA) has gained much attention from researchers and is increasingly being applied to uncover underlying semantic structures from a variety of corpora. However, nearly all researchers use symmetrical Dirichlet priors, often unaware of the underlying practical implications that they bear. This research is the first to explore symmetrical and asymmetrical Dirichlet priors on topic coherence and human topic ranking when uncovering latent semantic structures from scientific research articles. More specifically, we examine the practical effects of several classes of Dirichlet priors on 2000 LDA models created from abstract and full-text research articles. Our results show that symmetrical or asymmetrical priors on the document–topic distribution or the topic–word distribution for full-text data have little effect on topic coherence scores and human topic ranking. In contrast, asymmetrical priors on the document–topic distribution for abstract data show a significant increase in topic coherence scores and improved human topic ranking compared to a symmetrical prior. Symmetrical or asymmetrical priors on the topic–word distribution show no real benefits for both abstract and full-text data.

APA, Harvard, Vancouver, ISO, and other styles

10

Ohmura, Masahiro, Koh Kakusho, and Takeshi Okadome. "Tweet Sentiment Analysis with Latent Dirichlet Allocation." International Journal of Information Retrieval Research 4, no. 3 (July 2014): 66–79. http://dx.doi.org/10.4018/ijirr.2014070105.

Full text

Abstract:

The method proposed here analyzes the social sentiments from collected tweets that have at least 1 of 800 sentimental or emotional adjectives. By dealing with tweets posted in a half a day as an input document, the method uses Latent Dirichlet Allocation (LDA) to extract social sentiments, some of which coincide with our daily sentiments. The extracted sentiments, however, indicate lowered sensitivity to changes in time, which suggests that they are not suitable for predicting daily social or economic events. Using LDA for the representative 72 adjectives to which each of the 800 adjectives maps while preserving word frequencies permits us to obtain social sentiments that show improved sensitivity to changes in time. A regression model with autocorrelated errors in which the inputs are social sentiments obtained by analyzing the contracted adjectives predicts Dow Jones Industrial Average (DJIA) more precisely than autoregressive moving-average models.

APA, Harvard, Vancouver, ISO, and other styles

11

Li, Chenchen, Xiang Yan, Xiaotie Deng, Yuan Qi, Wei Chu, Le Song, Junlong Qiao, Jianshan He, and Junwu Xiong. "Latent Dirichlet Allocation for Internet Price War." Proceedings of the AAAI Conference on Artificial Intelligence 33 (July 17, 2019): 639–46. http://dx.doi.org/10.1609/aaai.v33i01.3301639.

Full text

Abstract:

Current Internet market makers are facing an intense competitive environment, where personalized price reductions or discounted coupons are provided by their peers to attract more customers. Much investment is spent to catch up with each other’s competitors but participants in such a price cut war are often incapable of winning due to their lack of information about others’ strategies or customers’ preference. We formalize the problem as a stochastic game with imperfect and incomplete information and develop a variant of Latent Dirichlet Allocation (LDA) to infer latent variables under the current market environment, which represents preferences of customers and strategies of competitors. Tests on simulated experiments and an open dataset for real data show that, by subsuming all available market information of the market maker’s competitors, our model exhibits a significant improvement for understanding the market environment and finding the best response strategies in the Internet price war. Our work marks the first successful learning method to infer latent information in the environment of price war by the LDA modeling, and sets an example for related competitive applications to follow.

APA, Harvard, Vancouver, ISO, and other styles

12

Rhys Leahy, Richard F. Sear, Nicholas J. Restrepo, Yonatan Lupu, and Neil F. Johnson. "Dynamic Latent Dirichlet Allocation Tracks Evolution of Online Hate Topics." Advances in Artificial Intelligence and Machine Learning 02, no. 01 (2022): 257–72. http://dx.doi.org/10.54364/aaiml.2022.1117.

Full text

Abstract:

Not only can online hate content spread easily between social media platforms, but its focus can also evolve over time. Machine learning and other artificial intelligence (AI) tools could play a key role in helping human moderators understand how such hate topics are evolving online. Latent Dirichlet Allocation (LDA) has been shown to be able to identify hate topics from a corpus of text associated with online communities that promote hate. However, applying LDA to each day’s data is impractical since the inferred topic list from the optimization can change abruptly from day to day, even though the underlying text and hence topics do not typically change this quickly. Hence, LDA is not well suited to capture the way in which hate topics evolve and morph. Here we solve this problem by showing that a dynamic version of LDA can help capture this evolution of topics surrounding online hate. Specifically, we show how standard and dynamical LDA models can be used in conjunction to analyze the topics over time emerging from extremist communities across multiple moderated and unmoderated social media platforms. Our dataset comprises material that we have gathered from hate-related communities on Facebook, Telegram, and Gab during the time period January-April 2021. We demonstrate the ability of dynamic LDA to shed light on how hate groups use different platforms in order to propagate their cause and interests across the online multiverse of social media platforms.

APA, Harvard, Vancouver, ISO, and other styles

13

Fatima-Zahrae, Sifi, Sabbar Wafae, and El Mzabi Amal. "Application of Latent Dirichlet Allocation (LDA) for clustering financial tweets." E3S Web of Conferences 297 (2021): 01071. http://dx.doi.org/10.1051/e3sconf/202129701071.

Full text

Abstract:

Sentiment classification is one of the hottest research areas among the Natural Language Processing (NLP) topics. While it aims to detect sentiment polarity and classification of the given opinion, requires a large number of aspect extractions. However, extracting aspect takes human effort and long time. To reduce this, Latent Dirichlet Allocation (LDA) method have come out recently to deal with this issue.In this paper, an efficient preprocessing method for sentiment classification is presented and will be used for analyzing user’s comments on Twitter social network. For this purpose, different text preprocessing techniques have been used on the dataset to achieve an acceptable standard text. Latent Dirichlet Allocation has been applied on the obtained data after this fast and accurate preprocessing phase. The implementation of different sentiment analysis methods and the results of these implementations have been compared and evaluated. The experimental results show that the combined uses of the preprocessing method of this paper and Latent Dirichlet Allocation have an acceptable results compared to other basic methods.

APA, Harvard, Vancouver, ISO, and other styles

14

Muhajir, Muhammad, Dedi Rosadi, and Danardono Danardono. "Improving the term weighting log entropy of latent dirichlet allocation." Indonesian Journal of Electrical Engineering and Computer Science 34, no. 1 (April 1, 2024): 455. http://dx.doi.org/10.11591/ijeecs.v34.i1.pp455-462.

Full text

Abstract:

<p class="AbstractText">The process of analyzing textual data involves the utilization of topic modeling techniques to uncover latent subjects within documents. The presence of numerous short texts in the Indonesian language poses additional challenges in the field of topic modeling. This study presents a substantial enhancement to the term weighting log entropy (TWLE) approach within the latent dirichlet allocation (LDA) framework, specifically tailored for topic modeling of Indonesian short texts. This work places significant emphasis on the utilization of LDA for word weighting. The research endeavor aimed to enhance the coherence and interpretability of an Indonesian topic model through the integration of local and global weights. Local Weight focuses on the distinct characteristics of each document, whereas global weight examines the broader perspective of the entire corpus of documents. The objective was to enhance the effectiveness of LDA themes by this amalgamation. The TWLE model of LDA was found to be more informative and effective than the TF-IDF LDA when compared with short Indonesian text. This work improves topic modeling in brief Indonesian compositions. Transfer learning for NLP and Indonesian language adaptation helps improve subject analysis knowledge and precision, this could boost NLP and topic modeling in Indonesian.</p>

APA, Harvard, Vancouver, ISO, and other styles

15

Journal, IJSREM. "Topic Modelling of Web Pages with Latent Dirichlet Allocation Methods." INTERANTIONAL JOURNAL OF SCIENTIFIC RESEARCH IN ENGINEERING AND MANAGEMENT 07, no. 11 (November 1, 2023): 1–11. http://dx.doi.org/10.55041/ijsrem27350.

Full text

Abstract:

Topic modelling with Latent Dirichlet Allocation (LDA) is a popular technique used in natural language processing to uncover hidden thematic structures within a collection of documents. When applied to web pages, LDA can help in identifying prevalent topics or themes across these pages.. This study delves into the utilization of Latent Dirichlet Allocation (LDA) methods to extract underlying topics within web pages, a fundamental pursuit in understanding the multifaceted landscape of online information. Web content analysis presents unique challenges owing to its diverse nature—comprising text, images, videos, and structured HTML elements—mandating rigorous preprocessing strategies to homogenize the data. By adapting the LDA model to accommodate these challenges, this research tackles the task of uncovering latent thematic structures prevalent across web content. Methodologically, the study explores parameter tuning and model adaptation to optimize LDA for web page analysis, navigating complexities such as varied content formats, noise, and inherent biases in web data. Addressing these intricacies involves parsing HTML, extracting meaningful textual information, and refining tokenization processes. Evaluating the fidelity and interpretability of discovered topics becomes pivotal, prompting the utilization of coherence scores, perplexity metrics, and human assessment to gauge the quality of generated topics. Additionally, this research confronts the dynamic nature of web content, proposing strategies like continuous model retraining and dynamic topic modeling to accommodate evolving trends and updates. Practical applications of the extracted topics span a spectrum of domains, encompassing content recommendation systems, user behavior analysis, sentiment analysis, targeted advertising, and the enhancement of search algorithms for improved relevance and user engagement. Supported by illustrative case studies, this study elucidates how LDA serves as a potent mechanism to distill coherent and meaningful topics from web pages, offering invaluable insights into the hidden structures within the vast expanse of online information. St This comprehensive abstract encapsulates the depth and breadth of employing LDA for the analysis of web content, encompassing challenges, methodologies, evaluations, applications, and real-world implications.

APA, Harvard, Vancouver, ISO, and other styles

16

Liu, Hailin, Ling Xu, Mengning Yang, Meng Yan, and Xiaohong Zhang. "Predicting Component Failures Using Latent Dirichlet Allocation." Mathematical Problems in Engineering 2015 (2015): 1–15. http://dx.doi.org/10.1155/2015/562716.

Full text

Abstract:

Latent Dirichlet Allocation (LDA) is a statistical topic model that has been widely used to abstract semantic information from software source code. Failure refers to an observable error in the program behavior. This work investigates whether semantic information and failures recorded in the history can be used to predict component failures. We use LDA to abstract topics from source code and a new metric (topic failure density) is proposed by mapping failures to these topics. Exploring the basic information of topics from neighboring versions of a system, we obtain a similarity matrix. Multiply the Topic Failure Density (TFD) by the similarity matrix to get the TFD of the next version. The prediction results achieve an average 77.8% agreement with the real failures by considering the top 3 and last 3 components descending ordered by the number of failures. We use the Spearman coefficient to measure the statistical correlation between the actual and estimated failure rate. The validation results range from 0.5342 to 0.8337 which beats the similar method. It suggests that our predictor based on similarity of topics does a fine job of component failure prediction.

APA, Harvard, Vancouver, ISO, and other styles

17

Liu, Yezheng, Fei Du, Jianshan Sun, and Yuanchun Jiang. "iLDA: An interactive latent Dirichlet allocation model to improve topic quality." Journal of Information Science 46, no. 1 (January 9, 2019): 23–40. http://dx.doi.org/10.1177/0165551518822455.

Full text

Abstract:

User-generated content has been an increasingly important data source for analysing user interests in both industries and academic research. Since the proposal of the basic latent Dirichlet allocation (LDA) model, plenty of LDA variants have been developed to learn knowledge from unstructured user-generated contents. An intractable limitation for LDA and its variants is that low-quality topics whose meanings are confusing may be generated. To handle this problem, this article proposes an interactive strategy to generate high-quality topics with clear meanings by integrating subjective knowledge derived from human experts and objective knowledge learned by LDA. The proposed interactive latent Dirichlet allocation (iLDA) model develops deterministic and stochastic approaches to obtain subjective topic-word distribution from human experts, combines the subjective and objective topic-word distributions by a linear weighted-sum method, and provides the inference process to draw topics and words from a comprehensive topic-word distribution. The proposed model is a significant effort to integrate human knowledge with LDA-based models by interactive strategy. The experiments on two real-world corpora show that the proposed iLDA model can draw high-quality topics with the assistance of subjective knowledge from human experts. It is robust under various conditions and offers fundamental supports for the applications of LDA-based topic modelling.

APA, Harvard, Vancouver, ISO, and other styles

18

Khairul Hudha Nasution, Widodo, and Bambang Prasetya Adhi. "SISTEM DETEKSI TOPIK POLITIK PADA TWITTER MENGGUNAKAN ALGORITMA LATENT DIRICHLET ALLOCATION." PINTER : Jurnal Pendidikan Teknik Informatika dan Komputer 5, no. 1 (June 1, 2021): 76–83. http://dx.doi.org/10.21009/pinter.5.1.10.

Full text

Abstract:

Tahun 2019 Indonesia melaksanakan tahun politik, banyak terjadi peristiwa politik yang membuat masyarakat Indonesia menyikapi dengan berbagai macam tanggapan dari berbagai banyak tanggapan tersebut beberapa dituliskan dalam media sosial Twitter dan data tersebut dapat diolah untuk menggambarkan bagaimana pendapat masyarakat akan suatu kejadian politik. Penelitian ini bertujuan untuk mendapatkan analisis dari implementasi algoritma LDA untuk menentukan topik politik pada Twitter. Metode yang digunakan adalah dengan algoritma LDA yang digunakan untuk menghitung kemungkinan topik yang ada untuk setiap tweet-nya, LDA merupakan model probabilistik yang dapat menggambarkan topik tanpa perlu melakukan proses klasifikasi sebelumnya, sistem akan otomatis mendeteksi topik-topik yang ada. Hasil penelitian dengan pengujian 3 kali dengan jumlah data masing-masing 100, 1000 dan 6000 dengan menggunakan setingan LDA bawaan dari library Genism dan jumlah topik 10 menghasilkan rata-rata nilai kebenaran 90%. Sehingga dapat disimpulkan bahwa LDA dapat digunakan dan memiliki nilai kebenaran yang tinggi dalam mendeteksi topik politik pada Twitter.

APA, Harvard, Vancouver, ISO, and other styles

19

Nawang Sari, Wilujeng Ayu, and Hindriyanto Dwi Purnomo. "TOPIC MODELING USING THE LATENT DIRICHLET ALLOCATION METHOD ON WIKIPEDIA PANDEMIC COVID-19 DATA IN INDONESIA." Jurnal Teknik Informatika (Jutif) 3, no. 5 (October 24, 2022): 1223–30. http://dx.doi.org/10.20884/1.jutif.2022.3.5.321.

Full text

Abstract:

Wikipedia is a web-based encyclopedia that is used to search for information. In one of the Wikipedia articles, a problem has been found regarding no one has clustered on the topic of the Covid-19 pandemic in Indonesia. The method used for this research is the Latent Dirichlet Allocation (LDA) method. The Latent Dirichlet Allocation (LDA) method is the most widely used topic modeling method today. In this study using 6658 words in English that will be used for the dataset. Then every word that appears will be counted using Corpus. This study applies topic modeling using the Latent Dirichlet Allocation (LDA) model and how to analyze COVID-19 data taken from Wikipedia. The LDA method will cluster by looking at the number of words that appear in Corpus and will determine the number of clusters and the number of topics and determine the iteration. The purpose of this study is to classify the information contained in the Wikipedia Article so that it can be used as an evaluation material in improving services and handling Wikipedia using the latent direchlet allocation method. The LDA method will mark every word contained in the topic in a semi-random distribution and will calculate the probability of the topic in the dataset and will calculate the probability of the word on the topic of each iteration. In this study, 5 iteration tests were conducted on topic modeling and a number of different topics. After the experiment is carried out, the final results obtained will be analyzed and get 1 number of topics with the best results with the most discussion topics regarding health.

APA, Harvard, Vancouver, ISO, and other styles

20

Masood, Muhammad Ali, Rabeeh Ayaz Abbasi, Onaiza Maqbool, Mubashar Mushtaq, Naif R. Aljohani, Ali Daud, Muhammad Ahtisham Aslam, and Jalal S. Alowibdi. "MFS-LDA: a multi-feature space tag recommendation model for cold start problem." Program 51, no. 3 (September 5, 2017): 218–34. http://dx.doi.org/10.1108/prog-01-2017-0002.

Full text

Abstract:

Purpose Tags are used to annotate resources on social media platforms. Most tag recommendation methods use popular tags, but in the case of new resources that are as yet untagged (the cold start problem), popularity-based tag recommendation methods fail to work. The purpose of this paper is to propose a novel model for tag recommendation called multi-feature space latent Dirichlet allocation (MFS-LDA) for cold start problem. Design/methodology/approach MFS-LDA is a novel latent Dirichlet allocation (LDA)-based model which exploits multiple feature spaces (title, contents, and tags) for recommending tags. Exploiting multiple feature spaces allows MFS-LDA to recommend tags even if data from a feature space is missing (the cold start problem). Findings Evaluation of a publicly available data set consisting of around 20,000 Wikipedia articles that are tagged on a social bookmarking website shows a significant improvement over existing LDA-based tag recommendation methods. Originality/value The originality of MFS-LDA lies in segregation of features for removing bias toward dominant features and in synchronization of multiple feature space for tag recommendation.

APA, Harvard, Vancouver, ISO, and other styles

21

Calistus, Ugorji C., Moses O. Onyesolu, Asogwa C. Doris, and Chukwudumebi V. Egwu. "Exploring Latent Dirichlet Allocation (LDA) in Topic Modeling: Theory, Applications, and Future Directions." NEWPORT INTERNATIONAL JOURNAL OF ENGINEERING AND PHYSICAL SCIENCES 4, no. 1 (March 11, 2024): 9–16. http://dx.doi.org/10.59298/nijep/2024/41916.1.1100.

Full text

Abstract:

In an era dominated by an unprecedented deluge of textual information, the need for effective methods to make sense of large datasets is more pressing than ever. This article takes a pragmatic approach to unraveling the intricacies of topic modeling, with a specific focus on the widely used Latent Dirichlet Allocation (LDA) algorithm. The initial segment of the article lays the groundwork by exploring the practical relevance of topic modeling in real-world scenarios. It addresses the everyday challenges faced by researchers and professionals dealing with vast amounts of unstructured text, emphasizing the potential of topic modeling to distill meaningful insights from seemingly chaotic data. Moving beyond theoretical abstraction, the article then delves into the mechanics of Latent Dirichlet Allocation. Developed in 2003 by Blei, Ng, and Jordan, LDA provides a probabilistic framework to identify latent topics within documents. The article takes a step-by-step approach to demystify LDA, offering a practical understanding of its components and the Bayesian principles governing its operation. A significant portion of the article is dedicated to the practical implementation of LDA. It provides insights into preprocessing steps, parameter tuning, and model evaluation, offering readers a hands-on guide to applying LDA in their own projects. Real-world examples and case studies showcase how LDA can be a valuable tool for tasks such as document clustering, topic summarization, and sentiment analysis. However, the journey through LDA is not without challenges, and the article candidly addresses these hurdles. Topics such as determining the optimal number of topics, the sensitivity of results to parameter settings, and the interpretability of outcomes are discussed. This realistic appraisal adds depth to the article, helping readers navigate the nuances and potential pitfalls of employing LDA in practice. Beyond the technical intricacies, the article explores the broad spectrum of applications where LDA has proven its efficacy. From text mining and information retrieval to social network analysis and healthcare informatics, LDA has left an indelible mark on diverse domains. Through practical examples, the article illustrates how LDA can be adapted to different contexts, showcasing its versatility as a tool for uncovering latent patterns. Keywords: Topic Modeling, Latent Dirichlet Allocation, Text Mining, Natural Language Processing, Document Clustering, Bayesian Inference.

APA, Harvard, Vancouver, ISO, and other styles

22

Mahoto, Naeem Ahmed. "Estimating News Coverage Patterns using Latent Dirichlet Allocation (LDA)." Sukkur IBA Journal of Emerging Technologies 1, no. 1 (June 27, 2018): 51–56. http://dx.doi.org/10.30537/sjet.v1i1.142.

Full text

Abstract:

The growing rate of unstructured textual data has made an open challenge for the knowledge discovery, which aims extracting desired information from large collection of data. This study presents a system to derive news coverage patterns with the help of probabilistic model – Latent Dirichlet Allocation. Pattern is an arrangement of words within collected data that more likely appear together in certain context. The news coverage patterns have been computed as number function of news articles comprising of such patterns. A prototype, as a proof, has been developed to estimate the news coverage patterns for a newspaper – The Dawn. Analyzing the news coverage patterns from different aspects has been carried out using multidimensional data model. Further, the extracted news coverage patterns are illustrated by visual graphs to yield in-depth understanding of the topics, which have been covered in the news. The results also assist in identification of schema related to newspaper and journalists’ articles.

APA, Harvard, Vancouver, ISO, and other styles

23

Siringoringo, Rimbun, Jamaluddin Jamaluddin, and Resianta Perangin-Angin. "Pemodelan Topik Berita Menggunakan Latent Dirichlet Allocation dan K-Means Clustering." Jurnal Informatika Kaputama (JIK) 4, no. 2 (July 1, 2020): 216–22. http://dx.doi.org/10.59697/jik.v4i2.334.

Full text

Abstract:

Majority of people now search the internet for news or information topics. The growth of the internet and social media has led to the emergence of hundreds of portals or online news with very diverse news topics. Searching for headlines manually is an ineffective and time-consuming method. In this study headlines modeling was used using Latent Dirichlet Allocation (LDA). Prior to the application of the LDA model, supporting processes such as tokenization, lemmatization, tf-idf factorization and non-negative matrix factorization were also applied. The results showed that the LDA can be applied to model the news topic well with a loglikelihood score of -13615,912 and a perplexity score of 378,958. In addition to using LDA, topic modeling is also done in the form of clusters by applying k-means clustering. With the elbow method, the ideal number of clusters for k-means clustering is 5 clusters and the silhouette performance is 0.62

APA, Harvard, Vancouver, ISO, and other styles

24

Serizawa, Midori, and Ichiro Kobayashi. "Topic Tracking Based on Identifying Proper Number of the Latent Topics in Documents." Journal of Advanced Computational Intelligence and Intelligent Informatics 16, no. 5 (July 20, 2012): 611–18. http://dx.doi.org/10.20965/jaciii.2012.p0611.

Full text

Abstract:

In this paper, we propose a method for detecting and tracking topics of newspaper articles based on the latent semantics of the documents. We use Latent Dirichlet Allocation (LDA) to extract latent topics. In using LDA, we have to provide the number of latent topics in target documents in advance. To do so, perplexity is widely used as a metric for estimating the number of latent topics in documents. As a solution, we estimate the number of latent topics without any prior information in the case of using Hierarchical Dirichlet Process LDA (HDP-LDA). We propose a method to estimate the number of latent topics in target documents based on calculating the similarity among extracted topics, and conduct an experiment with three data sets to compare the method with the above two representative methods, i.e., HDP-LDA and LDA using perplexity. From experimental results, we confirmed that our method can provide results similar to that of HDP-LDA. We also detect and track topics by means of our proposed method and confirm that our method is useful.

APA, Harvard, Vancouver, ISO, and other styles

25

Kondeti, Bhuvaneshwari, Jyothirani S. A, and Haragopal V. V. "Keyword Extraction – Comparison of Latent Dirichlet Allocation and Latent Semantic Analysis." European Journal of Mathematics and Statistics 3, no. 3 (June 13, 2022): 40–47. http://dx.doi.org/10.24018/ejmath.2022.3.3.119.

Full text

Abstract:

The main aim of the present study is to compare the keywords extracted from abstracts and full length text of scientific research papers. In addition to that, here, we compare Latent Semantic Analysis (LSA) and Latent Dirichlet Allocation (LDA) to identify better performer for keyword extraction. This comparative study is divided into three levels, In the first level, scientific research articles on topics such as Indian Economic growth, GDP, Economic Slowdown etc. were collected and abstracts and full length text was extracted from the sources and pre-processed to remove the words and characters which were not useful to obtain the semantic structures or necessary patterns to make the meaningful corpus. In the second level, the pre-processed data were converted into a bag of words and numerical statistic TF-IDF (Term Frequency – Inverse Document Frequency) is used to assess how relevant a word is to a document in a corpus. In the third level, in order to study the feasibility of the Natural Language Processing (NLP) techniques, Latent Semantic analysis (LSA) and Latent Dirichlet Allocations (LDA) methods were applied over the resultant corpus.

APA, Harvard, Vancouver, ISO, and other styles

26

Wang, Yansheng, Yongxin Tong, and Dingyuan Shi. "Federated Latent Dirichlet Allocation: A Local Differential Privacy Based Framework." Proceedings of the AAAI Conference on Artificial Intelligence 34, no. 04 (April 3, 2020): 6283–90. http://dx.doi.org/10.1609/aaai.v34i04.6096.

Full text

Abstract:

Latent Dirichlet Allocation (LDA) is a widely adopted topic model for industrial-grade text mining applications. However, its performance heavily relies on the collection of large amount of text data from users' everyday life for model training. Such data collection risks severe privacy leakage if the data collector is untrustworthy. To protect text data privacy while allowing accurate model training, we investigate federated learning of LDA models. That is, the model is collaboratively trained between an untrustworthy data collector and multiple users, where raw text data of each user are stored locally and not uploaded to the data collector. To this end, we propose FedLDA, a local differential privacy (LDP) based framework for federated learning of LDA models. Central in FedLDA is a novel LDP mechanism called Random Response with Priori (RRP), which provides theoretical guarantees on both data privacy and model accuracy. We also design techniques to reduce the communication cost between the data collector and the users during model training. Extensive experiments on three open datasets verified the effectiveness of our solution.

APA, Harvard, Vancouver, ISO, and other styles

27

Luo, Wang, and Tian Bing Zhang. "Blind Image Quality Assessment Using Latent Dirichlet Allocation Model." Applied Mechanics and Materials 483 (December 2013): 594–98. http://dx.doi.org/10.4028/www.scientific.net/amm.483.594.

Full text

Abstract:

In this paper, we propose a blind image quality assessment (IQA) method under latent Dirichlet allocation (LDA) model. To assess the image quality, firstly, we learn topic-specific word distribution by training a set of pristine and distorted images without human subjective scores. Secondly, LDA model is used to estimate probability distribution of topic for the regions in the test images. Finally, we calculate the perceptual quality score of the test image by comparing the estimated probabilities of topics of the test image with that for the pristine images. Note that the quality-aware visual words are used to represent the images, which generated with respect to the natural scene statistic features. Experimental evaluation on the publicly available subjective-rated database LIVE demonstrates that our proposed method correlates reasonably well with different mean opinion scores (DMOS).

APA, Harvard, Vancouver, ISO, and other styles

28

Sanjaya ER, Ngurah Agus. "Implementasi Latent Dirichlet Allocation (LDA) untuk Klasterisasi Cerita Berbahasa Bali." Jurnal Teknologi Informasi dan Ilmu Komputer 8, no. 1 (February 4, 2021): 127. http://dx.doi.org/10.25126/jtiik.0813556.

Full text

Abstract:

<p class="Abstrak">Cerita-cerita berbahasa Bali memiliki topik yang beragam namun memuat nilai kearifan lokal yang perlu untuk dilestarikan. Jika cerita-cerita tersebut dapat dikelompokkan berdasarkan topik, tentu akan sangat memudahkan bagi para pembacanya dalam memilih bacaan yang diinginkan. <em>Latent Dirichlet Allocation</em> (<em>LDA</em>) mengasumsikan bahwa suatu dokumen dibangun dari perpaduan topik-topik tersembunyi. Dengan menerapkan <em>LDA</em> pada kumpulan dokumen, maka dapat diketahui distribusi topik-topik tersembunyi pada kumpulan dokumen secara umum maupun masing-masing dokumen. Pada penelitian ini, distribusi topik yang ditemukan oleh LDA pada kumpulan cerita berbahasa Bali digunakan untuk melakukan pengelompokkan cerita secara otomatis. Tahapan penelitian meliputi digitalisasi cerita, tokenisasi, <em>case-folding</em>, <em>stemming</em>, pencarian topik dengan <em>LDA</em>, representasi dokumen dan klasterisasi hirarki secara <em>agglomerative</em>. Pengujian dilakukan menggunakan 100 buah data cerita berbahasa Bali yang didapat dari situs daring maupun Dinas Kebudayaan Provinsi Bali untuk menghitung akurasi hasil klasterisasi. Evaluasi dilakukan juga untuk melihat pengaruh jumlah kata dan ukuran kesamaan yang digunakan terhadap akurasi. Akurasi hasil klasterisasi tertinggi yang didapatkan adalah 62% pada saat jumlah kata yang digunakan sebagai representasi dokumen berjumlah 3000 kata. Selain itu, didapatkan suatu kesimpulan bahwa akurasi klasterisasi juga sangat dipengaruhi oleh ukuran kesamaan yang digunakan ketika melakukan penggabungan dokumen serta jumlah kata sebagai representasi dokumen.</p><p class="Abstrak"> </p><p class="Abstrak"><em><strong>Abstract</strong></em></p><p class="Abstrak"><em>Balinese folklores have diverse topics but contain local wisdom that needs to be preserved. Grouping the stories based on the topics can certainly help readers to choose their readings accordingly. Latent Dirichlet Allocation (LDA) assumes that a document is built from a combination of hidden topics. By applying LDA to a collection of documents (corpus), the global distribution of hidden topics in the corpus as well as the distribution of each individual document in the corpus can be identified. In this research, the individual distribution of topics in Balinese folklores is used to group stories based on common topics. The research stages include story digitization, tokenization, case-folding, stemming, topic search with LDA, document representation and agglomerative hierarchical clustering. Performance evaluation was carried out using 100 Balinese folklores data obtained from online sites and the Bali Provincial Cultural Office to calculate the accuracy of the clustering results. Evaluation is also carried out to see the effect of the number of words and the similarity measure used on accuracy. The highest accuracy obtained is 62% when the number of words used as the representation of a document is 3000 words. In addition, it can be concluded that accuracy is also greatly influenced by the similarity measure used when merging the documents and the number of words for document representation.</em></p>

APA, Harvard, Vancouver, ISO, and other styles

29

Rusdhi, Vira Faradhiba, and Ilmiyati Sari. "IDENTIFIKASI TOPIK ARTIKEL BERITA MENGGUNAKAN TOPIC MODELLING DENGAN LATENT DIRICHLET ALLOCATION." Jurnal Ilmiah Informatika Komputer 27, no. 2 (2022): 169–76. http://dx.doi.org/10.35760/ik.2022.v27i2.6829.

Full text

Abstract:

Portal berita memberikan informasi yang sangat beragam, namun judul berita tidak dapat dijadikan acuan utama dalam penentuan topik suatu berita secara keseluruhan karena judul berita bersifat hipebola untuk menarik pembaca. Oleh karena itu, penelitian ini mengusulkan sistem identifikasi topik artikel berita menggunakan topic modelling dengan algoritma Latent Dirichlet Allocation (LDA). Tahapan penelitian diawali dengan pengambilan data secara otomatis dari situs web detik.com dan tempo.co dengan proses web scrapping, kemudian dilakukan preprocessing terhadap data. Ada 4 tahap preprocessing yaitu tokenization, case folding, stopword removal, dan stemming. Tahap terakhir adalah topic modelling dengan algoritma LDA. Topic modelling merupakan model statistik untuk menentukan inti atau topik pada kumpulan dokumen. Identifikasi topik dengan algoritma LDA didasarkan pada probabilitas kemunculan kata dalam kumpulan dokumen. Penelitian ini menghasilkan topik yang paling sering muncul dalam portal berita kriminal adalah pembunuhan.

APA, Harvard, Vancouver, ISO, and other styles

30

Nam, Sohee. "Inference of Latent Dirichlet Allocation Topic Model using PMI." Korean Data Analysis Society 21, no. 6 (December 31, 2019): 2789–800. http://dx.doi.org/10.37727/jkdas.2019.21.6.2789.

Full text

APA, Harvard, Vancouver, ISO, and other styles

31

Fang, Debin, Haixia Yang, Baojun Gao, and Xiaojun Li. "Discovering research topics from library electronic references using latent Dirichlet allocation." Library Hi Tech 36, no. 3 (September 17, 2018): 400–410. http://dx.doi.org/10.1108/lht-06-2017-0132.

Full text

Abstract:

Purpose Discovering the research topics and trends from a large quantity of library electronic references is essential for scientific research. Current research of this kind mainly depends on human justification. The purpose of this paper is to demonstrate how to identify research topics and evolution in trends from library electronic references efficiently and effectively by employing automatic text analysis algorithms. Design/methodology/approach The authors used the latent Dirichlet allocation (LDA), a probabilistic generative topic model to extract the latent topic from the large quantity of research abstracts. Then, the authors conducted a regression analysis on the document-topic distributions generated by LDA to identify hot and cold topics. Findings First, this paper discovers 32 significant research topics from the abstracts of 3,737 articles published in the six top accounting journals during the period of 1992-2014. Second, based on the document-topic distributions generated by LDA, the authors identified seven hot topics and six cold topics from the 32 topics. Originality/value The topics discovered by LDA are highly consistent with the topics identified by human experts, indicating the validity and effectiveness of the methodology. Therefore, this paper provides novel knowledge to the accounting literature and demonstrates a methodology and process for topic discovery with lower cost and higher efficiency than the current methods.

APA, Harvard, Vancouver, ISO, and other styles

32

Muharomah, Sallu, and Chanifah Indah Ratnasari. "Latent Dirichlet Allocation for Uncovering Fraud Cases on Twitter." Jurnal Riset Informatika 5, no. 3 (June 9, 2023): 345–54. http://dx.doi.org/10.34288/jri.v5i3.551.

Full text

Abstract:

Fraud is a phenomenon that continues to exist in society with a modus operandi that continues to evolve with the times. The mode of operation of fraud is continually evolving with technological advancements, globalization, and consumer behavior shifts. In today's digital age, social media is important in spreading information regarding fraud. Twitter is a social media platform that is widely used. Twitter provides easy and fast access to relevant information. As a result, to raise fraud awareness, it is critical to study the mode of operation of fraud spread on social media, particularly on Twitter. The Latent Dirichlet Allocation (LDA) approach is used in this work to classify and identify fraud issues often addressed by Indonesian Twitter users. By applying LDA modeling, this study aims to understand more comprehensively the fraudulent topics that often appear on Twitter. The research found that seven fraud topics are most commonly discussed by Twitter users in Indonesia, with the highest cohesion value of 0.491899.

APA, Harvard, Vancouver, ISO, and other styles

33

Muharomah, Sallu, and Chanifah Indah Ratnasari. "Latent Dirichlet Allocation for Uncovering Fraud Cases on Twitter." Jurnal Riset Informatika 5, no. 3 (June 23, 2023): 345–54. http://dx.doi.org/10.34288/jri.v5i3.227.

Full text

Abstract:

Fraud is a phenomenon that continues to exist in society with a modus operandi that continues to evolve with the times. The mode of operation of fraud is continually evolving with technological advancements, globalization, and consumer behavior shifts. In today's digital age, social media is important in spreading information regarding fraud. Twitter is a social media platform that is widely used. Twitter provides easy and fast access to relevant information. As a result, to raise fraud awareness, it is critical to study the mode of operation of fraud spread on social media, particularly on Twitter. The Latent Dirichlet Allocation (LDA) approach is used in this work to classify and identify fraud issues often addressed by Indonesian Twitter users. By applying LDA modeling, this study aims to understand more comprehensively the fraudulent topics that often appear on Twitter. The research found that seven fraud topics are most commonly discussed by Twitter users in Indonesia, with the highest cohesion value of 0.491899.

APA, Harvard, Vancouver, ISO, and other styles

34

Ogunwale, Yetunde Esther, and Micheal Olalekan Ajinaja. "Application Research on Semantic Analysis Using Latent Dirichlet Allocation and Collapsed Gibbs Sampling for Topic Discovery." Asian Journal of Research in Computer Science 16, no. 4 (December 29, 2023): 445–52. http://dx.doi.org/10.9734/ajrcos/2023/v16i4404.

Full text

Abstract:

Topic discovery is a process of identifying the main topics present in a collection of documents. It is a crucial step in text mining, digital humanities, and information retrieval, as it allows one to extract meaningful information from large volumes of unstructured text data. The most widely used algorithm for topic discovery is Latent Dirichlet Allocation (LDA). LDA assumes that the words in each document are generated by a small number of underlying topics, and the algorithm learns the topics from the text data automatically. One of the main problems of LDA is that the topics extracted are of poor quality if the document does not coherently belong to a single topic. However, Gibbs sampling operates on a word-by-word basis, which allows it to be used on documents with a variety of topics and modifies the topic assignment of a single word. The paper presents application research on Latent Dirichlet Allocation and Collapsed Gibbs Sampling Semantic Analysis for topic discovery.

APA, Harvard, Vancouver, ISO, and other styles

35

Garbhapu, Vasantha Kumari. "A comparative analysis of Latent Semantic analysis and Latent Dirichlet allocation topic modeling methods using Bible data." Indian Journal of Science and Technology 13, no. 44 (November 20, 2020): 4474–82. http://dx.doi.org/10.17485/ijst/v13i44.1479.

Full text

Abstract:

Objective: To compare the topic modeling techniques, as no free lunch theorem states that under a uniform distribution over search problems, all machine learning algorithms perform equally. Hence, here, we compare Latent Semantic Analysis (LSA) or Latent Dirichlet Allocation (LDA) to identify better performer for English bible data set which has not been studied yet. Methods: This comparative study divided into three levels: In the first level, bible data was extracted from the sources and preprocessed to remove the words and characters which were not useful to obtain the semantic structures or necessary patterns to make the meaningful corpus. In the second level, the preprocessed data were converted into a bag of words and numerical statistic TF-IDF (Term Frequency – Inverse Document Frequency) is used to assess how relevant a word is to a document in a corpus. In the third level, Latent Semantic analysis and Latent Dirichlet Allocations methods were applied over the resultant corpus to study the feasibility of the techniques. Findings: Based on our evaluation, we observed that the LDA achieves 60 to 75% superior performance when compared to LSA using document similarity within-corpus, document similarity with the unseen document. Additionally, LDA showed better coherence score (0.58018) than LSA (0.50395). Moreover, when compared to any word within-corpus, the word association showed better results with LDA. Some words have homonyms based on the context; for example, in the bible; bear has a meaning of punishment and birth. In our study, LDA word association results are almost near to human word associations when compared to LSA. Novelty: LDA was found to be the computationally efficient and interpretable method in adopting the English Bible dataset of New International Version that was not yet created. Keywords: Topic modeling; LSA; LDA; word association; document similarity;Bible data set

APA, Harvard, Vancouver, ISO, and other styles

36

Akdeas Oktanae Widodo, Farhan Septiadi, and Nur Aini Rakhmawati. "ANALISIS TREN KONTEN PADA VTUBER INDONESIA MENGGUNAKAN LATENT DIRICHLET ALLOCATION." Jurnal Informatika dan Rekayasa Elektronik 6, no. 1 (April 19, 2023): 56–63. http://dx.doi.org/10.36595/jire.v6i1.718.

Full text

Abstract:

YouTube merupakan bukti dari perkembangan teknologi digital terbaru dalam bidang media dan hiburan. Tidak semua youtuber atau orang yang membuat konten video di YouTube melakukan ekspresi diri secara langsung, ada yang menggunakan perantara karakter virtual dua dimensi atau tiga dimensi yang dibuat dengan bantuan perangkat lunak computer untuk berinteraksi dengan penonton yang kedepannya biasa disebut sebagai Virtual Youtuber (VTuber). Tren vtuber ini mulai terkenal pada tahun 2016 di jepang makin tahun makin meningkat diseluruh dunia, Kemudian Semakin banyak orang-orang menyukai konten yang dibawakan oleh Vtuber dan tidak sedikit yang tertarik untuk menjadi bagian dari vtuber juga. Untuk membantu hal tersebut penelitian ini membantu untuk mengetahui topik konten yang ditayangkan oleh Vtuber Terkenal menggunakan metode Latent Dirichlet Allocation (LDA). Analisis dilakukan setelah menjalankan proses text mining terhadap 4312 video dari 10 channel vtuber Indonesia teratas. Penentuan topik yang optimal dari LDA yang diterapkan dapat melihat hasil nilai perplexity dan topic coherence. Dari implementasi metode LDA didapatkan hasil berupa lima topik yang sering ditayangkan oleh vtuber antara lain gim Minecraft dan reading donation, gim Apex Legend disertai collaboration dengan vtuber lain, tayangan siaran langsung video game, cover dari lagu penyanyi lain dan terakhir tayangan gim multiplayer lain seperti Raft atau Phasmophobia.

APA, Harvard, Vancouver, ISO, and other styles

37

Baranowski, Mariusz. "Epistemological aspect of topic modelling in the social sciences: Latent Dirichlet Allocation." Przegląd Krytyczny 4, no. 1 (August 21, 2022): 7–16. http://dx.doi.org/10.14746/pk.2022.4.1.1.

Full text

Abstract:

Aware of the challenges faced by the social sciences in publishing a massive volume of research papers, it is worth looking at a novel but no longer so new ways of machine learning for the purposes of literature review. To this end, I explore a probabilistic topic model called Latent Dirichlet Allocation (LDA) in the context of the epistemological challenge of analysing texts on social welfare. This paper aims to describe how the LDA algorithm works for large corpora of data, along with its advantages and disadvantages. This preliminary characterisation of an inductive method for automated text analysis is intended to give a brief overview of how LDA can be used in the social sciences.

APA, Harvard, Vancouver, ISO, and other styles

38

Altarturi, Hamza H. M., Muntadher Saadoon, and Nor Badrul Anuar. "Web content topic modeling using LDA and HTML tags." PeerJ Computer Science 9 (July 11, 2023): e1459. http://dx.doi.org/10.7717/peerj-cs.1459.

Full text

Abstract:

An immense volume of digital documents exists online and offline with content that can offer useful information and insights. Utilizing topic modeling enhances the analysis and understanding of digital documents. Topic modeling discovers latent semantic structures or topics within a set of digital textual documents. The Internet of Things, Blockchain, recommender system, and search engine optimization applications use topic modeling to handle data mining tasks, such as classification and clustering. The usefulness of topic models depends on the quality of resulting term patterns and topics with high quality. Topic coherence is the standard metric to measure the quality of topic models. Previous studies build topic models to generally work on conventional documents, and they are insufficient and underperform when applied to web content data due to differences in the structure of the conventional and HTML documents. Neglecting the unique structure of web content leads to missing otherwise coherent topics and, therefore, low topic quality. This study aims to propose an innovative topic model to learn coherence topics in web content data. We present the HTML Topic Model (HTM), a web content topic model that takes into consideration the HTML tags to understand the structure of web pages. We conducted two series of experiments to demonstrate the limitations of the existing topic models and examine the topic coherence of the HTM against the widely used Latent Dirichlet Allocation (LDA) model and its variants, namely the Correlated Topic Model, the Dirichlet Multinomial Regression, the Hierarchical Dirichlet Process, the Hierarchical Latent Dirichlet Allocation, the pseudo-document based Topic Model, and the Supervised Latent Dirichlet Allocation models. The first experiment demonstrates the limitations of the existing topic models when applied to web content data and, therefore, the essential need for a web content topic model. When applied to web data, the overall performance dropped an average of five times and, in some cases, up to approximately 20 times lower than when applied to conventional data. The second experiment then evaluates the effectiveness of the HTM model in discovering topics and term patterns of web content data. The HTM model achieved an overall 35% improvement in topic coherence compared to the LDA.

APA, Harvard, Vancouver, ISO, and other styles

39

Dhanal, Radhika Jinendra, and Vijay Ram Ghorpade. "Aspect term extraction from multi-source domain using enhanced latent Dirichlet allocation." Indonesian Journal of Electrical Engineering and Computer Science 35, no. 1 (July 1, 2024): 475. http://dx.doi.org/10.11591/ijeecs.v35.i1.pp475-484.

Full text

Abstract:

This study presents a comprehensive exploration of sentiment analysis across diverse domains through the introduction of a multi-source domain dataset encompassing hospitals, laptops, restaurants, cell phones, and electronics. Leveraging this extensive dataset, an enhanced latent Dirichlet allocation (E-LDA) model is proposed for topic modeling and aspect extraction, demonstrating superior performance with a remarkable coherence score of 0.5727. Comparative analyses with traditional LDA and other existing models showcase the efficacy of E-LDA in capturing sentiments and specific attributes within different domains. The extracted topics and aspects reveal valuable insights into domain-specific sentiments and aspects, contributing to the advancement of sentiment analysis methodologies. The findings underscore the significance of considering multi-source datasets for a more holistic understanding of sentiment in diverse text corpora.

APA, Harvard, Vancouver, ISO, and other styles

40

Obiorah ,, Philip, Friday Onuodu, and Batholowmeo Eke. "Topic Modeling Using Latent Dirichlet Allocation & Multinomial Logistic Regression." Advances in Multidisciplinary and scientific Research Journal Publication 10, no. 4 (December 30, 2022): 99–112. http://dx.doi.org/10.22624/aims/digital/v10n4p11a.

Full text

Abstract:

Unsupervised categorization for datasets has benefits, but not without a few difficulties. Unsupervised algorithms cluster groups of documents in an unsupervised fashion, and often output findings as vectors containing distributions of words clustered according to their probability of occurring together. Additionally, this technique requires human or domain expert interpretation in order to correctly identify clusters of words as belonging to a certain topic. We propose combining Latent Dirichlet Allocation (LDA) with multi-class Logistic Regression for topic modelling as a multi-step classification process in order to extract and classify topics from unseen texts without relying on human labelling or domain expert interpretation in order to correctly identify clusters of words as belonging to a certain topic. The findings suggest that the two procedures were complementary in terms of identifying textual subjects and overcoming the difficulty of comprehending the array of topics from the output of LDA. Keywords: Natural Language Processing; Topic Modeling; Latent Dirichlet Allocation; Logistic Regression

APA, Harvard, Vancouver, ISO, and other styles

41

Celard, P., A. Seara Vieira, E. L. Iglesias, and L. Borrajo. "LDA filter: A Latent Dirichlet Allocation preprocess method for Weka." PLOS ONE 15, no. 11 (November 9, 2020): e0241701. http://dx.doi.org/10.1371/journal.pone.0241701.

Full text

Abstract:

This work presents an alternative method to represent documents based on LDA (Latent Dirichlet Allocation) and how it affects to classification algorithms, in comparison to common text representation. LDA assumes that each document deals with a set of predefined topics, which are distributions over an entire vocabulary. Our main objective is to use the probability of a document belonging to each topic to implement a new text representation model. This proposed technique is deployed as an extension of the Weka software as a new filter. To demonstrate its performance, the created filter is tested with different classifiers such as a Support Vector Machine (SVM), k-Nearest Neighbors (k-NN), and Naive Bayes in different documental corpora (OHSUMED, Reuters-21578, 20Newsgroup, Yahoo! Answers, YELP Polarity, and TREC Genomics 2015). Then, it is compared with the Bag of Words (BoW) representation technique. Results suggest that the application of our proposed filter achieves similar accuracy as BoW but greatly improves classification processing times.

APA, Harvard, Vancouver, ISO, and other styles

42

Fahlevvi, Mohammad Rezza, and Azhari SN. "Topic Modeling on Online News.Portal Using Latent Dirichlet Allocation (LDA)." IJCCS (Indonesian Journal of Computing and Cybernetics Systems) 16, no. 4 (October 31, 2022): 335. http://dx.doi.org/10.22146/ijccs.74383.

Full text

Abstract:

The amount of News displayed on online news portals. Often does not indicate the topic being discussed, but the News can be read and analyzed. You can find the main issues and trends in the News being discussed. It would be best if you had a quick and efficient way to find trending topics in the News. One of the methods that can be used to solve this problem is topic modeling. Theme modeling is necessary to allow users to easily and quickly understand modern themes' development. One of the algorithms in topic modeling is the Latent Dirichlet Allocation (LDA). This research stage begins with data collection, preprocessing, n-gram formation, dictionary representation, weighting, topic model validation, topic model formation, and topic modeling results. Based on the results of the topic evaluation, the. The best value of topic modeling using coherence was related to the number of passes. The number of topics produced 20 keys, five cases with a 0.53 coherence value. It can be said to be relatively stable based on the standard coherence value.

APA, Harvard, Vancouver, ISO, and other styles

43

Dinda Adimanggala, Fitra Abdurrachman Bachtiar, and Eko Setiawan. "Evaluasi Topik Tersembunyi Berdasarkan Aspect Extraction menggunakan Pengembangan Latent Dirichlet Allocation." Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi) 5, no. 3 (June 19, 2021): 511–19. http://dx.doi.org/10.29207/resti.v5i3.3075.

Full text

Abstract:

Recently, Sentiment Analysis is used for expression detection of products or services. Sentiment Analysis is one category type with a level of aspect focused on extracting product aspects. One of the common methods used for aspect extraction is Latent Dirichlet Allocation (LDA) using random topic identification, but this method has not been able to find an acceptable topic with some aspects having been found. Undeterminable topics are referred to as the hidden topics. This study purpose is to evaluate and compare the suitability of identifying hidden topics between human and computer evaluation. The study is also focused on aspect extraction using a variety of LDA innovations. The data used in this study used case studies on e-Commerce. Data were processed using feature selection and grouped using LDA development. Then the data results are processed using Latent Topic Identification based on subjective and objective evaluations. The identification of hidden topic results was evaluated using several semantic and lexicon tests. The evaluation results indicate the comparison of two hidden topic identification assessment values is quite relevant with the average difference in value reaching 6%. As a result, computer calculations assist humans in determining topics if each topic has a low coherence value.

APA, Harvard, Vancouver, ISO, and other styles

44

Pylov, Petr, Roman Maitak, and Andrey Protodyakonov. "The Latent Dirichlet Allocation (LDA) generative model for automating process of rendering judicial decisions." E3S Web of Conferences 431 (2023): 05005. http://dx.doi.org/10.1051/e3sconf/202343105005.

Full text

Abstract:

The Latent Dirichlet Allocation (LDA) generative model is widely used in statistical analysis and machine learning due to its ability to model the probabilities of multidimensional categorical data, such as the frequencies of different categories or the probability distribution across multiple categories. This article explores the potential application of the LDA model for the practical task of topic separation in documents related to judicial proceedings.

APA, Harvard, Vancouver, ISO, and other styles

45

HUANG, Bo, Jiaji JU, Huan CHEN, Yimin ZHU, Jin LIU, and Zhicai SHI. "Online Latent Dirichlet Allocation Model Based on Sentiment Polarity Time Series." Wuhan University Journal of Natural Sciences 26, no. 6 (December 2021): 464–72. http://dx.doi.org/10.1051/wujns/2021266464.

Full text

Abstract:

The Product Sensitive Online Dirichlet Allocation model (PSOLDA) proposed in this paper mainly uses the sentiment polarity of topic words in the review text to improve the accuracy of topic evolution. First, we use Latent Dirichlet Allocation (LDA) to obtain the distribution of topic words in the current time window. Second, the word2vec word vector is used as auxiliary information to determine the sentiment polarity and obtain the sentiment polarity distribution of the current topic. Finally, the sentiment polarity changes of the topics in the previous and next time window are mapped to the sentiment factors, and the distribution of topic words in the next time window is controlled through them. The experimental results show that the PSOLDA model decreases the probability distribution by 0.160 1, while Online Twitter LDA only increases by 0.069 9. The topic evolution method that integrates the sentimental information of topic words proposed in this paper is better than the traditional model.

APA, Harvard, Vancouver, ISO, and other styles

46

Xie, WenBo, Qiang Dong, and Hui Gao. "A Probabilistic Recommendation Method Inspired by Latent Dirichlet Allocation Model." Mathematical Problems in Engineering 2014 (2014): 1–10. http://dx.doi.org/10.1155/2014/979147.

Full text

Abstract:

The recent decade has witnessed an increasing popularity of recommendation systems, which help users acquire relevant knowledge, commodities, and services from an overwhelming information ocean on the Internet. Latent Dirichlet Allocation (LDA), originally presented as a graphical model for text topic discovery, now has found its application in many other disciplines. In this paper, we propose an LDA-inspired probabilistic recommendation method by taking the user-item collecting behavior as a two-step process: every user first becomes a member of one latent user-group at a certain probability and each user-group will then collect various items with different probabilities. Gibbs sampling is employed to approximate all the probabilities in the two-step process. The experiment results on three real-world data sets MovieLens, Netflix, and Last.fm show that our method exhibits a competitive performance on precision, coverage, and diversity in comparison with the other four typical recommendation methods. Moreover, we present an approximate strategy to reduce the computing complexity of our method with a slight degradation of the performance.

APA, Harvard, Vancouver, ISO, and other styles

47

Prakerti, Aqila Intan, Avelyna Ferariya Claresta, Muhammad Rasyid Kafif Ibrahim, and Nur Aini Rakhmawati. "Model Latent Dirichlet Allocation Pada Perilaku Siswa Menggunakan Media Pembelajaran Daring." INFORMATION MANAGEMENT FOR EDUCATORS AND PROFESSIONALS : Journal of Information Management 5, no. 1 (December 29, 2020): 35. http://dx.doi.org/10.51211/imbi.v5i1.1407.

Full text

Abstract:

Abstrak: Indonesia saat ini sedang dihebohkan dengan yang namanya sekolah Daring. Dimana yang seharusnya sekolah adalah tempat untuk guru dan siswa mengajarkan ilmu dari pendidikan hingga perilaku secara tatap muka dan sekarang karena keadaan yang tidak bisa dihindari maka harus dilakukannya pembelajaran secara online yaitu dengan alat perantara. Permasalahan diambil dari banyaknya siswa sudah mempunyai alat komunikasi yaitu handphone dan berbagai media sosial yang sudah dikuasai seperti Instagram. Dengan maksud untuk menganalisa siswa khususnya di Indonesia, sikap apa yang diambil ketika siswa menggunakan Instagram ketika sedang berlangsungnya pembelajaran secara daring. Didapatkan hasil ketika melakukan Teknik crawling data untuk mendapatkan teks atau caption dari penggunaan hashtag sekolah daring yaitu 120 post dalam keadaan sudah terseleksi dari yang bukan post dari siswa. Bentuk analisa untuk pengolahan data yang sudah didapat menggunakan model Latent Dirichlet Allocation (LDA) yaitu untuk menemukan topik yang mendominasi dari hashtag yang digunakan dengan penambahan fitur Stopword untuk kata yang tidak diperlukan. Hasil akhir dari analisa tersebut terdapat 4 topik yang dominan dan dimayoritasi oleh siswa yang mendapatkan penugasan dari sekolah seperti pelajaran biologi. Kata kunci: Instagram, Latent Dirichlet Allocation (LDA), Pembelajaran Daring, Abstract: Indonesia is currently being shocked by the named school Online. Where the school should be a place for teachers and students to teach knowledge from education to face-to-face behavior and now because of circumstances that cannot be avoided,learning must be carried out online, namely with an intermediary tool. The problem is taken from the number of students who already have communication tools, namely mobile phones and various social media that have been mastered such as Instagram. With a view to analyzing students, especially in Indonesia, what attitudes are taken when students use Instagram when learning is taking place online. Obtained results when performing techniques crawling data to get text or captions from the use hashtags, of online school namely 120 posts in a selected state from non- posts student. The form of analysis for processing the data that has been obtained uses the model, Latent Dirichlet Allocation (LDA) which is to find the dominant topic of the hashtags used by adding the feature Stopword for unnecessary words. The final result of the analysis, there are 4 topics that are dominant and are majored by students who get assignments from schools such as biology lessons. Keywords: E-Learning, Instagram, Latent Dirichlet Allocation (LDA).

APA, Harvard, Vancouver, ISO, and other styles

48

Syaifuddin, Ahmad, Reddy Alexandro Harianto, and Joan Santoso. "Analisis Trending Topik untuk Percakapan Media Sosial dengan Menggunakan Topic Modelling Berbasis Algoritme LDA." Journal of Intelligent System and Computation 2, no. 1 (July 15, 2021): 12–19. http://dx.doi.org/10.52985/insyst.v2i1.150.

Full text

Abstract:

Aplikasi WhatsApp merupakan salah satu aplikasi chatting yang sangat populer terutama di Indonesia. WhatsApp mempunyai data unik karena memiliki pola pesan dan topik yang beragam dan sangat cepat berubah, sehingga untuk mengidentifikasi suatu topik dari kumpulan pesan tersebut sangat sulit dan menghabiskan banyak waktu jika dilakukan secara manual. Salah satu cara untuk mendapatkan informasi tersirat dari media sosial tersebut yaitu dengan melakukan pemodelan topik. Penelitian ini dilakukan untuk menganalisis penerapan metode LDA (Latent Dirichlet Allocation) dalam mengidentifikasi topik apa saja yang sedang dibahas pada grup WhatsApp di Universitas Islam Majapahit serta melakukan eksperimen pemodelan topik dengan menambahkan atribut waktu dalam penyusunan dokumen. Penelitian ini menghasilkan model topic dan nilai evaluasi f-measure dari model topik berdasarkan uji coba yang dilakukan. Metode LDA dipilih untuk melakukan pemodelan topik dengan memanfaatkan library LDA pada python serta menerapkan standar text-preprocessing dan menambahkan slang words removal untuk menangani kata tidak baku dan singkatan pada chat logs. Pengujian model topik dilakukan dengan uji human in the loop menggunakan word instrusion task kepada pakar Bahasa Indonesia. Hasil evaluasi LDA didapatkan hasil percobaan terbaik dengan mengubah dokumen menjadi 10 menit dan menggabungkan dengan reply chat pada percakapan grup WhatsApp merupakan salah satu cara dalam meningkatkan hasil pemodelan topik menggunakan algoritma Latent Dirichlet Allocation (LDA), didapatkan nilai precision sebesar 0.9294, nilai recall sebesar 0.7900 dan nilai f-measure sebesar 0.8541.

APA, Harvard, Vancouver, ISO, and other styles

49

Akbar, Jihadul, Tamrizal A. M., Yefta Tolla, Abdulrahmat E. Ahmad, Ainul Yaqin, and Ema Utami. "Pemodelan Topik Menggunakan Latent Dirichlet Allocation pada Ulasan Aplikasi PeduliLindungi." InComTech : Jurnal Telekomunikasi dan Komputer 13, no. 1 (April 30, 2023): 40. http://dx.doi.org/10.22441/incomtech.v13i1.15572.

Full text

Abstract:

Pandemi covid-19 yang melanda seluruh dunia termasuk Indonesia, membutuhkan langkah-langkah pencegahan seperti penelusuran (tracing), pelacakan (tracking) dan pemberian peringatan (warning dan fencing). Salah satu langkah pencegahan yang dilakukan Pemerintah adalah melalui Keputusan Menteri Komunikasi dan Informatika Nomor 171 Tahun 2020 tentang Penetapan Aplikasi Pedulilindungi Dalam Rangka Pelaksanaan Surveilans Kesehatan Penanganan Coronavirus Disease 2019 (Covid-19), menetapkan aplikasi Pedulilindungi sebagai aplikasi surveilans kesehatan penanganan COVID-19. Berbagai komentar disampaikan masyarakat terhadap aplikasi pedulilindungi termasuk melalui kolom ulasan pada playstore. Pada penelitian ini akan dilakukan pemodelan topik menggunakan LDA terhadap ulasan masyarakat tentang aplikasi pedulilindungi. Jumlah data yang digunakan pada penelitian ini sebanyak 13.731 data yang didapatkan dengan melakukan scraping pada google play mulai tanggal 15 september s/d 6 desember 2021 menggunakan library google scrapping. Langkah-langkah yang dilakukan dalam penelitian ini adalah preprocessing dataset, melakukan proses word2vec, menghitung nilai coherence dan melakukan pemodelan topik. Berdasarkan perhitungan nilai coherence pembagian jumlah topik yang ideal adalah 5, kemudian setelah diproses menggunakan algoritma LDA kesimpulan dari ke 5 topik tersebut didefinisikan sebagai kendala pendaftaran, sertifikat vaksin, tanggal lahir yang tidak sesuai, kendala membuka aplikasi dan keluhan pengguna aplikasi.

APA, Harvard, Vancouver, ISO, and other styles

50

Wang, Yi, and Lihong Xu. "Unsupervised segmentation of greenhouse plant images based on modified Latent Dirichlet Allocation." PeerJ 6 (June 28, 2018): e5036. http://dx.doi.org/10.7717/peerj.5036.

Full text

Abstract:

Agricultural greenhouse plant images with complicated scenes are difficult to precisely manually label. The appearance of leaf disease spots and mosses increases the difficulty in plant segmentation. Considering these problems, this paper proposed a statistical image segmentation algorithm MSBS-LDA (Mean-shift Bandwidths Searching Latent Dirichlet Allocation), which can perform unsupervised segmentation of greenhouse plants. The main idea of the algorithm is to take advantage of the language model LDA (Latent Dirichlet Allocation) to deal with image segmentation based on the design of spatial documents. The maximum points of probability density function in image space are mapped as documents and Mean-shift is utilized to fulfill the word-document assignment. The proportion of the first major word in word frequency statistics determines the coordinate space bandwidth, and the spatial LDA segmentation procedure iteratively searches for optimal color space bandwidth in the light of the LUV distances between classes. In view of the fruits in plant segmentation result and the ever-changing illumination condition in greenhouses, an improved leaf segmentation method based on watershed is proposed to further segment the leaves. Experiment results show that the proposed methods can segment greenhouse plants and leaves in an unsupervised way and obtain a high segmentation accuracy together with an effective extraction of the fruit part.

APA, Harvard, Vancouver, ISO, and other styles

We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!