Log in

Relevant bibliographies by topics / Bigram Analysis / Journal articles

To see the other types of publications on this topic, follow the link: Bigram Analysis.

Journal articles on the topic 'Bigram Analysis'

Author: Grafiati

Published: 7 June 2025

Last updated: 2 August 2025

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 journal articles for your research on the topic 'Bigram Analysis.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse journal articles on a wide variety of disciplines and organise your bibliography correctly.

1

Dang, Edward Kai Fung, Robert Wing Pong Luk, and Qing Li. "A Study of Word Bigrams for Pseudo-relevance Feedback in Information Retrieval." JUCS - Journal of Universal Computer Science 30, no. 11 (2024): 1511–28. http://dx.doi.org/10.3897/jucs.112725.

Full text

Abstract:

Traditional information retrieval models mostly adopt a term independence assumption and are based on single terms or unigrams. Past efforts have attempted to go beyond this assumption, such as by using contiguous terms (i.e. word n-grams) or terms appearing in proximity. One such approach employs pseudo-relevance feedback (PRF) in an extended BM25 model, with an expanded query containing bigrams and proximity word pairs besides unigrams. However, the benefit of this approach over the traditional unigram PRF remains inconclusive. We speculate the uncertain effectiveness of bigram PRF in this p

APA, Harvard, Vancouver, ISO, and other styles

2

Dang, Edward Kai Fung, Robert Wing Pong Luk, and Qing Li. "A Study of Word Bigrams for Pseudo-relevance Feedback in Information Retrieval." JUCS - Journal of Universal Computer Science 30, no. (11) (2024): 1511–28. https://doi.org/10.3897/jucs.112725.

Full text

Abstract:

Traditional information retrieval models mostly adopt a term independence assumption and are based on single terms or unigrams. Past efforts have attempted to go beyond this assumption, such as by using contiguous terms (i.e. word n-grams) or terms appearing in proximity. One such approach employs pseudo-relevance feedback (PRF) in an extended BM25 model, with an expanded query containing bigrams and proximity word pairs besides unigrams. However, the benefit of this approach over the traditional unigram PRF remains inconclusive. We speculate the uncertain effectiveness of bigram PRF in this p

APA, Harvard, Vancouver, ISO, and other styles

3

Crossley, Scott, and Max M. Louwerse. "Multi-dimensional register classification using bigrams." International Journal of Corpus Linguistics 12, no. 4 (2007): 453–78. http://dx.doi.org/10.1075/ijcl.12.4.02cro.

Full text

Abstract:

A corpus linguistic analysis investigated register classification using frequency of bigrams in nine spoken and two written corpora. Four dimensions emerged from a factor analysis using bigram frequencies shared across corpora: (1) Scripted vs. Unscripted Discourse, (2) Deliberate vs. Unplanned Discourse, (3) Spatial vs. Non-Spatial Discourse, and (4) Directional vs. Non-Directional Discourse. These findings were replicated in a second analysis. Both analyses demonstrate the strength of bigrams for classifying spoken and written registers, especially in locating distinct collocations among spo

APA, Harvard, Vancouver, ISO, and other styles

4

Anas, Zakaria, and Siallagan Manahan. "Predicting Customer Satisfaction through Sentiment Analysis on Online Review." International Journal of Current Science Research and Review 06, no. 01 (2023): 515–22. https://doi.org/10.5281/zenodo.7565720.

Full text

Abstract:

<strong>ABSTRACT: </strong>User-generated content, such as user reviews, posts, tags, ratings, and opinions on the internet, can be used as a business indicator if collected and appropriately analyzed. One of the examples is predicting customer satisfaction through implementing big data analytics on online reviews. In analyzing the user-generated content to predict customer satisfaction, the author implements machine learning approach using the Sentiment Analysis method. Five-fold cross-validation was performed to train the classification model. The training was performed with a combination of

APA, Harvard, Vancouver, ISO, and other styles

5

Neacsu, Teodor, Teodor Poncu, Stefan Ruseti, and Mihai Dascalu. "DoubleStrokeNet: Bigram-Level Keystroke Authentication." Electronics 12, no. 20 (2023): 4309. http://dx.doi.org/10.3390/electronics12204309.

Full text

Abstract:

Keystroke authentication is a well-established biometric technique that has gained significant attention due to its non-intrusive and continuous characteristics. The method analyzes the unique typing patterns of individuals to verify their identity while interacting with the keyboard, both virtual and hardware. Current deep-learning approaches like TypeNet and TypeFormer focus on generating biometric signatures as embeddings for the entire typing sequence. The authentication process is defined using the Euclidean distances between the new typing embedding and the saved biometric signatures. Th

APA, Harvard, Vancouver, ISO, and other styles

6

Sardinha, Tony Berber. "A historical characterisation of American and Brazilian cultures based on lexical representations." Corpora 15, no. 2 (2020): 183–212. http://dx.doi.org/10.3366/cor.2020.0194.

Full text

Abstract:

The goal of this study is to detect the historical distribution of representations of the United States and Brazil formed around the use of the nationality adjectives American and Brazilian. To achieve this goal, the study used a pre-existing multi-dimensional analysis of representations based on bigrams from the half-a-trillion-word Google Books bigram dataset of English writing ( Berber Sardinha, 2019 ), which provided the major representations of both cultures. The method was based on the text type approach developed by Biber (1989) , which uses cluster analysis to identify the groupings of

APA, Harvard, Vancouver, ISO, and other styles

7

Ravichandran, M., G. Kulanthaivel, and T. Chellatamilan. "Intelligent Topical Sentiment Analysis for the Classification of E-Learners and Their Topics of Interest." Scientific World Journal 2015 (2015): 1–8. http://dx.doi.org/10.1155/2015/617358.

Full text

Abstract:

Every day, huge numbers of instant tweets (messages) are published on Twitter as it is one of the massive social media for e-learners interactions. The options regarding various interesting topics to be studied are discussed among the learners and teachers through the capture of ideal sources in Twitter. The common sentiment behavior towards these topics is received through the massive number of instant messages about them. In this paper, rather than using the opinion polarity of each message relevant to the topic, authors focus on sentence level opinion classification upon using the unsupervi

APA, Harvard, Vancouver, ISO, and other styles

8

Suryani Hormansyah, Dhebys, Eka Larasati Amalia, Luqman Affandi, Dimas Wahyu Wibowo, and Indinabilah Aulia. "N-Gram Accuracy Analysis in the Method of Chatbot Response." International Journal of Engineering & Technology 7, no. 4.44 (2018): 152. http://dx.doi.org/10.14419/ijet.v7i4.44.26973.

Full text

Abstract:

Chatbot is a computer program designed to simulate interactive conversations or communication to users. In this study, chatbot was created as a customer service that functions as a public health service in Malang. This application is expected to facilitate the public to find the desired information. The method for user input in this application used N-Gram. N-gram consists of unigram, bigram and trigram. Testing of this application is carried out on 3 N-gram methods, so that the results of the tests have been done obtain the results for unigram 0.436, bigram 0.28, and trigram 0.26. From these

APA, Harvard, Vancouver, ISO, and other styles

9

Hougham, Dan, Jon Clenton, Takumi Uchihara, and George Higginbotham. "The Impact of Lexical Bundle Length on L2 Oral Proficiency." Languages 9, no. 7 (2024): 232. http://dx.doi.org/10.3390/languages9070232.

Full text

Abstract:

Lexical bundles (LBs) are crucial in L2 oral proficiency, yet their complexity in terms of length is under-researched. This study therefore examines the relationship between longer and shorter LBs and oral proficiency among 150 L2 learners of varying proficiency levels at a UK university. Through the analysis of oral presentation data (scores ranging from intermediate to advanced) and employing a combined text-internal and text-external approach (two- to five-word bundles), this study advances an innovative text-internal LB refinement procedure, thus isolating the unique contribution of LB len

APA, Harvard, Vancouver, ISO, and other styles

10

Cheng, Phillip M. "Bigram frequency analysis for detection of radiology report errors." Clinical Imaging 89 (September 2022): 84–88. http://dx.doi.org/10.1016/j.clinimag.2022.06.010.

Full text

APA, Harvard, Vancouver, ISO, and other styles

11

Chuzaimah Zulkifli, Umi. "Pengembangan Modul PreprocessingTeks untuk Kasus Formalisasi dan Pengecekan Ejaan Bahasa Indonesia pada Aplikasi Web Mining Simple Solution (WMSS)." Jurnal Matematika Statistika dan Komputasi 15, no. 2 (2018): 95. http://dx.doi.org/10.20956/jmsk.v15i2.5718.

Full text

Abstract:

Abstract Data of social media currently has been much used to analyze both sentiment analysis and another analysis. In fact, data that is obtained from the social media in generally has some mistakes which can influence the spelling in writing of words. The solution offered is word formalization and spelling check. Based on the problem, it will be built a preprocessing model to overcome two the mistakes. The method that will be used in formalization is to change the words to be formal form based on KBBI, while the method used for spelling check is spelling correction. Spelling correction metho

APA, Harvard, Vancouver, ISO, and other styles

12

Hendrikx, Roy Johannus Petrus, Hanneke Wil-Trees Drewes, Marieke Spreeuwenberg, Dirk Ruwaard, and Caroline Baan. "Measuring Regional Quality of Health Care Using Unsolicited Online Data: Text Analysis Study." JMIR Medical Informatics 7, no. 4 (2019): e13053. http://dx.doi.org/10.2196/13053.

Full text

Abstract:

Background Regional population management (PM) health initiatives require insight into experienced quality of care at the regional level. Unsolicited online provider ratings have shown potential for this use. This study explored the addition of comments accompanying unsolicited online ratings to regional analyses. Objective The goal was to create additional insight for each PM initiative as well as overall comparisons between these initiatives by attempting to determine the reasoning and rationale behind a rating. Methods The Dutch Zorgkaart database provided the unsolicited ratings from 2008

APA, Harvard, Vancouver, ISO, and other styles

13

Dwifebri Purbolaksono, Mahendra. "Sentiment Analysis of Game Review in Steam Platform using Random Forest." International Journal on Information and Communication Technology (IJoICT) 10, no. 2 (2024): 161–69. https://doi.org/10.21108/ijoict.v10i2.1007.

Full text

Abstract:

Steam provides a platform for buyers to write reviews of the software or games they have purchased. Developers will benefit from knowing the criticisms and suggestions given by their community. The number of reviews users give is so large that developers find it difficult to determine whether users like or dislike the games they create. In the Steam application, there is a rating system, but the ratings given by users do not always represent the content of the comments. Therefore, sentiment analysis is used to facilitate developers in understanding the sentiment of the reviews given by users.

APA, Harvard, Vancouver, ISO, and other styles

14

Borisov, Leonid Andreevich, Anastasia Yurievna Ivchenko, Nikolay Alexeevich Mitin, and Yurii Nikolaevich Orlov. "Classification of text information with the use of bigram analysis." Keldysh Institute Preprints, no. 106 (2017): 1–22. http://dx.doi.org/10.20948/prepr-2017-106.

Full text

APA, Harvard, Vancouver, ISO, and other styles

15

Setiadi Citawan, Rico, Viny Christanti Mawardi, and Bagus Mulyawan. "Automatic Essay Scoring in E-learning System Using LSA Method with N-Gram Feature for Bahasa Indonesia." MATEC Web of Conferences 164 (2018): 01037. http://dx.doi.org/10.1051/matecconf/201816401037.

Full text

Abstract:

In the world of education, e-learning system is a system that can be used to support the educational process. E-learning system is usually used by educators to learners in evaluating learning outcomes. In the process of evaluating learning outcomes in the e-learning system, the form type of exam questions that are often used are multiple choice and short stuffing. For exam questions in the form of essays are rarely used in the evaluation process of educational because of the difference in the subjectivity and time consuming in the assessment process. In this design aims to create an automatic

APA, Harvard, Vancouver, ISO, and other styles

16

Hizqil, Ahmad, and Yova Ruldeviani. "Sentiment analysis of online licensing service quality in the energy and mineral resources sector of the Republic of Indonesia." Computer Science and Information Technologies 5, no. 1 (2024): 57–65. https://doi.org/10.11591/csit.v5i1.pp57-65.

Full text

Abstract:

The Ministry of Energy and Mineral Resources of the Republic of Indonesia regularly assessed public satisfaction with its online licensing services. User rated their satisfaction at 3.42 on a scale of 4, below the organization's average of 3.53. Evaluating public service performance is crucial for quality improvement. Previous research relied solely on survey data to assess public satisfaction. This study goes further by analyzing user feedback in text form from an online licensing application to identify negative aspects of the service that need enhancement. The dataset spanned September 2019

APA, Harvard, Vancouver, ISO, and other styles

17

Tremblay, Antoine, and Benjamin V. Tucker. "The effects of N-gram probabilistic measures on the recognition and production of four-word sequences." Mental Lexicon 6, no. 2 (2011): 302–24. http://dx.doi.org/10.1075/ml.6.2.04tre.

Full text

Abstract:

The present study investigates the processing and production of four-word sequences such as I don’t really know, at the age of, and I think it’s the. Specifically, we investigate the influence of families of probabilistic measures such as unigram, bigram, trigram, and quadgram frequency of occurrence, logarithmic (log) probability of occurrence, and mutual information. Log probability of occurrence emerged as the predominant predictor family in the onset latency analysis, suggesting that recognition is mainly underpinned by competition between a target N-gram and its family members. In contras

APA, Harvard, Vancouver, ISO, and other styles

18

Ghosh, Monalisa, and Goutam Sanyal. "Performance Assessment of Multiple Classifiers Based on Ensemble Feature Selection Scheme for Sentiment Analysis." Applied Computational Intelligence and Soft Computing 2018 (October 1, 2018): 1–12. http://dx.doi.org/10.1155/2018/8909357.

Full text

Abstract:

Sentiment classification or sentiment analysis has been acknowledged as an open research domain. In recent years, an enormous research work is being performed in these fields by applying various numbers of methodologies. Feature generation and selection are consequent for text mining as the high-dimensional feature set can affect the performance of sentiment analysis. This paper investigates the inability or incompetency of the widely used feature selection methods (IG, Chi-square, and Gini Index) with unigram and bigram feature set on four machine learning classification algorithms (MNB, SVM,

APA, Harvard, Vancouver, ISO, and other styles

19

Khushhal, Saquib, Abdul Majid, Syed Ali Abass, Rabia Riaz, Mohammad Babar, and Shafiq Ahmad. "Cword2vec: a novel morphological rule-based word embedding approach for Urdu text sentiment analysis." PeerJ Computer Science 11 (July 15, 2025): e2937. https://doi.org/10.7717/peerj-cs.2937.

Full text

Abstract:

Word embeddings are essential to natural language processing tasks because they contain a single word’s syntactic and semantic information. Word embeddings have been developed widely for numerous spoken languages across the globe like English. The research community needs to pay more attention to the Urdu language despite its significant number of speakers, which amounts to approximately 231.3 million individuals. Urdu is a complex language because word boundaries in Urdu are unspecified, as it does not employ delimiters between words. The compound word, a multiword expression, is a more compl

APA, Harvard, Vancouver, ISO, and other styles

20

Dhina Nur Fitriana and Yuliant Sibaroni. "Sentiment Analysis on KAI Twitter Post Using Multiclass Support Vector Machine (SVM)." Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi) 4, no. 5 (2020): 846–53. http://dx.doi.org/10.29207/resti.v4i5.2231.

Full text

Abstract:

Information in form of unstructured texts is increasing and becoming commonplace for its existence on the internet. This information is easily found and utilized by business people or companies through social media. One of them is Twitter. Twitter is ranked 6th as a social media that is widely accessed today. The use of Twitter has the disadvantage of unstructured and large data. Consequently, it is difficult for business people or companies to know opinion towards service with limited resources. To Make it easier for businesses know the public's sentiment for better service in the future, pub

APA, Harvard, Vancouver, ISO, and other styles

21

Puneeth, Nakka. "Sentiment Polarity Prediction for Amazon Product Reviews Using Machine Learning and Deep Learning." International Journal for Research in Applied Science and Engineering Technology 13, no. 5 (2025): 1080–88. https://doi.org/10.22214/ijraset.2025.70377.

Full text

Abstract:

The exponential growth of e-commerce platforms, particularly Amazon, has resulted in a massive volume of customergenerated reviews, making manual sentiment analysis both time-consuming and inefficient. This study proposes an automated system to predict the sentiment polarity (positive or negative) of Amazon product reviews using a combination of machine learning and deep learning techniques. The reviews are processed and converted into vectorized representations using methods such as Bag of Words (Bow), TF-IDF, Word2Vec, and their respective unigrams, bigram and weighted variants. A variety of

APA, Harvard, Vancouver, ISO, and other styles

22

Nguyen, T. V., Q. H. T. Duong, and A. G. Kravets. "ANALYSIS AND PREDICTION OF TRENDS IN THE USE OF TERMS IN COMPUTER SCIENCE BASED ON NEURAL NETWORK MODELS." Vestnik komp'iuternykh i informatsionnykh tekhnologii, no. 200 (February 2021): 24–38. http://dx.doi.org/10.14489/vkit.2021.02.pp.024-038.

Full text

Abstract:

The widespread use of information and communication technologies, database technologies and the Internet has led to the development of specialized digital libraries. These digital libraries serve a huge number of different users and play an important role as repositories and providers of information and knowledge. Therefore, the automatic extraction of useful information from texts stored in digital libraries is becoming an increasingly important research topic in the field of data mining. The article discusses the statistical analysis of texts in the digital library arXiv.org to identify the

APA, Harvard, Vancouver, ISO, and other styles

23

Hayati, Hashri, and Muhammad Riza Alifi. "ANALISIS SENTIMEN PADA TWEET TERKAIT VAKSIN COVID-19 MENGGUNAKAN METODE SUPPORT VECTOR MACHINE." JTT (Jurnal Teknologi Terapan) 7, no. 2 (2021): 110. http://dx.doi.org/10.31884/jtt.v7i2.349.

Full text

Abstract:

Covid-19 is a disease that has been declared a global pandemic since March 2020. One of the challenges in dealing with the current Covid-19 pandemic is the widespread doubts about the use of vaccines, even though vaccination is one of the most successful ways to deal with infectious disease outbreaks. Vaccine hesitancy can be observed, among others, from public sentiment or perception on social media, one of them is Twitter. The existence of social media can affect the absorption of information received by a person, in this case social media is also a medium for anti-vaccine propaganda which c

APA, Harvard, Vancouver, ISO, and other styles

24

Seethappan, K., and K. Premalatha. "A comparative analysis of euphemistic sentences in news using feature weight scheme and intelligent techniques." Journal of Intelligent & Fuzzy Systems 42, no. 3 (2022): 1937–48. http://dx.doi.org/10.3233/jifs-211295.

Full text

Abstract:

Although there have been various researches in the detection of different figurative language, there is no single work in the automatic classification of euphemisms. Our primary work is to present a system for the automatic classification of euphemistic phrases in a document. In this research, a large dataset consisting of 100,000 sentences is collected from different resources for identifying euphemism or non-euphemism utterances. In this work, several approaches are focused to improve the euphemism classification: 1. A Combination of lexical n-gram features 2.Three Feature-weighting schemes

APA, Harvard, Vancouver, ISO, and other styles

25

Hizqil, Ahmad, and Yova Ruldeviani. "Sentiment analysis of online licensing service quality in the energy and mineral resources sector of the Republic of Indonesia." Computer Science and Information Technologies 5, no. 1 (2024): 57–65. http://dx.doi.org/10.11591/csit.v5i1.pp57-65.

Full text

Abstract:

The Ministry of Energy and Mineral Resources of the Republic of Indonesia regularly assessed public satisfaction with its online licensing services. User rated their satisfaction at 3.42 on a scale of 4, below the organization's average of 3.53. Evaluating public service performance is crucial for quality improvement. Previous research relied solely on survey data to assess public satisfaction. This study goes further by analyzing user feedback in text form from an online licensing application to identify negative aspects of the service that need enhancement. The dataset spanned September 2019

APA, Harvard, Vancouver, ISO, and other styles

26

Hizqil, Ahmad, and Yova Ruldeviyani. "Sentiment analysis of online licensing service quality in the energy and mineral resources sector of the Republic of Indonesia." Computer Science and Information Technologies 5, no. 1 (2024): 57–65. http://dx.doi.org/10.11591/csit.v5i1.p57-65.

Full text

Abstract:

The Ministry of Energy and Mineral Resources of the Republic of Indonesia regularly assessed public satisfaction with its online licensing services. User rated their satisfaction at 3.42 on a scale of 4, below the organization's average of 3.53. Evaluating public service performance is crucial for quality improvement. Previous research relied solely on survey data to assess public satisfaction. This study goes further by analyzing user feedback in text form from an online licensing application to identify negative aspects of the service that need enhancement. The dataset spanned September 2019

APA, Harvard, Vancouver, ISO, and other styles

27

Hizqil, Ahmad, and Yova Ruldeviyani. "Sentiment analysis of online licensing service quality in the energy and mineral resources sector of the Republic of Indonesia." Computer Science and Information Technologies 5, no. 1 (2024): 63–71. http://dx.doi.org/10.11591/csit.v5i1.p63-71.

Full text

Abstract:

The Ministry of Energy and Mineral Resources of the Republic of Indonesia regularly assessed public satisfaction with its online licensing services. User rated their satisfaction at 3.42 on a scale of 4, below the organization's average of 3.53. Evaluating public service performance is crucial for quality improvement. Previous research relied solely on survey data to assess public satisfaction. This study goes further by analyzing user feedback in text form from an online licensing application to identify negative aspects of the service that need enhancement. The dataset spanned September 2019

APA, Harvard, Vancouver, ISO, and other styles

28

Hizqil, Ahmad, and Yova Ruldeviani. "Sentiment analysis of online licensing service quality in the energy and mineral resources sector of the Republic of Indonesia." Computer Science and Information Technologies 5, no. 1 (2024): 63–71. http://dx.doi.org/10.11591/csit.v5i1.pp63-71.

Full text

Abstract:

The Ministry of Energy and Mineral Resources of the Republic of Indonesia regularly assessed public satisfaction with its online licensing services. User rated their satisfaction at 3.42 on a scale of 4, below the organization's average of 3.53. Evaluating public service performance is crucial for quality improvement. Previous research relied solely on survey data to assess public satisfaction. This study goes further by analyzing user feedback in text form from an online licensing application to identify negative aspects of the service that need enhancement. The dataset spanned September 2019

APA, Harvard, Vancouver, ISO, and other styles

29

Marjanen, Jani, Antti Kanner, and Eetu Mäkelä. "Studying the Historical Semantics of Finnishness with a Bigram Approach." Digital Humanities in the Nordic and Baltic Countries Publications 4, no. 1 (2022): 109–19. http://dx.doi.org/10.5617/dhnbpub.11279.

Full text

Abstract:

Our paper analyzes the historical understanding of Finland and Finnishness as it was expressed in newspapers published in the late eighteenth century and the early nineteenth century. As the period saw the decimation of the Swedish Kingdom and establishment of the Grand Duchy of Finland within the Russian Empire, a change in language use can be expected, but the changes occurring are rather fine-grained and difficult to detect without a systematic and transparent charting of the data. This paper suggests a method based on the analysis of bigrams to study this type of semantic change. Many exis

APA, Harvard, Vancouver, ISO, and other styles

30

Osipyan, V. O., K. I. Litvinov, and A. S. Zhuck. "Research and development of the mathematic models of cryptosystems based on the universal Diophantine language." SHS Web of Conferences 141 (2022): 01020. http://dx.doi.org/10.1051/shsconf/202214101020.

Full text

Abstract:

This paper shows the objective necessity of improving the information security systems under the development of information and telecommunication technologies. The paper for the first time involves a new area of NP-complete problems from Diophantine analysis, namely, multi-degree systems of Diophantine equations of a given dimension and degree of Tarry-Escott type. Based on a fundamentally new number-theoretic method, a mathematical model of an alphabetic information security system (ISS) has been developed that generalizes the principle of building cryptosystems with a public key – the so cal

APA, Harvard, Vancouver, ISO, and other styles

31

Mojžiš, Ján, Peter Krammer, Marcel Kvassay, Lenka Skovajsová, and Ladislav Hluchý. "Towards Reliable Baselines for Document-Level Sentiment Analysis in the Czech and Slovak Languages." Future Internet 14, no. 10 (2022): 300. http://dx.doi.org/10.3390/fi14100300.

Full text

Abstract:

This article helps establish reliable baselines for document-level sentiment analysis in highly inflected languages like Czech and Slovak. We revisit an earlier study representing the first comprehensive formulation of such baselines in Czech and show that some of its reported results need to be significantly revised. More specifically, we show that its online product review dataset contained more than 18% of non-trivial duplicates, which incorrectly inflated its macro F1-measure results by more than 19 percentage points. We also establish that part-of-speech-related features have no damaging

APA, Harvard, Vancouver, ISO, and other styles

32

Erwina, Emmy, Tommy Tommy, and Mayasari Mayasari. "Mapping and Analysis of Standard Indonesian Pronunciation Errors by Using the Bigram Method." INTERNATIONAL CONFERENCE ON RESEARCH AND DEVELOPMENT (ICORAD) 1, no. 1 (2022): 114–31. http://dx.doi.org/10.47841/icorad.v1i1.16.

Full text

Abstract:

Indonesian language is increasingly being ignored, even the mass media often find the use of non-standard language, so there is a uniformity in the use of words that often appear in scientific articles, especially those Indonesian. The uniformity of Indonesian pronunciation certainly confuses the general public, for example: television news viewers and radio listeners, to distinguish between standard and non-standard forms. The non-uniformity of Indonesian pronunciation often occurs in official situations such as official speeches or presentations. Based on this phenomenon, this study aims to

APA, Harvard, Vancouver, ISO, and other styles

33

Parlak, Bekir, and Alper Kürşat Uysal. "On classification of abstracts obtained from medical journals." Journal of Information Science 46, no. 5 (2019): 648–63. http://dx.doi.org/10.1177/0165551519860982.

Full text

Abstract:

Classification of medical documents was mostly carried out on English data sets and these studies were performed on hospital records rather than academic texts. The main reasons behind this situation are the lack of publicly available data sets and the tasks being costly and time-consuming. As the first contribution of this study, two data sets including Turkish and English counterparts of the same abstracts published in Turkish medical journals were constructed. Turkish is one of the widely used agglutinative languages worldwide and English is a good example of non-agglutinative languages. Wh

APA, Harvard, Vancouver, ISO, and other styles

34

TIŢĂ, SILVIU MIHAIL, and CARMEN TIŢĂ. "A BIBLIOMETRIC ANALYSIS OF PUBLICATION ON PERFORMANCE MANAGEMENT IN PUBLIC INSTITUTIONS." European Journal of Public Administration Research, no. 2 (2024): 49–58. http://dx.doi.org/10.47743/ejpar.2023-2-5.

Full text

Abstract:

This article researches the implications of performance management in public institutions. The methodological approach is based on the use of the bibliometric software R-Stata Bibliometrix, specialized in Science Mapping Workflow, to extract meaningful results about the thematic evolution of the performance management in the publication indexed by the Web of Science database. Our search is about “performance management in public institution” and the bibliometric analysis unigram, bigram, and thematic evolutions is about the title and abstract of the articles. The scope of the research is to id

APA, Harvard, Vancouver, ISO, and other styles

35

Nurodin, Muhammad Irsa, and Yan Puspitarani. "Phrase Detection's Impact on Sentiment Analysis of Public Opinion and online Media Toward Political Figures." Jurnal Riset Informatika 6, no. 2 (2024): 67–76. http://dx.doi.org/10.34288/jri.v6i2.268.

Full text

Abstract:

Public opinion of political figures and policy significantly impacts general elections. Sentiment analysis, as a method to comprehend opinion and emotion in texts, requires the step of text preprocessing to improve data quality. However, textual data often encounters irrelevant words and ambiguous language. These conditions can impact the accuracy of sentiment analysis. Given the significance of precisely interpreting public opinion toward political figures, these issues may result in biased or inaccurate sentiment analysis outcomes. Irregular punctuation or unclear language can disturb the te

APA, Harvard, Vancouver, ISO, and other styles

36

Liu, Kanglong, Rongguang Ye, Liu Zhongzhu, and Rongye Ye. "Entropy-based discrimination between translated Chinese and original Chinese using data mining techniques." PLOS ONE 17, no. 3 (2022): e0265633. http://dx.doi.org/10.1371/journal.pone.0265633.

Full text

Abstract:

The present research reports on the use of data mining techniques for differentiating between translated and non-translated original Chinese based on monolingual comparable corpora. We operationalized seven entropy-based metrics including character, wordform unigram, wordform bigram and wordform trigram, POS (Part-of-speech) unigram, POS bigram and POS trigram entropy from two balanced Chinese comparable corpora (translated vs non-translated) for data mining and analysis. We then applied four data mining techniques including Support Vector Machines (SVMs), Linear discriminant analysis (LDA), R

APA, Harvard, Vancouver, ISO, and other styles

37

Aziz, Sameen, Saleem Ullah, Bushra Mughal, Faheem Mushtaq, and Sabih Zahra. "Roman Urdu sentiment analysis using Machine Learning with best parameters and comparative study of Machine Learning algorithms." Pakistan Journal of Engineering and Technology 3, no. 2 (2020): 172–77. http://dx.doi.org/10.51846/vol3iss2pp172-177.

Full text

Abstract:

People talks on the social media as they feel good and easy way to express their feelings about topic, post or product on the ecommerce websites. In the Asia mostly the people use the Roman Urdu language script for expressing their opinion about the topic. The Sentiment analysis of the Roman Urdu (Bilal et al. 2016)language processes is a big challenging task for the researchers because of lack of resources and its non-structured and non-standard syntax / script. We have collected the Dataset from Kaggle containing 21000 values with manually annotated and prepare the data for machine learning

APA, Harvard, Vancouver, ISO, and other styles

38

Shinde, Ganesh K. "Sentiment Analysis on Twitter Hashtag Datasets." International Journal for Research in Applied Science and Engineering Technology 9, no. 12 (2021): 278–81. http://dx.doi.org/10.22214/ijraset.2021.39201.

Full text

Abstract:

Abstract: Sentiment Analysis has improvement in online shopping platforms, scientific surveys from political polls, business intelligence, etc. In this we trying to analyse the twitter posts about Hashtag like #MakeinIndia using Machine Learning approach. By doing opinion mining in a specific area, it is possible to identify the effect of area information in sentiment analysis. We put forth a feature vector for classifying the tweets as positive, negative and neutral. After that applied machine learning algorithms namely: MaxEnt and SVM. We utilised Unigram, Bigram and Trigram Features to gene

APA, Harvard, Vancouver, ISO, and other styles

39

Korshunov, D. S. "Distinctive Features of Association Measures Applied to Chinese Character Bigram Extraction Tasks." NSU Vestnik. Series: Linguistics and Intercultural Communication 20, no. 2 (2022): 64–80. http://dx.doi.org/10.25205/1818-7935-2022-20-2-64-80.

Full text

Abstract:

Studying professional discourse, a researcher has now an opportunity to create collections of texts and apply linguistic analysis software tools to them. However, when it comes to Chinese discourse there is a problem with the reliability of automatic word segmentation of texts. One of the ways to extract lexical units in Chinese texts is to apply statistical association measures for collocations to Chinese character bigrams. The purpose of this work is to conduct a comparative analysis of seven different statistical measures for collocations as a means of extracting two-syllabic lexical units

APA, Harvard, Vancouver, ISO, and other styles

40

Zaitsu, Wataru, and Mingzhe Jin. "Distinguishing ChatGPT(-3.5, -4)-generated and human-written papers through Japanese stylometric analysis." PLOS ONE 18, no. 8 (2023): e0288453. http://dx.doi.org/10.1371/journal.pone.0288453.

Full text

Abstract:

In the first half of 2023, text-generative artificial intelligence (AI), including ChatGPT from OpenAI, has attracted considerable attention worldwide. In this study, first, we compared Japanese stylometric features of texts generated by ChatGPT, equipped with GPT-3.5 and GPT-4, and those written by humans. In this work, we performed multi-dimensional scaling (MDS) to confirm the distributions of 216 texts of three classes (72 academic papers written by 36 single authors, 72 texts generated by GPT-3.5, and 72 texts generated by GPT-4 on the basis of the titles of the aforementioned papers) foc

APA, Harvard, Vancouver, ISO, and other styles

41

K, Sakkaravarthy Iyyappan, and Balasundaram SR. "A Multi Document Summarization of Learning Materials using Bigram Embedding Technique and Integer Linear Programming." International Journal of Membrane Science and Technology 10, no. 2 (2023): 3450–56. http://dx.doi.org/10.15379/ijmst.v10i2.3149.

Full text

Abstract:

In the present era of the Internet, teachers and learners are heavily inclined to use e-learning systems for an efficient learning process. Due to the proliferation of educational text contents in these e-learning systems, the need for incorporating advanced text analysis tools and techniques are becoming inevitable. Multi Document Summarization (MDS) is a technique for producing concise summaries from a collection of related text documents. The usage of MDS in the context of e-learning is more promising for providing summaries for learning materials which helps students and teachers to focus

APA, Harvard, Vancouver, ISO, and other styles

42

Rahman, Parinda, and Ifeoma Adaji. "Health Misinformation Vs. Facts on Social Media: Co-Occurrence Network Analysis in Bangladesh." European Conference on Social Media 11, no. 1 (2024): 359–67. http://dx.doi.org/10.34190/ecsm.11.1.2336.

Full text

Abstract:

The increased usage of social media provides a way to disseminate health-related information more quickly. Alternatively, sharing health content on social media poses risks due to unrestricted posting, enabling misinformation to spread. Regional social and cultural contexts influence themes in social media posts, underscoring the importance of understanding content and prevalent misinformation themes. This insight is crucial for tailoring interventions, resource allocation, misinformation detection algorithms, and policy formulation. We conducted word co-occurrence network analysis, creating a

APA, Harvard, Vancouver, ISO, and other styles

43

Kislitsyna, Maria Yurievna. "The text preprocessing influence analyze for author identification problem by bigram method." Keldysh Institute Preprints, no. 67 (2022): 1–18. http://dx.doi.org/10.20948/prepr-2022-67.

Full text

Abstract:

On the example of sufficiently representative number of authors and texts, a comparative analysis of the impact of text preprocessing programs on the possibility of identifying authors is carried out. The question of the sensitivity of the identification error by the proportion of changes in the source text is investigated. It is shown that the author's originality is preserved after preprocessing almost at the level of the original text.

APA, Harvard, Vancouver, ISO, and other styles

44

Khomsah, Siti, and Agus Sasmito Aribowo. "Text-Preprocessing Model Youtube Comments in Indonesian." Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi) 4, no. 4 (2020): 648–54. http://dx.doi.org/10.29207/resti.v4i4.2035.

Full text

Abstract:

YouTube is the most widely used in Indonesia, and it’s reaching 88% of internet users in Indonesia. YouTube’s comments in Indonesian languages produced by users has increased massively, and we can use those datasets to elaborate on the polarization of public opinion on government policies. The main challenge in opinion analysis is preprocessing, especially normalize noise like stop words and slang words. This research aims to contrive several preprocessing model for processing the YouTube commentary dataset, then seeing the effect for the accuracy of the sentiment analysis. The types of prepro

APA, Harvard, Vancouver, ISO, and other styles

45

McGarry, Ken. "Analyzing Social Media Data Using Sentiment Mining and Bigram Analysis for the Recommendation of YouTube Videos." Information 14, no. 7 (2023): 408. http://dx.doi.org/10.3390/info14070408.

Full text

Abstract:

In this work we combine sentiment analysis with graph theory to analyze user posts, likes/dislikes on a variety of social media to provide recommendations for YouTube videos. We focus on the topic of climate change/global warming, which has caused much alarm and controversy over recent years. Our intention is to recommend informative YouTube videos to those seeking a balanced viewpoint of this area and the key arguments/issues. To this end we analyze Twitter data; Reddit comments and posts; user comments, view statistics and likes/dislikes of YouTube videos. The combination of sentiment analys

APA, Harvard, Vancouver, ISO, and other styles

46

Chow, Christine, Yijung Kim, Lauren Bangerter, and Karl De Jonge. "DECODING EHR DATA: THE ROLE OF SOCIAL SUPPORT IN PRESSURE INJURIES AMONG HOMEBOUND OLDER ADULTS." Innovation in Aging 8, Supplement_1 (2024): 1300. https://doi.org/10.1093/geroni/igae098.4154.

Full text

Abstract:

Abstract Homebound older adults are increasingly susceptible to acquiring pressure injuries (PIs) due to a variety of pathological and physiological changes associated with aging. However, the impact of social risk factors of PIs among homebound older adults remains largely understudied. This research was conducted to explore how unstructured data via House Call clinical notes can provide insights of psychosocial factors that influence pressure ulcer conditions compared to formal diagnoses via ICD-10 codes. Using electronic health records and clinical notes (N= 1,770) from patients enrolled in

APA, Harvard, Vancouver, ISO, and other styles

47

Y., Sri Navya, Pranathi K., Srija G., and Hifsa Naaz Syeda. "Spam Detection Using Machine Learning: A Logistic Regression Approach." Advancement of Computer Technology and its Applications 8, no. 3 (2025): 1–10. https://doi.org/10.5281/zenodo.15093547.

Full text

Abstract:

<em>Spam emails pose a significant challenge to digital communication, leading to security threats and reduced productivity. This paper proposes a machine learning-based strategy to spam detection using Logistic Regression with TF-IDF vectorization. The dataset is prepared by handling missing values and normalizing labels. A TF-IDF model with bigram inclusion is implemented for feature extraction, followed by a balanced Logistic Regression classifier to address class imbalance. Experimental results indicate promising accuracy, demonstrating the effectiveness of the proposed method. Future enha

APA, Harvard, Vancouver, ISO, and other styles

48

Hung, Chihli, and You-Xin Cao. "Sentiment classification of Chinese cosmetic reviews based on integration of collocations and concepts." Electronic Library 38, no. 1 (2019): 155–69. http://dx.doi.org/10.1108/el-04-2019-0093.

Full text

Abstract:

Purpose This paper aims to propose a novel approach which integrates collocations and domain concepts for Chinese cosmetic word of mouth (WOM) sentiment classification. Most sentiment analysis works by collecting sentiment scores from each unigram or bigram. However, not every unigram or bigram in a WOM document contains sentiments. Chinese collocations consist of the main sentiments of WOM. This paper reduces the complexity of the document dimensionality and makes an improvement for sentiment classification. Design/methodology/approach This paper builds two contextual lexicons for feature wor

APA, Harvard, Vancouver, ISO, and other styles

49

Kusumo, Fahri Aimar, Dewi Retno Sari Saputro, and Purnami Widyaningsih. "SENTIMENT ANALYSIS OF REVIEWS ON X APPS ON GOOGLE PLAY STORE USING SUPPORT VECTOR MACHINE AND N-GRAM FEATURE SELECTION." BAREKENG: Jurnal Ilmu Matematika dan Terapan 19, no. 2 (2025): 1037–46. https://doi.org/10.30598/barekengvol19iss2pp1037-1046.

Full text

Abstract:

Sentiment analysis is an application of text mining that is used to find out opinions from a set of textual data about a particular event or topic. The main function of sentiment analysis is to extract information and find the meaning and opinions of a given user. Sentiment analysis requires classification algorithms, such as Support Vector Machine (SVM). SVM is a frequently used algorithm for text data classification because it can handle high-dimensional data. The concept of SVM is to determine the best hyperplane that serves as a separator of two classes in the input space. Text data with a

APA, Harvard, Vancouver, ISO, and other styles

50

Pristiwanto, Pristiwanto, Heri Sunandar, and Berto Nadeak. "Analysis and Implementation of PlayFair Chipper Algorithm in Text Data Encoding Process." Jurnal Info Sains : Informatika dan Sains 10, no. 2 (2020): 19–23. http://dx.doi.org/10.54209/infosains.v10i2.33.

Full text

Abstract:

This research discusses the implementation of Playfair Cipher to encode text data. Playfair Cipher is one of the classic cryptographic algorithms that use symmetry keys. Originally invented by Sir Charles Wheatstone and Baron Lyon Playfair, the algorithm used a 5x5 keyboard to encrypt and decrypt. The process of encryption and decryption is done by grouping the letters in a bigram. By using a 5x5 keyboard, we can encrypt plaintext (original text data to be encrypted) and decrypt the ciphertext (encrypted text data) by grouping it by removing the letter J from plaintext. The keypad is generated

APA, Harvard, Vancouver, ISO, and other styles

We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!