To see the other types of publications on this topic, follow the link: Term Frequency-Inverse Document Frequency (TF-IDF).

Journal articles on the topic 'Term Frequency-Inverse Document Frequency (TF-IDF)'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 journal articles for your research on the topic 'Term Frequency-Inverse Document Frequency (TF-IDF).'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse journal articles on a wide variety of disciplines and organise your bibliography correctly.

1

Mohammed, Mohannad T., and Omar Fitian Rashid. "Document retrieval using term term frequency inverse sentence frequency weighting scheme." Indonesian Journal of Electrical Engineering and Computer Science 31, no. 3 (2023): 1478. http://dx.doi.org/10.11591/ijeecs.v31.i3.pp1478-1485.

Full text
Abstract:
The need for an efficient method to find the furthermost appropriate document corresponding to a particular search query has become crucial due to the exponential development in the number of papers that are now readily available to us on the web. The vector space model (VSM) a perfect model used in “information retrieval”, represents these words as a vector in space and gives them weights via a popular weighting method known as term frequency inverse document frequency (TF-IDF). In this research, work has been proposed to retrieve the most relevant document focused on representing documents a
APA, Harvard, Vancouver, ISO, and other styles
2

Widianto, Adi, Eka Pebriyanto, Fitriyanti Fitriyanti, and Marna Marna. "Document Similarity Using Term Frequency-Inverse Document Frequency Representation and Cosine Similarity." Journal of Dinda : Data Science, Information Technology, and Data Analytics 4, no. 2 (2024): 149–53. http://dx.doi.org/10.20895/dinda.v4i2.1589.

Full text
Abstract:
Document similarity is a fundamental task in natural language processing and information retrieval, with applications ranging from plagiarism detection to recommendation systems. In this study, we leverage the term frequency-inverse document frequency (TF-IDF) to represent documents in a high-dimensional vector space, capturing their unique content while mitigating the influence of common terms. Subsequently, we employ the cosine similarity metric to measure the similarity between pairs of documents, which assesses the angle between their respective TF-IDF vectors. To evaluate the effectivenes
APA, Harvard, Vancouver, ISO, and other styles
3

Mohannad, T. Mohammed, and Fitian Rashid Omar. "Document retrieval using term frequency inverse sentence frequency weighting scheme." Document retrieval using term frequency inverse sentence frequency weighting scheme 31, no. 3 (2023): 1478–85. https://doi.org/10.11591/ijeecs.v31.i3.pp1478-1485.

Full text
Abstract:
The need for an efficient method to find the furthermost appropriate document corresponding to a particular search query has become crucial due to the exponential development in the number of papers that are now readily available to us on the web. The vector space model (VSM) a perfect model used in “information retrieval”, represents these words as a vector in space and gives them weights via a popular weighting method known as term frequency inverse document frequency (TF-IDF). In this research, work has been proposed to retrieve the most relevant document focused on representing
APA, Harvard, Vancouver, ISO, and other styles
4

Yulita, Winda, Meida Cahyo Untoro, Mugi Praseptiawan, Ilham Firman Ashari, Aidil Afriansyah, and Ahmad Naim Bin Che Pee. "Automatic Scoring Using Term Frequency Inverse Document Frequency Document Frequency and Cosine Similarity." Scientific Journal of Informatics 10, no. 2 (2023): 93–104. http://dx.doi.org/10.15294/sji.v10i2.42209.

Full text
Abstract:
Purpose: In the learning process, most of the tests to assess learning achievement have been carried out by providing questions in the form of short answers or essay questions. The variety of answers given by students makes a teacher have to focus on reading them. This scoring process is difficult to guarantee quality if done manually. In addition, each class is taught by a different teacher, which can lead to unequal grades obtained by students due to the influence of differences in teacher experience. Therefore the purpose of this study is to develop an assessment of the answers. Automated s
APA, Harvard, Vancouver, ISO, and other styles
5

Ni'mah, Ana Tsalitsatun, and Agus Zainal Arifin. "Perbandingan Metode Term Weighting terhadap Hasil Klasifikasi Teks pada Dataset Terjemahan Kitab Hadis." Rekayasa 13, no. 2 (2020): 172–80. http://dx.doi.org/10.21107/rekayasa.v13i2.6412.

Full text
Abstract:
Hadis adalah sumber rujukan agama Islam kedua setelah Al-Qur’an. Teks Hadis saat ini diteliti dalam bidang teknologi untuk dapat ditangkap nilai-nilai yang terkandung di dalamnya secara pegetahuan teknologi. Dengan adanya penelitian terhadap Kitab Hadis, pengambilan informasi dari Hadis tentunya membutuhkan representasi teks ke dalam vektor untuk mengoptimalkan klasifikasi otomatis. Klasifikasi Hadis diperlukan untuk dapat mengelompokkan isi Hadis menjadi beberapa kategori. Ada beberapa kategori dalam Kitab Hadis tertentu yang sama dengan Kitab Hadis lainnya. Ini menunjukkan bahwa ada beberapa
APA, Harvard, Vancouver, ISO, and other styles
6

Priyanka, Mesariya, and Madia Nidhi. "Document Ranking using Customizes Vector Method." International Journal of Trend in Scientific Research and Development 1, no. 4 (2017): 278–83. https://doi.org/10.31142/ijtsrd125.

Full text
Abstract:
Information retrieval IR system is about positioning reports utilizing clients question and get the important records from extensive dataset. Archive positioning is fundamentally looking the pertinent record as per their rank. Document ranking is basically search the relevant document according to their rank. Vector space model is traditional and widely applied information retrieval models to rank the web page based on similarity values. Term weighting schemes are the significant of an information retrieval system and it is query used in document ranking. Tf idf ranked calculates the term weig
APA, Harvard, Vancouver, ISO, and other styles
7

Christian, Hans, Mikhael Pramodana Agus, and Derwin Suhartono. "Single Document Automatic Text Summarization using Term Frequency-Inverse Document Frequency (TF-IDF)." ComTech: Computer, Mathematics and Engineering Applications 7, no. 4 (2016): 285. http://dx.doi.org/10.21512/comtech.v7i4.3746.

Full text
Abstract:
The increasing availability of online information has triggered an intensive research in the area of automatic text summarization within the Natural Language Processing (NLP). Text summarization reduces the text by removing the less useful information which helps the reader to find the required information quickly. There are many kinds of algorithms that can be used to summarize the text. One of them is TF-IDF (TermFrequency-Inverse Document Frequency). This research aimed to produce an automatic text summarizer implemented with TF-IDF algorithm and to compare it with other various online sour
APA, Harvard, Vancouver, ISO, and other styles
8

Setiawan, Gede Herdian, and I. Made Budi Adnyana. "Improving Helpdesk Chatbot Performance with Term Frequency-Inverse Document Frequency (TF-IDF) and Cosine Similarity Models." Journal of Applied Informatics and Computing 7, no. 2 (2023): 252–57. http://dx.doi.org/10.30871/jaic.v7i2.6527.

Full text
Abstract:
Helpdesk chatbots are growing in popularity due to their ability to provide help and answers to user questions quickly and effectively. Chatbot development poses several challenges, including enhancing accuracy in understanding user queries and providing relevant responses while improving problem-solving efficiency. In this research, we aim to enhance the accuracy and efficiency of the Helpdesk Chatbot by implementing the Term Frequency-Inverse Document Frequency (TF-IDF) model and the Cosine Similarity algorithm. The TF-IDF model is a method used to measure the frequency of words in a documen
APA, Harvard, Vancouver, ISO, and other styles
9

Tamardina, Fadhilla Atansa, Hasbi Yasin, and Dwi Ispriyanti. "ANALISIS SENTIMEN REVIEW APLIKASI CRYPTOCURRENCY MENGGUNAKAN ALGORITMA MAXIMUM ENTROPY DENGAN METODE PEMBOBOTAN TF, TF-IDF DAN BINARY." Jurnal Gaussian 11, no. 1 (2022): 1–10. http://dx.doi.org/10.14710/j.gauss.v11i1.34004.

Full text
Abstract:
Pandemi COVID-19 yang belum berhenti menyebabkan kondisi ekonomi Indonesia kian memburuk. Masyarakat yang terkena dampak pemotongan upah akibat pandemi harus mencari cara untuk mendapatkan pendapatan pasif. Salah satu cara untuk mendapatkan hal tersebut adalah berinvestasi. Cryptocurrency adalah salah satu instrumen investasi berbasis aplikasi yang memiliki return tinggi. Aplikasi Pintu adalah aplikasi pertama yang menyediakan fasilitas mobile apps pada penggunanya. Aplikasi yang dirilis pada tahun 2020 ini sudah memiliki banyak ulasan yang diberikan oleh penggunanya. Ulasan ini dibutuhkan unt
APA, Harvard, Vancouver, ISO, and other styles
10

Tama, Fauzaan Rakan, and Yuliant Sibaroni. "Fake News (Hoaxes) Detection on Twitter Social Media Content through Convolutional Neural Network (CNN) Method." JINAV: Journal of Information and Visualization 4, no. 1 (2023): 70–78. http://dx.doi.org/10.35877/454ri.jinav1525.

Full text
Abstract:
The use of social media is very influential for the community. Users can easily post various activities in the form of text, photos, and videos in social media. Information on social media contains fake news and hoaxes that will have an impact on society. One of the most social media used is Twitter. This study aims to detect fake news found on the Tweets using the Convolutional Neural Network (CNN) method by comparing the weighting features used of the Term Frequency Inverse Document Frequency (TF-IDF) and the Term Frequency-Relevance Frequency (TF-RF). The highest accuracy was obtained in th
APA, Harvard, Vancouver, ISO, and other styles
11

Hla, Sann Sint, and Khine Oo Khine. "Comparison of two methods on vector space model for trust in social commerce." TELKOMNIKA (Telecommunication, Computing, Electronics and Control) 19, no. 3 (2021): 809–16. https://doi.org/10.12928/telkomnika.v19i3.18150.

Full text
Abstract:
The study of dealing with searching information in documents within web pages is information retrieval (IR). The user needs to describe information with comments or reviews that consists of a number of words. Discovering weight of an inquiry term is helpful to decide the significance of a question. Estimation of term significance is a basic piece of most information retrieval approaches and it is commonly chosen through term frequency-inverse document frequency (TF-IDF). Also, improved TF-IDF method used to retrieve information in web documents. This paper presents comparison of TF-IDF method
APA, Harvard, Vancouver, ISO, and other styles
12

Hendra Suputra, I. Putu Gede, Kiki Dwi Prebiana, and Frisca Olivia Gorianto. "Perbandingan Jenis TF terhadap Hasil Evaluasi Information Retrieval." JELIKU (Jurnal Elektronik Ilmu Komputer Udayana) 8, no. 2 (2021): 207. http://dx.doi.org/10.24843/jlk.2019.v08.i02.p13.

Full text
Abstract:
Pada sebuah sistem temu kembali,salah satu cara untuk mencari kesamaan antara query dengan dokumen adalah dengan menggunakan Term Frequency – Inverse Document Frequency atau TF-IDF. TF yang umum digunakan adalah langsung menggunakan jumlah term frequency padahal banyak jenis TF lainnya yang dapat dikombinasikan dengan IDF. Penelitian ini akan mengkombinasikan 4 jenis TF, yaitu Natural TF, Normalization/max TF, Logaritma TF, dan Boolean TF dengan tujuan untuk mencari jenis TF mana yang lebih baik setelah dikombinasikan dengan IDF. Hasil penelitian menunjukkan bahwa.Logaritma TF adalah yang terb
APA, Harvard, Vancouver, ISO, and other styles
13

Sintia, Sintia, Sarjon Defit, and Gunadi Widi Nurcahyo. "Product Codefication Accuracy With Cosine Similarity And Weighted Term Frequency And Inverse Document Frequency (TF-IDF)." Journal of Applied Engineering and Technological Science (JAETS) 2, no. 2 (2021): 62–69. http://dx.doi.org/10.37385/jaets.v2i2.210.

Full text
Abstract:
In the SiPaGa application, the codefication search process is still inaccurate, so OPD often make mistakes in choosing goods codes. So we need Cosine Similarity and TF-IDF methods that can improve the accuracy of the search. Cosine Similarity is a method for calculating similarity by using keywords from the code of goods. Term Frequency and Inverse Document (TFIDF) is a way to give weight to a one-word relationship (term). The purpose of this research is to improve the accuracy of the search for goods codification. Codification of goods processed in this study were 14,417 data sourced from the
APA, Harvard, Vancouver, ISO, and other styles
14

Shehzad, Farhan, Abdur Rehman, Kashif Javed, Khalid A. Alnowibet, Haroon A. Babri, and Hafiz Tayyab Rauf. "Binned Term Count: An Alternative to Term Frequency for Text Categorization." Mathematics 10, no. 21 (2022): 4124. http://dx.doi.org/10.3390/math10214124.

Full text
Abstract:
In text categorization, a well-known problem related to document length is that larger term counts in longer documents cause classification algorithms to become biased. The effect of document length can be eliminated by normalizing term counts, thus reducing the bias towards longer documents. This gives us term frequency (TF), which in conjunction with inverse document frequency (IDF) became the most commonly used term weighting scheme to capture the importance of a term in a document and corpus. However, normalization may cause term frequency of a term in a related document to become equal or
APA, Harvard, Vancouver, ISO, and other styles
15

Santoti, Jennifer Velensia, Jennifer Jocelyn, and Hafiz Irsyad. "Implementasi Term Frequency - Inverse Document Frequency dan Cosine Similarity untuk Analisis Kemiripan Deskripsi Produk Halal." Jurnal Software Engineering and Computational Intelligence 3, no. 01 (2025): 44–52. https://doi.org/10.36982/jseci.v3i01.5421.

Full text
Abstract:
Di era digital saat ini, kejelasan informasi produk telah menjadi aspek penting untuk mendukung keputusan konsumen dalam proses pembelian. Penelitian ini difokuskan pada implementasi ekstraksi fitur dari deskripsi produk menggunakan metode TF-IDF (Term Frequency - Inverse Document Frequency) dan Cosine Similarity untuk memprediksi deskripsi produk yang membingungkan. Metodologi penelitian ini meliputi beberapa tahap preprocessing, yang meliputi tokenizing, stopword removal, filtering, penghapusan data null dan data NaN, serta ekstraksi fitur teks menggunakan metode TF-IDF dan Cosine Similarity
APA, Harvard, Vancouver, ISO, and other styles
16

Xu, Dong Dong, and Shao Bo Wu. "An Improved TFIDF Algorithm in Text Classification." Applied Mechanics and Materials 651-653 (September 2014): 2258–61. http://dx.doi.org/10.4028/www.scientific.net/amm.651-653.2258.

Full text
Abstract:
Term frequency/inverse document frequency (TF-IDF) is widely used in text classification at present, which is borrowed from Information Retrieval. Based on this conventional classical TF-IDF formula, we present a new TF-IDF weight schemes named CTF-IDF. The experiment shows that the improved method is feasible and effective. Furthermore, from the subsequent evaluations using 10-fold cross-validation, we can see the CTF-IDF greatly improves the accuracy of text classification.
APA, Harvard, Vancouver, ISO, and other styles
17

Ariyanti, Meiga Ayu, Aji Prasetya Wibawa, and Utomo Pujianto. "Metode term frequency - invers document frequency pada mekanisme pencarian judul skripsi." TEKNO 28, no. 2 (2019): 177. http://dx.doi.org/10.17977/um034v28i2p177-190.

Full text
Abstract:
Tujuan penelitian dan pengembangan ini adalah (1) merancang dan membangun mekanisme pencarian dengan metode TF-IDF sebagai salah satu fitur yang ada pada SISINTA, (2) menguji akurasi, presisi, dan sensitifitas metode TF-IDF, dan (3) menguji fungsionalitas mekanisme pencarian dengan metode TF-IDF. Hasil penelitian dan pengembangan ini berupa fitur mekanisme pencarian judul skripsi dengan metode term frequency dan invers document frequency (TF-IDF). Fitur tersebut dapat menampilkan hasil pencarian judul skripsi yang relevan sesuai dengan kata kunci pencarian oleh pengguna. Berdasarkan hasil peng
APA, Harvard, Vancouver, ISO, and other styles
18

Al-Obaydy, Wasseem N. Ibrahem, Hala A. Hashim, Yassen AbdulKhaleq Najm, and Ahmed Adeeb Jalal. "Document classification using term frequency-inverse document frequency and K-means clustering." Indonesian Journal of Electrical Engineering and Computer Science 27, no. 3 (2022): 1517–24. https://doi.org/10.11591/ijeecs.v27.i3.pp1517-1524.

Full text
Abstract:
Increased advancement in a variety of study subjects and information technologies, has increased the number of published research articles. However, researchers are facing difficulties and devote a significant time amount in locating scientific research publications relevant to their domain of expertise. In this article, an approach of document classification is presented to cluster the text documents of research articles into expressive groups that encompass a similar scientific field. The main focus and scopes of target groups were adopted in designing the proposed method, each group include
APA, Harvard, Vancouver, ISO, and other styles
19

Arif, Ridho Lubis, Khairuddin Matyuso Nasution Mahyuddin, Salim Sitompul Opim, and Muisa Zamzami Elviawaty. "The feature extraction for classifying words on social media with the Naïve Bayes algorithm." International Journal of Artificial Intelligence (IJ-AI) 11, no. 3 (2022): 1041–48. https://doi.org/10.11591/ijai.v11.i3.pp1041-1048.

Full text
Abstract:
To classify Naïve Bayes classification (NBC), however, it is necessary to have a previous pre-processing and feature extraction. Generally, pre-processing eliminates unnecessary words while feature extraction processes these words. This paper focuses on feature extraction in which calculations and searches are used by applying word2vec while in frequency using term frequency-Inverse document frequency (TF-IDF). The process of classifying words on Twitter with 1734 tweets which are defined as a document to weight the calculation of frequency with TF-IDF with words that often come out in tw
APA, Harvard, Vancouver, ISO, and other styles
20

A. Nicholas, Danie, and Devi Jayanthila. "Data retrieval in cancer documents using various weighting schemes." i-manager's Journal on Information Technology 12, no. 4 (2023): 28. http://dx.doi.org/10.26634/jit.12.4.20365.

Full text
Abstract:
In the realm of data retrieval, sparse vectors serve as a pivotal representation for both documents and queries, where each element in the vector denotes a word or phrase from a predefined lexicon. In this study, multiple scoring mechanisms are introduced aimed at discerning the significance of specific terms within the context of a document extracted from an extensive textual dataset. Among these techniques, the widely employed method revolves around inverse document frequency (IDF) or Term Frequency-Inverse Document Frequency (TF-IDF), which emphasizes terms unique to a given context. Additi
APA, Harvard, Vancouver, ISO, and other styles
21

Al-Obaydy, Wasseem N. Ibrahem, Hala A. Hashim, Yassen AbdelKhaleq Najm, and Ahmed Adeeb Jalal. "Document classification using term frequency-inverse document frequency and K-means clustering." Indonesian Journal of Electrical Engineering and Computer Science 27, no. 3 (2022): 1517. http://dx.doi.org/10.11591/ijeecs.v27.i3.pp1517-1524.

Full text
Abstract:
Increased advancement in a variety of study subjects and information technologies, has increased the number of published research articles. However, researchers are facing difficulties and devote a significant time amount in locating scientific research publications relevant to their domain of expertise. In this article, an approach of document classification is presented to cluster the text documents of research articles into expressive groups that encompass a similar scientific field. The main focus and scopes of target groups were adopted in designing the proposed method, each group include
APA, Harvard, Vancouver, ISO, and other styles
22

You, Zi-Hung, Ya-Han Hu, Chih-Fong Tsai, and Yen-Ming Kuo. "Integrating Feature and Instance Selection Techniques in Opinion Mining." International Journal of Data Warehousing and Mining 16, no. 3 (2020): 168–82. http://dx.doi.org/10.4018/ijdwm.2020070109.

Full text
Abstract:
Opinion mining focuses on extracting polarity information from texts. For textual term representation, different feature selection methods, e.g. term frequency (TF) or term frequency–inverse document frequency (TF–IDF), can yield diverse numbers of text features. In text classification, however, a selected training set may contain noisy documents (or outliers), which can degrade the classification performance. To solve this problem, instance selection can be adopted to filter out unrepresentative training documents. Therefore, this article investigates the opinion mining performance associated
APA, Harvard, Vancouver, ISO, and other styles
23

Alshuraiqi, Hamza Sulimansallam. "Improved Term Frequency Inverse Document Frequency (TF-IDF) Method for Arabic Text Classification." International Journal of Advanced Trends in Computer Science and Engineering 9, no. 5 (2020): 6939–46. http://dx.doi.org/10.30534/ijatcse/2020/11952020.

Full text
APA, Harvard, Vancouver, ISO, and other styles
24

Riadi, Imam, Sunardi Sunardi, and Panggah Widiandana. "Mobile Forensics for Cyberbullying Detection using Term Frequency - Inverse Document Frequency (TF-IDF)." Jurnal Ilmiah Teknik Elektro Komputer dan Informatika 5, no. 2 (2020): 68. http://dx.doi.org/10.26555/jiteki.v5i2.14510.

Full text
APA, Harvard, Vancouver, ISO, and other styles
25

Vichianchai, Vuttichai, and Sumonta Kasemvilas. "A New Term Frequency with Gaussian Technique for Text Classification and Sentiment Analysis." Journal of ICT Research and Applications 15, no. 2 (2021): 152–68. http://dx.doi.org/10.5614/itbj.ict.res.appl.2021.15.2.4.

Full text
Abstract:
This paper proposes a new term frequency with a Gaussian technique (TF-G) to classify the risk of suicide from Thai clinical notes and to perform sentiment analysis based on Thai customer reviews and English tweets of travelers that use US airline services. This research compared TF-G with term weighting techniques based on Thai text classification methods from previous researches, including the bag-of-words (BoW), term frequency (TF), term frequency-inverse document frequency (TF-IDF), and term frequency-inverse corpus document frequency (TF-ICF) techniques. Suicide risk classification and se
APA, Harvard, Vancouver, ISO, and other styles
26

Rofiqi, Moh Afif, Abd Charis Fauzan, Afivatu Pratama Agustin, and Ahmad Agung Saputra. "Implementasi Term-Frequency Inverse Document Frequency (TF-IDF) Untuk Mencari Relevansi Dokumen Berdasarkan Query." ILKOMNIKA: Journal of Computer Science and Applied Informatics 1, no. 2 (2019): 58–64. http://dx.doi.org/10.28926/ilkomnika.v1i2.18.

Full text
Abstract:
Tujuan dibuatnya penelitian ini adalah untuk mencari relevansi antar beberapa dokumen berupa artikel berita dari beberapa sumber. Metode yang digunakan yaitu metode Term-Frequency Inverse Document Frequency karena relevan untuk keakuratan sebuah dokumen. Term-Frequency Inverse Document Frequency adalah perhitungan atau pembobotan kata melalui teknik tokenisasi, stopwords, dan steming, dan frekuensi munculnya kata dalam dokumen yang diberikan menunjukkan pentingnya kata itu di dalam sebuah dokumen. Yang mengunakan data dari artikel berita metode ini melakukan pembobotan kata didalam sebuah doku
APA, Harvard, Vancouver, ISO, and other styles
27

Anna, Fay E. Naïve, and B. Barbosa Jocelyn. "Efficient Accreditation Document Classification Using Naïve Bayes Classifier." Indian Journal of Science and Technology 15, no. 1 (2022): 9–18. https://doi.org/10.17485/IJST/v15i1.1761.

Full text
Abstract:
ABSTRACT <strong>Objectives:</strong>&nbsp;To develop a desktop application that automatically classifies a document as to which area of accreditation documents it should belong to. Specifically, it aims to: a) To create a predictive model that addresses document classification tasks. b) To design and develop an application that classifies documents according to document classification. c) To evaluate the performance measures of the automatic document classification.&nbsp;<strong>Methods:</strong>&nbsp;We introduce an innovative approach for the automatic classification of accreditation docume
APA, Harvard, Vancouver, ISO, and other styles
28

I Wayan Alston Argodi, Eva Yulia Puspaningrum, and Muhammad Muharrom Al Haromainy. "IMPLEMENTASI METODE TF-IDF DAN ALGORITMA NAIVE BAYES DALAM APLIKASI DIABETIC BERBASIS ANDROID." Jurnal Teknik Mesin, Elektro dan Ilmu Komputer 3, no. 2 (2023): 23–33. http://dx.doi.org/10.55606/teknik.v3i2.2009.

Full text
Abstract:
Diabetes is a serious disease that occurs when the pancreas does not produce enough insulin as a hormone that regulates blood sugar in the body. This disease also has an impact on health. This research builds an Android-based application called Diabetic to help classify and provide information related to diabetes and analyze the performance of the Term Frequency Inverse Document Frequency method and the Naive Bayes algorithm. The Term Frequency Inverse Document Frequency method is a technique for calculating the presence of words in a collection of documents by creating document vectors. The N
APA, Harvard, Vancouver, ISO, and other styles
29

Mujilahwati, Siti. "Kombinasi Algoritma Data Reduksi untuk Optimalisasi Dokumen Cluster." Jurnal Eksplora Informatika 12, no. 2 (2023): 113–19. http://dx.doi.org/10.30864/eksplora.v12i2.819.

Full text
Abstract:
Clustering adalah proses pengelompokkan tanpa pelatihan (unsupervised learning), salah satu algoritma yang dapat diterapkan untuk clustering adalah K-Means. Algoritma ini memiliki kinerja dengan konsep menghitung jarak terdekat dari sebuah cluster. Penelitian ini bertujuan untuk melakukan optimasi hasil clustering data abstrak skripsi dengan algoritma K-Means tersebut. Upaya yang dilakukan untuk optimalisasi hasil cluster adalah dengan model kombinasi algoritma Latent Semantic Analysis (LSA), Term Frequency – Inverse Document Frequency (TF-IDF) dan Hashing. Seperti penanganan data teks pada um
APA, Harvard, Vancouver, ISO, and other styles
30

Deo, Tula Kanta, Rajesh Keshavrao Deshmukh, and Gajendra Sharma. "Comparative Study among Term Frequency-Inverse Document Frequency and Count Vectorizer towards K Nearest Neighbor and Decision Tree Classifiers for Text Dataset." Nepal Journal of Multidisciplinary Research 7, no. 2 (2024): 1–11. http://dx.doi.org/10.3126/njmr.v7i2.68189.

Full text
Abstract:
Background: Text classification techniques are increasingly important with the exponential growth of textual data on the internet. Term Frequency-Inverse Document Frequency (TF-IDF) and Count Vectorizer(CV) are commonly used methods for feature extraction. TF-IDF assigning weights to terms based on their frequency. CV simply counts the occurrences of terms. The performance of CV as well as TF-IDF are evaluated and compared with KNN and DT classifiers across text datasets. Methodology: The investigation begins with preprocessing. The feature vectors are created using both TF-IDF and CV. Feature
APA, Harvard, Vancouver, ISO, and other styles
31

Sharma, Saurabh, Zohaib Hasan, and Vishal Paranjape. "Applying Naive Bayes Techniques for Accurate Sentiment Analysis in Movie Reviews." International Journal of Innovative Research in Computer and Communication Engineering 10, no. 10 (2023): 8205–12. http://dx.doi.org/10.15680/ijircce.2022.1010019.

Full text
Abstract:
This study examines the effectiveness of Naive Bayes and Logistic Regression classifiers in analyzing the sentiment of movie reviews. Two feature extraction approaches, namely Bag of Words (BoW) and Term Frequency-Inverse Document Frequency (TF-IDF), are utilized. We employed a dataset comprising 50,000 IMDB reviews that underwent preprocessing techniques such as denoising, stop word removal, and stemming. The reviews were transformed into vectors using Bag-of-Words (BoW) and Term Frequency-Inverse Document Frequency (TFIDF) approaches. Our investigation demonstrates that Logistic Regression s
APA, Harvard, Vancouver, ISO, and other styles
32

Intan, Rolly, and Andrew Defeng. "HARD: SUBJECT-BASED SEARCH ENGINE MENGGUNAKAN TF-IDF DAN JACCARD’S COEFFICIENT." Jurnal Teknik Industri 8, no. 1 (2006): 61–72. http://dx.doi.org/10.9744/jti.8.1.61-72.

Full text
Abstract:
This paper proposes a hybridized concept of search engine based on subject parameter of High Accuracy Retrieval from Documents (HARD). Tf-Idf and Jaccard's Coefficient are modified and extended to providing the concept. Several illustrative examples are given including their steps of calculations in order to clearly understand the proposed concept and formulas.&#x0D; &#x0D; &#x0D; Abstract in Bahasa Indonesia : &#x0D; &#x0D; Paper ini memperkenalkan suatu algorima search engine berdasarkan konsep HARD (High Accuracy Retrieval from Documents) dengan menggabungkan penggunaan metoda TF-IDF (Term
APA, Harvard, Vancouver, ISO, and other styles
33

Yunitarini, Rika, Jhon Filius Gultom, and Evy Maya Stefany. "Klasifikasi Jamu Tradisional Madura Menggunakan Metode K-Nearest Neighbors (KNN) dan Term Frequency-Inverse Document Frequency (TF-IDF) Sebagai Representasi Teks." Jurnal Informatika Polinema 11, no. 1 (2024): 99–106. https://doi.org/10.33795/jip.v11i1.6456.

Full text
Abstract:
Jamu Madura merupakan jamu tradisional yang digunakan untuk alternatif pengobatan maupun perawatan tubuh, baik laki-laki maupun perempuan. Penelitian ini bertujuan untuk melakukan proses pengembangan sistem otomatis untuk suatu klasifikasi jamu Madura dengan menggunakan pemodelan K-Nearest Neighbors (KNN) yang didukung oleh representasi teks TF-IDF (Term Frequency-Inverse Document Frequency). Dimana K-Nearest Neighbors adalah salah satu algoritma dalam suatu teknik machine learning yang digunakan untuk melakukan proses klasifikasi dan regresi, sedangkan TF-IDF (Term Frequency-Inverse Document
APA, Harvard, Vancouver, ISO, and other styles
34

Sulaksono, Juli, Risky Aswi Ramadhani, and Ratih Kumalasari Niswatin. "Automatic Article Summary with the Term Frequency-Inverse Document Frequency Algorithm for Information on Elderly Health." Journal of Computational and Theoretical Nanoscience 17, no. 2 (2020): 1511–13. http://dx.doi.org/10.1166/jctn.2020.8833.

Full text
Abstract:
Elderly is someone whose age ranges from 60–74 years. At that age, one’s health tends to decline. Various programs have been provided by the Indonesian government, such as providing information, giving brochures, and giving announcements on the health service website. But this counselling is not optimal because of the elderly, tend to be lazy to read this because the eyes have started to farsight. So that the health information provided by dina health can be optimal, we try to make a model that is used to summarize an article so that the article is easily understood by the elderly. To summariz
APA, Harvard, Vancouver, ISO, and other styles
35

Qhabib, Fiqih Ainul, Abd Charis Fauzan, and Harliana Harliana. "Implementasi Algoritma Term Frequency Inverse Document Frequency (TF-IDF) dalam Menganalisis Sentimen Masyarakat Terhadap Covid-19 Varian Omicron." JTIM : Jurnal Teknologi Informasi dan Multimedia 4, no. 4 (2023): 308–18. http://dx.doi.org/10.35746/jtim.v4i4.233.

Full text
Abstract:
The latest variant was detected on November 24, 2021, namely the Omicron variant WHO said, Omicron was one of the Covid-19 variants that had mutated, with a very fast spread rate. The Government Republic of Indonesia has officially banned all foreigners from entering Indonesia, both those who have done so travel or come from countries exposed to the Omicron variant. This study uses data that has been processed using Netlytic online website. Netlytic analyzes text and visualizes public online conversations on social media sites. text preprocessing has several stages, namely case folding, tokeni
APA, Harvard, Vancouver, ISO, and other styles
36

Ramadhan, Fikri Alwan, Sampe Hotlan Sitorus, and Tedy Rismawan. "Penerapan Metode Multinomial Naïve Bayes untuk Klasifikasi Judul Berita Clickbait dengan Term Frequency - Inverse Document Frequency." Jurnal Sistem dan Teknologi Informasi (JustIN) 11, no. 1 (2023): 70. http://dx.doi.org/10.26418/justin.v11i1.57452.

Full text
Abstract:
Clickbait merupakan judul berita yang bombastis dan memberikan informasi tidak utuh sehingga membuat pembaca penasaran ingin tahu dengan cara mengklik tautan berita. Penggunaan judul berita clickbait terkadang bersifat menjebak karena judul dari artikel tersebut bersifat tidak utuh. Hal tersebut menyebabkan kesimpulan yang didapat dari judul dan isi berita terkadang tidak sesuai. Sehingga perlu dilakukan penelitian untuk mengklasifikasi judul berita yang termasuk clickbait atau bukan. Penelitian ini menggunakan metode Multinomial Naïve Bayes dan TF-IDF (Term Frequency - Inverse Document Freque
APA, Harvard, Vancouver, ISO, and other styles
37

AlShammari, Ahmad Farhan. "Implementation of Keyword Extraction using Term Frequency-Inverse Document Frequency (TF-IDF) in Python." International Journal of Computer Applications 185, no. 35 (2023): 9–14. http://dx.doi.org/10.5120/ijca2023923137.

Full text
APA, Harvard, Vancouver, ISO, and other styles
38

Sai, Ch Pranay. "Webpage Metadata Extration Using Machine Learning Techiniques." International Journal for Research in Applied Science and Engineering Technology 11, no. 12 (2023): 851–54. http://dx.doi.org/10.22214/ijraset.2023.57449.

Full text
Abstract:
Abstract: This Python script defines a Flask web application enabling users to input a URL. The application fetches the webpage content and utilizes TF-IDF (Term Frequency-Inverse Document Frequency) analysis to extract information like the title, description, and top keywords. The / route renders an HTML template (index.html) for user input, while the /extract route handles a POST request, fetching the webpage content, extracting relevant information using TF-IDF analysis, and rendering the results in another HTML template (result.html). The TF-IDF process involves tokenizing the text, elimin
APA, Harvard, Vancouver, ISO, and other styles
39

Nugroho, Satyawan Agung, Fitra A. Bachtiar, and Randy Cahya Wihandika. "ASPECT EXTRACTION IN E-COMMERCE USING LATENT DIRICHLET ALLOCATION (LDA) WITH TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF)." Jurnal Ilmiah Kursor 11, no. 2 (2022): 53. http://dx.doi.org/10.21107/kursor.v11i2.247.

Full text
Abstract:
Social media is a common thing that people use. Posts or comments found on social media describe someone’s feelings and opinions so there have to be important topics that can be extracted from social media. In the e-commerce field, topic is an interesting thing to know because it can describes people’s opinion towards a product. However, the large number of social media users is currently making the process of finding topics from social media difficult, so computer assistance is needed. One method that can be used is Latent Dirichlet Allocation (LDA). LDA is a good method for extracting to
APA, Harvard, Vancouver, ISO, and other styles
40

Rohman, Arif Nur, Riska Dwi Handayani, Ryan Dwi Y. P., and Kusrini Kusrini. "DETEKSI EMOSI MEDIA SOSIAL MENGGUNAKAN TERM FREQUENCY- INVERSE DOCUMENT FREQUENCY." CSRID (Computer Science Research and Its Development Journal) 11, no. 3 (2021): 140. http://dx.doi.org/10.22303/csrid.11.3.2019.140-148.

Full text
Abstract:
&lt;em&gt;Pada saat ini, manusia cenderung mengekspresikan pendapat, dan emosi melalui media sosial. Keterbukaan ekspresi pada media sosial membuat batasan batasan pribadi seseorang menjadi lebur. Orang tidak lagi sungkan menulis kehidupan pribadinya melalui postingan status pembaharuan untuk dilihat oleh orang lain. Penulis mencoba menggunakan data dari media sosial agar dapat dilakukan analisis untuk mendapatan informasi kepribadian termasuk emosi. Sebelum dianalisis, data dilakukan pra pemrosesan membuang symbol dan icon, normalisasi teks, stemming dan membuang stopword terlebih dahulu untu
APA, Harvard, Vancouver, ISO, and other styles
41

Rosalina, Siti, and Naim Rochmawati. "Membangun Search Engine “Caari” dengan Metode Term Frequency Invers Document Frequency (TF-IDF) dan Rekomendasi Simple Multi Attribute Rating Techique (SMART)." Journal of Informatics and Computer Science (JINACS) 5, no. 03 (2023): 313–23. http://dx.doi.org/10.26740/jinacs.v5n03.p313-323.

Full text
Abstract:
Search engine merupakan sebuah website atau perangkat lunak yang dibangun dengan tujuan membantu pengguna untuk mencari informasi di internet. Tidak semua search engine memberikan hasil yang relevan sehingga pengguna tidak menemukan informasi secara cepat dan akurat. Kecenderungan pengguna mengklik hasil halaman pertama membuat perangkingan hasil dapat meningkatkan relevansi informasi yang diinginkan pengguna. Maka dari itu dibuatlah sebuah sebuah search engine “Caari” dengan metode Term Frequency Inverse Document Frequency (TF-IDF) dan rekomendasi Simple Multi Attribute Rating Technique (SMAR
APA, Harvard, Vancouver, ISO, and other styles
42

P, Vivekananth, and Sharma Navneet. "Detecting Cyberbullying in Social Media: An NLP-Based Classification Framework." Indian Journal of Science and Technology 18, no. 5 (2025): 380–89. https://doi.org/10.17485/IJST/v18i5.1491.

Full text
Abstract:
<strong>Objectives:</strong>&nbsp;To accomplish better cyberbullying classification accuracy through Modified Term Frequency and Inverse Document Frequency (MTF-IDF) with Machine Learning (ML) technique. The cyberbullying classification is improved by modifying the parameter by hypertuning the concept in MTF-IDF using the Optuna method.&nbsp;<strong>Method:</strong>&nbsp;To categorize the bullying text, this study discusses the Ensemble model that integrates MTF-IDF and ML methods with sophisticated feature extraction approaches. This research has considered 47692 tweets along with the label o
APA, Harvard, Vancouver, ISO, and other styles
43

Lan, Fei. "Research on Text Similarity Measurement Hybrid Algorithm with Term Semantic Information and TF-IDF Method." Advances in Multimedia 2022 (April 23, 2022): 1–11. http://dx.doi.org/10.1155/2022/7923262.

Full text
Abstract:
TF-IDF (term frequency-inverse document frequency) is one of the traditional text similarity calculation methods based on statistics. Because TF-IDF does not consider the semantic information of words, it cannot accurately reflect the similarity between texts, and semantic information enhanced methods distinguish between text documents poorly because extended vectors with semantic similar terms aggravate the curse of dimensionality. Aiming at this problem, this paper advances a hybrid with the semantic understanding and TF-IDF to calculate the similarity of texts. Based on term similarity weig
APA, Harvard, Vancouver, ISO, and other styles
44

Harieby, Edo, Hoiriyah Hoiriyah, and Miftahul Walid. "TWITTER TEXT MINING MENGENAI ISU VAKSINASI COVID-19 MENGGUNAKAN METODE TERM FREQUENCY, INVERSE DOCUMENT FREQUENCY (TF-IDF)." JATI (Jurnal Mahasiswa Teknik Informatika) 6, no. 2 (2022): 532–37. http://dx.doi.org/10.36040/jati.v6i2.5129.

Full text
Abstract:
Penyebaran informasi mengenai vaksin covid-19 menarik perhatian masyarakat. Berbagai macam isu bermunculan terkait halal dan tidaknya vaksinasi covid-19 dilakukan. Media sosial Twitter salah satunya yang memberikan ruang pada masyarakat untuk menanyakan dan berkomentar terkait vaksin covid-19 melalui cuitan (tweet) ataupun retweet. Dengan metode TF-IDF, penelitian ini dilakukan untuk menganalisis text (analisis sentimen) dari kumpulan tweet sehingga hasilnya diketahui banyaknya kata yang muncul dapat menjadi suatu kata kunci dalam perbincangan di Twitter, bahwa banyak masyarakat yang menyetuju
APA, Harvard, Vancouver, ISO, and other styles
45

Alfarizi, Muhammad Ibnu, Lailis Syafaah, and Merinda Lestandy. "Emotional Text Classification Using TF-IDF (Term Frequency-Inverse Document Frequency) And LSTM (Long Short-Term Memory)." JUITA : Jurnal Informatika 10, no. 2 (2022): 225. http://dx.doi.org/10.30595/juita.v10i2.13262.

Full text
Abstract:
Humans in carrying out communication activities can express their feelings either verbally or non-verbally. Verbal communication can be in the form of oral or written communication. A person's feelings or emotions can usually be seen by their behavior, tone of voice, and expression. Not everyone can see emotion only through writing, whether in the form of words, sentences, or paragraphs. Therefore, a classification system is needed to help someone determine the emotions contained in a piece of writing. The novelty of this study is a development of previous research using a similar method, name
APA, Harvard, Vancouver, ISO, and other styles
46

Silalahi, Natalia, and Guidio Leonarde Ginting. "Analisa Sentimen Masyarakat Dalam Penggunaan Vaksin Sinovac Dengan Menerapkan Algoritma Term Frequence – Inverse Document Frequence (TF-IDF) dan Metode Deskripsi." Journal of Information System Research (JOSH) 3, no. 3 (2022): 206–17. http://dx.doi.org/10.47065/josh.v3i3.1441.

Full text
Abstract:
Socialmedia is a medium used by Indonesian people to socialize and also as a medium to express their thoughts on something. Communities who support and reject the procurement of the Sinovac vaccine carried out by the Indonesian government based on the responses submitted by the community on Twitter regarding the procurement of the corona virus vaccine, it can be known in general how much community support, rejects or is neutral on the procurement of the Sinovac corona virus vaccine. using RapidMiner 9.0 using the search operator twitter, then processing the data with the Text Mining Algorithm
APA, Harvard, Vancouver, ISO, and other styles
47

Albeer, Rand Abdulwahid, Huda F. Al-Shahad, Hiba J. Aleqabie, and Noor D. Al-shakarchy. "Automatic summarization of YouTube video transcription text using term frequency-inverse document frequency." Indonesian Journal of Electrical Engineering and Computer Science 26, no. 3 (2022): 1512–19. https://doi.org/10.11591/ijeecs.v26.i3.pp1512-1519.

Full text
Abstract:
Automatic summarization is a technique for quickly introducing key information by abbreviating large sections of material. Summarization may apply to text and video with a different method to display the abstract of the subject. Natural language processing is employed in automated text summarization in this research, which applies to YouTube videos by transcribing and applying the summary stages in this study. Based on the number of words and sentences in the text, the method term frequencyinverse document frequency (TF-IDF) was used to extract the important keywords for the summary. Some vide
APA, Harvard, Vancouver, ISO, and other styles
48

Assaf, Kamel. "Testing Different Log Bases for Vector Model Weighting Technique." International Journal on Natural Language Computing 12, no. 03 (2023): 1–15. http://dx.doi.org/10.5121/ijnlc.2023.12301.

Full text
Abstract:
Information retrieval systems retrieves relevant documents based on a query submitted by the user. The documents are initially indexed and the words in the documents are assigned weights using a weighting technique called TFIDF which is the product of Term Frequency (TF) and Inverse Document Frequency (IDF). TF represents the number of occurrences of a term in a document. IDF measures whether the term is common or rare across all documents. It is computed by dividing the total number of documents in the system by the number of documents containing the term and then computing the logarithm of t
APA, Harvard, Vancouver, ISO, and other styles
49

Kamel, Assaf. "Testing Different Log Bases for Vector Model Weighting Technique." International Journal on Natural Language Computing (IJNLC) 12, no. 3 (2023): 15. https://doi.org/10.5281/zenodo.8138320.

Full text
Abstract:
Information retrieval systems retrieves relevant documents based on a query submitted by the user. The documents are initially indexed and the words in the documents are assigned weights using a weighting technique called TFIDF which is the product of Term Frequency (TF) and Inverse Document Frequency (IDF). TF represents the number of occurrences of a term in a document. IDF measures whether the term is common or rare across all documents. It is computed by dividing the total number of documents in the system by the number of documents containing the term and then computing the logarithm of t
APA, Harvard, Vancouver, ISO, and other styles
50

Chiraratanasopha, Boonthida, Thanaruk Theeramunkong, and Salin Boonbrahm. "Hierarchical text classification using Relative Inverse Document Frequency." ECTI Transactions on Computer and Information Technology (ECTI-CIT) 15, no. 2 (2021): 166–76. http://dx.doi.org/10.37936/ecti-cit.2021152.240515.

Full text
Abstract:
Automatic hierarchical text classification has been a challenging and in-needed task with an increasing of hierarchical taxonomy from the booming of knowledge organization. The hierarchical structure identifies the relationships of dependence between different categories in which can be overlapped of generalized and specific concepts within the tree. This paper presents the use of frequency of the occurring term in related categories among the hierarchical tree to help in document classification. The four extended term weighting of Relative Inverse Document Frequency (IDFr) including its locat
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!