Log in

Relevant bibliographies by topics / Term Frequency-Inverse Document Frequency Vectors / Journal articles

To see the other types of publications on this topic, follow the link: Term Frequency-Inverse Document Frequency Vectors.

Journal articles on the topic 'Term Frequency-Inverse Document Frequency Vectors'

Author: Grafiati

Published: 4 June 2025

Last updated: 1 August 2025

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 journal articles for your research on the topic 'Term Frequency-Inverse Document Frequency Vectors.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse journal articles on a wide variety of disciplines and organise your bibliography correctly.

1

Mohammed, Mohannad T., and Omar Fitian Rashid. "Document retrieval using term term frequency inverse sentence frequency weighting scheme." Indonesian Journal of Electrical Engineering and Computer Science 31, no. 3 (2023): 1478. http://dx.doi.org/10.11591/ijeecs.v31.i3.pp1478-1485.

Full text

Abstract:

The need for an efficient method to find the furthermost appropriate document corresponding to a particular search query has become crucial due to the exponential development in the number of papers that are now readily available to us on the web. The vector space model (VSM) a perfect model used in “information retrieval”, represents these words as a vector in space and gives them weights via a popular weighting method known as term frequency inverse document frequency (TF-IDF). In this research, work has been proposed to retrieve the most relevant document focused on representing documents a

APA, Harvard, Vancouver, ISO, and other styles

2

Mohannad, T. Mohammed, and Fitian Rashid Omar. "Document retrieval using term frequency inverse sentence frequency weighting scheme." Document retrieval using term frequency inverse sentence frequency weighting scheme 31, no. 3 (2023): 1478–85. https://doi.org/10.11591/ijeecs.v31.i3.pp1478-1485.

Full text

Abstract:

The need for an efficient method to find the furthermost appropriate document corresponding to a particular search query has become crucial due to the exponential development in the number of papers that are now readily available to us on the web. The vector space model (VSM) a perfect model used in “information retrieval”, represents these words as a vector in space and gives them weights via a popular weighting method known as term frequency inverse document frequency (TF-IDF). In this research, work has been proposed to retrieve the most relevant document focused on representing

APA, Harvard, Vancouver, ISO, and other styles

3

Widianto, Adi, Eka Pebriyanto, Fitriyanti Fitriyanti, and Marna Marna. "Document Similarity Using Term Frequency-Inverse Document Frequency Representation and Cosine Similarity." Journal of Dinda : Data Science, Information Technology, and Data Analytics 4, no. 2 (2024): 149–53. http://dx.doi.org/10.20895/dinda.v4i2.1589.

Full text

Abstract:

Document similarity is a fundamental task in natural language processing and information retrieval, with applications ranging from plagiarism detection to recommendation systems. In this study, we leverage the term frequency-inverse document frequency (TF-IDF) to represent documents in a high-dimensional vector space, capturing their unique content while mitigating the influence of common terms. Subsequently, we employ the cosine similarity metric to measure the similarity between pairs of documents, which assesses the angle between their respective TF-IDF vectors. To evaluate the effectivenes

APA, Harvard, Vancouver, ISO, and other styles

4

Hu, Zheng, Hua Dai, Geng Yang, Xun Yi, and Wenjie Sheng. "Semantic-Based Multi-Keyword Ranked Search Schemes over Encrypted Cloud Data." Security and Communication Networks 2022 (April 29, 2022): 1–15. http://dx.doi.org/10.1155/2022/4478618.

Full text

Abstract:

Traditional searchable encryption schemes construct document vectors based on the term frequency-inverse document frequency (TF-IDF) model. Such vectors are not only high-dimensional and sparse but also ignore the semantic information of the documents. The Sentence Bidirectional Encoder Representations from Transformers (SBERT) model can be used to train vectors containing document semantic information to realize semantic-aware multi-keyword search. In this paper, we propose a privacy-preserving searchable encryption scheme based on the SBERT model. The SBERT model is used to train vectors con

APA, Harvard, Vancouver, ISO, and other styles

5

Priyanka, Mesariya, and Madia Nidhi. "Document Ranking using Customizes Vector Method." International Journal of Trend in Scientific Research and Development 1, no. 4 (2017): 278–83. https://doi.org/10.31142/ijtsrd125.

Full text

Abstract:

Information retrieval IR system is about positioning reports utilizing clients question and get the important records from extensive dataset. Archive positioning is fundamentally looking the pertinent record as per their rank. Document ranking is basically search the relevant document according to their rank. Vector space model is traditional and widely applied information retrieval models to rank the web page based on similarity values. Term weighting schemes are the significant of an information retrieval system and it is query used in document ranking. Tf idf ranked calculates the term weig

APA, Harvard, Vancouver, ISO, and other styles

6

A. Nicholas, Danie, and Devi Jayanthila. "Data retrieval in cancer documents using various weighting schemes." i-manager's Journal on Information Technology 12, no. 4 (2023): 28. http://dx.doi.org/10.26634/jit.12.4.20365.

Full text

Abstract:

In the realm of data retrieval, sparse vectors serve as a pivotal representation for both documents and queries, where each element in the vector denotes a word or phrase from a predefined lexicon. In this study, multiple scoring mechanisms are introduced aimed at discerning the significance of specific terms within the context of a document extracted from an extensive textual dataset. Among these techniques, the widely employed method revolves around inverse document frequency (IDF) or Term Frequency-Inverse Document Frequency (TF-IDF), which emphasizes terms unique to a given context. Additi

APA, Harvard, Vancouver, ISO, and other styles

7

Murata, Hiroshi, Takashi Onoda, and Seiji Yamada. "Comparative Analysis of Relevance for SVM-Based Interactive Document Retrieval." Journal of Advanced Computational Intelligence and Intelligent Informatics 17, no. 2 (2013): 149–56. http://dx.doi.org/10.20965/jaciii.2013.p0149.

Full text

Abstract:

Support Vector Machines (SVMs) were applied to interactive document retrieval that uses active learning. In such a retrieval system, the degree of relevance is evaluated by using a signed distance from the optimal hyperplane. It is not clear, however, how the signed distance in SVMs has characteristics of vector space model. We therefore formulated the degree of relevance by using the signed distance in SVMs and comparatively analyzed it with a conventional Rocchio-based method. Although vector normalization has been utilized as preprocessing for document retrieval, few studies explained why v

APA, Harvard, Vancouver, ISO, and other styles

8

Ni'mah, Ana Tsalitsatun, and Agus Zainal Arifin. "Perbandingan Metode Term Weighting terhadap Hasil Klasifikasi Teks pada Dataset Terjemahan Kitab Hadis." Rekayasa 13, no. 2 (2020): 172–80. http://dx.doi.org/10.21107/rekayasa.v13i2.6412.

Full text

Abstract:

Hadis adalah sumber rujukan agama Islam kedua setelah Al-Qur’an. Teks Hadis saat ini diteliti dalam bidang teknologi untuk dapat ditangkap nilai-nilai yang terkandung di dalamnya secara pegetahuan teknologi. Dengan adanya penelitian terhadap Kitab Hadis, pengambilan informasi dari Hadis tentunya membutuhkan representasi teks ke dalam vektor untuk mengoptimalkan klasifikasi otomatis. Klasifikasi Hadis diperlukan untuk dapat mengelompokkan isi Hadis menjadi beberapa kategori. Ada beberapa kategori dalam Kitab Hadis tertentu yang sama dengan Kitab Hadis lainnya. Ini menunjukkan bahwa ada beberapa

APA, Harvard, Vancouver, ISO, and other styles

9

I Wayan Alston Argodi, Eva Yulia Puspaningrum, and Muhammad Muharrom Al Haromainy. "IMPLEMENTASI METODE TF-IDF DAN ALGORITMA NAIVE BAYES DALAM APLIKASI DIABETIC BERBASIS ANDROID." Jurnal Teknik Mesin, Elektro dan Ilmu Komputer 3, no. 2 (2023): 23–33. http://dx.doi.org/10.55606/teknik.v3i2.2009.

Full text

Abstract:

Diabetes is a serious disease that occurs when the pancreas does not produce enough insulin as a hormone that regulates blood sugar in the body. This disease also has an impact on health. This research builds an Android-based application called Diabetic to help classify and provide information related to diabetes and analyze the performance of the Term Frequency Inverse Document Frequency method and the Naive Bayes algorithm. The Term Frequency Inverse Document Frequency method is a technique for calculating the presence of words in a collection of documents by creating document vectors. The N

APA, Harvard, Vancouver, ISO, and other styles

10

Suhartono, Didit, and Khodirun Khodirun. "System of Information Feedback on Archive Using Term Frequency-Inverse Document Frequency and Vector Space Model Methods." IJIIS: International Journal of Informatics and Information Systems 3, no. 1 (2020): 36–42. http://dx.doi.org/10.47738/ijiis.v3i1.6.

Full text

Abstract:

The archive is one of the examples of documents that important. Archives are stored systematically with a view to helping and simplifying the storage and retrieval of the archive. In the information retrieval (Information retrieval) the process of retrieving relevant documents and not retrieving documents that are not relevant. To retrieve the relevant documents, a method is needed. Using the Term Frequency-Inverse Document and Vector Space Model methods can find relevant documents according to the level of closeness or similarity, in addition to applying the Nazief-Adriani stemming algorithm

APA, Harvard, Vancouver, ISO, and other styles

11

Naeem, Muhammad Zaid, Furqan Rustam, Arif Mehmood, Mui-zzud-din, Imran Ashraf, and Gyu Sang Choi. "Classification of movie reviews using term frequency-inverse document frequency and optimized machine learning algorithms." PeerJ Computer Science 8 (March 15, 2022): e914. http://dx.doi.org/10.7717/peerj-cs.914.

Full text

Abstract:

The Internet Movie Database (IMDb), being one of the popular online databases for movies and personalities, provides a wide range of movie reviews from millions of users. This provides a diverse and large dataset to analyze users’ sentiments about various personalities and movies. Despite being helpful to provide the critique of movies, the reviews on IMDb cannot be read as a whole and requires automated tools to provide insights on the sentiments in such reviews. This study provides the implementation of various machine learning models to measure the polarity of the sentiments presented in us

APA, Harvard, Vancouver, ISO, and other styles

12

Dhumale, Rakesh Bapu, Ajay Kumar Dass, Amit Umbrajkaar, and Pradeep Mane. "Enhancing cyberbullying detection with advanced text preprocessing and machine learning." International Journal of Electrical and Computer Engineering (IJECE) 15, no. 3 (2025): 3139. https://doi.org/10.11591/ijece.v15i3.pp3139-3148.

Full text

Abstract:

The use of social media and the internet has been increasing dramatically in recent years. Cyber-bullying is the term used to describe the misuse of social media by some people who make threatening comments. This has a devastating influence on people's lives, especially those of children and teenagers, and can lead to feelings of depression and suicidal thoughts. The methodology proposed in this paper includes four steps for identifying cyberbullying: preprocessing, feature extraction, classification, and evaluation. The first step is to create a labeled, varied dataset. Word2Vec and term freq

APA, Harvard, Vancouver, ISO, and other styles

13

Labd, Zakia, Said Bahassine, Khalid Housni, Ait Hamou Aadi Fatima Zahrae, and Khalid Benabbes. "Text classification supervised algorithms with term frequency inverse document frequency and global vectors for word representation: a comparative study." International Journal of Electrical and Computer Engineering (IJECE) 14, no. 1 (2024): 589–99. https://doi.org/10.11591/ijece.v14i1.pp589-599.

Full text

Abstract:

Over the course of the previous two decades, there has been a rise in the quantity of text documents stored digitally. The ability to organize and categorize those documents in an automated mechanism, is known as text categorization which is used to classify them into a set of predefined categories so they may be preserved and sorted more efficiently. Identifying appropriate structures, architectures, and methods for text classification presents a challenge for researchers. This is due to the significant impact this concept has on content management, contextual search, opinion mining, product

APA, Harvard, Vancouver, ISO, and other styles

14

Xie, Lixia, Ziying Wang, Yue Wang, Hongyu Yang, and Jiyong Zhang. "New Multi-Keyword Ciphertext Search Method for Sensor Network Cloud Platforms." Sensors 18, no. 9 (2018): 3047. http://dx.doi.org/10.3390/s18093047.

Full text

Abstract:

This paper proposed a multi-keyword ciphertext search, based on an improved-quality hierarchical clustering (MCS-IQHC) method. MCS-IQHC is a novel technique, which is tailored to work with encrypted data. It has improved search accuracy and can self-adapt when performing multi-keyword ciphertext searches on privacy-protected sensor network cloud platforms. Document vectors are first generated by combining the term frequency-inverse document frequency (TF-IDF) weight factor and the vector space model (VSM). The improved quality hierarchical clustering (IQHC) algorithm then generates document ve

APA, Harvard, Vancouver, ISO, and other styles

15

Shehzad, Farhan, Abdur Rehman, Kashif Javed, Khalid A. Alnowibet, Haroon A. Babri, and Hafiz Tayyab Rauf. "Binned Term Count: An Alternative to Term Frequency for Text Categorization." Mathematics 10, no. 21 (2022): 4124. http://dx.doi.org/10.3390/math10214124.

Full text

Abstract:

In text categorization, a well-known problem related to document length is that larger term counts in longer documents cause classification algorithms to become biased. The effect of document length can be eliminated by normalizing term counts, thus reducing the bias towards longer documents. This gives us term frequency (TF), which in conjunction with inverse document frequency (IDF) became the most commonly used term weighting scheme to capture the importance of a term in a document and corpus. However, normalization may cause term frequency of a term in a related document to become equal or

APA, Harvard, Vancouver, ISO, and other styles

16

Srikanth, Bethu* B. Sankara Babu. "DATA MINING AND TEXT MINING: EFFICIENT TEXT CLASSIFICATION USING SVMS FOR LARGE DATASETS." Global Journal of Engineering Science and Research Management 3, no. 8 (2016): 47–56. https://doi.org/10.5281/zenodo.60657.

Full text

Abstract:

The Text mining and Data mining supports different kinds of algorithms for classification of large data sets. The Text Categorization is traditionally done by using the Term Frequency and Inverse Document Frequency. This method does not satisfy elimination of unimportant words in the document. For reducing the error classifying of documents in wrong category, efficient classification algorithms are needed. Support Vector Machines (SVM) is used based on the large margin data sets for classification algorithms that give good generalization, compactness and performance. Support Vector Machines (S

APA, Harvard, Vancouver, ISO, and other styles

17

Sharma, Saurabh, Zohaib Hasan, and Vishal Paranjape. "Applying Naive Bayes Techniques for Accurate Sentiment Analysis in Movie Reviews." International Journal of Innovative Research in Computer and Communication Engineering 10, no. 10 (2023): 8205–12. http://dx.doi.org/10.15680/ijircce.2022.1010019.

Full text

Abstract:

This study examines the effectiveness of Naive Bayes and Logistic Regression classifiers in analyzing the sentiment of movie reviews. Two feature extraction approaches, namely Bag of Words (BoW) and Term Frequency-Inverse Document Frequency (TF-IDF), are utilized. We employed a dataset comprising 50,000 IMDB reviews that underwent preprocessing techniques such as denoising, stop word removal, and stemming. The reviews were transformed into vectors using Bag-of-Words (BoW) and Term Frequency-Inverse Document Frequency (TFIDF) approaches. Our investigation demonstrates that Logistic Regression s

APA, Harvard, Vancouver, ISO, and other styles

18

S, Nithish Kumar, Sai Subhakar M, and Veeresh K. "Students Query Classification System." International Journal of Recent Technology and Engineering (IJRTE) 9, no. 5 (2021): 191–94. https://doi.org/10.35940/ijrte.E5247.019521.

Full text

Abstract:

Abstract: A University or educational institute generally receives a bulk of complaints posted by students every day. The issues relate to their academics or any issues related to their education or related to exam sections etc., because of these bulk of complaints received from the students every day, makes it difficult for the university to sort out them and classify them and send them to their respective departments for resolving the issues. In this project, we work on classifying these complaints based on the classes or departments they belong to, using. By using TF-IDF (term frequency-inv

APA, Harvard, Vancouver, ISO, and other styles

19

Jathe, Narendra. M., and Hemant S. Mahalle. "A Comparative Study of Boolean Model and Vector Space Model (Vsm) For Information Aggregation from Various Websites Using Web Content Mining Techniques: A Matlab Approach." International Journal of Advance and Applied Research 5, no. 23 (2024): 392–400. https://doi.org/10.5281/zenodo.13637383.

Full text

Abstract:

<strong>Abstract:</strong>             The rapid growth of World Wide Web and the abundance of documents and different forms of information available on it, has recorded the need for good Information Retrieval technique. By and large, three classic framework models have been used in the process of retrieving information: Boolean and Vector Space and Probabilistic. The (standard) Boolean model of information retrieval is a classical information retrieval (IR) model and, at the same time, the first and most-adopted one. The BIR is based on B

APA, Harvard, Vancouver, ISO, and other styles

20

Labd, Zakia, Said Bahassine, Khalid Housni, Fatima Zahrae Ait Hamou Aadi, and Khalid Benabbes. "Text classification supervised algorithms with term frequency inverse document frequency and global vectors for word representation: a comparative study." International Journal of Electrical and Computer Engineering (IJECE) 14, no. 1 (2024): 589. http://dx.doi.org/10.11591/ijece.v14i1.pp589-599.

Full text

Abstract:

Over the course of the previous two decades, there has been a rise in the quantity of text documents stored digitally. The ability to organize and categorize those documents in an automated mechanism, is known as text categorization which is used to classify them into a set of predefined categories so they may be preserved and sorted more efficiently. Identifying appropriate structures, architectures, and methods for text classification presents a challenge for researchers. This is due to the significant impact this concept has on content management, contextual search, opinion mining, product

APA, Harvard, Vancouver, ISO, and other styles

21

Matthew, Schofield, Alicioglu Gulsum, Sun Bo, et al. "COMPARISON OF MALWARE CLASSIFICATION METHODS USING CONVOLUTIONAL NEURAL NETWORK BASED ON API CALL STREAM." International Journal of Network Security & Its Applications (IJNSA) 13, no. 2 (2021): 01–19. https://doi.org/10.5281/zenodo.4674294.

Full text

Abstract:

Malicious software is constantly being developed and improved, so detection and classification of malwareis an ever-evolving problem. Since traditional malware detection techniques fail to detect new/unknown malware, machine learning algorithms have been used to overcome this disadvantage. We present a Convolutional Neural Network (CNN) for malware type classification based on the API (Application Program Interface) calls. This research uses a database of 7107 instances of API call streams and 8 different malware types:Adware, Backdoor, Downloader, Dropper, Spyware, Trojan, Virus,Worm. We used

APA, Harvard, Vancouver, ISO, and other styles

22

Ali, Yasser Ibrahim Abdelmonem, Mohammed Abdel Razek, and Nasser El-Sherbeny. "Social cyber-criminal, towards automatic real time recognition of malicious posts on Twitter." Indonesian Journal of Electrical Engineering and Computer Science 25, no. 2 (2022): 1199–207. https://doi.org/10.11591/ijeecs.v25.i2.pp1199-1207.

Full text

Abstract:

Easy access to the internet throughout the world has fully reformed the usage of social communication such as Facebook, Twitter, Linked In which are becoming a part of our life. Accordingly, cybercrime has become a vital problem, especially in developing countries. The dissemination of information with no risk of being discovered and fetched leads to an increase in cyber-criminal. Meanwhile, the huge amount of data continuously produced from Twitter made the discovery process of cyber-criminals is a tough assignment. This research will contribute in determined on the build the comparable vecto

APA, Harvard, Vancouver, ISO, and other styles

23

Angga, Putra, Lastri Widya Astuti, and Mustafa Ramadhan. "Pencarian Materi Kuliah Pada Aplikasi Blended Learning Menggunakan Metode Vector Space Model." Jurnal ULTIMATICS 8, no. 2 (2017): 92–101. http://dx.doi.org/10.31937/ti.v8i2.517.

Full text

Abstract:

Searching for a lot of materials are materials which is needed quickly and accurately. are by ranking them. Ranking is one branch of science of information retrieval. Information document search Vector Space Model (VSM). VSM uses the concept which is included in linear algebra is a vector space. Based on the concept that is used, the development of blended learning application uses space vector modeling method as an alternative for students in searching of relavan material toward materials needed, reducing the error level in the return of information and students can achieve goals quickly. Col

APA, Harvard, Vancouver, ISO, and other styles

24

Fauzi, A., E. B. Setiawan, and Z. K. A. Baizal. "Hoax News Detection on Twitter using Term Frequency Inverse Document Frequency and Support Vector Machine Method." Journal of Physics: Conference Series 1192 (March 2019): 012025. http://dx.doi.org/10.1088/1742-6596/1192/1/012025.

Full text

APA, Harvard, Vancouver, ISO, and other styles

25

Deo, Tula Kanta, Rajesh Keshavrao Deshmukh, and Gajendra Sharma. "Comparative Study among Term Frequency-Inverse Document Frequency and Count Vectorizer towards K Nearest Neighbor and Decision Tree Classifiers for Text Dataset." Nepal Journal of Multidisciplinary Research 7, no. 2 (2024): 1–11. http://dx.doi.org/10.3126/njmr.v7i2.68189.

Full text

Abstract:

Background: Text classification techniques are increasingly important with the exponential growth of textual data on the internet. Term Frequency-Inverse Document Frequency (TF-IDF) and Count Vectorizer(CV) are commonly used methods for feature extraction. TF-IDF assigning weights to terms based on their frequency. CV simply counts the occurrences of terms. The performance of CV as well as TF-IDF are evaluated and compared with KNN and DT classifiers across text datasets. Methodology: The investigation begins with preprocessing. The feature vectors are created using both TF-IDF and CV. Feature

APA, Harvard, Vancouver, ISO, and other styles

26

Ghag, Kranti Vithal, and Ketan Shah. "Conceptual Sentiment Analysis Model." International Journal of Electrical and Computer Engineering (IJECE) 8, no. 4 (2018): 2358. http://dx.doi.org/10.11591/ijece.v8i4.pp2358-2366.

Full text

Abstract:

<span>Bag-of-words approach is popularly used for Sentiment analysis. It maps the terms in the reviews to term-document vectors and thus disrupts the syntactic structure of sentences in the reviews. Association among the terms or the semantic structure of sentences is also not preserved. This research work focuses on classifying the sentiments by considering the syntactic and semantic structure of the sentences in the review. To improve accuracy, sentiment classifiers based on relative frequency, average frequency and term frequency inverse document frequency were proposed. To handle ter

APA, Harvard, Vancouver, ISO, and other styles

27

Kranti, Vithal Ghag, and Shah Ketan. "Conceptual Sentiment Analysis Model." International Journal of Electrical and Computer Engineering (IJECE) 8, no. 4 (2018): 2358–66. https://doi.org/10.11591/ijece.v8i4.pp2358-2366.

Full text

Abstract:

Bag-of-words approach is popularly used for Sentiment analysis. It maps the terms in the reviews to term-document vectors and thus disrupts the syntactic structure of sentences in the reviews. Association among the terms or the semantic structure of sentences is also not preserved. This research work focuses on classifying the sentiments by considering the syntactic and semantic structure of the sentences in the review. To improve accuracy, sentiment classifiers based on relative frequency, average frequency and term frequency inverse document frequency were proposed. To handle terms with apos

APA, Harvard, Vancouver, ISO, and other styles

28

Pratama, Septya Egho, Wahyudin Darmalaksana, Dian Sa'adillah Maylawati, Hamdan Sugilar, Teddy Mantoro, and Muhammad Ali Ramdhani. "Weighted inverse document frequency and vector space model for hadith search engine." Indonesian Journal of Electrical Engineering and Computer Science 18, no. 2 (2020): 1004. http://dx.doi.org/10.11591/ijeecs.v18.i2.pp1004-1014.

Full text

Abstract:

Hadith is the second source of Islamic law after Qur’an which make many types and references of hadith need to be studied. However, there are not many Muslims know about it and many even have difficulties in studying hadiths. This study aims to build a hadith search engine from reliable source by utilizing Information Retrieval techniques. The structured representation of the text that used is Bag of Word (1-term) with the Weighted Inverse Document Frequency (WIDF) method to calculate the frequency of occurrence of each term before being converted in vector form with the Vector Space Model (VS

APA, Harvard, Vancouver, ISO, and other styles

29

Septya, Egho Pratama, Darmalaksana Wahyudin, Sa'adillah Maylawati Dian, Sugilar Hamdan, Mantoro Teddy, and Ali Ramdhani Muhammad. "Weighted inverse document frequency and vector space model for hadith search engine." Indonesian Journal of Electrical Engineering and Computer Science (IJEECS) 18, no. 2 (2020): 1004–14. https://doi.org/10.11591/ijeecs.v18.i2.pp1004-1014.

Full text

Abstract:

Hadith is the second source of Islamic law after Qur’an which make many types and references of hadith need to be studied. However, there are not many Muslims know about it and many even have difficulties in studying hadiths. This study aims to build a hadith search engine from reliable source by utilizing Information Retrieval techniques. The structured representation of the text that used is Bag of Word (1-term) with the Weighted Inverse Document Frequency (WIDF) method to calculate the frequency of occurrence of each term before being converted in vector form with the Vector Space Mod

APA, Harvard, Vancouver, ISO, and other styles

30

Wawrzyński, Adam, and Julian Szymański. "Study of Statistical Text Representation Methods for Performance Improvement of a Hierarchical Attention Network." Applied Sciences 11, no. 13 (2021): 6113. http://dx.doi.org/10.3390/app11136113.

Full text

Abstract:

To effectively process textual data, many approaches have been proposed to create text representations. The transformation of a text into a form of numbers that can be computed using computers is crucial for further applications in downstream tasks such as document classification, document summarization, and so forth. In our work, we study the quality of text representations using statistical methods and compare them to approaches based on neural networks. We describe in detail nine different algorithms used for text representation and then we evaluate five diverse datasets: BBCSport, BBC, Ohs

APA, Harvard, Vancouver, ISO, and other styles

31

Assaf, Kamel. "Testing Different Log Bases for Vector Model Weighting Technique." International Journal on Natural Language Computing 12, no. 03 (2023): 1–15. http://dx.doi.org/10.5121/ijnlc.2023.12301.

Full text

Abstract:

Information retrieval systems retrieves relevant documents based on a query submitted by the user. The documents are initially indexed and the words in the documents are assigned weights using a weighting technique called TFIDF which is the product of Term Frequency (TF) and Inverse Document Frequency (IDF). TF represents the number of occurrences of a term in a document. IDF measures whether the term is common or rare across all documents. It is computed by dividing the total number of documents in the system by the number of documents containing the term and then computing the logarithm of t

APA, Harvard, Vancouver, ISO, and other styles

32

Kamel, Assaf. "Testing Different Log Bases for Vector Model Weighting Technique." International Journal on Natural Language Computing (IJNLC) 12, no. 3 (2023): 15. https://doi.org/10.5281/zenodo.8138320.

Full text

Abstract:

Information retrieval systems retrieves relevant documents based on a query submitted by the user. The documents are initially indexed and the words in the documents are assigned weights using a weighting technique called TFIDF which is the product of Term Frequency (TF) and Inverse Document Frequency (IDF). TF represents the number of occurrences of a term in a document. IDF measures whether the term is common or rare across all documents. It is computed by dividing the total number of documents in the system by the number of documents containing the term and then computing the logarithm of t

APA, Harvard, Vancouver, ISO, and other styles

33

Plisiecki, Hubert, and Agnieszka Kwiatkowska. "Discovering Representations of Democracy in Big Data: Purposive Semantic Sample Selection for Qualitative and Mixed-Methods Research." Przegląd Socjologii Jakościowej 20, no. 4 (2024): 18–43. https://doi.org/10.18778/1733-8069.20.4.02.

Full text

Abstract:

The increasing volume of large, multi-thematic text corpora in social sciences presents a challenge in selecting relevant documents for qualitative and mixed-methods research. Traditional sample selection methods require extensive manual coding or prior dataset knowledge, while unsupervised methods can yield inconsistent results with theory-driven coding. To address this, we propose purposive semantic sampling – a Natural Language Processing approach using document-level embeddings created by a weighted average of word vectors with term frequency-inverse document frequency (tf-idf). We demonst

APA, Harvard, Vancouver, ISO, and other styles

34

Yogish, Deepa, T. N. Manjunath, and Ravindra S. Hegadi. "Analysis of Vector Space Method in Information Retrieval for Smart Answering System." Journal of Computational and Theoretical Nanoscience 17, no. 9 (2020): 4468–72. http://dx.doi.org/10.1166/jctn.2020.9099.

Full text

Abstract:

In the world of internet, searching play a vital role to retrieve the relevant answers for the user specific queries. The most promising application of natural language processing and information retrieval system is Question answering system which provides directly the accurate answer instead of set of documents. The main objective of information retrieval is to retrieve relevant document from a huge volume of data sets underlying in the internet using appropriatemodel. There are many models proposed for retrieval process such as Boolean, Vector space and Probabilistic method. Vector space mod

APA, Harvard, Vancouver, ISO, and other styles

35

Xie, Chunli, Xia Wang, Cheng Qian, and Mengqi Wang. "A Source Code Similarity Based on Siamese Neural Network." Applied Sciences 10, no. 21 (2020): 7519. http://dx.doi.org/10.3390/app10217519.

Full text

Abstract:

Finding similar code snippets is a fundamental task in the field of software engineering. Several approaches have been proposed for this task by using statistical language model which focuses on syntax and structure of codes rather than deep semantic information underlying codes. In this paper, a Siamese Neural Network is proposed that maps codes into continuous space vectors and try to capture their semantic meaning. Firstly, an unsupervised pre-trained method that models code snippets as a weighted series of word vectors. The weights of the series are fitted by the Term Frequency-Inverse Doc

APA, Harvard, Vancouver, ISO, and other styles

36

Vichianchai, Vuttichai, and Sumonta Kasemvilas. "A New Term Frequency with Gaussian Technique for Text Classification and Sentiment Analysis." Journal of ICT Research and Applications 15, no. 2 (2021): 152–68. http://dx.doi.org/10.5614/itbj.ict.res.appl.2021.15.2.4.

Full text

Abstract:

This paper proposes a new term frequency with a Gaussian technique (TF-G) to classify the risk of suicide from Thai clinical notes and to perform sentiment analysis based on Thai customer reviews and English tweets of travelers that use US airline services. This research compared TF-G with term weighting techniques based on Thai text classification methods from previous researches, including the bag-of-words (BoW), term frequency (TF), term frequency-inverse document frequency (TF-IDF), and term frequency-inverse corpus document frequency (TF-ICF) techniques. Suicide risk classification and se

APA, Harvard, Vancouver, ISO, and other styles

37

Edward, Steven. "IMPACT OF TRANSFORMED FEATURES IN AUTOMATED SURVEY CODING." COMPUSOFT: An International Journal of Advanced Computer Technology 03, no. 03 (2014): 609–13. https://doi.org/10.5281/zenodo.14682169.

Full text

Abstract:

Survey coding is a process of transforming respondents' responses or description into a code in the process of data analysis. This is an expensive task and this is the reason for social scientists or other professionals in charge of designing and administering surveys tend to avoid the inclusion of many open-ended questions in their surveys. They tend to rely more on the less expensive multiple-choice questions, which by definition do not require a coding phase. However multiple-choice questions strictly limit the respondents’ possible answers. This study aims at automating the survey co

APA, Harvard, Vancouver, ISO, and other styles

38

Lan, Fei. "Research on Text Similarity Measurement Hybrid Algorithm with Term Semantic Information and TF-IDF Method." Advances in Multimedia 2022 (April 23, 2022): 1–11. http://dx.doi.org/10.1155/2022/7923262.

Full text

Abstract:

TF-IDF (term frequency-inverse document frequency) is one of the traditional text similarity calculation methods based on statistics. Because TF-IDF does not consider the semantic information of words, it cannot accurately reflect the similarity between texts, and semantic information enhanced methods distinguish between text documents poorly because extended vectors with semantic similar terms aggravate the curse of dimensionality. Aiming at this problem, this paper advances a hybrid with the semantic understanding and TF-IDF to calculate the similarity of texts. Based on term similarity weig

APA, Harvard, Vancouver, ISO, and other styles

39

Schofield, Matthew, Gulsum Alicioglu, Bo Sun, et al. "Comparison of Malware Classification Methods using Convolutional Neural Network based on API Call Stream." International Journal of Network Security & Its Applications 13, no. 2 (2021): 1–19. http://dx.doi.org/10.5121/ijnsa.2021.13201.

Full text

Abstract:

Malicious software is constantly being developed and improved, so detection and classification of malwareis an ever-evolving problem. Since traditional malware detection techniques fail to detect new/unknown malware, machine learning algorithms have been used to overcome this disadvantage. We present a Convolutional Neural Network (CNN) for malware type classification based on the API (Application Program Interface) calls. This research uses a database of 7107 instances of API call streams and 8 different malware types:Adware, Backdoor, Downloader, Dropper, Spyware, Trojan, Virus,Worm. We used

APA, Harvard, Vancouver, ISO, and other styles

40

Hla, Sann Sint, and Khine Oo Khine. "Comparison of two methods on vector space model for trust in social commerce." TELKOMNIKA (Telecommunication, Computing, Electronics and Control) 19, no. 3 (2021): 809–16. https://doi.org/10.12928/telkomnika.v19i3.18150.

Full text

Abstract:

The study of dealing with searching information in documents within web pages is information retrieval (IR). The user needs to describe information with comments or reviews that consists of a number of words. Discovering weight of an inquiry term is helpful to decide the significance of a question. Estimation of term significance is a basic piece of most information retrieval approaches and it is commonly chosen through term frequency-inverse document frequency (TF-IDF). Also, improved TF-IDF method used to retrieve information in web documents. This paper presents comparison of TF-IDF method

APA, Harvard, Vancouver, ISO, and other styles

41

Aprilio, Pajri, Michael Felix, Putu Surya Nugraha, and Hasanul Fahmi. "Hybrid Feature Combination of TF-IDF and BERT for Enhanced Information Retrieval Accuracy." JISA(Jurnal Informatika dan Sains) 8, no. 1 (2025): 8–15. https://doi.org/10.31326/jisa.v8i1.2179.

Full text

Abstract:

Text representation is a critical component in Natural Language Processing tasks such as information retrieval and text classification. Traditional approaches like Term Frequency-Inverse Document Frequency (TF-IDF) provide a simple and efficient way to represent term importance but lack the ability to capture semantic meaning. On the other hand, deep learning models such as Bidirectional Encoder Representations from Transformers (BERT) produce context-aware embeddings that enhance semantic understanding but may overlook exact term relevance. This study proposes a hybrid approach that combines

APA, Harvard, Vancouver, ISO, and other styles

42

Tamrakar, Sujan, Bal Krishna Bal, and Rajendra Bahadur Thapa. "Aspect Based Sentiment Analysis of Nepali Text Using Support Vector Machine and Naive Bayes." Technical Journal 2, no. 1 (2020): 22–29. http://dx.doi.org/10.3126/tj.v2i1.32824.

Full text

Abstract:

Aspect-based Sentiment Analysis assists in understanding the opinion of the associated entities helping for a better quality of a service or a product. A model is developed to detect the aspect-based sentiment in Nepali text using Machine Learning (ML) classifier algorithms namely Support Vector Machine (SVM) and Naïve Bayes (NB). The system collects Nepali text data from various websites and Part of Speech (POS) tagging is applied to extract the desired features of aspect and sentiment. Manual labeling is done for each sentence to identify the sentiment of the sentence. Term Frequency – Inver

APA, Harvard, Vancouver, ISO, and other styles

43

Pradhan, Ligaj. "Enhancing Rating Prediction by Discovering and Incorporating Hidden User Associations and Behaviors." International Journal of Multimedia Data Engineering and Management 10, no. 1 (2019): 40–59. http://dx.doi.org/10.4018/ijmdem.2019010103.

Full text

Abstract:

Collaborative filtering (CF)-based rating prediction would greatly benefit by incorporating additional user associations and behavioral similarity. This article focuses on infusing such additional side information in three common techniques used for building CF-based systems. First, multi-view clustering is used over neighborhood-based rating predictions. Secondly, additional user behavior knowledge discovered by mining user reviews are infused into non-negative matrix factorization (NMF) techniques. Finally, the article explores how to infuse such additional behavioral knowledge into a Deep N

APA, Harvard, Vancouver, ISO, and other styles

44

Asgari, Meysam, Jeffrey Kaye, and Hiroko Dodge. "LINGUISTIC MEASURES OF SPOKEN UTTERANCES FOR DETECTING MILD COGNITIVE IMPAIRMENT." Innovation in Aging 3, Supplement_1 (2019): S224—S225. http://dx.doi.org/10.1093/geroni/igz038.826.

Full text

Abstract:

Abstract Studies have shown that speech characteristics can aid in early-identification of those with mild cognitive impairment (MCI). We performed a linguistic analysis on spoken utterances of 41 participants (15 MCI, 26 healthy controls) from conversations with a trained interviewer using the Term Frequency-Inverse Document Frequency (TF-IDF) method. Data came from a randomized controlled behavioral clinical trial (ClinicalTrials.gov: NCT01571427) to examine effects of conversation-based cognitive stimulation on cognitive functions among older adults with normal cognition or MCI, which serve

APA, Harvard, Vancouver, ISO, and other styles

45

Ichsan, Taufik, Agra Agra, and Aditia Gerhana Yana. "Vector space model, term frequency-inverse document frequency with linear search, and object-relational mapping Django on hadith data search." Computer Science and Information Technologies 5, no. 3 (2024): 306–14. https://doi.org/10.11591/csit.v5i3.pp306-314.

Full text

Abstract:

For Muslims, the Hadith ranks as the secondary legal authority following the Quran. This research leverages hadith data to streamline the search process within the nine imams’ compendium using the vector space model (VSM) approach. The primary objective of this research is to enhance the efficiency and effectiveness of the search process within Hadith collections by implementing pre-filtering techniques. This study aims to demonstrate the potential of linear search and Django object-relational mapping (ORM) filters in reducing search times and improving retrieval performance, thereby fac

APA, Harvard, Vancouver, ISO, and other styles

46

Alshehri, Arwa, and Abdulmohsen Algarni. "TF-TDA: A Novel Supervised Term Weighting Scheme for Sentiment Analysis." Electronics 12, no. 7 (2023): 1632. http://dx.doi.org/10.3390/electronics12071632.

Full text

Abstract:

In text classification tasks, such as sentiment analysis (SA), feature representation and weighting schemes play a crucial role in classification performance. Traditional term weighting schemes depend on the term frequency within the entire document collection; therefore, they are called unsupervised term weighting (UTW) schemes. One of the most popular UTW schemes is term frequency–inverse document frequency (TF-IDF); however, this is not sufficient for SA tasks. Newer weighting schemes have been developed to take advantage of the membership of documents in their categories. These are called

APA, Harvard, Vancouver, ISO, and other styles

47

Elhadad, Mohamed K., Khaled M. Badran, and Gouda I. Salama. "A Novel Approach for Ontology-Based Dimensionality Reduction for Web Text Document Classification." International Journal of Software Innovation 5, no. 4 (2017): 44–58. http://dx.doi.org/10.4018/ijsi.2017100104.

Full text

Abstract:

Dimensionality reduction of feature vector size plays a vital role in enhancing the text processing capabilities; it aims in reducing the size of the feature vector used in the mining tasks (classification, clustering, etc.). This paper proposes an efficient approach to be used in reducing the size of the feature vector for web text document classification process. This approach is based on using WordNet ontology, utilizing the benefit of its hierarchal structure, to eliminate words from the generated feature vector that has no relation with any of WordNet lexical categories; this leads to the

APA, Harvard, Vancouver, ISO, and other styles

48

Shrabanti, Mandal, and Kumar Singh Girish. "LSA Based Text Summarization." International Journal of Recent Technology and Engineering (IJRTE) 9, no. 2 (2020): 150–56. https://doi.org/10.35940/ijrte.B3288.079220.

Full text

Abstract:

In this study we propose an automatic single document text summarization technique using Latent Semantic Analysis (LSA) and diversity constraint in combination. The proposed technique uses the query based sentence ranking. Here we are not considering the concept of IR (Information Retrieval) so we generate the query by using the TF-IDF(Term Frequency-Inverse Document Frequency). For producing the query vector, we identify the terms having the high IDF. We know that LSA utilizes the vectorial semantics to analyze the relationships between documents in a corpus or between sentences within a docu

APA, Harvard, Vancouver, ISO, and other styles

49

Mahmoud, Adnen, and Mounir Zrigui. "Semantic Similarity Analysis for Corpus Development and Paraphrase Detection in Arabic." International Arab Journal of Information Technology 18, no. 1 (2020): 1–7. http://dx.doi.org/10.34028/iajit/18/1/1.

Full text

Abstract:

Paraphrase detection allows determining how original and suspect documents convey the same meaning. It has attracted attention from researchers in many Natural Language Processing (NLP) tasks such as plagiarism detection, question answering, information retrieval, etc., Traditional methods (e.g., Term Frequency-Inverse Document Frequency (TF-IDF), Latent Dirichlet Allocation (LDA), and Latent Semantic Analysis (LSA)) cannot capture efficiently hidden semantic relations when sentences may not contain any common words or the co-occurrence of words is rarely present. Therefore, we proposed a deep

APA, Harvard, Vancouver, ISO, and other styles

50

Nafea, Ahmed A., Mustafa S. Ibrahim, Abdulrahman A. Mukhlif, Mohammed M. AL-Ani, and Nazlia Omar. "An Ensemble Model for Detection of Adverse Drug Reactions." ARO-THE SCIENTIFIC JOURNAL OF KOYA UNIVERSITY 12, no. 1 (2024): 41–47. http://dx.doi.org/10.14500/aro.11403.

Full text

Abstract:

The detection of adverse drug reactions (ADRs) plays a necessary role in comprehending the safety and benefit profiles of medicines. Although spontaneous reporting stays the standard approach for ADR documents, it suffers from significant under reporting rates and limitations in terms of treatment inspection. This study proposes an ensemble model that combines decision trees, support vector machines, random forests, and adaptive boosting (ADA-boost) to improve ADR detection. The experimental evaluation applied the benchmark data set and many preprocessing techniques such as tokenization, stop-

APA, Harvard, Vancouver, ISO, and other styles

We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!