Academic literature on the topic 'IDF (Inverse Document Frequency)'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the lists of relevant articles, books, theses, conference reports, and other scholarly sources on the topic 'IDF (Inverse Document Frequency).'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Journal articles on the topic "IDF (Inverse Document Frequency)"

1

Ni'mah, Ana Tsalitsatun, and Agus Zainal Arifin. "Perbandingan Metode Term Weighting terhadap Hasil Klasifikasi Teks pada Dataset Terjemahan Kitab Hadis." Rekayasa 13, no. 2 (2020): 172–80. http://dx.doi.org/10.21107/rekayasa.v13i2.6412.

Full text
Abstract:
Hadis adalah sumber rujukan agama Islam kedua setelah Al-Qur’an. Teks Hadis saat ini diteliti dalam bidang teknologi untuk dapat ditangkap nilai-nilai yang terkandung di dalamnya secara pegetahuan teknologi. Dengan adanya penelitian terhadap Kitab Hadis, pengambilan informasi dari Hadis tentunya membutuhkan representasi teks ke dalam vektor untuk mengoptimalkan klasifikasi otomatis. Klasifikasi Hadis diperlukan untuk dapat mengelompokkan isi Hadis menjadi beberapa kategori. Ada beberapa kategori dalam Kitab Hadis tertentu yang sama dengan Kitab Hadis lainnya. Ini menunjukkan bahwa ada beberapa dokumen Kitab Hadis tertentu yang memiliki topik yang sama dengan Kitab Hadis lain. Oleh karena itu, diperlukan metode term weighting yang dapat memilih kata mana yang harus memiliki bobot tinggi atau rendah dalam ruang Kitab Hadis untuk optimalisasi hasil klasifikasi dalam Kitab-kitab Hadis. Penelitian ini mengusulkan sebuah perbandingan beberapa metode term weighting, yaitu: Term Frequency Inverse Document Frequency (TF-IDF), Term Frequency Inverse Document Frequency Inverse Class Frequency (TF-IDF-ICF), Term Frequency Inverse Document Frequency Inverse Class Space Density Frequency (TF-IDF-ICSδF), dan Term Frequency Inverse Document Frequency Inverse Class Space Density Frequency Inverse Hadith Space Density Frequency (TF-IDF-ICSδF-IHSδF). Penelitian ini melakukan perbandingan hasil term weighting terhadap dataset Terjemahan 9 Kitab Hadis yang diterapkan pada mesin klasifikasi Naive Bayes dan SVM. 9 Kitab Hadis yang digunakan, yaitu: Sahih Bukhari, Sahih Muslim, Abu Dawud, at-Turmudzi, an-Nasa'i, Ibnu Majah, Ahmad, Malik, dan Darimi. Hasil uji coba menunjukkan bahwa hasil klasifikasi menggunakan metode term weighting TF-IDF-ICSδF-IHSδF mengungguli term weighting lainnya, yaitu mendapatkan Precission sebesar 90%, Recall sebesar 93%, F1-Score sebesar 92%, dan Accuracy sebesar 83%.Comparison of a term weighting method for the text classification in Indonesian hadithHadith is the second source of reference for Islam after the Qur’an. Currently, hadith text is researched in the field of technology for capturing the values of technology knowledge. With the research of the Book of Hadith, retrieval of information from the hadith certainly requires the representation of text into vectors to optimize automatic classification. The classification of the hadith is needed to be able to group the contents of the hadith into several categories. There are several categories in certain Hadiths that are the same as other Hadiths. Shows that there are certain documents of the hadith that have the same topic as other Hadiths. Therefore, a term weighting method is needed that can choose which words should have high or low weights in the Hadith Book space to optimize the classification results in the Hadith Books. This study proposes a comparison of several term weighting methods, namely: Term Frequency Inverse Document Frequency (TF-IDF), Term Frequency Inverse Document Frequency Inverse Class Frequency (TF-IDF-ICF), Term Frequency Inverse Document Frequency Inverse Class Space Density Frequency (TF-IDF-ICSδF) and Term Frequency Inverse Document Frequency Inverse Class Space Density Frequency Inverse Hadith Space Density Frequency (TF-IDF-ICSδF-IHSδF). This research compares the term weighting results to the 9 Hadith Book Translation dataset applied to the Naive Bayes classification engine and SVM. 9 Books of Hadith are used, namely: Sahih Bukhari, Sahih Muslim, Abu Dawud, at-Turmudzi, an-Nasa’i, Ibn Majah, Ahmad, Malik, and Darimi. The trial results show that the classification results using the TF-IDF-ICSδF-IHSδF term weighting method outperformed another term weighting, namely getting a Precession of 90%, Recall of 93%, F1-Score of 92%, and Accuracy of 83%.
APA, Harvard, Vancouver, ISO, and other styles
2

Widianto, Adi, Eka Pebriyanto, Fitriyanti Fitriyanti, and Marna Marna. "Document Similarity Using Term Frequency-Inverse Document Frequency Representation and Cosine Similarity." Journal of Dinda : Data Science, Information Technology, and Data Analytics 4, no. 2 (2024): 149–53. http://dx.doi.org/10.20895/dinda.v4i2.1589.

Full text
Abstract:
Document similarity is a fundamental task in natural language processing and information retrieval, with applications ranging from plagiarism detection to recommendation systems. In this study, we leverage the term frequency-inverse document frequency (TF-IDF) to represent documents in a high-dimensional vector space, capturing their unique content while mitigating the influence of common terms. Subsequently, we employ the cosine similarity metric to measure the similarity between pairs of documents, which assesses the angle between their respective TF-IDF vectors. To evaluate the effectiveness of our approach, we conducted experiments on the Document Similarity Triplets Dataset, a benchmark dataset specifically designed for assessing document similarity techniques. Our experimental results demonstrate a significant performance with an accuracy score of 93.6% using bigram-only representation. However, we observed instances where false predictions occurred due to paired documents having similar terms but differing semantics, revealing a weakness in the TF-IDF approach. To address this limitation, future research could focus on augmenting document representations with semantic features. Incorporating semantic information, such as word embeddings or contextual embeddings, could enhance the model's ability to capture nuanced semantic relationships between documents, thereby improving accuracy in scenarios where term overlap does not adequately signify similarity.
APA, Harvard, Vancouver, ISO, and other styles
3

Yulita, Winda, Meida Cahyo Untoro, Mugi Praseptiawan, Ilham Firman Ashari, Aidil Afriansyah, and Ahmad Naim Bin Che Pee. "Automatic Scoring Using Term Frequency Inverse Document Frequency Document Frequency and Cosine Similarity." Scientific Journal of Informatics 10, no. 2 (2023): 93–104. http://dx.doi.org/10.15294/sji.v10i2.42209.

Full text
Abstract:
Purpose: In the learning process, most of the tests to assess learning achievement have been carried out by providing questions in the form of short answers or essay questions. The variety of answers given by students makes a teacher have to focus on reading them. This scoring process is difficult to guarantee quality if done manually. In addition, each class is taught by a different teacher, which can lead to unequal grades obtained by students due to the influence of differences in teacher experience. Therefore the purpose of this study is to develop an assessment of the answers. Automated short answer scoring is designed to automatically grade and evaluate students' answers based on a series of trained answer documents.Methods: This is ‘how’ you did it. Let readers know exactly what you did to reach your results. For example, did you undertake interviews? Did you carry out an experiment in the lab? What tools, methods, protocols or datasets did you use The method used is TF-IDF-DF and Similarity and scoring computation. Theword weight used is the term Frequency-Inverse Documents Frequency -Document Frequency (TF-IDF-DF) method. The data used is 5 questions with each question answered by 30 students, while the students' answers are assessed by teachers/experts to determine the real score. The study was evaluated by Mean Absolute Error (MAE).Result: The evaluation results obtained Mean Absolute Error (MAE) with a resulting value of 0.123.Value: The word weighting method used is the Term Frequency Inverse Document Frequency DocumentFrequency (TF-IDF-DF) which is an improvement over the Term Frequency Inverse Document Frequency (TF-IDF) method. This method is a method of weighting words that will be applied before calculating the similarity of sentences between teachers and students.
APA, Harvard, Vancouver, ISO, and other styles
4

Mohannad, T. Mohammed, and Fitian Rashid Omar. "Document retrieval using term frequency inverse sentence frequency weighting scheme." Document retrieval using term frequency inverse sentence frequency weighting scheme 31, no. 3 (2023): 1478–85. https://doi.org/10.11591/ijeecs.v31.i3.pp1478-1485.

Full text
Abstract:
The need for an efficient method to find the furthermost appropriate document corresponding to a particular search query has become crucial due to the exponential development in the number of papers that are now readily available to us on the web. The vector space model (VSM) a perfect model used in “information retrieval”, represents these words as a vector in space and gives them weights via a popular weighting method known as term frequency inverse document frequency (TF-IDF). In this research, work has been proposed to retrieve the most relevant document focused on representing documents and queries as vectors comprising average term term frequency inverse sentence frequency (TF-ISF) weights instead of representing them as vectors of term TF-IDF weight and two basic and effective similarity measures: Cosine and Jaccard were used. Using the MS MARCO dataset, this article analyzes and assesses the retrieval effectiveness of the TF-ISF weighting scheme. The result shows that the TF-ISF model with the Cosine similarity measure retrieves more relevant documents. The model was evaluated against the conventional TF-ISF technique and shows that it performs significantly better on MS MARCO data (Microsoft-curated data of Bing queries).
APA, Harvard, Vancouver, ISO, and other styles
5

A. Nicholas, Danie, and Devi Jayanthila. "Data retrieval in cancer documents using various weighting schemes." i-manager's Journal on Information Technology 12, no. 4 (2023): 28. http://dx.doi.org/10.26634/jit.12.4.20365.

Full text
Abstract:
In the realm of data retrieval, sparse vectors serve as a pivotal representation for both documents and queries, where each element in the vector denotes a word or phrase from a predefined lexicon. In this study, multiple scoring mechanisms are introduced aimed at discerning the significance of specific terms within the context of a document extracted from an extensive textual dataset. Among these techniques, the widely employed method revolves around inverse document frequency (IDF) or Term Frequency-Inverse Document Frequency (TF-IDF), which emphasizes terms unique to a given context. Additionally, the integration of BM25 complements TF-IDF, sustaining its prevalent usage. However, a notable limitation of these approaches lies in their reliance on near-perfect matches for document retrieval. To address this issue, researchers have devised latent semantic analysis (LSA), wherein documents are densely represented as low-dimensional vectors. Through rigorous testing within a simulated environment, findings indicate a superior level of accuracy compared to preceding methodologies.
APA, Harvard, Vancouver, ISO, and other styles
6

Christian, Hans, Mikhael Pramodana Agus, and Derwin Suhartono. "Single Document Automatic Text Summarization using Term Frequency-Inverse Document Frequency (TF-IDF)." ComTech: Computer, Mathematics and Engineering Applications 7, no. 4 (2016): 285. http://dx.doi.org/10.21512/comtech.v7i4.3746.

Full text
Abstract:
The increasing availability of online information has triggered an intensive research in the area of automatic text summarization within the Natural Language Processing (NLP). Text summarization reduces the text by removing the less useful information which helps the reader to find the required information quickly. There are many kinds of algorithms that can be used to summarize the text. One of them is TF-IDF (TermFrequency-Inverse Document Frequency). This research aimed to produce an automatic text summarizer implemented with TF-IDF algorithm and to compare it with other various online source of automatic text summarizer. To evaluate the summary produced from each summarizer, The F-Measure as the standard comparison value had been used. The result of this research produces 67% of accuracy with three data samples which are higher compared to the other online summarizers.
APA, Harvard, Vancouver, ISO, and other styles
7

Priyanka, Mesariya, and Madia Nidhi. "Document Ranking using Customizes Vector Method." International Journal of Trend in Scientific Research and Development 1, no. 4 (2017): 278–83. https://doi.org/10.31142/ijtsrd125.

Full text
Abstract:
Information retrieval IR system is about positioning reports utilizing clients question and get the important records from extensive dataset. Archive positioning is fundamentally looking the pertinent record as per their rank. Document ranking is basically search the relevant document according to their rank. Vector space model is traditional and widely applied information retrieval models to rank the web page based on similarity values. Term weighting schemes are the significant of an information retrieval system and it is query used in document ranking. Tf idf ranked calculates the term weight according to users query on basis of term which is including in documents. When user enter query it will find the documents in which the query terms are included and it will count the term calculate the Tf idf according to the highest weight of value it will gives the ranked documents. Priyanka Mesariya | Nidhi Madia "Document Ranking using Customizes Vector Method" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-1 | Issue-4 , June 2017, URL: https://www.ijtsrd.com/papers/ijtsrd125.pdf
APA, Harvard, Vancouver, ISO, and other styles
8

Mohammed, Mohannad T., and Omar Fitian Rashid. "Document retrieval using term term frequency inverse sentence frequency weighting scheme." Indonesian Journal of Electrical Engineering and Computer Science 31, no. 3 (2023): 1478. http://dx.doi.org/10.11591/ijeecs.v31.i3.pp1478-1485.

Full text
Abstract:
The need for an efficient method to find the furthermost appropriate document corresponding to a particular search query has become crucial due to the exponential development in the number of papers that are now readily available to us on the web. The vector space model (VSM) a perfect model used in “information retrieval”, represents these words as a vector in space and gives them weights via a popular weighting method known as term frequency inverse document frequency (TF-IDF). In this research, work has been proposed to retrieve the most relevant document focused on representing documents and queries as vectors comprising average term term frequency inverse sentence frequency (TF-ISF) weights instead of representing them as vectors of term TF-IDF weight and two basic and effective similarity measures: Cosine and Jaccard were used. Using the MS MARCO dataset, this article analyzes and assesses the retrieval effectiveness of the TF-ISF weighting scheme. The result shows that the TF-ISF model with the Cosine similarity measure retrieves more relevant documents. The model was evaluated against the conventional TF-ISF technique and shows that it performs significantly better on MS MARCO data (Microsoft-curated data of Bing queries).
APA, Harvard, Vancouver, ISO, and other styles
9

Setiawan, Gede Herdian, and I. Made Budi Adnyana. "Improving Helpdesk Chatbot Performance with Term Frequency-Inverse Document Frequency (TF-IDF) and Cosine Similarity Models." Journal of Applied Informatics and Computing 7, no. 2 (2023): 252–57. http://dx.doi.org/10.30871/jaic.v7i2.6527.

Full text
Abstract:
Helpdesk chatbots are growing in popularity due to their ability to provide help and answers to user questions quickly and effectively. Chatbot development poses several challenges, including enhancing accuracy in understanding user queries and providing relevant responses while improving problem-solving efficiency. In this research, we aim to enhance the accuracy and efficiency of the Helpdesk Chatbot by implementing the Term Frequency-Inverse Document Frequency (TF-IDF) model and the Cosine Similarity algorithm. The TF-IDF model is a method used to measure the frequency of words in a document and their occurrence in the entire document collection, while the Cosine Similarity algorithm is used to measure the similarity between two documents. After implementing and testing TF-IDF and Cosine Similarity models in the Helpdesk Chatbot, we achieved a 75% question recognition rate. To increase accuracy and precision, it is necessary to increase the knowledge dataset and improve pre-processing, especially in recognition and correct inaccurate spelling
APA, Harvard, Vancouver, ISO, and other styles
10

Al-Obaydy, Wasseem N. Ibrahem, Hala A. Hashim, Yassen AbdelKhaleq Najm, and Ahmed Adeeb Jalal. "Document classification using term frequency-inverse document frequency and K-means clustering." Indonesian Journal of Electrical Engineering and Computer Science 27, no. 3 (2022): 1517. http://dx.doi.org/10.11591/ijeecs.v27.i3.pp1517-1524.

Full text
Abstract:
Increased advancement in a variety of study subjects and information technologies, has increased the number of published research articles. However, researchers are facing difficulties and devote a significant time amount in locating scientific research publications relevant to their domain of expertise. In this article, an approach of document classification is presented to cluster the text documents of research articles into expressive groups that encompass a similar scientific field. The main focus and scopes of target groups were adopted in designing the proposed method, each group include several topics. The word tokens were separately extracted from topics related to a single group. The repeated appearance of word tokens in a document has an impact on the document's weight, which is computed using the term frequency-inverse document frequency (TF-IDF) numerical statistic. To perform the categorization process, the proposed approach employs the paper's title, abstract, and keywords, as well as the categories' topics. We exploited the K-means clustering algorithm for classifying and clustering the documents into primary categories. The K-means algorithm uses category weights to initialize the cluster centers (or centroids). Experimental results have shown that the suggested technique outperforms the k-nearest neighbors algorithm in terms of accuracy in retrieving information.
APA, Harvard, Vancouver, ISO, and other styles
More sources

Dissertations / Theses on the topic "IDF (Inverse Document Frequency)"

1

Regard, Viktor. "Studying the effectiveness of dynamic analysis for fingerprinting Android malware behavior." Thesis, Linköpings universitet, Databas och informationsteknik, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-163090.

Full text
Abstract:
Android is the second most targeted operating system for malware authors and to counter the development of Android malware, more knowledge about their behavior is needed. There are mainly two approaches to analyze Android malware, namely static and dynamic analysis. Recently in 2017, a study and well labeled dataset, named AMD (Android Malware Dataset), consisting of over 24,000 malware samples was released. It is divided into 135 varieties based on similar malicious behavior, retrieved through static analysis of the file classes.dex in the APK of each malware, whereas the labeled features were determined by manual inspection of three samples in each variety. However, static analysis is known to be weak against obfuscation techniques, such as repackaging or dynamic loading, which can be exploited to avoid the analysis. In this study the second approach is utilized and all malware in the dataset are analyzed at run-time in order to monitor their dynamic behavior. However, analyzing malware at run-time has known weaknesses as well, as it can be avoided through, for instance, anti-emulator techniques. Therefore, the study aimed to explore the available sandbox environments for dynamic analysis, study the effectiveness of fingerprinting Android malware using one of the tools and investigate whether static features from AMD and the dynamic analysis correlate. For instance, by an attempt to classify the samples based on similar dynamic features and calculating the Pearson Correlation Coefficient (r) for all combinations of features from AMD and the dynamic analysis. The comparison of tools for dynamic analysis, showed a need of development, as most popular tools has been released for a long time and the common factor is a lack of continuous maintenance. As a result, the choice of sandbox environment for this study ended up as Droidbox, because of aspects like ease of use/install and easily adaptable for large scale analysis. Based on the dynamic features extracted with Droidbox, it could be shown that Android malware are more similar to the varieties which they belong to. The best metric for classifying samples to varieties, out of four investigated metrics, turned out to be Cosine Similarity, which received an accuracy of 83.6% for the entire dataset. The high accuracy indicated a correlation between the dynamic features and static features which the varieties are based on. Furthermore, the Pearson Correlation Coefficient confirmed that the manually extracted features, used to describe the varieties, and the dynamic features are correlated to some extent, which could be partially confirmed by a manual inspection in the end of the study.
APA, Harvard, Vancouver, ISO, and other styles
2

Zhang, Wenlong. "Forward and Inverse Problems Under Uncertainty." Thesis, Paris Sciences et Lettres (ComUE), 2017. http://www.theses.fr/2017PSLEE024/document.

Full text
Abstract:
Cette thèse contient deux matières différentes. Dans la première partie, deux cas sont considérés. L'un est le modèle plus lisse de la plaque mince et l'autre est les équations des limites elliptiques avec des données limites incertaines. Dans cette partie, les convergences stochastiques des méthodes des éléments finis sont prouvées pour chaque problème.Dans la deuxième partie, nous fournissons une analyse mathématique du problème inverse linéarisé dans la tomographie d'impédance électrique multifréquence. Nous présentons un cadre mathématique et numérique pour une procédure d'imagerie du tenseur de conductivité électrique anisotrope en utilisant une nouvelle technique appelée Tentomètre de diffusion Magnéto-acoustographie et proposons une approche de contrôle optimale pour reconstruire le facteur de propriété intrinsèque reliant le tenseur de diffusion au tenseur de conductivité électrique anisotrope. Nous démontrons la convergence et la stabilité du type Lipschitz de l'algorithme et présente des exemples numériques pour illustrer sa précision. Le modèle cellulaire pour Electropermécanisme est démontré. Nous étudions les paramètres efficaces dans un modèle d'homogénéisation. Nous démontrons numériquement la sensibilité de ces paramètres efficaces aux paramètres microscopiques critiques régissant l'électropermécanisme<br>This thesis contains two different subjects. In first part, two cases are considered. One is the thin plate spline smoother model and the other one is the elliptic boundary equations with uncertain boundary data. In this part, stochastic convergences of the finite element methods are proved for each problem.In second part, we provide a mathematical analysis of the linearized inverse problem in multifrequency electrical impedance tomography. We present a mathematical and numerical framework for a procedure of imaging anisotropic electrical conductivity tensor using a novel technique called Diffusion Tensor Magneto-acoustography and propose an optimal control approach for reconstructing the cross-property factor relating the diffusion tensor to the anisotropic electrical conductivity tensor. We prove convergence and Lipschitz type stability of the algorithm and present numerical examples to illustrate its accuracy. The cell model for Electropermeabilization is demonstrated. We study effective parameters in a homogenization model. We demonstrate numerically the sensitivity of these effective parameters to critical microscopic parameters governing electropermeabilization
APA, Harvard, Vancouver, ISO, and other styles
3

Ríos, Araya Paula Andrea. "Tag Clouds para investigadores de Ciencias de la Computación." Tesis, Universidad de Chile, 2018. http://repositorio.uchile.cl/handle/2250/168614.

Full text
Abstract:
Memoria para optar al título de Ingeniera Civil en Computación<br>Actualmente, existen millones de publicaciones de investigadores en distintas áreas de las Ciencias de la Computación, y estas continúan aumentando día a día. En los perfiles de cada investigador del área en sitios web como DBLP o Google Scholar, se puede encontrar un listado con sus publicaciones. Sin embargo, con esta información por sí sola es difícil captar cuáles son los tópicos de interés de cada investigador a simple vista, y podría ser necesario en un ámbito de colaboración entre académicos o entre académicos y estudiantes. Este trabajo busca facilitar la información resumida de los tópicos de investigación de académicos de Ciencias de la Computación mediante la generación de visualizaciones como nubes de palabras, o tag clouds, a partir de las palabras y frases clave mencionadas en las publicaciones encontradas en repositorios bibliográficos online, como los mencionados anteriormente. El sistema desarrollado en esta memoria consiste en una herramienta que permite la creación de tag clouds para perfiles de DBLP. Esta herramienta se encarga de la obtención de las publicaciones encontradas en el perfil, la extracción de potenciales keywords y la selección de las keywords más relevantes según cuatro modelos de ordenamiento. Por cada uno de estos modelos se crea una variante de tag cloud. Además, se crea un sitio web que permite el uso de la herramienta para cualquier usuario. El trabajo se enfoca principalmente en la investigación de modelos de learning to rank y la comparación de su desempeño en la tarea de definir las keywords más relevantes para un investigador de Ciencias de la Computación. Dado que existen tres enfoques distintos para resolver la tarea de ordenamiento, se utilizan cuatro modelos de learning to rank, teniendo al menos uno por cada enfoque. Estos son regresión lineal, RankSVM, LambdaMART y AdaRank. De las evaluaciones a las tag clouds creadas por la herramienta se observa que no habría una preferencia absoluta por un método por sobre los demás, sino que varía según cada persona, pero en la mayoría de los casos se le asigna el puntaje máximo a al menos una de las tag clouds generadas. Esto podría deberse a que los modelos tienden a diferir en su enfoque, en algunos casos seleccionando keywords más técnicas y en otros más genéricas. De esta forma la apreciación de un método por sobre el otro se ve afectada por las preferencias de cada uno. De esto se concluye la importancia de dar la posibilidad de elegir a los usuarios entre distintas variantes.
APA, Harvard, Vancouver, ISO, and other styles
4

Sullivan, Daniel Edward. "Evaluation of Word and Paragraph Embeddings and Analogical Reasoning as an Alternative to Term Frequency-Inverse Document Frequency-based Classification in Support of Biocuration." Diss., Virginia Tech, 2016. http://hdl.handle.net/10919/80572.

Full text
Abstract:
This research addresses the problem, can unsupervised learning generate a representation that improves on the commonly used term frequency-inverse document frequency (TF-IDF ) representation by capturing semantic relations? The analysis measures the quality of sentence classification using term TF-IDF representations, and finds a practical upper limit to precision and recall in a biomedical text classification task (F1-score of 0.85). Arguably, one could use ontologies to supplement TF-IDF, but ontologies are sparse in coverage and costly to create. This prompts a correlated question: can unsupervised learning capture semantic relations at least as well as existing ontologies, and thus supplement existing sparse ontologies? A shallow neural network implementing the Skip-Gram algorithm is used to generate semantic vectors using a corpus of approximately 2.4 billion words. The ability to capture meaning is assessed by comparing semantic vectors generated with MESH. Results indicate that semantic vectors trained by unsupervised methods capture comparable levels of semantic features in some cases, such as amino acid (92% of similarity represented in MESH), but perform substantially poorer in more expansive topics, such as pathogenic bacteria (37.8% similarity represented in MESH). Possible explanations for this difference in performance are proposed along with a method to combine manually curated ontologies with semantic vector spaces to produce a more comprehensive representation than either alone. Semantic vectors are also used as representations for paragraphs, which, when used for classification, achieve an F1-score of 0.92. The results of classification and analogical reasoning tasks are promising but a formal model of semantic vectors, subject to the constraints of known linguistic phenomenon, is needed. This research includes initial steps for developing a formal model of semantic vectors based on a combination of linear algebra and fuzzy set theory subject to the semantic molecularism linguistic model. This research is novel in its analysis of semantic vectors applied to the biomedical domain, analysis of different performance characteristics in biomedical analogical reasoning tasks, comparison semantic relations captured by between vectors and MESH, and the initial development of a formal model of semantic vectors.<br>Ph. D.
APA, Harvard, Vancouver, ISO, and other styles
5

Azpiroz, Izar. "Contribution à la Résolution Numérique de Problèmes Inverses de Diffraction Élasto-acoustique." Thesis, Pau, 2018. http://www.theses.fr/2018PAUU3004/document.

Full text
Abstract:
La caractérisation d’objets enfouis à partir de mesures d’ondes diffractées est un problème présent dans de nombreuses applications comme l’exploration géophysique, le contrôle non-destructif, l’imagerie médicale, etc. Elle peut être obtenue numériquement par la résolution d’un problème inverse. Néanmoins, c’est un problème non linéaire et mal posé, ce qui rend la tâche difficile. Une reconstruction précise nécessite un choix judicieux de plusieurs paramètres très différents, dépendant des données de la méthode numérique d’optimisation choisie.La contribution principale de cette thèse est une étude de la reconstruction complète d’obstacles élastiques immergés à partir de mesures du champ lointain diffracté. Les paramètres à reconstruire sont la frontière, les coefficients de Lamé, la densité et la position de l’obstacle. On établit tout d’abord des résultats d’existence et d’unicité pour un problème aux limites généralisé englobant le problème direct d’élasto-acoustique. On analyse la sensibilité du champ diffracté par rapport aux différents paramètres du solide, ce qui nous conduit à caractériser les dérivées partielles de Fréchet comme des solutions du problème direct avec des seconds membres modifiés. Les dérivées sont calculées numériquement grâce à la méthode de Galerkine discontinue avec pénalité intérieure et le code est validé par des comparaisons avec des solutions analytiques. Ensuite, deux méthodologies sont introduites pour résoudre le problème inverse. Toutes deux reposent sur une méthode itérative de type Newton généralisée et la première consiste à retrouver les paramètres de nature différente indépendamment, alors que la seconde reconstruit tous les paramètre en même temps. À cause du comportement différent des paramètres, on réalise des tests de sensibilité pour évaluer l’influence de ces paramètres sur les mesures. On conclut que les paramètres matériels ont une influence plus faible sur les mesures que les paramètres de forme et, ainsi, qu’une stratégie efficace pour retrouver des paramètres de nature distincte doit prendre en compte ces différents niveaux de sensibilité. On a effectué de nombreuses expériences à différents niveaux de bruit, avec des données partielles ou complètes pour retrouver certains paramètres, par exemple les coefficients de Lamé et les paramètres de forme, la densité, les paramètres de forme et la localisation. Cet ensemble de tests contribue à la mise en place d’une stratégie pour la reconstruction complète des conditions plus proches de la réalité. Dans la dernière partie de la thèse, on étend ces résultats à des matériaux plus complexes, en particulier élastiques anisotropes<br>The characterization of hidden objects from scattered wave measurements arises in many applications such as geophysical exploration, non destructive testing, medical imaging, etc. It can be achieved numerically by solving an Inverse Problem. However, this is a nonlinear and ill-posed problem, thus a difficult task. A successful reconstruction requires careful selection of very different parameters depending on the data and the chosen optimization numerical method.The main contribution of this thesis is an investigation of the full reconstruction of immersed elastic scatterers from far-field pattern measurements. The sought-after parameters are the boundary, the Lamé coefficients, the density and the location of the obstacle. First, existence and uniqueness results of a generalized Boundary Value Problem including the direct elasto-acoustic problem are established. The sensitivity of the scattered field with respect to the different parametersdescribing the solid is analyzed and we end up with the characterization of the corresponding partial Fréchet derivatives as solutions to the direct problem with modified right-hand sides. These Fréchet derivatives are computed numerically thanks to the Interior Penalty Discontinuous Galerkin method and the code is validated thanks to comparison with analytical solutions. Then, two solution methodologies are introduced for solving the inverse problem. Both are based on an iterative regularized Newton-type methodology and the first one consists in retrieving the parameters of different nature independently, while the second one reconstructs all parameters together. Due to the different behavior of the parameters, sensitivity tests are performed to assess the impact of the parameters on the measurements. We conclude that material parameters have a weaker influence on the measurements than shape parameters, and therefore, a successful strategy to retrieve parameters of distinct nature should take into account these different levels of sensitivity. Various experiments at different noise levels and with full or limited aperture data are carried out to retrieve some of the physical properties, e.g. Lamé coefficients with shape parameters, density with shape parameters a, density, shape and location. This set of tests contributes to a final strategy for the full reconstruction and in more realistic conditions. In the final part of the thesis, we extend the results to more complex material parameters, in particular anisotropic elastic
APA, Harvard, Vancouver, ISO, and other styles
6

Alastal, Khalil. "Ecoulements oscillatoires et effets capillaires en milieux poreux partiellement saturés et non saturés : applications en hydrodynamique côtière." Thesis, Toulouse, INPT, 2012. http://www.theses.fr/2012INPT0039/document.

Full text
Abstract:
Dans cette thèse, on étudie les écoulements oscillatoires en milieux poreux (non saturés ou partiellement saturés) dus à des oscillations tidales des niveaux d'eau dans des milieux ouverts adjacents aux milieux poreux. L'étude est centrée sur le cas des plages de sable en hydrodynamique côtière, mais les applications concernent, potentiellement et plus généralement, les problèmes d'oscillation et de variation temporelle des niveaux d'eau dans des systèmes couplés, lorsque ceux-ci mettent en jeu des interactions entre les écoulements de sub-surface (milieux poreux) et les eaux de surface (milieux ouverts) : plages naturelles et artificielles; digues portuaires; barrages en terre; berges de fleuves; estuaires. Le forçage tidal des écoulements souterrains est représenté et modélisé ici, tant expérimentalement que numériquement, par une oscillation quasi-statique du niveau d'eau dans un réservoir externe ouvert, connecté au domaine poreux. On s'intéresse plus particulièrement aux écoulements verticaux forcés par une pression oscillatoire imposée au bas d'une colonne de sol. Sur le plan expérimental, ce type de forçage est obtenu par une machine à marée équipée d'un arbre rotatif. Au total, on utilise dans ce travail trois types d'approches (expérimentale, numérique, analytique), l'objectif étant d'étudier le mouvement vertical de la surface "libre" et l'écoulement non saturé sus-jacent, de façon à prendre en compte aussi bien les pertes de charge dans la zone saturée que les gradients de pression capillaire dans la zone non saturée. […]<br>In this thesis, we study hydrodynamic oscillations in porous bodies (unsaturated or partially saturated), due to tidal oscillations of water levels in adjacent open water bodies. The focus is on beach hydrodynamics, but potential applications concern, more generally, time varying and oscillating water levels in coupled systems involving subsurface / open water interactions (natural and artificial beaches, harbor dykes, earth dams, river banks, estuaries). The tidal forcing of groundwater is represented and modeled (both experimentally and numerically) by quasi-static oscillations of water levels in an open water reservoir connected to the porous medium. Specifically, we focus on vertical water movements forced by an oscillating pressure imposed at the bottom of a soil column. Experimentally, a rotating tide machine is used to achieve this forcing. Overall, we use three types of methods (experimental, numerical, analytical) to study the vertical motion of the groundwater table and the unsaturated flow above it, taking into account the vertical head drop in the saturated zone as well as capillary pressure gradients in the unsaturated zone. Laboratory experiments are conducted on vertical sand columns, with a tide machine to force water table oscillations, and with porous cup tensiometers to measure both positive pressures and suctions along the column (among other measurement methods). Numerical simulations of oscillatory water flow are implemented with the BIGFLOW 3D code (implicit finite volumes, with conjugate gradients for the matrix solver and modified Picard iterations for the nonlinear problem). In addition, an automatic calibration based on a genetic optimization algorithm is implemented for a given tidal frequency, to obtain the hydrodynamic parameters of the experimental soil. Calibrated simulations are then compared to experimental results for other non calibrated frequencies. Finally, a family of quasi-analytical multi-front solutions is developed for the tidal oscillation problem, as an extension of the Green-Ampt piston flow approximation, leading to nonlinear, non-autonomous systems of Ordinary Differential Equations with initial conditions (dynamical systems). The multi-front solutions are tested by comparing them with a refined finite volume solution of the Richards equation. Multi-front solutions are at least 100 times faster, and the match is quite good even for a loamy soil with strong capillary effects (the number of fronts required is small, no more than N≈ to 20 at most). A large set of multi-front simulations is then produced in order to analyze water table and flux fluctuations for a broad range of forcing frequencies. The results, analyzed in terms of means and amplitudes of hydrodynamic variables, indicate the existence, for each soil, of a characteristic frequency separating low frequency / high frequency flow regimes in the porous system
APA, Harvard, Vancouver, ISO, and other styles
7

Fan, Fang-Syuan, and 范芳瑄. "Classified Term Frequency-Inverse Document Frequency technique applied to school regulationsClassified Term Frequency-Inverse Document Frequency technique applied to school regulations." Thesis, 2019. http://ndltd.ncl.edu.tw/handle/6hb936.

Full text
Abstract:
碩士<br>國立中央大學<br>資訊工程學系在職專班<br>107<br>This study combines Term Frequency-Inverse Document Frequency technique with compatibility and applies it to the “Regulations of National Central University and Extensions of Off-campus Regulations” and establishes them on the cloud platform for tax classification. Term Frequency-Inverse Document Frequency technique can only present one type of measurement and quantitative method and is not capable of presenting diverse selection. Therefore, through the combination of compatibility, Cosine Similarity, Hierarchical Clustering and other techniques, a regulation can produce different results in different compatibility. A wide range of selection can be produced through classification, helping users to find the proper regulations which is related. keyword:text mining、TF-IDF、Cosine Similarity、Hierarchical Clustering This study combines Term Frequency-Inverse Document Frequency technique with compatibility and applies it to the “Regulations of National Central University and Extensions of Off-campus Regulations” and establishes them on the cloud platform for tax classification. Term Frequency-Inverse Document Frequency technique can only present one type of measurement and quantitative method and is not capable of presenting diverse selection. Therefore, through the combination of compatibility, Cosine Similarity, Hierarchical Clustering and other techniques, a regulation can produce different results in different compatibility. A wide range of selection can be produced through classification, helping users to find the proper regulations which is related. keyword:text mining、TF-IDF、Cosine Similarity、Hierarchical Clustering This study combines Term Frequency-Inverse Document Frequency technique with compatibility and applies it to the “Regulations of National Central University and Extensions of Off-campus Regulations” and establishes them on the cloud platform for tax classification. Term Frequency-Inverse Document Frequency technique can only present one type of measurement and quantitative method and is not capable of presenting diverse selection. Therefore, through the combination of compatibility, Cosine Similarity, Hierarchical Clustering and other techniques, a regulation can produce different results in different compatibility. A wide range of selection can be produced through classification, helping users to find the proper regulations which is related.
APA, Harvard, Vancouver, ISO, and other styles
8

Lin, Jun-liang, and 林俊良. "A New Auto Document Category System by Using Google N-gram and Probability based Term Frequency and Inverse Category Frequency." Thesis, 2012. http://ndltd.ncl.edu.tw/handle/20409545542311421955.

Full text
Abstract:
碩士<br>國立高雄第一科技大學<br>資訊管理研究所<br>100<br>The electronic documents between companies and organizations are growing fast. Automatic classification is an important issue of information service and knowledge management. Keywords are the smallest units to present the document. Therefore, almost each part of document automation processing such as knowledge mining, automatic filtering, automatic summarization, event tracking, or concept retrieval etc., have to retrieve keywords from documents first, and then proceed with analytical processing. We propose the N-gram Segmentation Algorithm (NSA) in this study to improve the problem of static retrieving keywords. NSA method combines Stopwords, Stemming, N-gram choosing, and Google N-gram Corpus. We fetch the meaningful N-gram keywords by using NSA method. In addition to keywords extraction methods, this research also proposes a new keyword weighting method. We use Google N-gram Frequency as a weight for terms frequency. This method can enhances the weighting mechanism for keywords extraction in particular group. The Probability based Term Frequency and Inverse Category Frequency (PTFICF) is used for weighting the keywords in documents. Finally, we use SVM to classify the test documents. This study sets up three experiments: Experiment 1 used Classic4 as a balanced data set, and experimental result showed that F_1Value was 96.4%. Experiment 2 used Reuter-21578 as an imbalanced dataset, and experimental result showed that F_1Value was 78.7%. Experiment 3 used Google Frequency as a weighting method, the experimental result demonstrated that if the Google Frequency was higher, the classification result would be more accurate. Overall, the proposed methods are more accurate than traditional methods, and they also reduce 90% of the training time.
APA, Harvard, Vancouver, ISO, and other styles
9

Costa, Joao Mario Goncalves da. "Classificação automática de páginas web usando features visuais." Master's thesis, 2014. http://hdl.handle.net/10316/40401.

Full text
Abstract:
Dissertação de Mestrado Integrado em Engenharia Electrotécnica e de Computadores apresentada à Faculdade de Ciências e Tecnologia da Universidade de Coimbra<br>The world of Internet grows up every day. There are a large number of web pages actives at this moment and more are released every day. It is impossible to perform the web page classification manually. It was already developed several approaches in this area. Most of them only use the text information contained in the web pages, ignoring the visual content of them. This work shows that the visual content can improve the accuracies of the classifications that only use the text. It was extracted the text features of the web pages using the term frequency inverse document frequency method. As well, it was also extracted two different types of visual features: the low-level features and the local SIFT ones. Since the amount of the SIFT features is extremely high, it was created a dictionary using the “Bag-of-Words” method. After this extraction the features were merged, using all the types of combinations of them. It was also used the Chi-Square method that selects the best features of a vector. In the classification it was used four different classifiers. It was implemented a multi-label classification, for which we gave unknown web pages to the classifiers, so they could predict the main topic of the web page. It was also implemented a binary classification, for which we used only visual features to verify if a web page was a blog or non-blog. It was obtained good results that shows that adding the visual content to the text the accuracies improve. The best classification it was obtained using only four different categories, where was achieved 98% of accuracy. Later it was developed a web application, where the user can find out the main topic of a web page only inserting the web page URL. It can be accessed in ”http://scrat.isr.uc.pt/uniprojection /wpc.html”.<br>O mundo da internet cresce a cada dia que passa. Existe um enorme numero de p´aginas web activas neste preciso momento e muitas mais s˜ao lan¸cadas a cada dia que passa. E impossivel ´ realizar uma classifica¸c˜ao manual destas p´aginas web. J´a foram realizados diversos trabalhos nesta ´area. A maioria delas apenas utiliza a informa¸c˜ao do texto da p´agina web, ignorando o conte´udo visual das mesmas. Neste trabalho mostramos que o conte´udo visual melhora as precis˜oes dos classificadores que utilizavam apenas texto. Para isso foram extra´ıdas caracter´ısticas de texto das p´aginas web utilizando o m´etodo term frequency-inverse document frequency. Foram extra´ıdos dois tipos de caracter´ısticas visuais: as caracter´ısticas low-level e as caracter´ısticas locais SIFT. Sendo que o n´umero de caracter´ısticas SIFT ´e extremamente alto, foi criado um dicion´ario utilizando o m´etodo “Bag-of-Words”. Depois de exta´ıdas, foram feitas todas as combina¸c˜oes poss´ıveis entre estes trˆes tipos de caracter´ısticas. Foi utilizado tamb´em o m´etodo Chi-Square que seleciona as melhores caracter´ısticas. Na classifica¸c˜ao, foram utilizados quatro classificadores diferentes. Foi realizada uma classifica¸c˜ao multi-label, onde introduzindo p´aginas web desconhecidas pelos classificadores, os mesmos previam o t´opico principal dessa p´agina. Foi tamb´em realizada uma classifica¸c˜ao bin´aria onde apenas foram utilizadas as features visuais para verificarem se uma p´agina web ´e um blog. Foram obtidos bons resultados que mostram que realmente adicionando o conte´udo visual ao texto, as precis˜oes dos classificadores melhoram. A melhor classifica¸c˜ao foi obtida quando utilizadas apenas quatro categorias diferentes, onde foi obtida uma precis˜ao de 98%. Posteriormente foi desenvolvida uma aplica¸c˜ao web com o objectivo de um utilizador conseguir descobrir qual o t´opico principal de uma p´agina web apenas inserindo o seu URL. Pode ser acedida em “http://scrat.isr.uc.pt/uniprojection/wpc.html”.
APA, Harvard, Vancouver, ISO, and other styles
10

Zeng, Shih-Fong, and 曾士峰. "A New Auto Document Category System by Using Microsoft N-gram and Probability based Microsoft N-gram probability and Inverse Category Frequency." Thesis, 2013. http://ndltd.ncl.edu.tw/handle/10808089079398377000.

Full text
Abstract:
碩士<br>國立高雄第一科技大學<br>資訊管理研究所<br>101<br>A New Auto Document Category System by Using Microsoft N-gram and Probability based Microsoft N-gram probability and Inverse Category Frequency Student: Shih-Fong Zeng Advisor: Wen-Chen Huang Department of Information Management National Kaohsiung First University of Science and Technology Abstract Since the widespread use of information technology, the amount of networking information is increasing, which various digital data grows more and more huge. That because automated classification becomes a significant subject. Text classification almost relies on key words of documents thus the key words are crucial of thematic significance. In regard to the classification of unstructured documents need to capture key words, and then process automated classification. We propose the Noun Phrase Selecting Algorithm (NPSA) in this study to improve the problem of static retrieving keywords. NPSA method combines Microsoft N-gram Cloud Service to get effective Probability. So we propose a new keyword weighting method in this study. We use the method of NPSA to capture key words that is noun phrase of document and combines Probability based Microsoft N-gram probability and Inverse Category Frequency (PMNPICF). And then we use these parameters weight key words. Finally, we use SVM to classification the text documents. The study contains three experiments: The first experiment used Classic4 as a balanced data set, and experimental result showed that Value was 96.6%. The second experiment used Reuter-21578 as an imbalanced dataset, and experimental result showed that Value III was 83.3%. The third experiment is about feature numbers. Experimental result shows that the large number of features, the classification accuracy is not the best. Overall, the proposed method is significantly superior to other researches.
APA, Harvard, Vancouver, ISO, and other styles

Books on the topic "IDF (Inverse Document Frequency)"

1

Shi, Feng. Learn About Term Frequency–Inverse Document Frequency in Text Analysis in R With Data From How ISIS Uses Twitter Dataset (2016). SAGE Publications, Ltd., 2019. http://dx.doi.org/10.4135/9781526489012.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Shi, Feng. Learn About Term Frequency–Inverse Document Frequency in Text Analysis in Python With Data From How ISIS Uses Twitter Dataset (2016). SAGE Publications, Ltd., 2019. http://dx.doi.org/10.4135/9781526498038.

Full text
APA, Harvard, Vancouver, ISO, and other styles

Book chapters on the topic "IDF (Inverse Document Frequency)"

1

Church, K., and W. Gale. "Inverse Document Frequency (IDF): A Measure of Deviations from Poisson." In Text, Speech and Language Technology. Springer Netherlands, 1999. http://dx.doi.org/10.1007/978-94-017-2390-9_18.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Zhang, Jincheng, Thada Jantakoon, and Potsirin Limpinan. "Multiple Novel Algorithms Based on TF-IDF and Inverse Document Frequency, Experimented with Text Data in the Education Field." In Communications in Computer and Information Science. Springer Nature Switzerland, 2025. https://doi.org/10.1007/978-3-031-88042-1_23.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Ounis, Iadh. "Inverse Document Frequency." In Encyclopedia of Database Systems. Springer New York, 2016. http://dx.doi.org/10.1007/978-1-4899-7993-3_933-2.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Ounis, Iadh. "Inverse Document Frequency." In Encyclopedia of Database Systems. Springer US, 2009. http://dx.doi.org/10.1007/978-0-387-39940-9_933.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Ounis, Iadh. "Inverse Document Frequency." In Encyclopedia of Database Systems. Springer New York, 2018. http://dx.doi.org/10.1007/978-1-4614-8265-9_933.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Arun, K. P., and Arun Mishra. "Image Based Password Composition Using Inverse Document Frequency." In Communications in Computer and Information Science. Springer Singapore, 2017. http://dx.doi.org/10.1007/978-981-10-6544-6_10.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Kumar, Mukesh, and Renu Vig. "Term-Frequency Inverse-Document Frequency Definition Semantic (TIDS) Based Focused Web Crawler." In Communications in Computer and Information Science. Springer Berlin Heidelberg, 2012. http://dx.doi.org/10.1007/978-3-642-29216-3_5.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Quan, Do Viet, and Phan Duy Hung. "Application of Customized Term Frequency-Inverse Document Frequency for Vietnamese Document Classification in Place of Lemmatization." In Advances in Intelligent Systems and Computing. Springer International Publishing, 2021. http://dx.doi.org/10.1007/978-3-030-68154-8_37.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Viswanath, P., J. Rohini, and Y. C. A. Padmanabha Reddy. "Query Performance Prediction Using Joint Inverse Document Frequency of Multiple Terms." In Lecture Notes in Electrical Engineering. Springer Singapore, 2016. http://dx.doi.org/10.1007/978-981-10-1540-3_10.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

Grycuk, Rafał, Marcin Gabryel, Marcin Korytkowski, and Rafał Scherer. "Content-Based Image Indexing by Data Clustering and Inverse Document Frequency." In Communications in Computer and Information Science. Springer International Publishing, 2014. http://dx.doi.org/10.1007/978-3-319-06932-6_36.

Full text
APA, Harvard, Vancouver, ISO, and other styles

Conference papers on the topic "IDF (Inverse Document Frequency)"

1

Lubis, Hasby Sahendri, Mahyuddin K. M. Nasution, and Amalia Amalia. "Performance of Term Frequency - Inverse Document Frequency and K-Means in Government Service Identification." In 2024 4th International Conference of Science and Information Technology in Smart Administration (ICSINTESA). IEEE, 2024. http://dx.doi.org/10.1109/icsintesa62455.2024.10748106.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Wen, Xiaojiao. "Student Grade Prediction and Classification based on Term Frequency-Inverse Document Frequency with Random Forest." In 2024 First International Conference on Software, Systems and Information Technology (SSITCON). IEEE, 2024. https://doi.org/10.1109/ssitcon62437.2024.10796287.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Bharattej R, Rana Veer Samara Sihman, Prashanth V, Haideer Alabdeli, Sunaina Sangeet Thottan, and S. Ananthi. "Modified Term Frequency and Inverse Document Frequency with Optimized Deep Learning Algorithm based Fake News Detection." In 2025 International Conference on Intelligent Systems and Computational Networks (ICISCN). IEEE, 2025. https://doi.org/10.1109/iciscn64258.2025.10934578.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Liu, Fengjuan. "Japanese Dependency Analysis using Multi-Kernel Support Vector Machine based on Term Frequency and Inverse Document Frequency." In 2024 International Conference on Integrated Intelligence and Communication Systems (ICIICS). IEEE, 2024. https://doi.org/10.1109/iciics63763.2024.10860205.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Islavath, Srinivas, and C. Rohith Bhat. "Uniform Resource Locator Phishing in Real Time Scenario Predicted Using Novel Term Frequency-Inverse Document Frequency +N Gram in Comparison with Support Vector Machine Algorithm." In 2024 15th International Conference on Computing Communication and Networking Technologies (ICCCNT). IEEE, 2024. http://dx.doi.org/10.1109/icccnt61001.2024.10725919.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Hakim, Ari Aulia, Alva Erwin, Kho I. Eng, Maulahikmah Galinium, and Wahyu Muliady. "Automated document classification for news article in Bahasa Indonesia based on term frequency inverse document frequency (TF-IDF) approach." In 2014 6th International Conference on Information Technology and Electrical Engineering (ICITEE). IEEE, 2014. http://dx.doi.org/10.1109/iciteed.2014.7007894.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Costa, José Alfredo, and Nielsen Dantas. "Análise Comparativa de Embeddings Jurídicos aplicados a Algoritmos de Clustering." In Congresso Brasileiro de Inteligência Computacional. SBIC, 2023. http://dx.doi.org/10.21528/cbic2023-181.

Full text
Abstract:
Text clustering analysis plays an important role in the organization and comprehension of extensive amounts of textual data. By grouping semantically similar documents into coherent categories, or clusters, it is possible to extract pertinent information and the unearthing of latent patterns embedded within the text. Text clustering enables a deeper understanding of the underlying structure and relationships within textual data, therefore, unveiling patterns and thematic trends. This paper aims to evaluate the impact of different text embeddings in the task of clustering Brazilian legal documents. The embeddings were obtained from BERT (Bidirectional Encoder Representations from transformers) models: Jurisbert, Bert Law and Irisbert. Term Frequency-Inverse Document Frequency (TF-IFD) was also used as a base representation model for comparisons. Nine different clustering algorithms were tested, including methods such as MB Kmeans, DBSCAN, BIRCH. Experiments were conducted in a database of 30,000 documents in Brazilian Portuguese of judicial moves of the Tribunal de Justiça do Rio Grande do Norte. To evaluate the performance of the clustering algorithms, the Normalized Mutual Info and Jaccard coefficients were used. Processing time are also described for the different algorithms. Results suggest better results with embedding “Irisbert” and TF-IDF when considering NMI and Bert Law and TF-IDF when considering Jaccard coefficient, although “Irisbert” also produced good scores.
APA, Harvard, Vancouver, ISO, and other styles
8

Khunruksa, Sahussawud, and Somkiat Wangsiripitak. "Learning Extended Term Frequency-Inverse Document Frequency (TF-IDF++) for Depression Screening From Sentences in Thai Blog Post." In 2023 8th International Conference on Business and Industrial Research (ICBIR). IEEE, 2023. http://dx.doi.org/10.1109/icbir57571.2023.10147692.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Rodrigues da Silva, Mônica, Anita Maria da Rocha Fernandes, and Guilherme Falcão da Silva Campos. "Implementação de Chatbot para Aprimorar a Comunicação com Usuários de Serviços Públicos." In Computer on the Beach. Universidade do Vale do Itajaí, 2021. http://dx.doi.org/10.14210/cotb.v12.p480-482.

Full text
Abstract:
This paper describes the study and implementation of a chatbot tohelp users of public services with their most frequent questions.The chatbot was developed based on the TF-IDF (Term Frequency -Inverse Document Frequency) model, using the Python languageand the Django framework. Functions such as registration of questionsand answers, were implemented using the Java language withAPI Restful and Spring Boot, and the MongoDB database. Finally,to enable the interaction of internal and external users with thesystem, front ends were built using the TypeScript language andthe Angular platform. The system is in the testing and validationphase.
APA, Harvard, Vancouver, ISO, and other styles
10

Zen, Bita Parga, Irwan Susanto, Khofifah Putriyani, and Sintiya. "Automatic document classification for tempo news articles about covid 19 based on term frequency, inverse document frequency (TF-IDF), and Vector Space Model (VSM)." In THE 8TH INTERNATIONAL CONFERENCE ON TECHNOLOGY AND VOCATIONAL TEACHERS 2022. AIP Publishing, 2024. http://dx.doi.org/10.1063/5.0212036.

Full text
APA, Harvard, Vancouver, ISO, and other styles

Reports on the topic "IDF (Inverse Document Frequency)"

1

Budzich, Jeffrey. PR-685-184506-R03 Monitoring Techniques For Determining Critical Return Period Flood Alert Triggers. Pipeline Research Council International, Inc. (PRCI), 2020. http://dx.doi.org/10.55274/r0011667.

Full text
Abstract:
This document provides a summary of developing regional rainfall intensity-duration-frequency (IDF) curves for determining critical return period alert triggers, and compare with pipeline operator-specific QPE monitoring locations for known crossings of potential concern. These IDF curves aid in determining the rainfall depth that has accumulated within a specified period of time. One example of a trigger alert requested by a pipeline operator might be the 100-year return period rainfall at a location upstream of their pipeline crossing in a small watershed.
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography