To see the other types of publications on this topic, follow the link: Tf-idf vectors.

Journal articles on the topic 'Tf-idf vectors'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 journal articles for your research on the topic 'Tf-idf vectors.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse journal articles on a wide variety of disciplines and organise your bibliography correctly.

1

Asgari, Meysam, Jeffrey Kaye, and Hiroko Dodge. "LINGUISTIC MEASURES OF SPOKEN UTTERANCES FOR DETECTING MILD COGNITIVE IMPAIRMENT." Innovation in Aging 3, Supplement_1 (2019): S224—S225. http://dx.doi.org/10.1093/geroni/igz038.826.

Full text
Abstract:
Abstract Studies have shown that speech characteristics can aid in early-identification of those with mild cognitive impairment (MCI). We performed a linguistic analysis on spoken utterances of 41 participants (15 MCI, 26 healthy controls) from conversations with a trained interviewer using the Term Frequency-Inverse Document Frequency (TF-IDF) method. Data came from a randomized controlled behavioral clinical trial (ClinicalTrials.gov: NCT01571427) to examine effects of conversation-based cognitive stimulation on cognitive functions among older adults with normal cognition or MCI, which serve
APA, Harvard, Vancouver, ISO, and other styles
2

Bounabi, Mariem, Karim Elmoutaouakil, and Khalid Satori. "A new neutrosophic TF-IDF term weighting for text mining tasks: text classification use case." International Journal of Web Information Systems 17, no. 3 (2021): 229–49. http://dx.doi.org/10.1108/ijwis-11-2020-0067.

Full text
Abstract:
Purpose This paper aims to present a new term weighting approach for text classification as a text mining task. The original method, neutrosophic term frequency – inverse term frequency (NTF-IDF), is an extended version of the popular fuzzy TF-IDF (FTF-IDF) and uses the neutrosophic reasoning to analyze and generate weights for terms in natural languages. The paper also propose a comparative study between the popular FTF-IDF and NTF-IDF and their impacts on different machine learning (ML) classifiers for document categorization goals. Design/methodology/approach After preprocessing textual dat
APA, Harvard, Vancouver, ISO, and other styles
3

Ni'mah, Ana Tsalitsatun, and Agus Zainal Arifin. "Perbandingan Metode Term Weighting terhadap Hasil Klasifikasi Teks pada Dataset Terjemahan Kitab Hadis." Rekayasa 13, no. 2 (2020): 172–80. http://dx.doi.org/10.21107/rekayasa.v13i2.6412.

Full text
Abstract:
Hadis adalah sumber rujukan agama Islam kedua setelah Al-Qur’an. Teks Hadis saat ini diteliti dalam bidang teknologi untuk dapat ditangkap nilai-nilai yang terkandung di dalamnya secara pegetahuan teknologi. Dengan adanya penelitian terhadap Kitab Hadis, pengambilan informasi dari Hadis tentunya membutuhkan representasi teks ke dalam vektor untuk mengoptimalkan klasifikasi otomatis. Klasifikasi Hadis diperlukan untuk dapat mengelompokkan isi Hadis menjadi beberapa kategori. Ada beberapa kategori dalam Kitab Hadis tertentu yang sama dengan Kitab Hadis lainnya. Ini menunjukkan bahwa ada beberapa
APA, Harvard, Vancouver, ISO, and other styles
4

Grishmanov, E., I. Zakharchenko, P. Berdnik та M. Kasyanenko. "ВИБІР МАТЕМАТИЧНОГО АПАРАТУ ДЛЯ ПОБУДОВИ ВЕКТОРНОЇ МОДЕЛІ ТЕКСТОВИХ ПОВІДОМЛЕНЬ ДЛЯ НАВЧАННЯ ГЛИБОКОЇ НЕЙРОННОЇ МЕРЕЖІ ПРОГНОЗУВАННЮ НЕСПРИЯТЛИВИХ АВІАЦІЙНИХ ПОДІЙ В ПОЛЬОТІ". Системи управління, навігації та зв’язку. Збірник наукових праць 2, № 54 (2019): 18–21. http://dx.doi.org/10.26906/sunz.2019.2.018.

Full text
Abstract:
В роботі проводиться дослідження і вибір математичного апарату для побудови словника і векторної моделі текстових повідомлень для навчання глибокої гібридної нейронної мережі прогнозуванню несприятливих авіаційних подій в польоті. Для визначення вагових значень слів в текстових повідомленнях про несприятливі авіаційнї події в польоті при формуванні словника аналізуються вагові моделі на основі мір TF-IDF, TF-RF і TF-ICF. У якості методів векторного представлення текстової інформації в роботі досліджуються: «мішок слів», латентно-семантичний аналіз (Latent semantic analysis (LSA)), моделі векто
APA, Harvard, Vancouver, ISO, and other styles
5

Mazurek, Marcin, and Mateusz Romaniuk. "Attribution of authorship in instant messaging software applications, based on similarity measures of the stylometric features’ vector." Computer Science and Mathematical Modelling, no. 11-12/2020 (June 30, 2021): 33–41. http://dx.doi.org/10.5604/01.3001.0015.2735.

Full text
Abstract:
This paper describes the issue of authorship attribution based on the content of conversations originating from instant messaging software applications. The results presented in the paper refer to the corpus of conversations conducted in Polish. On the basis of a standardised model of the corpus of conversations, stylometric features were extracted, which were divided into four groups: word and message length distributions, character frequencies, tf-idf matrix and features extracted on the basis of turns (conversational features). The vectors of users’ stylometric features were compared in pai
APA, Harvard, Vancouver, ISO, and other styles
6

Pradhan, Ligaj. "Enhancing Rating Prediction by Discovering and Incorporating Hidden User Associations and Behaviors." International Journal of Multimedia Data Engineering and Management 10, no. 1 (2019): 40–59. http://dx.doi.org/10.4018/ijmdem.2019010103.

Full text
Abstract:
Collaborative filtering (CF)-based rating prediction would greatly benefit by incorporating additional user associations and behavioral similarity. This article focuses on infusing such additional side information in three common techniques used for building CF-based systems. First, multi-view clustering is used over neighborhood-based rating predictions. Secondly, additional user behavior knowledge discovered by mining user reviews are infused into non-negative matrix factorization (NMF) techniques. Finally, the article explores how to infuse such additional behavioral knowledge into a Deep N
APA, Harvard, Vancouver, ISO, and other styles
7

Haq, Bishrul, Ghulam Mujtaba, Zahid Hussain Khand, Javed Ahmad, and Zafar Ali. "A Comparative Study of Sentiment Analysis on Mask-Wearing Practices during the COVID-19 Pandemic." Quaid-e-Awam University Research Journal of Engineering, Science & Technology 18, no. 02 (2020): 116–26. http://dx.doi.org/10.52584/qrj.1802.17.

Full text
Abstract:
COVID-19 has become one of the most highly orated subject matter in these days. Countries have taken many viable actions to prevent the spread of the virus directed by international recommendations, which led to many disputes concerning wearing a face mask as a preventive measure against the virus. This study aims to assess and compare the overall accuracy, macro precision, macro F-measure and macro recall of the different decision models towards the COVID-19 mask-wearing practices via sentiment analysis. Tweets are labeled and text pre-processing techniques are applied as stemming, normalizat
APA, Harvard, Vancouver, ISO, and other styles
8

Xie, Lixia, Ziying Wang, Yue Wang, Hongyu Yang, and Jiyong Zhang. "New Multi-Keyword Ciphertext Search Method for Sensor Network Cloud Platforms." Sensors 18, no. 9 (2018): 3047. http://dx.doi.org/10.3390/s18093047.

Full text
Abstract:
This paper proposed a multi-keyword ciphertext search, based on an improved-quality hierarchical clustering (MCS-IQHC) method. MCS-IQHC is a novel technique, which is tailored to work with encrypted data. It has improved search accuracy and can self-adapt when performing multi-keyword ciphertext searches on privacy-protected sensor network cloud platforms. Document vectors are first generated by combining the term frequency-inverse document frequency (TF-IDF) weight factor and the vector space model (VSM). The improved quality hierarchical clustering (IQHC) algorithm then generates document ve
APA, Harvard, Vancouver, ISO, and other styles
9

Ianina, Anastasia, and Konstantin Vorontsov. "Hierarchical Interpretable Topical Embeddings for Exploratory Search and Real-Time Document Tracking." International Journal of Embedded and Real-Time Communication Systems 11, no. 4 (2020): 134–52. http://dx.doi.org/10.4018/ijertcs.2020100107.

Full text
Abstract:
Real-time monitoring of scientific papers and technological news requires fast processing of complicated search demands motivated by thematically relevant information acquisition. For this case, the authors develop an exploratory search engine based on probabilistic hierarchical topic modeling. Topic model gives a low dimensional sparse interpretable vector representation (topical embedding) of a text, which is used for ranking documents by their similarity to the query. They explore several ways of comparing topical vectors including searching with thematically homogeneous text segments. Topi
APA, Harvard, Vancouver, ISO, and other styles
10

Xie, Chunli, Xia Wang, Cheng Qian, and Mengqi Wang. "A Source Code Similarity Based on Siamese Neural Network." Applied Sciences 10, no. 21 (2020): 7519. http://dx.doi.org/10.3390/app10217519.

Full text
Abstract:
Finding similar code snippets is a fundamental task in the field of software engineering. Several approaches have been proposed for this task by using statistical language model which focuses on syntax and structure of codes rather than deep semantic information underlying codes. In this paper, a Siamese Neural Network is proposed that maps codes into continuous space vectors and try to capture their semantic meaning. Firstly, an unsupervised pre-trained method that models code snippets as a weighted series of word vectors. The weights of the series are fitted by the Term Frequency-Inverse Doc
APA, Harvard, Vancouver, ISO, and other styles
11

Man Kwon, Young, So Hee Jun, Won Mo Gal, and Myung Jae Lim. "The Performance Comparison of the Classifiers According to Binary Bow, Count Bow and Tf-Idf Feature Vectors for Malware Detection." International Journal of Engineering & Technology 7, no. 3.33 (2018): 15. http://dx.doi.org/10.14419/ijet.v7i3.33.18515.

Full text
Abstract:
In this paper, we compared the performance of the classifiers according to feature vectors with Binary BOW, Count BOW and TF-IDF for malware detection. We used the feature of Opcode that extracted from PE file. For performance comparison, we measured the AUC score for the classifiers those are DT, KNN, MLP, MNB and SVM. As a result, we recommend neural network (MLP) and instance-based model (KNN) because they show the high AUC score and accuracy regardless of the unbalanced dataset and the feature vector. If you use classical classifiers, we recommend DT because it guarantees high AUC score an
APA, Harvard, Vancouver, ISO, and other styles
12

Eddamiri, Siham, Asmaa Benghabrit, and Elmoukhtar Zemmouri. "RDF graph mining for cluster-based theme identification." International Journal of Web Information Systems 16, no. 2 (2020): 223–47. http://dx.doi.org/10.1108/ijwis-10-2019-0048.

Full text
Abstract:
Purpose The purpose of this paper is to present a generic pipeline for Resource Description Framework (RDF) graph mining to provide a comprehensive review of each step in the knowledge discovery from data process. The authors also investigate different approaches and combinations to extract feature vectors from RDF graphs to apply the clustering and theme identification tasks. Design/methodology/approach The proposed methodology comprises four steps. First, the authors generate several graph substructures (Walks, Set of Walks, Walks with backward and Set of Walks with backward). Second, the au
APA, Harvard, Vancouver, ISO, and other styles
13

Boonchuay, Kesinee. "Sentiment Classification Using Text Embedding for Thai Teaching Evaluation." Applied Mechanics and Materials 886 (January 2019): 221–26. http://dx.doi.org/10.4028/www.scientific.net/amm.886.221.

Full text
Abstract:
Sentiment classification gains a lot of attention nowadays. For a university, the knowledge obtained from classifying sentiments of student learning in courses is highly valuable, and can be used to help teachers improve their teaching skills. In this research, sentiment classification based on text embedding is applied to enhance the performance of sentiment classification for Thai teaching evaluation. Text embedding techniques considers both syntactic and semantic elements of sentences that can be used to improve the performance of the classification. This research uses two approaches to app
APA, Harvard, Vancouver, ISO, and other styles
14

Gaye, Babacar, Dezheng Zhang, and Aziguli Wulamu. "Sentiment classification for employees reviews using regression vector- stochastic gradient descent classifier (RV-SGDC)." PeerJ Computer Science 7 (September 23, 2021): e712. http://dx.doi.org/10.7717/peerj-cs.712.

Full text
Abstract:
The satisfaction of employees is very important for any organization to make sufficient progress in production and to achieve its goals. Organizations try to keep their employees satisfied by making their policies according to employees’ demands which help to create a good environment for the collective. For this reason, it is beneficial for organizations to perform staff satisfaction surveys to be analyzed, allowing them to gauge the levels of satisfaction among employees. Sentiment analysis is an approach that can assist in this regard as it categorizes sentiments of reviews into positive an
APA, Harvard, Vancouver, ISO, and other styles
15

Schofield, Matthew, Gulsum Alicioglu, Bo Sun, et al. "Comparison of Malware Classification Methods using Convolutional Neural Network based on API Call Stream." International Journal of Network Security & Its Applications 13, no. 2 (2021): 1–19. http://dx.doi.org/10.5121/ijnsa.2021.13201.

Full text
Abstract:
Malicious software is constantly being developed and improved, so detection and classification of malwareis an ever-evolving problem. Since traditional malware detection techniques fail to detect new/unknown malware, machine learning algorithms have been used to overcome this disadvantage. We present a Convolutional Neural Network (CNN) for malware type classification based on the API (Application Program Interface) calls. This research uses a database of 7107 instances of API call streams and 8 different malware types:Adware, Backdoor, Downloader, Dropper, Spyware, Trojan, Virus,Worm. We used
APA, Harvard, Vancouver, ISO, and other styles
16

Tamrakar, Sujan, Bal Krishna Bal, and Rajendra Bahadur Thapa. "Aspect Based Sentiment Analysis of Nepali Text Using Support Vector Machine and Naive Bayes." Technical Journal 2, no. 1 (2020): 22–29. http://dx.doi.org/10.3126/tj.v2i1.32824.

Full text
Abstract:
Aspect-based Sentiment Analysis assists in understanding the opinion of the associated entities helping for a better quality of a service or a product. A model is developed to detect the aspect-based sentiment in Nepali text using Machine Learning (ML) classifier algorithms namely Support Vector Machine (SVM) and Naïve Bayes (NB). The system collects Nepali text data from various websites and Part of Speech (POS) tagging is applied to extract the desired features of aspect and sentiment. Manual labeling is done for each sentence to identify the sentiment of the sentence. Term Frequency – Inver
APA, Harvard, Vancouver, ISO, and other styles
17

Angga, Putra, Lastri Widya Astuti, and Mustafa Ramadhan. "Pencarian Materi Kuliah Pada Aplikasi Blended Learning Menggunakan Metode Vector Space Model." Jurnal ULTIMATICS 8, no. 2 (2017): 92–101. http://dx.doi.org/10.31937/ti.v8i2.517.

Full text
Abstract:
Searching for a lot of materials are materials which is needed quickly and accurately. are by ranking them. Ranking is one branch of science of information retrieval. Information document search Vector Space Model (VSM). VSM uses the concept which is included in linear algebra is a vector space. Based on the concept that is used, the development of blended learning application uses space vector modeling method as an alternative for students in searching of relavan material toward materials needed, reducing the error level in the return of information and students can achieve goals quickly. Col
APA, Harvard, Vancouver, ISO, and other styles
18

Feng, Jian, Ying Zhang, and Yuqiang Qiao. "A Detection Method for Phishing Web Page Using DOM-Based Doc2Vec Model." Journal of Computing and Information Technology 28, no. 1 (2020): 19–31. http://dx.doi.org/10.20532/cit.2020.1004899.

Full text
Abstract:
Detecting phishing web pages is a challenging task. The existing detection method for phishing web page based on DOM (Document Object Model) is mainly aiming at obtaining structural characteristics but ignores the overall representation of web pages and the semantic information that HTML tags may have. This paper regards DOMs as a natural language with Doc2Vec model and learns the structural semantics automatically to detect phishing web pages. Firstly, the DOM structure of the obtained web page is parsed to construct the DOM tree, then the Doc2Vec model is used to vectorize the DOM tree, and
APA, Harvard, Vancouver, ISO, and other styles
19

Zhiquan, Wang. "Trouble Of Vespa Mandarinia: Confirming the Buzz about Hornets." E3S Web of Conferences 245 (2021): 02044. http://dx.doi.org/10.1051/e3sconf/202124502044.

Full text
Abstract:
In order to help Washington State interpret the data about Vespa mandarinia provided by the public report, and enable government agencies to adopt corresponding strategies to prioritize correct reports when resources are limited, for further investigation, this article establishes two targeted models: The first unsupervised probability prediction model. First, extract the text information of misjudgment classification in the data set, and carry out preprocessing. The data set is divided into training set and test set according to the ratio of 8:2, and the Latent Dirichlet Allocation model is t
APA, Harvard, Vancouver, ISO, and other styles
20

Hallac, Ibrahim Riza, Betul Ay, and Galip Aydin. "User Representation Learning for Social Networks: An Empirical Study." Applied Sciences 11, no. 12 (2021): 5489. http://dx.doi.org/10.3390/app11125489.

Full text
Abstract:
Gathering useful insights from social media data has gained great interest over the recent years. User representation can be a key task in mining publicly available user-generated rich content offered by the social media platforms. The way to automatically create meaningful observations about users of a social network is to obtain real-valued vectors for the users with user embedding representation learning models. In this study, we presented one of the most comprehensive studies in the literature in terms of learning high-quality social media user representations by leveraging state-of-the-ar
APA, Harvard, Vancouver, ISO, and other styles
21

Bessou, Sadik, and Racha Sari. "Efficient Discrimination between Arabic Dialects." Recent Advances in Computer Science and Communications 13, no. 4 (2020): 725–30. http://dx.doi.org/10.2174/2213275912666190716115604.

Full text
Abstract:
Background: With the explosion of communication technologies and the accompanying pervasive use of social media, we notice an outstanding proliferation of posts, reviews, comments, and other forms of expressions in different languages. This content attracted researchers from different fields; economics, political sciences, social sciences, psychology and particularly language processing. One of the prominent subjects is the discrimination between similar languages and dialects using natural language processing and machine learning techniques. The problem is usually addressed by formulating the
APA, Harvard, Vancouver, ISO, and other styles
22

Mahmoud, Adnen, and Mounir Zrigui. "Semantic Similarity Analysis for Corpus Development and Paraphrase Detection in Arabic." International Arab Journal of Information Technology 18, no. 1 (2020): 1–7. http://dx.doi.org/10.34028/iajit/18/1/1.

Full text
Abstract:
Paraphrase detection allows determining how original and suspect documents convey the same meaning. It has attracted attention from researchers in many Natural Language Processing (NLP) tasks such as plagiarism detection, question answering, information retrieval, etc., Traditional methods (e.g., Term Frequency-Inverse Document Frequency (TF-IDF), Latent Dirichlet Allocation (LDA), and Latent Semantic Analysis (LSA)) cannot capture efficiently hidden semantic relations when sentences may not contain any common words or the co-occurrence of words is rarely present. Therefore, we proposed a deep
APA, Harvard, Vancouver, ISO, and other styles
23

Babić, Karlo, Francesco Guerra, Sanda Martinčić-Ipšić, and Ana Meštrović. "A Comparison of Approaches for Measuring the Semantic Similarity of Short Texts Based on Word Embeddings." Journal of information and organizational sciences 44, no. 2 (2020): 231–46. http://dx.doi.org/10.31341/jios.44.2.2.

Full text
Abstract:
Measuring the semantic similarity of texts has a vital role in various tasks from the field of natural language processing. In this paper, we describe a set of experiments we carried out to evaluate and compare the performance of different approaches for measuring the semantic similarity of short texts. We perform a comparison of four models based on word embeddings: two variants of Word2Vec (one based on Word2Vec trained on a specific dataset and the second extending it with embeddings of word senses), FastText, and TF-IDF. Since these models provide word vectors, we experiment with various m
APA, Harvard, Vancouver, ISO, and other styles
24

Kumar, S. Nithish, M. Sai Subhakar, and K. Veeresh. "Students Query Classification System." International Journal of Recent Technology and Engineering 9, no. 5 (2021): 191–94. http://dx.doi.org/10.35940/ijrte.e5247.019521.

Full text
Abstract:
A University or educational institute generally receives a bulk of complaints posted by students every day. The issues relate to their academics or any issues related to their education or related to exam sections etc., because of these bulk of complaints received from the students every day, makes it difficult for the university to sort out them and classify them and send them to their respective departments for resolving the issues. In this project, we work on classifying these complaints based on the classes or departments they belong to, using. By using TF-IDF (term frequency-inverse docum
APA, Harvard, Vancouver, ISO, and other styles
25

Shyamasundar, L. B., and P. Jhansi Rani. "A Multiple-Layer Machine Learning Architecture for Improved Accuracy in Sentiment Analysis." Computer Journal 63, no. 3 (2019): 395–409. http://dx.doi.org/10.1093/comjnl/bxz038.

Full text
Abstract:
Abstract Twitter is an online micro-blogging platform through which one can explore the hidden valuable and delightful information about the current context at any point of time, which also serves as a data source to carry out sentiment analysis. In this paper, the sentiments of large amount of tweets generated from Twitter in the form of big data have been analyzed using machine learning algorithms. A multi-tier architecture for sentiment classification is proposed in this paper, which includes modules such as tokenization, data cleaning, preprocessing, stemming, updated lexicon, stopwords an
APA, Harvard, Vancouver, ISO, and other styles
26

Ahmed, Nizar, Fatih Dilmaç, and Adil Alpkocak. "Classification of Biomedical Texts for Cardiovascular Diseases with Deep Neural Network Using a Weighted Feature Representation Method." Healthcare 8, no. 4 (2020): 392. http://dx.doi.org/10.3390/healthcare8040392.

Full text
Abstract:
This study aims to improve the performance of multiclass classification of biomedical texts for cardiovascular diseases by combining two different feature representation methods, i.e., bag-of-words (BoW) and word embeddings (WE). To hybridize the two feature representations, we investigated a set of possible statistical weighting schemes to combine with each element of WE vectors, which were term frequency (TF), inverse document frequency (IDF) and class probability (CP) methods. Thus, we built a multiclass classification model using a bidirectional long short-term memory (BLSTM) with deep neu
APA, Harvard, Vancouver, ISO, and other styles
27

KLUNGPORNKUN, Mongkud, and Peerapon VATEEKUL. "Hierarchical Text Categorization Using Level Based Neural Networks of Word Embedding Sequences with Sharing Layer Information." Walailak Journal of Science and Technology (WJST) 16, no. 2 (2018): 121–31. http://dx.doi.org/10.48048/wjst.2019.4145.

Full text
Abstract:
In text corpora, it is common to categorize each document to a predefined class hierarchy, which is usually a tree. One of the most widely-used approaches is a level-based strategy that induces a multiclass classifier for each class level independently. However, all prior attempts did not utilize information from its parent level and employed a bag of words rather than considered a sequence of words. In this paper, we present a novel level-based hierarchical text categorization with a strategy called “sharing layer information” For each class level, a neural network is constructed, where its i
APA, Harvard, Vancouver, ISO, and other styles
28

Ali, Daler, Malik Muhammad Saad Missen, and Mujtaba Husnain. "Multiclass Event Classification from Text." Scientific Programming 2021 (January 12, 2021): 1–15. http://dx.doi.org/10.1155/2021/6660651.

Full text
Abstract:
Social media has become one of the most popular sources of information. People communicate with each other and share their ideas, commenting on global issues and events in a multilingual environment. While social media has been popular for several years, recently, it has given an exponential rise in online data volumes because of the increasing popularity of local languages on the web. This allows researchers of the NLP community to exploit the richness of different languages while overcoming the challenges posed by these languages. Urdu is also one of the most used local languages being used
APA, Harvard, Vancouver, ISO, and other styles
29

Aljofey, Ali, Qingshan Jiang, Qiang Qu, Mingqing Huang, and Jean-Pierre Niyigena. "An Effective Phishing Detection Model Based on Character Level Convolutional Neural Network from URL." Electronics 9, no. 9 (2020): 1514. http://dx.doi.org/10.3390/electronics9091514.

Full text
Abstract:
Phishing is the easiest way to use cybercrime with the aim of enticing people to give accurate information such as account IDs, bank details, and passwords. This type of cyberattack is usually triggered by emails, instant messages, or phone calls. The existing anti-phishing techniques are mainly based on source code features, which require to scrape the content of web pages, and on third-party services which retard the classification process of phishing URLs. Although the machine learning techniques have lately been used to detect phishing, they require essential manual feature engineering and
APA, Harvard, Vancouver, ISO, and other styles
30

Park, Dae-Seo, and Hwa-Jong Kim. "A Proposal of Join Vector for Semantic Factor Reflection in TF-IDF Based Keyword Extraction." Journal of Korean Institute of Information Technology 16, no. 2 (2018): 1–16. http://dx.doi.org/10.14801/jkiit.2018.16.2.1.

Full text
APA, Harvard, Vancouver, ISO, and other styles
31

Kurniasari, Iin, Kusrini Kusrini, and Hanif Al Fatta. "Analysis of Public Opinion Sentiment on Instagram regarding Covid-19 with SVM." JTECS : Jurnal Sistem Telekomunikasi Elektronika Sistem Kontrol Power Sistem dan Komputer 1, no. 1 (2021): 67. http://dx.doi.org/10.32503/jtecs.v1i1.1416.

Full text
Abstract:
Perkembangan teknologi dewasa ini mendorong masyarakat untuk selalu tanggap teknologi, terlebih di era pandemi covid-19 yang selalu mengedepankan social distancing. Media sosial digunakan sebagai suatu alat untuk menyampaikan opini masyarakat kepada khalayak. Dalam penelitian ini, penulis melakukan penelitian tentang opini masyaraat pada media sosial instagram dengan mengguakan Support Vector Machine. Setelah dilakukan uji akurasi dan presisi ternyata SVM belum sesuai digunakan sebagai algoritma yang dapat menangkap urutan karena susunan kata yang dibolak-balik meskipun maknanya berbeda tetap
APA, Harvard, Vancouver, ISO, and other styles
32

Andayani, Sri, and Ady Ryansyah. "Implementasi Algoritma TF-IDF Pada Pengukuran Kesamaan Dokumen." JuSiTik : Jurnal Sistem dan Teknologi Informasi Komunikasi 1, no. 1 (2017): 53. http://dx.doi.org/10.32524/jusitik.v1i1.218.

Full text
Abstract:
Documents similarity measure is a time consuming problem. The large amount of documents and the large number of pages per document are causing the similarity measures to becomes a complicated and hard job to do manually. In this research, a system that can automatically measuring similarity between documents is built by implementing TF-IDF. Measurements are carried by first creating a vector representation of documents being compared. This vector representation containing the weight of each term in the documents. After that, the similarity value are calculated using cosine similarity. The fini
APA, Harvard, Vancouver, ISO, and other styles
33

Truşcă, Maria Mihaela. "Efficiency of SVM classifier with Word2Vec and Doc2Vec models." Proceedings of the International Conference on Applied Statistics 1, no. 1 (2019): 496–503. http://dx.doi.org/10.2478/icas-2019-0043.

Full text
Abstract:
Abstract Support Vector Machine model is one of the most intensive used text data classifiers ever since the moment of its development. However, its performance depends not only on its features but also on data preprocessing and model tuning. The main purpose of this paper is to compare the efficiency of more Support Vector Machine models using both TF-IDF approach and Word2Vec and Doc2Vec neural networks for text data representation. Besides the data vectorization process, I try to enhance the models’ efficiency by identifying which kind of kernel fits better the data or if it is just better
APA, Harvard, Vancouver, ISO, and other styles
34

Bouarara, Hadj Ahmed, Reda Mohamed Hamou, and Abdelmalek Amine. "Text Clustering using Distances Combination by Social Bees." International Journal of Information Retrieval Research 4, no. 3 (2014): 34–53. http://dx.doi.org/10.4018/ijirr.2014070103.

Full text
Abstract:
Recently, the researchers proved that 90% of the information existed on the web, were presented in unstructured format (text free). The automatic text classification (clustering), has become a crucial challenge in the computer science community, where Most of the classical techniques, have known different problems in terms of time execution, multiplicity of data (marketing, biology, economics), and the initialization of cluster number. Nowadays, the bio-inspired paradigm, has known a genuine success in several sectors and particularly in the world of data-mining. The content of our work, is a
APA, Harvard, Vancouver, ISO, and other styles
35

Feng, Yiwei, M. Asif Naeem, Farhaan Mirza, and Ali Tahir. "Reply Using Past Replies—A Deep Learning-Based E-Mail Client." Electronics 9, no. 9 (2020): 1353. http://dx.doi.org/10.3390/electronics9091353.

Full text
Abstract:
Email is the most common and effective source of communication for most enterprises and individuals. In the corporate sector the volume of email received daily is significant while timely reply of each email is important. This generates a huge amount of work for the organisation, in particular for the staff located in the help-desk role. In this paper we present a novel Smart E-mail Management System (SEMS) for handling the issue of E-mail overload. The Term Frequency-Inverse Document Frequency (TF-IDF) model was used for designing a Smart Email Client in previous research. Since TF-IDF does n
APA, Harvard, Vancouver, ISO, and other styles
36

Alizah, Muhammad Dwison, Arifin Nugroho, Ummu Radiyah, and Windu Gata. "Sentimen Analisis Terkait Lockdown pada Sosial Media Twitter." Indonesian Journal on Software Engineering (IJSE) 6, no. 2 (2020): 223–29. http://dx.doi.org/10.31294/ijse.v6i2.8991.

Full text
Abstract:
Abstract: Covid-19 has been set as a Pandemic by the World Health Organization (WHO). The very large impact and the infection that is fast enough are the reasons for making Covid-19 as a pandemic and efforts to overcome. One anticipation that can be done is to do lockdown. Making the decision to carry out a lockdown is intended to reduce the spread that occurs. Lockdown is certainly not a 100% good solution for all of individual. There are individual who agree that the lockdown will be implemented, also there are those who think that the lockdown is better not to be carried out considering the
APA, Harvard, Vancouver, ISO, and other styles
37

Islam, Tanvirul, Ashik Iqbal Prince, Md Mehedee Zaman Khan, Md Ismail Jabiullah, and Md Tarek Habib. "An in-depth exploration of Bangla blog post classification." Bulletin of Electrical Engineering and Informatics 10, no. 2 (2021): 742–49. http://dx.doi.org/10.11591/eei.v10i2.2873.

Full text
Abstract:
Bangla blog is increasing rapidly in the era of information, and consequently, the blog has a diverse layout and categorization. In such an aptitude, automated blog post classification is a comparatively more efficient solution in order to organize Bangla blog posts in a standard way so that users can easily find their required articles of interest. In this research, nine supervised learning models which are Support Vector Machine (SVM), multinomial naïve Bayes (MNB), multi-layer perceptron (MLP), k-nearest neighbours (k-NN), stochastic gradient descent (SGD), decision tree, perceptron, ridge
APA, Harvard, Vancouver, ISO, and other styles
38

Sanjaya, Eko, Agi Prasetiadi, and WAHYU ANDI SAPUTRA. "Klasifikasi Analisis Sentimen Pada Gambar Meme Politik Dengan Library Tesseract Dan Algoritme Support vector machine." Journal of Informatics, Information System, Software Engineering and Applications (INISTA) 2, no. 1 (2019): 56–64. http://dx.doi.org/10.20895/inista.v2i1.96.

Full text
Abstract:
Meme merupakan penyebaran informasi dalam bentuk gambar. Berdasarkan data yang diperoleh, pengembangan meme mulai meningkat menjelang pemilu 2019. Informasi yang diperoleh dari meme politik beragam. Salah satunya memberikan dukungan untuk suatu partai atau tokoh politik atau digunakan untuk mengkritik / mencaci-maki partai politik atau tokoh. Sehingga diperlukan suatu sistem yang dapat mengklasifikasikan meme berdasarkan kelas Penelitian ini bertujuan untuk menciptakan sistem yang dapat mengklasifikasikan meme politik berdasarkan kelas. Algoritma yang akan digunakan dalam mengklasifikasikan ad
APA, Harvard, Vancouver, ISO, and other styles
39

Rahim, Robbi, Nuning Kurniasih, Muhammad Dedi Irawan, et al. "Latent Semantic Indexing for Indonesian Text Similarity." International Journal of Engineering & Technology 7, no. 2.3 (2018): 73. http://dx.doi.org/10.14419/ijet.v7i2.3.12619.

Full text
Abstract:
Document is a written letter that can be used as evidence of information. Plagiarism is a deliberate or unintentional act of obtaining or attempting to obtain credit or value for a scientific work, citing some or all of the scientific work of another party acknowledged as a scientific work without stating the source properly and adequately. Latent Semantic Indexing method serves to find text that has the same text against from a document. The algorithm used is TF/IDF Algorithm that is the result of multiplication of TF value with IDF for a term in document while Vector Space Model (VSM) is met
APA, Harvard, Vancouver, ISO, and other styles
40

Siregar, Riki Ruli A., Fera Amelia Sinaga, and Rakhmat Arianto. "Aplikasi Penentuan Dosen Penguji Skripsi Menggunakan Metode TF-IDF dan Vector Space Model." Computatio : Journal of Computer Science and Information Systems 1, no. 2 (2017): 171. http://dx.doi.org/10.24912/computatio.v1i2.1014.

Full text
Abstract:
Pada Sekolah Tinggi Teknik PLN (STT-PLN) penentuan dosen penguji tugas akhir atau skripsimerupakan tugas dari sekretaris jurusan. Penelitian ini bertujuan untuk memberikan alternativeuntuk menentukan dosen penguji skripsi. Metode yang di terapkan untuk membangun system iniadalah text mining, TF-IDF dan Vector Space Model (VSM). Text mining untuk melakukanprocessing data, dimana data yang akan diproses adalah judul dan abstrak skripsi, sedangkanVSM untuk melakukan pengklasifikasian kompetensi, penelitian ini dapat merekomendasikantiga dosen untuk menjadi dosen penguji skripsi berdasarkan kecoco
APA, Harvard, Vancouver, ISO, and other styles
41

Smirnov, D. A., та G. B. Sologub. "Automatiс Recommendation of Video for Online School Lesson Using Neuro-Linguistic Programming". Моделирование и анализ данных 10, № 2 (2020): 102–9. http://dx.doi.org/10.17759/mda.2020100208.

Full text
Abstract:
The article describes an approach to automating the matching of video materials to text slides in English classes in an online school by vectorizing slide text and video subtitles using the TF-IDF measure and maximizing the cosine similarity measure of these vector representations.
APA, Harvard, Vancouver, ISO, and other styles
42

Dias Canedo, Edna, and Bruno Cordeiro Mendes. "Software Requirements Classification Using Machine Learning Algorithms." Entropy 22, no. 9 (2020): 1057. http://dx.doi.org/10.3390/e22091057.

Full text
Abstract:
The correct classification of requirements has become an essential task within software engineering. This study shows a comparison among the text feature extraction techniques, and machine learning algorithms to the problem of requirements engineer classification to answer the two major questions “Which works best (Bag of Words (BoW) vs. Term Frequency–Inverse Document Frequency (TF-IDF) vs. Chi Squared (CHI2)) for classifying Software Requirements into Functional Requirements (FR) and Non-Functional Requirements (NF), and the sub-classes of Non-Functional Requirements?” and “Which Machine Lea
APA, Harvard, Vancouver, ISO, and other styles
43

Al-Radaei, Sami A. M., and R. B. Mishra. "A Heuristic Method for Learning Path Sequencing for Intelligent Tutoring System (ITS) in E-learning." International Journal of Intelligent Information Technologies 7, no. 4 (2011): 65–80. http://dx.doi.org/10.4018/jiit.2011100104.

Full text
Abstract:
Course sequencing is one of the vital aspects in an Intelligent Tutoring System (ITS) for e-learning to generate the dynamic and individual learning path for each learner. Many researchers used different methods like Genetic Algorithm, Artificial Neural Network, and TF-IDF (Term Frequency- Inverse Document Frequency) in E-leaning systems to find the adaptive course sequencing by obtaining the relation between the courseware. In this paper, heuristic semantic values are assigned to the keywords in the courseware based on the importance of the keyword. These values are used to find the relations
APA, Harvard, Vancouver, ISO, and other styles
44

Yazdani, Sepideh Foroozan, Masrah Azrifah Azmi Murad, Nurfadhlina Mohd Sharef, Yashwant Prasad Singh, and Ahmed Razman Abdul Latiff. "Sentiment Classification of Financial News Using Statistical Features." International Journal of Pattern Recognition and Artificial Intelligence 31, no. 03 (2017): 1750006. http://dx.doi.org/10.1142/s0218001417500069.

Full text
Abstract:
Sentiment classification of financial news deals with the identification of positive and negative news so that they can be applied in decision support systems for stock trend predictions. This paper explores several types of feature spaces as different data spaces for sentiment classification of the news article. Experiments are conducted using [Formula: see text]-gram models unigram, bigram and the combination of unigram and bigram as feature extraction with traditional feature weighting methods (binary, term frequency (TF), and term frequency-document frequency (TF-IDF)), while document freq
APA, Harvard, Vancouver, ISO, and other styles
45

Wang, Jin, Yangning Tang, Shiming He, et al. "LogEvent2vec: LogEvent-to-Vector Based Anomaly Detection for Large-Scale Logs in Internet of Things." Sensors 20, no. 9 (2020): 2451. http://dx.doi.org/10.3390/s20092451.

Full text
Abstract:
Log anomaly detection is an efficient method to manage modern large-scale Internet of Things (IoT) systems. More and more works start to apply natural language processing (NLP) methods, and in particular word2vec, in the log feature extraction. Word2vec can extract the relevance between words and vectorize the words. However, the computing cost of training word2vec is high. Anomalies in logs are dependent on not only an individual log message but also on the log message sequence. Therefore, the vector of words from word2vec can not be used directly, which needs to be transformed into the vecto
APA, Harvard, Vancouver, ISO, and other styles
46

Wang, Haoriqin, Huaji Zhu, Huarui Wu, Xiaomin Wang, Xiao Han, and Tongyu Xu. "A Densely Connected GRU Neural Network Based on Coattention Mechanism for Chinese Rice-Related Question Similarity Matching." Agronomy 11, no. 7 (2021): 1307. http://dx.doi.org/10.3390/agronomy11071307.

Full text
Abstract:
In the question-and-answer (Q&A) communities of the “China Agricultural Technology Extension Information Platform”, thousands of rice-related Chinese questions are newly added every day. The rapid detection of the same semantic question is the key to the success of a rice-related intelligent Q&A system. To allow the fast and automatic detection of the same semantic rice-related questions, we propose a new method based on the Coattention-DenseGRU (Gated Recurrent Unit). According to the rice-related question characteristics, we applied word2vec with the TF-IDF (Term Frequency–Inverse Do
APA, Harvard, Vancouver, ISO, and other styles
47

Celik, Mete, and Ahmet Sakir Dokuz. "Daily and hourly mood pattern discovery of Turkish twitter users." Global Journal of Computer Science 5, no. 2 (2015): 90. http://dx.doi.org/10.18844/gjcs.v5i2.183.

Full text
Abstract:
<p>Massive amount of data-related applications and widespread usage of web technologies has started big data era. Social media data is one of the big data sources. Mining social media data provides useful insights for companies and organizations for developing their services, products or organizations. This study aims to analyze Turkish Twitter users based on daily and hourly social media sharings. By this way, daily and hourly mood patterns of Turkish social media users could be revealed in positive or negative manner. For this purpose, Support Vector Machines (SVM) classification algor
APA, Harvard, Vancouver, ISO, and other styles
48

Powell, Michael, Jamison A. Rotz, and Kevin D. O’Malley. "How Machine Learning Is Improving U.S. Navy Customer Support." Proceedings of the AAAI Conference on Artificial Intelligence 34, no. 08 (2020): 13188–95. http://dx.doi.org/10.1609/aaai.v34i08.7023.

Full text
Abstract:
The U.S. Navy is successfully using natural language processing (NLP) and common machine-learning (ML) algorithms to categorize and automatically route plain text support requests at a Navy fleet support center. The algorithms enhance routine IT support tasks with automation and reduce the workload of service desk agents. The ML pipeline works in a five-step process. First, an archive of documents is created from various sources, including standard operating procedure (SOP) memos, frequently asked questions (FAQs), knowledge articles, Wikipedia articles, encyclopedia articles, previously close
APA, Harvard, Vancouver, ISO, and other styles
49

Seliverstov, Yaroslav, Viktoriya Chigur, Arseniy Sazanov, Svyatoslav Seliverstov, and Aliaksandra Svistunova. "Sentiment Analysis of "AUTOSTRADA.INFO/RU" Users’ Comments." SPIIRAS Proceedings 18, no. 2 (2019): 354–89. http://dx.doi.org/10.15622/sp.18.2.354-389.

Full text
Abstract:
As a result of the analysis, it was revealed that social networks (Vkontakte, Facebook), thematic communities in microblogging networks (Twitter), resources for travelers (TripAdvisor), transport portals (Autostrada) are a source of up-to-date and operational information about the traffic situation, the quality of transport services and passenger satisfaction with the quality of levels of transport services. However, the existing transport monitoring systems do not contain software tools capable of collecting and analyzing traffic information located in the Internet environment. This paper dis
APA, Harvard, Vancouver, ISO, and other styles
50

Pasma, Cadea Mikha, Ulla Delfana Rosiani, and Rudy Ariyanto. "PENGEMBANGAN SISTEM PENDETEKSI KEMIRIPAN KARYA PADA INAICTA 2013." Jurnal Informatika Polinema 1, no. 4 (2017): 14. http://dx.doi.org/10.33795/jip.v1i4.117.

Full text
Abstract:
Indonesia ICT Award (INAICTA) 2013 merupakan ajang lomba karya cipta kreativitas dan inovasi di bidang TIK (Teknologi Informasi dan Komputer) terbesar di Indonesia yang bertujuan untuk terus mendorong berkembangnya produk-produk TIK (Teknologi Informasi dan Komputer) lokal dengan peningkatan kualitas maupun inovasi produk. Semakin tahun, jumlah kontestan yang mengikuti INAICTA semakin bertambah. Hal tersebut berpengaruh terhadap tingkat kesulitan bagi para juri atau tim penilai untuk mengetahui kemiripan dari inovasi-inovasi para kontestan. Dibutuhkan suatu aplikasi yang dapat membantu dalam p
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!