To see the other types of publications on this topic, follow the link: TfIdf vectorizer.

Journal articles on the topic 'TfIdf vectorizer'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 22 journal articles for your research on the topic 'TfIdf vectorizer.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse journal articles on a wide variety of disciplines and organise your bibliography correctly.

1

Setiawan, Assegaff, Rasywir Errissya, and Pratama Yovi. "Experimental of vectorizer and classifier for scrapped social media data." TELKOMNIKA 21, no. 04 (2023): 815–24. https://doi.org/10.12928/telkomnika.v21i4.24180.

Full text
Abstract:
In this study, we used several classifiers and vectorizers to see their effect on processing social media data. In this study, the classifiers used were random forest, logistic regression, Bernoulli Naive Bayes (NB), and support vector clustering (SVC). Random forests are used to reduce spatial complexity, and also to minimize errors. Logistic regression is a method with a statistical model whose basic form uses a logistic function to represent the binary dependent variable. Then, the Naive Bayes function uses binary elements and SVC which has so far given good results rivals other guided learning. Our tests use social media data. Based on the tests that have been carried out on classifier variations and vectorizer variations, it was found that the best classifier is a linear regression algorithm based on predictive adaptive compared to the random forest method based on decision trees, probability-based Bernoulli NB and SVC which work by clustering. Meanwhile, from the test results on the count vectorizer, term frequency-inverse document frequency (TFIDF), and hashing, the best accuracy is achieved on the TFIDF vectorizer. In this case, it means that the TFIDF vectorizer has a better value in presenting word feature dimensions.
APA, Harvard, Vancouver, ISO, and other styles
2

Rahmatul Kholiq, Muhammad Hatta, Wiranto Wiranto, and Sari Widya Sihwi. "News classification using light gradient boosted machine algorithm." Indonesian Journal of Electrical Engineering and Computer Science 27, no. 1 (2022): 206. http://dx.doi.org/10.11591/ijeecs.v27.i1.pp206-213.

Full text
Abstract:
News classification is a complex issue as people are easily convinced of misleading information and lack control over the spread of fake news. However, we ca n break the problem of spreading fake news with artificial intelligence (AI), which has developed rapidly. This study proposes a news classification model using a light gradient boosted machine (LightGBM) algorithm. The model is analyzed using two feature extraction techniques, count vectorizer and Tfidf vectorize r and compared with a deep learning model using long - short term memory (LSTM). The experimental evaluation showed that all LightGBM models outperform LSTM. The best model is the count vectorizer Li ghtGBM, which achieves an accuracy value of 0.9933 and an area under curve (AUC) score of 0.9999.
APA, Harvard, Vancouver, ISO, and other styles
3

Kholiq, Muhammad Hatta Rahmatul, Wiranto Wiranto, and Sari Widya Sihwi. "News classification using light gradient boosted machine algorithm." Indonesian Journal of Electrical Engineering and Computer Science 27, no. 1 (2022): 206–13. https://doi.org/10.11591/ijeecs.v27.i1.pp206-213.

Full text
Abstract:
News classification is a complex issue as people are easily convinced of misleading information and lack control over the spread of fake news. However, we can break the problem of spreading fake news with artificial intelligence (AI), which has developed rapidly. This study proposes a news classification model using a light gradient boosted machine (LightGBM) algorithm. The model is analyzed using two feature extraction techniques, count vectorizer and Tfidf vectorizer and compared with a deep learning model using long-short term memory (LSTM). The experimental evaluation showed that all LightGBM models outperform LSTM. The best model is the count vectorizer LightGBM, which achieves an accuracy value of 0.9933 and an area under curve (AUC) score of 0.9999.
APA, Harvard, Vancouver, ISO, and other styles
4

Ramalingam, Gomathi, Logeswari S, M. D. Kumar, Manjula Prabakaran, Neerav Nishant, and Syed A. Ahmed. "Machine learning classifiers to predict the quality of semantic web queries." Scientific Temper 15, no. 01 (2024): 1777–83. http://dx.doi.org/10.58414/scientifictemper.2024.15.1.28.

Full text
Abstract:
In this research, a classification framework to automatically identify well and poorly designed SPARQL queries is proposed. Evaluating SPARQL queries becomes a difficult challenging issue because of the query design and the volume of data to be handled. The proposed context applies various machine learning algorithms including decision trees, k nearest neighbours, support vector machine, and naive Bayes. In addition, two different feature extraction techniques called TFIDF measure and count vectorizer are measured to identify the key features. The experimental results show that the four machine learning classifiers applied are able to classify the SPARQL queries into three categories like well, accepted, and poorly designed queries. It also provides hopeful results with respect to recall, precision, and F1-score. In datasets used for experimentation, it was found that the decision trees classifier outperforms well compared to other classifiers by achieving 92% in terms of F1-measure. Also, the count vectorizer performs well in measuring the TFIDF property to predict the poorly designed queries.
APA, Harvard, Vancouver, ISO, and other styles
5

Borkar, Sumedh. "Identifying Fake News Using Real Time Analytics." International Journal for Research in Applied Science and Engineering Technology 10, no. 7 (2022): 994–1000. http://dx.doi.org/10.22214/ijraset.2022.45406.

Full text
Abstract:
Abstract: Before the internet, people acquired their news from the radio, television, and newspapers. With the internet, the news moved online, and suddenly, anyone could post information on websites such as Facebook and Twitter. The spread of fake news has also increased with social media. It has become one of the most significant issues of this century. People use the method of fake news to pollute the reputation of a well-reputed organization for their benefit. The most important reason for such a project is to frame a device to examine the language designs that describe fake and right news through machine learning. This paper proposes models of machine learning that can successfully detect fake news. These models identify which news is real or fake and specify the accuracy of said news, even in a complex environment. After data preprocessing and exploration, we applied three machine learning models; random forest classifier, logistic regression, and term frequency-inverse document frequency (TFIDF) vectorizer. The accuracy of the TFIDF vectorizer, logistic regression, random forest classifier, and decision tree classifier models was approximately 99.52%, 98.63%, 99.63%, and 99.68%, respectively. Machine learning models can be considered a great choice to find reality-based results and applied to other unstructured data for various sentiment analysis applications.
APA, Harvard, Vancouver, ISO, and other styles
6

Awan, Mazhar Javed, Awais Yasin, Haitham Nobanee, et al. "Fake News Data Exploration and Analytics." Electronics 10, no. 19 (2021): 2326. http://dx.doi.org/10.3390/electronics10192326.

Full text
Abstract:
Before the internet, people acquired their news from the radio, television, and newspapers. With the internet, the news moved online, and suddenly, anyone could post information on websites such as Facebook and Twitter. The spread of fake news has also increased with social media. It has become one of the most significant issues of this century. People use the method of fake news to pollute the reputation of a well-reputed organization for their benefit. The most important reason for such a project is to frame a device to examine the language designs that describe fake and right news through machine learning. This paper proposes models of machine learning that can successfully detect fake news. These models identify which news is real or fake and specify the accuracy of said news, even in a complex environment. After data-preprocessing and exploration, we applied three machine learning models; random forest classifier, logistic regression, and term frequency-inverse document frequency (TF-IDF) vectorizer. The accuracy of the TFIDF vectorizer, logistic regression, random forest classifier, and decision tree classifier models was approximately 99.52%, 98.63%, 99.63%, and 99.68%, respectively. Machine learning models can be considered a great choice to find reality-based results and applied to other unstructured data for various sentiment analysis applications.
APA, Harvard, Vancouver, ISO, and other styles
7

Bhuvaneshwari, K., Dr S. A. Jyothi Rani, and Dr V. V. Haragopal. "Sentiment Analysis of Tweets on Telangana State Government Flagship Schemes." International Journal of Engineering and Advanced Technology 12, no. 1 (2022): 23–27. http://dx.doi.org/10.35940/ijeat.a3794.1012122.

Full text
Abstract:
Over the last decade, the usage of social media has evolved to a greater extent. Today, social media platforms like Twitter, facebook, snapchat are vastly used to incept the opinions of public about a particular entity. Social media has become a great source of text data. Text analytics plays a crucial role on social media data to give answers to a wide variety of questions about public feedback on many issues or topics. The primary objective of this work is to analyse the public opinion or sentiment in social media on Telangana state government welfare schemes. The purpose of sentiment analysis is to find opinions from tweets and extract sentiments from them and find their polarity, i.e., positive, neutral or negative. Here we are using twitter as it has gained much popularity and media attention. The first step is to extract the tweets on particular schemes through Twitter API and Python language followed by cleaning and pre- processing steps of the raw tweets. Then tfidf vectorizer was invoked for feature extraction and creation of bag of words and finally sentiment polarity scores were obtained by using VADER (Valence Aware Dictionary and sentiment Reasoner), lexicon and rule-based sentiment analysis tool.
APA, Harvard, Vancouver, ISO, and other styles
8

Reddy, Vookanti Anurag, CH Vamsidhar Reddy, and Dr R. Lakshminarayanan. "Fake News Detection using Machine Learning." International Journal for Research in Applied Science and Engineering Technology 10, no. 4 (2022): 227–30. http://dx.doi.org/10.22214/ijraset.2022.41124.

Full text
Abstract:
Abstract: This Project comes up with the applications of NLP (Natural Language Processing) techniques for detecting the 'fake news', that is, misleading news stories that comes from the non-reputable sources. Only by building a model based on a count vectorizer (using word tallies) or a (Term Frequency Inverse Document Frequency) tfidf matrix, (word tallies relative to how often they’re used in other articles in your dataset) can only get you so far. But these models do not consider the important qualities like word ordering and context. It is very possible that two articles that are similar in their word count will be completely different in their meaning. The data science community has responded by taking actions against the problem. There is a Kaggle competition called as the “Fake News Challenge” and Facebook is employing AI to filter fake news stories out of users’ feeds. Combatting the fake news is a classic text classification project with a straight forward proposition. Is it possible for you to build a model that can differentiate between “Real “news and “Fake” news? So a proposed work on assembling a dataset of both fake and real news and employ a Naive Bayes classifier in order to create a model to classify an article into fake or real based on its words and phrases
APA, Harvard, Vancouver, ISO, and other styles
9

Nandini, Mrs Kagita, Sreya Deepthi Puppala, Vadupu Varshita, Sudulagunta Ratna Megda, and Thommandru Sumanth. "Deep Hybrid System for Personalized Movie Recommendations." Journal of Nonlinear Analysis and Optimization 16, no. 01 (2025): 739–47. https://doi.org/10.36893/jnao.2025.v16i01.087.

Full text
Abstract:
A proposal framework is a sort of programming application or calculation intended to recommend things, for example, items, films, melodies, articles to clients in light of their inclinations, ways of behaving, or comparable client's exercises. This paper presents an original crossover suggestion framework that coordinates content-based and cooperative separating approaches utilizing profound learning procedures to improve film proposals. Our model merges the metadata of movies, including genres, cast, and crew from the Movie Lens dataset with user ratings to construct a comprehensive feature set. We employ a Term Frequency-Inverse Document Frequency (TFIDF) vectorizer to extract content-based features and utilize Singular Value Decomposition (SVD) to derive collaborative filtering features, thereby addressing both user preferences and item characteristics. We further enhance the model by concatenating these features into a unified representation, which is then processed through a deep neural network to predict movie ratings. The network architecture consists of multiple dense layers with dropout regularization to prevent overfitting, ensuring robustness in learning complex user-item interactions. We evaluate our model on a standard dataset, focusing on mean squared error (MSE) as the performance metric to assess accuracy. The results demonstrate the effectiveness of our hybrid approach in providing precise recommendations by leveraging both the semantic content of movies and the historical interactions of users, thereby outperforming traditional methods that rely on singular recommendation strategies. This research contributes to the recommendation system community by showcasing a scalable and efficient method to improve
APA, Harvard, Vancouver, ISO, and other styles
10

Terisri, Paladugula, Nandyala Hiranmayee, V. V. S. S. C. Ekantha S, Dungala Puthin, Kishor Ambati Karteek, and Tanmai Ramisetti Jyothi. "Sentimental Analysis using NLP." Sentimental Analysis using NLP 8, no. 12 (2023): 5. https://doi.org/10.5281/zenodo.10401483.

Full text
Abstract:
Sentiment analysis is a subset of text analysis techniques that uses automatic text polarity detection. One of the main responsibilities of NLP (Natural Language Processing) is sentiment analysis, often known as opinion mining. In recent years, sentiment analysis has gained a lot of popularity. It is meant for people to build a system that can recognize and categorize sentiment or opinion as it is expressed in an electronic text. Nowadays, people who wish to purchase consumer goods prefer to read user reviews and participate in public online forums where others discuss the product. This is because consumers frequently have to make trade-offs when making purchases. Before making a purchase, a lot of customers read other people's reviews. Individuals frequently voice their opinions about several things. Opinion mining has grown in significance as a result. Sentiment analysis is the process of determining if the expressed opinion about the subject is favorable or negative. Customers must choose which portion of the available data to utilize. Sentiment analysis is the technique of locating and removing subjective information from unprocessed data. If we could accurately forecast sentiments, we could be able to gather online opinions and anticipate the preferences of online customers. This information could be useful for study in marketing or economics. As of right now, sentiment classification, feature-based classification, and handling negations are the three main issues facing this research community. Keywords:- Numpy, Pandas, TF-IDF, Tfidf Vectorizer, Linear SVC, Train-Test Split, Accuracy Score, Classification Report, Confusion Matrix, user Input, Vectorization, Prediction, Preprocessing, Text Classification, Supervised Learning, Machine Learning Model, Scikit-Learn.
APA, Harvard, Vancouver, ISO, and other styles
11

D.Swetha, D.Swetha. "Detecting Faux Information Using Machine Learning." International Journal of Scientific Development and Research 7, no. 9 (2022): 954–57. https://doi.org/10.5281/zenodo.10442975.

Full text
Abstract:
Fake news is false or deceiving information presented as news. Fake news, or fake news websites, have no base in fact, but are presented as being factually accurate. Fake news has also been called junk news, pseudo-news, indispensable data, false news, humbug news and bullshit. Recent political events have led to an increase in the fashionability and spread of fake news. As demonstrated by the wide goods of the large onset of fake news, humans are inconsistent if not outright poor sensors of fake news. With this, been made to automate the process of fake news discovery. The most popular of similar attempts include “blacklists” of sources and authors that are unreliable. While these tools are useful, in order to produce a more complete end to end result, we need to regard for more delicate cases where dependable sources and authors release fake news. As similar, the thing of this design was to produce a tool for detecting the language patterns that characterize fake and real news through the use of machine learning and natural language processing ways. The results of this design demonstrate the capability for machine learning to be useful in this task. We've erected a model that catches numerous intuitive suggestions of real and fake news as well as an operation that aids in the visualization of the bracket decision. This design comes up with the operations of NLP (Natural Language Processing) ways for detecting the ‘fake news’, that is, misleading news stories that comes from the non-reputable sources. Only by erecting a model grounded on a count vectorizer or a (Term frequence Inverse Document frequence) tfidf matrix. There's a Kaggle competition called as the “Fake News Challenge” and Facebook is employing AI to sludge fake news stories out of druggies’ feeds. Combatting the fake news is a classic textbook bracket design with a straight forward proposition.
APA, Harvard, Vancouver, ISO, and other styles
12

Itoo, Rayees Ahmad. "Classifying Opinions and Sentiments on Social Networking Sites using Machine Learning Classifiers." International Journal for Research in Applied Science and Engineering Technology 12, no. 2 (2024): 1613–23. http://dx.doi.org/10.22214/ijraset.2024.58664.

Full text
Abstract:
Abstract: People now publish evaluations on social media for any product, movie, or location they visit as a result of the Web's rapid development. Customers and product owners can both benefit from the reviews posted on social media in order to assess their offerings. Compared to unstructured data, structured data is simpler to analyze. The reviews are mostly available in an unstructured format. Aspect-Based Sentiment Analysis extracts from the reviews the features of a product and then calculates sentiment for each feature. Sentiment analysis, also known as opinion mining, is a natural language processing (NLP) technique used to determine the sentiment or emotional tone expressed in a piece of text, such as a sentence, paragraph, or document. Machine leaning classifiers are used to classify sentiments. Machine learning classifiers cannot process raw text so raw text needs to be converted into vectorized form. Feature extraction techniques are used to convert raw text to numerical form also called vectorized data. In present research, four feature extraction techniques with five different machine learning classifiers namely, SVM, Logistic Regression, Naïve Bayes, Random Forest, and KNN are applied to classify sentiments associated with tweets. Two online twitter data sets containing tweets about product reviews and tweets about people's thoughts on public policy are selected for experimentation. In the experiments done, it has been found that the SVM classifier using TFIDF and HFE shows better performance as compared to other classifiers. Using the feature sets, 97% accuracy and 98% F1- score is achieved in the aspect category prediction task.
APA, Harvard, Vancouver, ISO, and other styles
13

K., Bhuvaneshwari, S. A. Jyothi Rani Dr., and V. V. Haragopal Dr. "Sentiment Analysis of Tweets on Telangana State Government Flagship Schemes." International Journal of Engineering and Advanced Technology (IJEAT) 12, no. 1 (2022): 23–27. https://doi.org/10.35940/ijeat.A3794.1012122.

Full text
Abstract:
<strong>Abstract: </strong>Over the last decade, the usage of social media has evolved to a greater extent. Today, social media platforms like Twitter, facebook, snapchat are vastly used to incept the opinions of public about a particular entity. Social media has become a great source of text data. Text analytics plays a crucial role on social media data to give answers to a wide variety of questions about public feedback on many issues or topics. The primary objective of this work is to analyse the public opinion or sentiment in social media on Telangana state government welfare schemes. The purpose of sentiment analysis is to find opinions from tweets and extract sentiments from them and find their polarity, i.e., positive, neutral or negative. Here we are using twitter as it has gained much popularity and media attention. The first step is to extract the tweets on particular schemes through Twitter API and Python language followed by cleaning and pre- processing steps of the raw tweets. Then tfidf vectoriser was invoked for feature extraction and creation of bag of words and finally sentiment polarity scores were obtained by using VADER (Valence Aware Dictionary and sEntiment Reasoner), lexicon and rule-based sentiment analysis tool.
APA, Harvard, Vancouver, ISO, and other styles
14

Shinde, Anjali, Essa Q. Shahra, Shadi Basurra, Faisal Saeed, Abdulrahman A. AlSewari, and Waheb A. Jabbar. "SMS Scam Detection Application Based on Optical Character Recognition for Image Data Using Unsupervised and Deep Semi-Supervised Learning." Sensors 24, no. 18 (2024): 6084. http://dx.doi.org/10.3390/s24186084.

Full text
Abstract:
The growing problem of unsolicited text messages (smishing) and data irregularities necessitates stronger spam detection solutions. This paper explores the development of a sophisticated model designed to identify smishing messages by understanding the complex relationships among words, images, and context-specific factors, areas that remain underexplored in existing research. To address this, we merge a UCI spam dataset of regular text messages with real-world spam data, leveraging OCR technology for comprehensive analysis. The study employs a combination of traditional machine learning models, including K-means, Non-Negative Matrix Factorization, and Gaussian Mixture Models, along with feature extraction techniques such as TF_IDF and PCA. Additionally, deep learning models like RNN-Flatten, LSTM, and Bi-LSTM are utilized. The selection of these models is driven by their complementary strengths in capturing both the linear and non-linear relationships inherent in smishing messages. Machine learning models are chosen for their efficiency in handling structured text data, while deep learning models are selected for their superior ability to capture sequential dependencies and contextual nuances. The performance of these models is rigorously evaluated using metrics like accuracy, precision, recall, and F1 score, enabling a comparative analysis between the machine learning and deep learning approaches. Notably, the K-means feature extraction with vectorizer achieved 91.01% accuracy, and the KNN-Flatten model reached 94.13% accuracy, emerging as the top performer. The rationale behind highlighting these models is their potential to significantly improve smishing detection rates. For instance, the high accuracy of the KNN-Flatten model suggests its applicability in real-time spam detection systems, but its computational complexity might limit scalability in large-scale deployments. Similarly, while K-means with vectorizer excels in accuracy, it may struggle with the dynamic and evolving nature of smishing attacks, necessitating continual retraining.
APA, Harvard, Vancouver, ISO, and other styles
15

Jiang, Xuehui. "A Sentiment Classification Model of E-Commerce User Comments Based on Improved Particle Swarm Optimization Algorithm and Support Vector Machines." Scientific Programming 2022 (April 1, 2022): 1–9. http://dx.doi.org/10.1155/2022/3330196.

Full text
Abstract:
With the rapid increase of the number of Internet users and the amount of online comment data, a large number of referable information samples are provided for data mining technology. As a technical application of data mining, text sentiment classification can be widely used in public opinion management, marketing, and other fields. In this study, a combination approach to SVM (support vector machine) and IPSO (improved particle swarm optimization) is proposed to classify sentiment by using text data. First, the text data of 30,000 goods reviews and corresponding ratings are collected through the web crawler. Then, TFIDF (term frequency-inverse document frequency) and Word2vec are used to vectorize the goods review text data. Next, the proposed classification model is trained by the SVM, and the initial parameters of the SVM are optimized by the IPSO. Finally, we applied the trained SVM-IPSO model to the test set and evaluated the performance by several measures. Our experiment results indicate that the proposed model performed the best for text data sentiment classification. Additionally, the traditional machine learning model SVM becomes very effective after parameter optimization, which demonstrates that the parameters’ optimization by IPSO has successfully improved the classification accuracy. Furthermore, our proposed model SVM-IPSO significantly outperforms other benchmark models, indicating that it could be applied to improve the accuracy and efficiency for text data sentiment classification.
APA, Harvard, Vancouver, ISO, and other styles
16

Arya, Vishakha, Amit Kumar Mishra Mishra, and Alfonso González-Briones. "Analysis of sentiments on the onset of Covid-19 using Machine Learning Techniques." ADCAIJ: Advances in Distributed Computing and Artificial Intelligence Journal 11, no. 1 (2022): 45–63. http://dx.doi.org/10.14201/adcaij.27348.

Full text
Abstract:
The novel coronavirus (Covid-19) pandemic has struck the whole world and is one of the most striking topics on social media platforms. Sentiment outbreak on social media enduring various thoughts, opinions, and emotions about the Covid-19 disease, expressing views they are feeling presently. Analyzing sentiments helps to yield better results. Gathering data from different blogging sites like Facebook, Twitter, Weibo, YouTube, Instagram, etc., and Twitter is the largest repository. Videos, text, and audio were also collected from repositories. Sentiment analysis uses opinion mining to acquire the sentiments of its users and categorizes them accordingly as positive, negative, and neutral. Analytical and machine learning classification is implemented to 3586 tweets collected in different time frames. In this paper, sentiment analysis was performed on tweets accumulated during the Covid-19 pandemic, Coronavirus disease. Tweets are collected from the Twitter database using Hydrator a web-based application. Data-preprocessing removes all the noise, outliers from the raw data. With Natural Language Toolkit (NLTK), text classification for sentiment analysis and calculate the score subjective polarity, counts, and sentiment distribution. N-gram is used in textual mining -and Natural Language Processing for a continuous sequence of words in a text or document applying uni-gram, bi-gram, and tri-gram for statistical computation. Term frequency and Inverse document frequency (TF-IDF) is a feature extraction technique that converts textual data into numeric form. Vectorize data feed to our model to obtain insights from linguistic data. Linear SVC, MultinomialNB, GBM, and Random Forest classifier with Tfidf classification model applied to our proposed model. Linear Support Vector classification performs better than the other two classifiers. Results depict that RF performs better.
APA, Harvard, Vancouver, ISO, and other styles
17

"Identification of Duplication in Questions Posed on Knowledge Sharing Platform Quora using Machine Learning Techniques." International Journal of Innovative Technology and Exploring Engineering 8, no. 12 (2019): 2444–51. http://dx.doi.org/10.35940/ijitee.l3017.1081219.

Full text
Abstract:
Quora, an online question-answering platform has a lot of duplicate questions i.e. questions that convey the same meaning. Since it is open to all users, anyone can pose a question any number of times this increases the count of duplicate questions. This paper uses a dataset comprising of question pairs (taken from the Quora website) in different columns with an indication of whether the pair of questions are duplicates or not. Traditional comparison methods like Sequence matcher perform a letter by letter comparison without understanding the contextual information, hence they give lower accuracy. Machine learning methods predict the similarity using features extracted from the context. Both the traditional methods as well as the machine learning methods were compared in this study. The features for the machine learning methods are extracted using the Bag of Words models- Count-Vectorizer and TFIDF-Vectorizer. Among the traditional comparison methods, Sequence matcher gave the highest accuracy of 65.29%. Among the machine learning methods XGBoost gave the highest accuracy, 80.89% when Count-Vectorizer is used and 80.12% when TFIDF-Vectorizer is used.
APA, Harvard, Vancouver, ISO, and other styles
18

Venkatramulu, S., Md Sharfuddin Waseem, Arshiya Taneem, Sri Yashaswini Thoutam, Snigdha Apuri, and Nachiketh Nachiketh. "Research on SQL Injection Attacks using Word Embedding Techniques and Machine Learning." Journal of Sensors, IoT & Health Sciences, March 31, 2024. http://dx.doi.org/10.69996/jsihs.2024005.

Full text
Abstract:
Most of the damage done by web application attacks comes from SQL injection attacks, in which the attacker(s) can change, remove, and read data from the database servers. All three tenets of security— confidentiality, integrity, and availability—are vulnerable to a SQL injection attack. Database management systems receive their queries in the form of SQL (structured query language). It is not a new field of study, but it is still important to detect and prevent SQL injection attacks. A method of SQL injection detection based on machine learning is proposed. Feature extraction, followed by implementing various word embedding techniques like count vectorizer, TFIDF vectorizer to process the text data which can effectively represent the SQLI features is performed. Classification algorithms like Logistic Regression, SVM and Ensemble techniques like XGBoost is employed. Our goal in doing this systematic review is to find a better machine learning model to detect SQL injection attacks via implementing different word embedding techniques. The accuracy and F1-score of machine learning algorithms in terms of predicting the SQLI query has been calculated and reported in this research paper.
APA, Harvard, Vancouver, ISO, and other styles
19

Fatima, Rubab, Mian Muhammad Sadiq Fareed, Saleem Ullah, Gulnaz Ahmad, and Saqib Mahmood. "An Optimized Approach for Detection and Classification of Spam Email’s Using Ensemble Methods." Wireless Personal Communications, November 13, 2024. http://dx.doi.org/10.1007/s11277-024-11628-9.

Full text
Abstract:
AbstractSince the advent of email services, spam emails have been a major concern because users’ security depends on the classification of emails as ham or spam. It’s a malware attack that has been used for spear phishing, whaling, clone phishing, website forgery, and other harmful activities. However, various ensemble Machine Learning (ML) algorithms used for the detection and filtering of spam emails have been less explored. In this research, we offer a ML-based optimized algorithm for detecting spam emails that have been enhanced using Hyper-parameter tuning approaches. The proposed approach uses two feature extraction modules, namely Count-Vectorizer and TFIDF-Vectorizer that provide the most effective classification results when we apply them to three different publicly available email data sets: Ling Spam, UCI SMS Spam, and the Proposed dataset. Moreover, to extend the performance of classifiers we used various ML methods such as Naive Bayes (NB), Logistic Regression (LR), Extra Tree, Stochastic Gradient Descent (SGD), XG-Boost, Support Vector Machine (SVM), Random Forest (RF), Multi-layer Perception (MLP), and parameter optimization approaches such as Manual search, Random search, Grid search, and Genetic algorithm. For all three data sets, the SGD outperformed other algorithms. All of the other ensembles (Extra Tree, RF), linear models (LR, Linear-SVC), and MLP performed admirably, with relatively high precision, recall, accuracies, and F1-score.
APA, Harvard, Vancouver, ISO, and other styles
20

Sahu, Laxminarayan, and Bhavana Narain. "FAKE NEWS DETECTION USING MACHINE LEARNING MULTI-MODEL METHOD." ShodhKosh: Journal of Visual and Performing Arts 5, no. 2 (2024). http://dx.doi.org/10.29121/shodhkosh.v5.i2.2024.1811.

Full text
Abstract:
A fake news article that originates from an WhatsApp source is known as fake news. Fake news is becoming more and more prevalent on social media and other platforms, and this is a serious worry since it has the potential to have devastating effects on society and the country. This is why there has already been a lot of research done on its detection. This study uses supervised machine learning techniques to develop a product model through research and implementation of false news detection system. To put it briefly, this work will use a Naive Bayes classifier to build a model that can identify fake news by measuring its words and phrases against a set of criteria. that uses techniques like a count vectorizer (using word tallies) or a (Term Frequency Inverse Document Frequency) tfidf matrix to categorize bogus news as real or false It's highly likely that the meaning of two papers with comparable word counts will be entirely different.
APA, Harvard, Vancouver, ISO, and other styles
21

-, Kavitha I., Arshad Ahamed M. -, Deral Akshan A. -, Gokul S. -, and Kogul M. -. "Discerning Truth: Leveraging Naïve Bayes for Fake News Detection." International Journal For Multidisciplinary Research 6, no. 2 (2024). http://dx.doi.org/10.36948/ijfmr.2024.v06i02.18312.

Full text
Abstract:
These days individuals get to know all the news, temperate and political undertakings through social media. The most deliberate is to redirect the truthfulness and inventiveness of the news. This kind of news spreading poses a serious threat to social cohesiveness and well-being since it fosters polarization in politics and mistrust among people. False news producers use elaborate, colorful traps to further the success of their manifestations, one of which is to incite the providers' emotions. The information-savvy community has responded by adopting measures to address the issue. Hence by utilizing machine learning Algorithm, we are reaching to make a demonstrate that separate the genuine and fake news. This system works with the operations of NLP (Normal Dialect Handling) ways for recognizing the Genuine Time ‘phony news’ that's deluding stories that come from the untrustworthy source. By performing nostalgic examination, the show is prepared to characterize the suppositions, feelings and demeanor in a corpus on the off chance that news. In this framework we utilized TexrBlob, which is one of the effective python library to preform nostalgic examination. Our model grounded on a TFIDF vectorizer (Term recurrence Converse Report recurrence). We accumulated our datasets from facebook, instagram, wire, twitter conjointly from various other social medias. We evacuated a few datasets from Kaggle to test and preparing our framework In order to offer a show that classifies a composition as false or genuine based on its words and expressions, a proposed method involves gathering a dataset of both fake and genuine news and using a Naïve Bayes classifier. For visualization we utilized Scene, which is used to mix each kind of information to assist for creating appealing visualization
APA, Harvard, Vancouver, ISO, and other styles
22

"Interrogation of Sentiment Perusing with Hash Counting Vectorizer and Term Inverse Frequency Transformer using Machine Learning Classification." International Journal of Recent Technology and Engineering 8, no. 4 (2019): 3895–901. http://dx.doi.org/10.35940/ijrte.d8303.118419.

Full text
Abstract:
With the fast growing technology, the business is moving towards increasing their profit by interpreting the customer satisfaction. The customer satisfaction can be analyzed in many ways. It is the responsibility of the business to analyze the customer satisfaction in order to improve their turnover and profit. With the current trend, the customers are giving their feedback through mobile and internet. With this overview, this paper attempts to analyze the sentiment of the customer feedback for the movie. The sentiment Analysis on movie Review dataset from the KAGGLE Machine learning repository is used for implementation. The type of sentiment classes is predicted through the following ways. Firstly, the sentiment count for each class is displayed and the top feature words for each sentiment class are also extracted from the dataset. Secondly, the dataset is sampled with counting vectorizer and then fitted with the classifiers like Logistic Regression Classifier, Linear SVM Classifier, Multinomial Naives Bayes Classifier, Gradient Boosting Classifer, Guassian Naive Bayes Classifier, Random Forest Classifier, Decision Tree Classifier and and Extra Tree Classifier. Thirdly, the dataset is sampled with Hashing vectorizer and then fitted with the above specified classifiers. Fourth, the dataset is sampled with TFIFD vectorizer and then fitted with the above specified classifiers. Fifth, the dataset is sampled with TFIFD Transformer and then fitted with the above specified classifiers. Sixth, the Performance analysis of classifiers is performed by analyzing the metrics like Precision, Recall, Fscore and Accuracy. The implementation is carried out using python code in Spyder Anaconda Navigator IP Console. Experimental results shows that the analysis of sentiment done by the random forest classifier is found to be more effective with the Accuracy of 89% for Counting vectorizer and TFIFD transformer, Accuracy of 87% for Hashing vectorizer and Accuracy of 88% for TFIFD vectorizer.
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!