Academic literature on the topic 'Skip-gram model'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the lists of relevant articles, books, theses, conference reports, and other scholarly sources on the topic 'Skip-gram model.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Journal articles on the topic "Skip-gram model"

1

Wang, Xiaojie, Haijun Zhao, and Huayue Chen. "Improved Skip-Gram Based on Graph Structure Information." Sensors 23, no. 14 (2023): 6527. http://dx.doi.org/10.3390/s23146527.

Full text
Abstract:
Applying the Skip-gram to graph representation learning has become a widely researched topic in recent years. Prior works usually focus on the migration application of the Skip-gram model, while Skip-gram in graph representation learning, initially applied to word embedding, is left insufficiently explored. To compensate for the shortcoming, we analyze the difference between word embedding and graph embedding and reveal the principle of graph representation learning through a case study to explain the essential idea of graph embedding intuitively. Through the case study and in-depth understanding of graph embeddings, we propose Graph Skip-gram, an extension of the Skip-gram model using graph structure information. Graph Skip-gram can be combined with a variety of algorithms for excellent adaptability. Inspired by word embeddings in natural language processing, we design a novel feature fusion algorithm to fuse node vectors based on node vector similarity. We fully articulate the ideas of our approach on a small network and provide extensive experimental comparisons, including multiple classification tasks and link prediction tasks, demonstrating that our proposed approach is more applicable to graph representation learning.
APA, Harvard, Vancouver, ISO, and other styles
2

Chai, Bianfang, Xinyu Ji, Jianglin Guo, Lixiao Ma, and Yibo Zheng. "A Generative Model for Topic Discovery and Polysemy Embeddings on Directed Attributed Networks." Symmetry 14, no. 4 (2022): 703. http://dx.doi.org/10.3390/sym14040703.

Full text
Abstract:
Combining topic discovery with topic-specific word embeddings is a popular, powerful method for text mining in a small collection of documents. However, the existing researches purely modeled on the contents of documents and led to discovering noisy topics. This paper proposes a generative model, the skip-gram topical word-embedding model (simplified as steoLC) on asymmetric document link networks, where nodes correspond to documents and links refer to directed references between documents. It simultaneously improves the performance of topic discovery and polysemous word embeddings. Each skip-gram in a document is generated based on the topic distribution of the document and the two word embeddings in the skip-gram. Each directed link is generated based on the hidden topic distribution of the beginning document node. For a document, the skip-grams and links share a common topic distribution. Parameter estimation is inferred and an algorithm is designed to learn the model parameters by combining the expectation-maximization (EM) algorithm with the negative sampling method. Experimental results show that our method generates more useful topic-specific word embeddings and coherent latent topics than the state-of-the-art models.
APA, Harvard, Vancouver, ISO, and other styles
3

Celik, Enes, and Sevinc Ilhan Omurca. "Skip-Gram and Transformer Model for Session-Based Recommendation." Applied Sciences 14, no. 14 (2024): 6353. http://dx.doi.org/10.3390/app14146353.

Full text
Abstract:
Session-based recommendation uses past clicks and interaction sequences from anonymous users to predict the next item most likely to be clicked. Predicting the user’s subsequent behavior in online transactions becomes a problem mainly due to the lack of user information and limited behavioral information. Existing methods, such as recurrent neural network (RNN)-based models that model user’s past behavior sequences and graph neural network (GNN)-based models that capture potential relationships between items, miss different time intervals in the past behavior sequence and can only capture certain types of user interest patterns due to the characteristics of neural networks. Graphic models created to improve the current session reduce the model’s success due to the addition of irrelevant items. Moreover, attention mechanisms in recent approaches have been insufficient due to weak representations of users and products. In this study, we propose a model based on the combination of skip-gram and transformer (SkipGT) to solve the above-mentioned drawbacks in session-based recommendation systems. In the proposed method, skip-gram both captures chained user interest in the session thread through item-specific subreddits and learns complex interaction information between items. The proposed method captures short-term and long-term preference representations to predict the next click with the help of a transformer. The transformer in our proposed model overcomes many limitations in turn-based models and models longer contextual connections between items more effectively. In our proposed model, by giving the transformer trained item embeddings from the skip-gram model as input, the transformer has better performance because it does not learn item representations from scratch. By conducting extensive experiments with three real-world datasets, we confirm that SkipGT significantly outperforms state-of-the-art solutions with an average MRR score of 5.58%.
APA, Harvard, Vancouver, ISO, and other styles
4

Joko, Hideaki, Yoshitatsu Matsuda, and Kazunori Yamaguchi. "Automatic Synonym Acquisition Using a Context-Restricted Skip-gram Model." Journal of Natural Language Processing 24, no. 2 (2017): 187–204. http://dx.doi.org/10.5715/jnlp.24.187.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Yajni, Archit, and Ms Sabu Lama Tamang. "Chunker Based Sentiment Analysis and Tense Classification for Nepali Text." International Journal on Natural Language Computing 12, no. 6 (2023): 01–14. http://dx.doi.org/10.5121/ijnlc.2023.12601.

Full text
Abstract:
The article represents the Sentiment Analysis (SA) and Tense Classification using Skip gram model for the word to vector encoding on Nepali language. The experiment on SA for positive-negative classification is carried out in two ways. In the first experiment the vector representation of each sentence is generated by using Skip-gram model followed by the Multi-Layer Perceptron (MLP) classification and it is observed that the F1 score of 0.6486 is achieved for positive-negative classification with overall accuracy of 68%. Whereas in the second experiment the verb chunks are extracted using Nepali parser and carried out the similar experiment on the verb chunks. F1 scores of 0.6779 is observed for positive -negative classification with overall accuracy of 85%. Hence, Chunker based sentiment analysis is proven to be better than sentiment analysis using sentences. This paper also proposes using a skip-gram model to identify the tenses of Nepali sentences and verbs. In the third experiment, the vector representation of each sentence is generated by using Skip-gram model followed by the Multi-Layer Perceptron (MLP)classification and it is observed that verb chunks had very low overall accuracy of 53%. In the fourth experiment, conducted for Tense Classification using Sentences resulted in improved efficiency with overall accuracy of 89%. Past tenses were identified and classified more accurately than other tenses. Hence, sentence based tense classification is proven to be better than verb Chunker based sentiment analysis.
APA, Harvard, Vancouver, ISO, and other styles
6

Chunker, Based Sentiment Analysis and Tense Classification for Nepali Text. "Chunker Based Sentiment Analysis and Tense Classification for Nepali Text." International Journal on Natural Language Computing (IJNLC) 12, no. 6 (2023): 16. https://doi.org/10.5121/ijnlc.2023.12601.

Full text
Abstract:
The article represents the Sentiment Analysis (SA) and Tense Classification using Skip gram model for the word to vector encoding on Nepali language. The experiment on SA for positive-negative classification is carried out in two ways. In the first experiment the vector representation of each sentence is generated by using Skip-gram model followed by the Multi-Layer Perceptron (MLP) classification and it is observed that the F1 score of 0.6486 is achieved for positive-negative classification with overall accuracy of 68%. Whereas in the second experiment the verb chunks are extracted using Nepali parser and carried out the similar experiment on the verb chunks. F1 scores of 0.6779 is observed for positive -negative classification with overall accuracy of 85%. Hence, Chunker based sentiment analysis is proven to be better than sentiment analysis using sentences. This paper also proposes using a skip-gram model to identify the tenses of Nepali sentences and verbs. In the third experiment, the vector representation of each sentence is generated by using Skip-gram model followed by the Multi-Layer Perceptron (MLP)classification and it is observed that verb chunks had very low overall accuracy of 53%. In the fourth experiment, conducted for Tense Classification using Sentences resulted in improved efficiency with overall accuracy of 89%. Past tenses were identified and classified more accurately than other tenses. Hence, sentence based tense classification is proven to be better than verb Chunker based sentiment analysis.
APA, Harvard, Vancouver, ISO, and other styles
7

Hsieh, I.-Chung, and Cheng-Te Li. "Toward an Adaptive Skip-Gram Model for Network Representation Learning." IEEE Access 10 (2022): 37506–14. http://dx.doi.org/10.1109/access.2022.3164670.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Tang, Yachun. "Research on Word Vector Training Method Based on Improved Skip-Gram Algorithm." Advances in Multimedia 2022 (February 27, 2022): 1–8. http://dx.doi.org/10.1155/2022/4414207.

Full text
Abstract:
Through the effective word vector training method, we can obtain semantic-rich word vectors and can achieve better results on the same task. In view of the shortcomings of the traditional skip-gram model in coding and modeling the processing of context words, this study proposes an improved word vector-training method based on skip-gram algorithm. Based on the analysis of the existing skip-gram model, the concept of distribution hypothesis is introduced. The distribution of each word in the word context is taken as the representation of the word, the word is put into the semantic space of the word, and then the word is modelled, which is better modelled by the smoothing of words and the semantic space of words. In the training process, the random gradient descent method is used to solve the vector representation of each word and each Chinese character. The proposed training method is compared with skip gram, CWE + P , and SEING by using word sense similarity task and text classification task in the experiment. Experimental results showed that the proposed method had significant advantages in the Chinese-word segmentation task with a performance gain rate of about 30%. The method proposed in this study provides a reference for the in-depth study of word vector and text mining.
APA, Harvard, Vancouver, ISO, and other styles
9

Khomsah, Siti. "Sentiment Analysis On YouTube Comments Using Word2Vec and Random Forest." Telematika 18, no. 1 (2021): 61. http://dx.doi.org/10.31315/telematika.v18i1.4493.

Full text
Abstract:
Purpose: This study aims to determine the accuracy of sentiment classification using the Random-Forest, and Word2Vec Skip-gram used for features extraction. Word2Vec is one of the effective methods that represent aspects of word meaning and, it helps to improve sentiment classification accuracy.Methodology: The research data consists of 31947 comments downloaded from the YouTube channel for the 2019 presidential election debate. The dataset consists of 23612 positive comments and 8335 negative comments. To avoid bias, we balance the amount of positive and negative data using oversampling. We use Skip-gram to extract features word. The Skip-gram will produce several features around the word the context (input word). Each of these features contains a weight. The feature weight of each comment is calculated by an average-based approach. Random Forest is used to building a sentiment classification model. Experiments were carried out several times with different epoch and window parameters. The performance of each model experiment was measured by cross-validation.Result: Experiments using epochs 1, 5, and 20 and window sizes of 3, 5, and 10, obtain the average accuracy of the model is 90.1% to 91%. However, the results of testing reach an accuracy between 88.77% and 89.05%. But accuracy of the model little bit lower than the accuracy model also was not significant. In the next experiment, it recommended using the number of epochs and the window size greater than twenty epochs and ten windows, so that accuracy increasing significantly.Value: The number of epoch and window sizes on the Skip-Gram affect accuracy. More and more epoch and window sizes affect increasing the accuracy.
APA, Harvard, Vancouver, ISO, and other styles
10

Arthur O. Santos, Flávio, Thiago Dias Bispo, Hendrik Teixeira Macedo, and Cleber Zanchettin. "Morphological Skip-Gram: Replacing FastText characters n-gram with morphological knowledge." Inteligencia Artificial 24, no. 67 (2021): 1–17. http://dx.doi.org/10.4114/intartif.vol24iss67pp1-17.

Full text
Abstract:
Natural language processing systems have attracted much interest of the industry. This branch of study is composed of some applications such as machine translation, sentiment analysis, named entity recognition, question and answer, and others. Word embeddings (i.e., continuous word representations) are an essential module for those applications generally used as word representation to machine learning models. Some popular methods to train word embeddings are GloVe and Word2Vec. They achieve good word representations, despite limitations: both ignore morphological information of the words and consider only one representation vector for each word. This approach implies the word embeddings does not consider different word contexts properly and are unaware of its inner structure. To mitigate this problem, the other word embeddings method FastText represents each word as a bag of characters n-grams. Hence, a continuous vector describes each n-gram, and the final word representation is the sum of its characters n-grams vectors. Nevertheless, the use of all n-grams character of a word is a poor approach since some n-grams have no semantic relation with their words and increase the amount of potentially useless information. This approach also increase the training phase time. In this work, we propose a new method for training word embeddings, and its goal is to replace the FastText bag of character n-grams for a bag of word morphemes through the morphological analysis of the word. Thus, words with similar context and morphemes are represented by vectors close to each other. To evaluate our new approach, we performed intrinsic evaluations considering 15 different tasks, and the results show a competitive performance compared to FastText. Moreover, the proposed model is $40\%$ faster than FastText in the training phase. We also outperform the baseline approaches in extrinsic evaluations through Hate speech detection and NER tasks using different scenarios.
APA, Harvard, Vancouver, ISO, and other styles
More sources

Dissertations / Theses on the topic "Skip-gram model"

1

Lopes, Evandro Dalbem. "Utilização do modelo skip-gram para representação distribuída de palavras no projeto Media Cloud Brasil." reponame:Repositório Institucional do FGV, 2015. http://hdl.handle.net/10438/16685.

Full text
Abstract:
Submitted by Evandro Lopes (dalbem.evandro@gmail.com) on 2016-04-04T03:14:32Z No. of bitstreams: 1 dissertacao_skip_gram.pdf: 1559216 bytes, checksum: c9487105e0e9341acd30f549c30d4dc9 (MD5)<br>Approved for entry into archive by Janete de Oliveira Feitosa (janete.feitosa@fgv.br) on 2016-07-19T19:55:35Z (GMT) No. of bitstreams: 1 dissertacao_skip_gram.pdf: 1559216 bytes, checksum: c9487105e0e9341acd30f549c30d4dc9 (MD5)<br>Approved for entry into archive by Marcia Bacha (marcia.bacha@fgv.br) on 2016-07-25T17:47:32Z (GMT) No. of bitstreams: 1 dissertacao_skip_gram.pdf: 1559216 bytes, checksum: c9487105e0e9341acd30f549c30d4dc9 (MD5)<br>Made available in DSpace on 2016-07-25T17:47:47Z (GMT). No. of bitstreams: 1 dissertacao_skip_gram.pdf: 1559216 bytes, checksum: c9487105e0e9341acd30f549c30d4dc9 (MD5) Previous issue date: 2015-06-30<br>There is a representation problem when working with natural language processing because once the traditional model of bag-of-words represents the documents and words as single matrix, this one tends to be completely sparse. In order to deal with this problem, there are some methods capable of represent the words using a distributed representation, with a smaller dimension and more compact, including some properties that allow to relate words on the semantic form. The aim of this work is to use a dataset obtained by the Media Cloud Brasil project and apply the skip-gram model to explore relations and search for pattern that helps to understand the content.<br>Existe um problema de representação em processamento de linguagem natural, pois uma vez que o modelo tradicional de bag-of-words representa os documentos e as palavras em uma unica matriz, esta tende a ser completamente esparsa. Para lidar com este problema, surgiram alguns métodos que são capazes de representar as palavras utilizando uma representação distribuída, em um espaço de dimensão menor e mais compacto, inclusive tendo a propriedade de relacionar palavras de forma semântica. Este trabalho tem como objetivo utilizar um conjunto de documentos obtido através do projeto Media Cloud Brasil para aplicar o modelo skip-gram em busca de explorar relações e encontrar padrões que facilitem na compreensão do conteúdo.
APA, Harvard, Vancouver, ISO, and other styles
2

Moreno, Pérez Carlos. "Text Mining in Macroeconomics and Finance Using Unsupervised Machine Learning Algorithms." Doctoral thesis, 2021. http://hdl.handle.net/11562/1042759.

Full text
Abstract:
Questa tesi presenta tre diverse applicazioni di macroeconomia e finanza delle tecniche di analisi di testi basate su algoritmi di apprendimento automatico non supervisionati. In particolare, queste tecniche di analisi di testi vengono applicate ai documenti ufficiali delle banche centrali e agli articoli di giornale scritti in inglese e spagnolo. L'implementazione di queste tecniche ha comportato un considerevole lavoro di preelaborazione per rimuovere paragrafi e articoli non rilevanti per l'analisi. Ai documenti ufficiali delle banche centrali, abbiamo assegnato etichette ad ogni paragrafo per indicare la data e altra informazione utile. Abbiamo quindi applicato vari algoritmi di apprendimento automatico non supervisionato come Latent Dirichlet Allocation (LDA), il modello Skip-Gram e K-Means per costruire misure di testo. Alcuni di questi algoritmi di apprendimento automatico non supervisionati, che erano già disponibili per la lingua inglese, sono stati adattati alla lingua spagnola. Abbiamo prodotto semplici misure del contenuto della comunicazione per identificare gli argomenti, cioè i temi o soggetti, e il tono, cioè il sentimento o il grado di incertezza del testo. Quindi, abbiamo studiato la relazione tra questi indici di incertezza e le variabili economiche chiave in macroeconomia e finanza utilizzando i modelli VAR strutturale e GARCH esponenziale. Il primo articolo indaga la relazione tra le opinioni espresse nei verbali delle riunioni del Comitato di politica monetaria (COPOM) della Banca Centrale del Brasile e l'economia reale. In primo luogo, deduciamo il contenuto dei paragrafi dei minuti con LDA e poi costruiamo un indice di incertezza per i minuti con Word Embeddings e K-Means. Dopo costruiamo due indici di incertezza per diversi contenuti, Il primo indice di incertezza del tema è costruito da paragrafi con una maggiore probabilità di argomenti relativi alle "condizioni economiche generali", mentre il secondo indice di incertezza del tema è costruito da paragrafi con una maggiore probabilità di argomenti relativi a "inflazione" e “discussione della politica monetaria”. Infine, tramite un VAR strutturale esploriamo gli effetti di questi indici di incertezza su alcune variabili macroeconomiche brasiliane Il secondo articolo studia e misura l'incertezza nei verbali delle riunioni del consiglio di amministrazione della Banca Centrale del Messico e la mette in relazione con le variabili di politica monetaria. In particolare, concepiamo due indici di incertezza per la versione spagnola dei verbali utilizzando tecniche di apprendimento automatico senza supervisione. Il primo indice di incertezza è costruito sfruttando LDA, mentre il secondo utilizza il modello Skip-Gram e K-Means. Costruiamo anche indici di incertezza per le tre sezioni principali del verbale. Troviamo che una maggiore incertezza nei verbali è correlata a un aumento dell'inflazione e della massa monetaria. Il terzo articolo indaga le reazioni dei mercati finanziari statunitensi alle notizie dei giornali dal gennaio 2019 al primo maggio 2020. A tal fine, deduciamo il contenuto e il sentimento delle notizie sviluppando indici dai titoli e dai frammenti del New York Times . In particolare, utilizziamo LDA per dedurre il contenuto degli articoli e gli algoritmi Skip-Gram e K-Means per misurare il loro sentimento (incertezza). In questo modo si arriva alla definizione di una serie di indici giornalieri di incertezza per ogni tema. Questi indici vengono quindi utilizzati per trovare spiegazioni nel comportamento dei mercati finanziari statunitensi implementando una serie di modelli EGARCH. In sostanza, troviamo che due indici di incertezza e argomento, uno relativo alle notizie COVID-19 e l'altro alle notizie sulla guerra commerciale, spiegano gran parte dei movimenti nei mercati finanziari dall'inizio del 2019 fino ai primi quattro mesi del 2020 .<br>This thesis presents three different applications to macroeconomics and finance of text mining techniques based on unsupervised machine learning algorithms. In particular, these text mining techniques are applied to official documents of central banks and to newspaper articles written in English and Spanish. The implementation of these techniques involved a considerable preprocessing work to remove paragraphs and articles not relevant for the analysis. To the official documents of the central banks, we also assigned tags to each paragraph to indicate the date and other useful information. We then applied various computational linguistic unsupervised machine learning algorithms such as Latent Dirichlet Allocation (LDA), Word Embedding (with the Skip-Gram model) and K-Means to construct some text measures. Some of these unsupervised machine learning algorithms, which were already available for the English language, have been adapted to the Spanish language. We produced simple measures to identify the topics, that is, the themes or subjects, and the tone, that is, the sentiment or degree of uncertainty, of the text. Finally, we investigated the relationship between these uncertainty indices and some key variables in macroeconomics and finance using Structural VAR and Exponential GARCH models. The first paper investigates the relationship between the views expressed in the minutes of the meetings of the Central Bank of Brazil’s Monetary Policy Committee (COPOM) and the real economy. Firstly, we infer the content of the paragraphs of the minutes with Latent Dirichlet Allocation and then we build an uncertainty index for the minutes with Word Embedding and K-Means. Thus, we create two topic-uncertainty indices. The first topic-uncertainty index is constructed from paragraphs with a higher probability of topics related to “general economic conditions”, whereas the second topic-uncertainty index is constructed from paragraphs with a higher probability of topics related to “inflation” and the “monetary policy discussion”. Finally, via a Structural VAR we explore the lasting effects of these uncertainty indices on some Brazilian macroeconomic variables. The second paper studies and measures uncertainty in the minutes of the meetings of the board of governors of the Central Bank of Mexico and relates it to monetary policy variables. In particular, we conceive two uncertainty indices for the Spanish version of the minutes using unsupervised machine learning techniques. The first uncertainty index is constructed exploiting Latent Dirichlet Allocation, whereas the second uses Word Embedding (with the Skip-Gram model) and K-Means. We also create uncertainty indices for the three main sections of the minutes. We find that higher uncertainty in the minutes is related to an increase in inflation and money supply. The third paper investigates the reactions of US financial markets to newspaper news from January 2019 to the first of May 2020. To this end, we deduce the content and sentiment of the news by developing apposite indices from the headlines and snippets of the New York Times. In particular, we use Latent Dirichlet Allocation to infer the content of the articles, and Word Embedding and K-Means to measure their sentiment (uncertainty). In this way, we arrive to the definition of a set of daily topic-specific uncertainty indices. These indices are then used to find explanations in the behaviour of the US financial markets by implementing a batch of EGARCH models. In substance, we find that two topic-specific uncertainty indices, one related with COVID-19 news and the other with trade war news, explain much of the movements in the financial markets from the beginning of 2019 up to the first four months of 2020.
APA, Harvard, Vancouver, ISO, and other styles

Book chapters on the topic "Skip-gram model"

1

Preethi Krishna, P., and A. Sharada. "Word Embeddings - Skip Gram Model." In ICICCT 2019 – System Reliability, Quality Control, Safety, Maintenance and Management. Springer Singapore, 2019. http://dx.doi.org/10.1007/978-981-13-8461-5_15.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Mu, Cun (Matthew), Guang Yang, and Yan (John) Zheng. "Revisiting Skip-Gram Negative Sampling Model with Rectification." In Advances in Intelligent Systems and Computing. Springer International Publishing, 2019. http://dx.doi.org/10.1007/978-3-030-22871-2_33.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Ren, Dedong, and Yong Liu. "SkipCas: Information Diffusion Prediction Model Based on Skip-Gram." In Machine Learning and Knowledge Discovery in Databases. Springer International Publishing, 2023. http://dx.doi.org/10.1007/978-3-031-26390-3_16.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Law, Jarvan, Hankz Hankui Zhuo, JunHua He, and Erhu Rong. "LTSG: Latent Topical Skip-Gram for Mutually Improving Topic Model and Vector Representations." In Pattern Recognition and Computer Vision. Springer International Publishing, 2018. http://dx.doi.org/10.1007/978-3-030-03338-5_32.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Röchert, Daniel, German Neubaum, and Stefan Stieglitz. "Identifying Political Sentiments on YouTube: A Systematic Comparison Regarding the Accuracy of Recurrent Neural Network and Machine Learning Models." In Disinformation in Open Online Media. Springer International Publishing, 2020. http://dx.doi.org/10.1007/978-3-030-61841-4_8.

Full text
Abstract:
Abstract Since social media have increasingly become forums to exchange personal opinions, more and more approaches have been suggested to analyze those sentiments automatically. Neural networks and traditional machine learning methods allow individual adaption by training the data, tailoring the algorithm to the particular topic that is discussed. Still, a great number of methodological combinations involving algorithms (e.g., recurrent neural networks (RNN)), techniques (e.g., word2vec), and methods (e.g., Skip-Gram) are possible. This work offers a systematic comparison of sentiment analytical approaches using different word embeddings with RNN architectures and traditional machine learning techniques. Using German comments of controversial political discussions on YouTube, this study uses metrics such as F1-score, precision and recall to compare the quality of performance of different approaches. First results show that deep neural networks outperform multiclass prediction with small datasets in contrast to traditional machine learning models with word embeddings.
APA, Harvard, Vancouver, ISO, and other styles
6

Grover, Madhur. "Text Emotion Categorization Using a Convolutional Recurrent Neural Network Enhanced by an Attention Mechanism-based Skip Gram Method." In Demystifying Emerging Trends in Machine Learning. BENTHAM SCIENCE PUBLISHERS, 2025. https://doi.org/10.2174/9789815305395125020032.

Full text
Abstract:
Text-based web archives have become increasingly common as technology has advanced. For many text classification applications, classic machine learning classifiers like support vector machines (SVMs) and naïve Bayes (NBayes) perform well. Since short texts have fewer words and convolutional and pooling layers have their limits, these classifiers suffer from sparsity and lack long-term dependencies. In this study, we present a convolutional recurrent neural network architecture that makes use of a modified skip-gram method. For the adversarial training of the skip-gram algorithm, we employ the L2 regularization technique. It can boost the model's performance in text sentiment classification tasks and increase its robustness and generalizability. To extract information from the entire text while dampening the influence of irrelevant words, we deployed a convolutional neural network equipped with attention mechanisms. The CNN-based categorization of text emotion is complete. When compared to other classifiers used on the Twitter dataset, our model and algorithm were shown to be more efficient and accurate.
APA, Harvard, Vancouver, ISO, and other styles
7

Ren, Taiyong. "Research on Chinese Spam SMS Classification Based on ECA-TextCNN Model." In Frontiers in Artificial Intelligence and Applications. IOS Press, 2024. http://dx.doi.org/10.3233/faia231403.

Full text
Abstract:
In response to the issues of traditional Chinese spam SMS classification models, which fail to comprehensively extract text features and obtain representative text features, we propose a text classification model based on the ECA-TextCNN model, combining an efficient channel attention mechanism and TextCNN classification model. Firstly, the model utilizes a skip-gram pre-training model for text representation. Secondly, it employs convolutional kernels of different sizes for feature extraction. Furthermore, ECA-Net is used to assign corresponding weights to each channel feature. Finally, the feature vectors are inputted into the softmax layer to obtain the classification results. The experiments are conducted on the publicly available Message80W Chinese SMS dataset. The results show that compared to the baseline model, the text classification model based on ECA-TextCNN exhibits varying degrees of improvement in classification accuracy, precision, recall, and F1 value.
APA, Harvard, Vancouver, ISO, and other styles
8

Arguello Casteleiro Mercedes, Maseda Fernandez Diego, Demetriou George, et al. "A Case Study on Sepsis Using PubMed and Deep Learning for Ontology Learning." In Studies in Health Technology and Informatics. IOS Press, 2017. https://doi.org/10.3233/978-1-61499-753-5-516.

Full text
Abstract:
We investigate the application of distributional semantics models for facilitating unsupervised extraction of biomedical terms from unannotated corpora. Term extraction is used as the first step of an ontology learning process that aims to (semi-)automatic annotation of biomedical concepts and relations from more than 300K PubMed titles and abstracts. We experimented with both traditional distributional semantics methods such as Latent Semantic Analysis (LSA) and Latent Dirichlet Allocation (LDA) as well as the neural language models CBOW and Skip-gram from Deep Learning. The evaluation conducted concentrates on sepsis, a major life-threatening condition, and shows that Deep Learning models outperform LSA and LDA with much higher precision.
APA, Harvard, Vancouver, ISO, and other styles
9

Souza de Oliveira, Raphael, and Erick Giovani Sperandio Nascimento. "Clustering by Similarity of Brazilian Legal Documents Using Natural Language Processing Approaches." In Artificial Intelligence. IntechOpen, 2021. http://dx.doi.org/10.5772/intechopen.99875.

Full text
Abstract:
The Brazilian legal system postulates the expeditious resolution of judicial proceedings. However, legal courts are working under budgetary constraints and with reduced staff. As a way to face these restrictions, artificial intelligence (AI) has been tackling many complex problems in natural language processing (NLP). This work aims to detect the degree of similarity between judicial documents that can be achieved in the inference group using unsupervised learning, by applying three NLP techniques, namely term frequency-inverse document frequency (TF-IDF), Word2Vec CBoW, and Word2Vec Skip-gram, the last two being specialized with a Brazilian language corpus. We developed a template for grouping lawsuits, which is calculated based on the cosine distance between the elements of the group to its centroid. The Ordinary Appeal was chosen as a reference file since it triggers legal proceedings to follow to the higher court and because of the existence of a relevant contingent of lawsuits awaiting judgment. After the data-processing steps, documents had their content transformed into a vector representation, using the three NLP techniques. We notice that specialized word-embedding models—like Word2Vec—present better performance, making it possible to advance in the current state of the art in the area of NLP applied to the legal sector.
APA, Harvard, Vancouver, ISO, and other styles

Conference papers on the topic "Skip-gram model"

1

Grzegorczyk, Karol, and Marcin Kurdziel. "Disambiguated skip-gram model." In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 2018. http://dx.doi.org/10.18653/v1/d18-1174.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Kaji, Nobuhiro, and Hayato Kobayashi. "Incremental Skip-gram Model with Negative Sampling." In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 2017. http://dx.doi.org/10.18653/v1/d17-1037.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Onishi, Takamune, and Hiromitsu Shiina. "Distributed Representation Computation Using CBOW Model and Skip–gram Model." In 2020 9th International Congress on Advanced Applied Informatics (IIAI-AAI). IEEE, 2020. http://dx.doi.org/10.1109/iiai-aai50415.2020.00179.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Choudhary, Kailash, and Ruby Beniwal. "Xplore Word Embedding Using CBOW Model and Skip-Gram Model." In 2021 7th International Conference on Signal Processing and Communication (ICSC). IEEE, 2021. http://dx.doi.org/10.1109/icsc53193.2021.9673321.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Suncong Zheng, Hongyun Bao, Jiaming Xu, Yuexing Hao, Zhenyu Qi, and Hongwei Hao. "A Bidirectional Hierarchical Skip-Gram model for text topic embedding." In 2016 International Joint Conference on Neural Networks (IJCNN). IEEE, 2016. http://dx.doi.org/10.1109/ijcnn.2016.7727289.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Mingzhen Zhao, Bo Xu, Hongfei Lin, Zhihao Yang, and Jian Wang. "Discover potential adverse drug reactions using the skip-gram model." In 2015 IEEE International Conference on Bioinformatics and Biomedicine (BIBM). IEEE, 2015. http://dx.doi.org/10.1109/bibm.2015.7359955.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Maillard, Jean, and Stephen Clark. "Learning Adjective Meanings with a Tensor-Based Skip-Gram Model." In Proceedings of the Nineteenth Conference on Computational Natural Language Learning. Association for Computational Linguistics, 2015. http://dx.doi.org/10.18653/v1/k15-1035.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Yan-hong, Feng, Yu Hong, Sun Geng, and Yu Xun-ran. "Domain named entity recognition method based on skip-gram model." In 2017 First International Conference on Electronics Instrumentation & Information Systems (EIIS). IEEE, 2017. http://dx.doi.org/10.1109/eiis.2017.8298655.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Lazaridou, Angeliki, Nghia The Pham, and Marco Baroni. "Combining Language and Vision with a Multimodal Skip-gram Model." In Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, 2015. http://dx.doi.org/10.3115/v1/n15-1016.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

Poulopoulos, Dimitris, and Athina Kalampogia. "Using a Skip-gram Architecture for Model Contextualization in CARS." In Special Session on Appliances for Data-Intensive and Time Critical Applications. SCITEPRESS - Science and Technology Publications, 2019. http://dx.doi.org/10.5220/0008256304430446.

Full text
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography