Academic literature on the topic 'Word2Vec embedding'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the lists of relevant articles, books, theses, conference reports, and other scholarly sources on the topic 'Word2Vec embedding.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Journal articles on the topic "Word2Vec embedding"

1

Lu, Zihao, Xiaohui Hu, and Yun Xue. "Dual-Word Embedding Model Considering Syntactic Information for Cross-Domain Sentiment Classification." Mathematics 10, no. 24 (2022): 4704. http://dx.doi.org/10.3390/math10244704.

Full text
Abstract:
The purpose of cross-domain sentiment classification (CDSC) is to fully utilize the rich labeled data in the source domain to help the target domain perform sentiment classification even when labeled data are insufficient. Most of the existing methods focus on obtaining domain transferable semantic information but ignore syntactic information. The performance of BERT may decrease because of domain transfer, and traditional word embeddings, such as word2vec, cannot obtain contextualized word vectors. Therefore, achieving the best results in CDSC is difficult when only BERT or word2vec is used. In this paper, we propose a Dual-word Embedding Model Considering Syntactic Information for Cross-domain Sentiment Classification. Specifically, we obtain dual-word embeddings using BERT and word2vec. After performing BERT embedding, we pay closer attention to semantic information, mainly using self-attention and TextCNN. After word2vec word embedding is obtained, the graph attention network is used to extract the syntactic information of the document, and the attention mechanism is used to focus on the important aspects. Experiments on two real-world datasets show that our model outperforms other strong baselines.
APA, Harvard, Vancouver, ISO, and other styles
2

Liu, Ruoyu. "Exploring the Impact of Word2Vec Embeddings Across Neural Network Architectures for Sentiment Analysis." Applied and Computational Engineering 97, no. 1 (2024): 93–98. http://dx.doi.org/10.54254/2755-2721/97/2024melb0085.

Full text
Abstract:
Abstract. Sentiment analysis is crucial for understanding public opinion, gauging customer satisfaction, and making informed business decisions based on the emotional tone of textual data. This study investigates the performance of different Word2Vec-based embedding strategies static, non-static, and multichannel for sentiment analysis across various neural network architectures, including Convolution Neural Networks (CNNs), Long Short-Term Memory (LSTM), and Gated Recurrent Units (GRUs). Despite the rise of advanced contextual embedding methods such as Bidirectional Encoder Representations from Transformers (BERT), Word to Vector (Word2Vec) retains its importance due to its simplicity and lower computational demands, making it ideal for use in settings with limited resources. The goal is to evaluate the impact of fine-tuning Word2Vec embeddings on the accuracy of sentiment classification. Using the Internet Movie Database (IMDb), this work finds that multichannel embeddings, which combine static and non-static representations, provide the best performance across most architectures, while static embeddings continue to deliver strong results in specific sequential models. These findings highlight the balance between efficiency and accuracy in traditional word embeddings, particularly when advanced models are not feasible.
APA, Harvard, Vancouver, ISO, and other styles
3

Liu, Ruoyu. "Exploring the Impact of Word2Vec Embeddings Across Neural Network Architectures for Sentiment Analysis." Applied and Computational Engineering 94, no. 1 (2024): 106–11. http://dx.doi.org/10.54254/2755-2721/94/2024melb0085.

Full text
Abstract:
Abstract. Sentiment analysis is crucial for understanding public opinion, gauging customer satisfaction, and making informed business decisions based on the emotional tone of textual data. This study investigates the performance of different Word2Vec-based embedding strategies static, non-static, and multichannel for sentiment analysis across various neural network architectures, including Convolution Neural Networks (CNNs), Long Short-Term Memory (LSTM), and Gated Recurrent Units (GRUs). Despite the rise of advanced contextual embedding methods such as Bidirectional Encoder Representations from Transformers (BERT), Word to Vector (Word2Vec) retains its importance due to its simplicity and lower computational demands, making it ideal for use in settings with limited resources. The goal is to evaluate the impact of fine-tuning Word2Vec embeddings on the accuracy of sentiment classification. Using the Internet Movie Database (IMDb), this work finds that multichannel embeddings, which combine static and non-static representations, provide the best performance across most architectures, while static embeddings continue to deliver strong results in specific sequential models. These findings highlight the balance between efficiency and accuracy in traditional word embeddings, particularly when advanced models are not feasible.
APA, Harvard, Vancouver, ISO, and other styles
4

Tahmasebi, Nina. "A Study on Word2Vec on a Historical Swedish Newspaper Corpus." Digital Humanities in the Nordic and Baltic Countries Publications 1, no. 1 (2018): 25–37. http://dx.doi.org/10.5617/dhnbpub.11007.

Full text
Abstract:
Detecting word sense changes can be of great interest in the field of digital humanities. Thus far, most investigations and automatic methods have been developed and carried out on English text and most recent methods make use of word embeddings. This paper presents a study on using Word2Vec, a neural word embedding method, on a Swedish historical newspaper collection. Our study includes a set of 11 words and our focus is the quality and stability of the word vectors over time. We investigate whether a word embedding method like Word2Vec can be effectively used on texts where the volume and quality is limited.
APA, Harvard, Vancouver, ISO, and other styles
5

Akshata, Upadhye. "A Deep Dive into Word2Vec and Doc2Vec Models in Natural Language Processing." Journal of Scientific and Engineering Research 7, no. 3 (2020): 244–49. https://doi.org/10.5281/zenodo.10902940.

Full text
Abstract:
<strong>Abstract </strong>In the field of natural language processing, the advent of word2vec and doc2vec models has reshaped the paradigm of language representation. This paper provides a comprehensive exploration of these distributed embedding models, tracing their historical development, key contributions, and advancements. The literature review provides the intricate details of word2vec and doc2vec which acts as the foundation for understanding their operational principles and variations. A critical analysis in the comparison section presents the strengths and weaknesses of both models and offers insights into their suitability for different applications. Real-world case studies are summarized to highlight the effectiveness of word2vec and doc2vec in several fields. Additionally the challenges and limitations of these models are discussed to provide a holistic view of the models&rsquo; capabilities. Finally the future perspective on potential developments, including advancements in embedding techniques, domain-specific embeddings, etc., are presented. The exploration of emerging trends including continued growth in contextual embeddings, ethical considerations, and interpretability are discussed. In conclusion, this paper offers a comprehensive overview of word2vec and doc2vec models helpful for the ongoing exploration of distributed representations in natural language understanding
APA, Harvard, Vancouver, ISO, and other styles
6

Li, Saihan, and Bing Gong. "Word embedding and text classification based on deep learning methods." MATEC Web of Conferences 336 (2021): 06022. http://dx.doi.org/10.1051/matecconf/202133606022.

Full text
Abstract:
Traditional manual text classification method has been unable to cope with the current huge amount of data volume. The improvement of deep learning technology also accelerates the technology of text classification. Based on this background, we presented different word embedding methods such as word2vec, doc2vec, tfidf and embedding layer. After word embedding, we demonstrated 8 deep learning models to classify the news text automatically and compare the accuracy of all the models, the model ‘2 layer GRU model with pretrained word2vec embeddings’ model got the highest accuracy. Automatic text classification can help people summary the text accurately and quickly from the mass of text information. No matter in the academic or in the industry area, it is a topic worth discussing.
APA, Harvard, Vancouver, ISO, and other styles
7

Romanyuk, Andriy. "Vector Representations of Ukrainian Words." Ukraina Moderna 27, no. 27 (2019): 46–72. http://dx.doi.org/10.30970/uam.2019.27.1062.

Full text
Abstract:
I n this paper, Ukrainian word embeddings and their properties are examined. Provided are a theoretical description, a brief account of the most common technologies used to produce an embedding, and lists of implemented algorithms. Word2wec, the first technology for calculating word embeddings, is used to demonstrate modern approaches of calculating using neural networks. Word2wec and FastText, which evolved from word2vec, are compared, and FastText’s benefits are described. Word embeddings have been applied to solving majority of the practical tasks of natural language processing. One of the latest such applications have been in the automatic construction of translation dictionaries. A previous analysis indicates that most of the words found in English-Ukrainian dictionaries are absent in the Great Electronic Dictionary of the Ukrainian Language (VESUM) project. For embeddings in Ukrainian based on word2vec, Glove, lex2vec, and FastText, the Gensim open-source library was used to demonstrate the potential of calculated models, and the results of repeating known calculation experiments are provided. They indicate that the hypothesis about the existence of biases and stereotypes in such models does not pertain to the Ukrainian language. The quality of the word embeddings is assessed on the basis of testing analogies, and adapting lexical data from a Ukrainian associative dictionary in order to construct a selection of data for assessing the quality of word embeddings is proposed. Listed are necessary tasks of future research in the field of creating and utilizing Ukrainian word embeddings.
APA, Harvard, Vancouver, ISO, and other styles
8

Alachram, Halima, Hryhorii Chereda, Tim Beißbarth, Edgar Wingender, and Philip Stegmaier. "Text mining-based word representations for biomedical data analysis and protein-protein interaction networks in machine learning tasks." PLOS ONE 16, no. 10 (2021): e0258623. http://dx.doi.org/10.1371/journal.pone.0258623.

Full text
Abstract:
Biomedical and life science literature is an essential way to publish experimental results. With the rapid growth of the number of new publications, the amount of scientific knowledge represented in free text is increasing remarkably. There has been much interest in developing techniques that can extract this knowledge and make it accessible to aid scientists in discovering new relationships between biological entities and answering biological questions. Making use of the word2vec approach, we generated word vector representations based on a corpus consisting of over 16 million PubMed abstracts. We developed a text mining pipeline to produce word2vec embeddings with different properties and performed validation experiments to assess their utility for biomedical analysis. An important pre-processing step consisted in the substitution of synonymous terms by their preferred terms in biomedical databases. Furthermore, we extracted gene-gene networks from two embedding versions and used them as prior knowledge to train Graph-Convolutional Neural Networks (CNNs) on large breast cancer gene expression data and on other cancer datasets. Performances of resulting models were compared to Graph-CNNs trained with protein-protein interaction (PPI) networks or with networks derived using other word embedding algorithms. We also assessed the effect of corpus size on the variability of word representations. Finally, we created a web service with a graphical and a RESTful interface to extract and explore relations between biomedical terms using annotated embeddings. Comparisons to biological databases showed that relations between entities such as known PPIs, signaling pathways and cellular functions, or narrower disease ontology groups correlated with higher cosine similarity. Graph-CNNs trained with word2vec-embedding-derived networks performed sufficiently good for the metastatic event prediction tasks compared to other networks. Such performance was good enough to validate the utility of our generated word embeddings in constructing biological networks. Word representations as produced by text mining algorithms like word2vec, therefore are able to capture biologically meaningful relations between entities. Our generated embeddings are publicly available at https://github.com/genexplain/Word2vec-based-Networks/blob/main/README.md.
APA, Harvard, Vancouver, ISO, and other styles
9

JP, Sanjanasri, Vijay Krishna Menon, Soman KP, Rajendran S, and Agnieszka Wolk. "Generation of Cross-Lingual Word Vectors for Low-Resourced Languages Using Deep Learning and Topological Metrics in a Data-Efficient Way." Electronics 10, no. 12 (2021): 1372. http://dx.doi.org/10.3390/electronics10121372.

Full text
Abstract:
Linguists have been focused on a qualitative comparison of the semantics from different languages. Evaluation of the semantic interpretation among disparate language pairs like English and Tamil is an even more formidable task than for Slavic languages. The concept of word embedding in Natural Language Processing (NLP) has enabled a felicitous opportunity to quantify linguistic semantics. Multi-lingual tasks can be performed by projecting the word embeddings of one language onto the semantic space of the other. This research presents a suite of data-efficient deep learning approaches to deduce the transfer function from the embedding space of English to that of Tamil, deploying three popular embedding algorithms: Word2Vec, GloVe and FastText. A novel evaluation paradigm was devised for the generation of embeddings to assess their effectiveness, using the original embeddings as ground truths. Transferability across other target languages of the proposed model was assessed via pre-trained Word2Vec embeddings from Hindi and Chinese languages. We empirically prove that with a bilingual dictionary of a thousand words and a corresponding small monolingual target (Tamil) corpus, useful embeddings can be generated by transfer learning from a well-trained source (English) embedding. Furthermore, we demonstrate the usability of generated target embeddings in a few NLP use-case tasks, such as text summarization, part-of-speech (POS) tagging, and bilingual dictionary induction (BDI), bearing in mind that those are not the only possible applications.
APA, Harvard, Vancouver, ISO, and other styles
10

Ahn, Yoonjoo, Eugene Rhee, and Jihoon Lee. "Dual embedding with input embedding and output embedding for better word representation." Indonesian Journal of Electrical Engineering and Computer Science 27, no. 2 (2022): 1091–99. https://doi.org/10.11591/ijeecs.v27.i2.pp1091-1099.

Full text
Abstract:
Recent studies in distributed vector representations for words have variety of ways to represent words. We propose a various ways using input embedding and output embedding to better represent words than single model. We compared the performance in terms of word analogy and word similarity with each input and output embeddings and various dual embeddings which are the combination of those two embeddings. Performance evaluation results show that the proposed dual embeddings outperform each single embedding, especially with the way of simply adding input and output embeddings. We figured out two things in this paper, i) not only input embedding but also output embedding has such meaning to represent the words and ii) combining input embedding and output embedding as dual embedding outperforms the single embedding when we use input embedding and output embedding individually.
APA, Harvard, Vancouver, ISO, and other styles
More sources

Dissertations / Theses on the topic "Word2Vec embedding"

1

Fulda, Nancy Ellen. "Semantically Aligned Sentence-Level Embeddings for Agent Autonomy and Natural Language Understanding." BYU ScholarsArchive, 2019. https://scholarsarchive.byu.edu/etd/7550.

Full text
Abstract:
Many applications of neural linguistic models rely on their use as pre-trained features for downstream tasks such as dialog modeling, machine translation, and question answering. This work presents an alternate paradigm: Rather than treating linguistic embeddings as input features, we treat them as common sense knowledge repositories that can be queried using simple mathematical operations within the embedding space, without the need for additional training. Because current state-of-the-art embedding models were not optimized for this purpose, this work presents a novel embedding model designed and trained specifically for the purpose of "reasoning in the linguistic domain".Our model jointly represents single words, multi-word phrases, and complex sentences in a unified embedding space. To facilitate common-sense reasoning beyond straightforward semantic associations, the embeddings produced by our model exhibit carefully curated properties including analogical coherence and polarity displacement. In other words, rather than training the model on a smorgaspord of tasks and hoping that the resulting embeddings will serve our purposes, we have instead crafted training tasks and placed constraints on the system that are explicitly designed to induce the properties we seek. The resulting embeddings perform competitively on the SemEval 2013 benchmark and outperform state-of- the-art models on two key semantic discernment tasks introduced in Chapter 8.The ultimate goal of this research is to empower agents to reason about low level behaviors in order to fulfill abstract natural language instructions in an autonomous fashion. An agent equipped with an embedding space of sucient caliber could potentially reason about new situations based on their similarity to past experience, facilitating knowledge transfer and one-shot learning. As our embedding model continues to improve, we hope to see these and other abilities become a reality.
APA, Harvard, Vancouver, ISO, and other styles
2

Wang, Run Fen. "Semantic Text Matching Using Convolutional Neural Networks." Thesis, Uppsala universitet, Institutionen för lingvistik och filologi, 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-362134.

Full text
Abstract:
Semantic text matching is a fundamental task for many applications in NaturalLanguage Processing (NLP). Traditional methods using term frequencyinversedocument frequency (TF-IDF) to match exact words in documentshave one strong drawback which is TF-IDF is unable to capture semanticrelations between closely-related words which will lead to a disappointingmatching result. Neural networks have recently been used for various applicationsin NLP, and achieved state-of-the-art performances on many tasks.Recurrent Neural Networks (RNN) have been tested on text classificationand text matching, but it did not gain any remarkable results, which is dueto RNNs working more effectively on texts with a short length, but longdocuments. In this paper, Convolutional Neural Networks (CNN) will beapplied to match texts in a semantic aspect. It uses word embedding representationsof two texts as inputs to the CNN construction to extract thesemantic features between the two texts and give a score as the output ofhow certain the CNN model is that they match. The results show that aftersome tuning of the parameters the CNN model could produce accuracy,prediction, recall and F1-scores all over 80%. This is a great improvementover the previous TF-IDF results and further improvements could be madeby using dynamic word vectors, better pre-processing of the data, generatelarger and more feature rich data sets and further tuning of the parameters.
APA, Harvard, Vancouver, ISO, and other styles
3

Moon, Gordon Euhyun. "Parallel Algorithms for Machine Learning." The Ohio State University, 2019. http://rave.ohiolink.edu/etdc/view?acc_num=osu1561980674706558.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Šůstek, Martin. "Word2vec modely s přidanou kontextovou informací." Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2017. http://www.nusl.cz/ntk/nusl-363837.

Full text
Abstract:
This thesis is concerned with the explanation of the word2vec models. Even though word2vec was introduced recently (2013), many researchers have already tried to extend, understand or at least use the model because it provides surprisingly rich semantic information. This information is encoded in N-dim vector representation and can be recall by performing some operations over the algebra. As an addition, I suggest a model modifications in order to obtain different word representation. To achieve that, I use public picture datasets. This thesis also includes parts dedicated to word2vec extension based on convolution neural network.
APA, Harvard, Vancouver, ISO, and other styles
5

Ramström, Kasper. "Botnet detection on flow data using the reconstruction error from Autoencoders trained on Word2Vec network embeddings." Thesis, Uppsala universitet, Institutionen för informationsteknologi, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-393285.

Full text
Abstract:
Botnet network attacks are a growing issue in network security. These types of attacks consist out of compromised devices which are used for malicious activities. Many traditional systems use pre-defined pattern matching methods for detecting network intrusions based on the characteristics of previously seen attacks. This means that previously unseen attacks often go unnoticed as they do not have the patterns that the traditional systems are looking for. This paper proposes an anomaly detection approach which doesn’t use the characteristics of known attacks in order to detect new ones, instead it looks for anomalous events which deviate from the normal. The approach uses Word2Vec, a neural network model used in the field of Natural Language Processing and applies it to NetFlow data in order to produce meaningful representations of network features. These representations together with statistical features are then fed into an Autoencoder model which attempts to reconstruct the NetFlow data, where poor reconstructions could indicate anomalous data. The approach was evaluated on multiple different flow-based network datasets and the results show that the approach has potential for botnet detection, where the reconstructions can be used as metrics for finding botnet events. However, the results vary for different datasets and performs poorly as a botnet detector for some datasets, indicating that further investigation is required before real world use.
APA, Harvard, Vancouver, ISO, and other styles
6

Fong, Vivian Lin. "Software Requirements Classification Using Word Embeddings and Convolutional Neural Networks." DigitalCommons@CalPoly, 2018. https://digitalcommons.calpoly.edu/theses/1851.

Full text
Abstract:
Software requirements classification, the practice of categorizing requirements by their type or purpose, can improve organization and transparency in the requirements engineering process and thus promote requirement fulfillment and software project completion. Requirements classification automation is a prominent area of research as automation can alleviate the tediousness of manual labeling and loosen its necessity for domain-expertise. This thesis explores the application of deep learning techniques on software requirements classification, specifically the use of word embeddings for document representation when training a convolutional neural network (CNN). As past research endeavors mainly utilize information retrieval and traditional machine learning techniques, we entertain the potential of deep learning on this particular task. With the support of learning libraries such as TensorFlow and Scikit-Learn and word embedding models such as word2vec and fastText, we build a Python system that trains and validates configurations of Naïve Bayes and CNN requirements classifiers. Applying our system to a suite of experiments on two well-studied requirements datasets, we recreate or establish the Naïve Bayes baselines and evaluate the impact of CNNs equipped with word embeddings trained from scratch versus word embeddings pre-trained on Big Data.
APA, Harvard, Vancouver, ISO, and other styles
7

Maitre, Julien. "Détection et analyse des signaux faibles. Développement d’un framework d’investigation numérique pour un service caché Lanceurs d’alerte." Thesis, La Rochelle, 2022. http://www.theses.fr/2022LAROS020.

Full text
Abstract:
Ce manuscrit s’inscrit dans le cadre du développement d’une plateforme d’analyse automatique de documents associée à un service sécurisé lanceurs d’alerte, de type GlobalLeaks. Nous proposons une chaine d’extraction à partir de corpus de document, d’analyse semi-automatisée et de recherche au moyen de requêtes Web pour in fine, proposer des tableaux de bord décrivant les signaux faibles potentiels. Nous identifions et levons un certain nombre de verrous méthodologiques et technologiques inhérents : 1) à l’analyse automatique de contenus textuels avec un minimum d’a priori, 2) à l’enrichissement de l’information à partir de recherches Web 3) à la visualisation sous forme de tableau de bord et d’une représentation dans un espace 3D interactif. Ces approches, statique et dynamique, sont appliquées au contexte du data journalisme, et en particulier, au traitement, analyse et hiérarchisation d’informations hétérogènes présentes dans des documents. Cette thèse propose également une étude de faisabilité et de prototypage par la mise en œuvre d’une chaine de traitement sous forme d’un logiciel. La construction de celui-ci a nécessité la caractérisation d’un signal faible pour lequel nous avons proposé une définition. Notre objectif est de fournir un outil paramétrable et générique à toute thématique. La solution que nous proposons repose sur deux approches : statique et dynamique. Dans l’approche statique, contrairement aux approches existantes nécessitant la connaissance de termes pertinents dans un domaine spécifique, nous proposons une solution s’appuyant sur des techniques nécessitant une intervention moindre de l’expert du domaine. Dans ce contexte, nous proposons une nouvelle approche de modélisation thématique multi-niveaux. Cette méthode d’approche conjointe combine une modélisation thématique, un plongement de mots et un algorithme où le recours à un expert du domaine permet d’évaluer la pertinence des résultats et d’identifier les thèmes porteurs de signaux faibles potentiels. Dans l’approche dynamique, nous intégrons une solution de veille à partir des signaux faibles potentiels trouvées dans les corpus initiaux et effectuons un suivi pour étudier leur évolution. Nous proposons donc une solution d’agent mining combinant data mining et système multi-agents où des agents animés par des forces d’attraction/répulsion représentant documents et mots se déplacent. La visualisation des résultats est réalisée sous forme de tableau de bord et de représentation dans un espace 3D interactif dans unclient Unity. Dans un premier temps, l’approche statique a été évaluée dans une preuve de concept sur des corpus synthétiques et réelles utilisés comme vérité terrain. L’ensemble de la chaine de traitement (approches statique et dynamique), mise en œuvre dans le logiciel WILD, est dans un deuxième temps appliquée sur des données réelles provenant de bases documentaires<br>This manuscript provides the basis for a complete chain of document analysis for a whistleblower service, such as GlobalLeaks. We propose a chain of semi-automated analysis of text document and search using websearch queries to in fine present dashboards describing weak signals. We identify and solve methodological and technological barriers inherent to : 1) automated analysis of text document with minimum a priori information,2) enrichment of information using web search 3) data visualization dashboard and 3D interactive environment. These static and dynamic approaches are used in the context of data journalism for processing heterogeneous types of information within documents. This thesis also proposed a feasibility study and prototyping by the implementation of a processing chain in the form of a software. This construction requires a weak signal definition. Our goal is to provide configurable and generic tool. Our solution is based on two approaches : static and dynamic. In the static approach, we propose a solution requiring less intervention from the domain expert. In this context, we propose a new approach of multi-leveltopic modeling. This joint approach combines topic modeling, word embedding and an algorithm. The use of a expert helps to assess the relevance of the results and to identify topics with weak signals. In the dynamic approach, we integrate a solution for monitoring weak signals and we follow up to study their evolution. Wetherefore propose and agent mining solution which combines data mining and multi-agent system where agents representing documents and words are animated by attraction/repulsion forces. The results are presented in a data visualization dashboard and a 3D interactive environment in Unity. First, the static approach is evaluated in a proof-of-concept with synthetic and real text corpus. Second, the complete chain of document analysis (static and dynamic) is implemented in a software and are applied to data from document databases
APA, Harvard, Vancouver, ISO, and other styles
8

Murgia, Antonio. "Lightweight Internet Traffic Classification - A Subject Based Solution with Word Embeddings." Master's thesis, Alma Mater Studiorum - Università di Bologna, 2016. http://amslaurea.unibo.it/10569/.

Full text
Abstract:
Internet traffic classification is a relevant and mature research field, anyway of growing importance and with still open technical challenges, also due to the pervasive presence of Internet-connected devices into everyday life. We claim the need for innovative traffic classification solutions capable of being lightweight, of adopting a domain-based approach, of not only concentrating on application-level protocol categorization but also classifying Internet traffic by subject. To this purpose, this paper originally proposes a classification solution that leverages domain name information extracted from IPFIX summaries, DNS logs, and DHCP leases, with the possibility to be applied to any kind of traffic. Our proposed solution is based on an extension of Word2vec unsupervised learning techniques running on a specialized Apache Spark cluster. In particular, learning techniques are leveraged to generate word-embeddings from a mixed dataset composed by domain names and natural language corpuses in a lightweight way and with general applicability. The paper also reports lessons learnt from our implementation and deployment experience that demonstrates that our solution can process 5500 IPFIX summaries per second on an Apache Spark cluster with 1 slave instance in Amazon EC2 at a cost of $ 3860 year. Reported experimental results about Precision, Recall, F-Measure, Accuracy, and Cohen's Kappa show the feasibility and effectiveness of the proposal. The experiments prove that words contained in domain names do have a relation with the kind of traffic directed towards them, therefore using specifically trained word embeddings we are able to classify them in customizable categories. We also show that training word embeddings on larger natural language corpuses leads improvements in terms of precision up to 180%.
APA, Harvard, Vancouver, ISO, and other styles
9

Kindberg, Erik. "Word embeddings and Patient records : The identification of MRI risk patients." Thesis, Linköpings universitet, Institutionen för datavetenskap, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-157467.

Full text
Abstract:
Identification of risks ahead of MRI examinations is identified as a cumbersome and time-consuming process at the Linköping University Hospital radiology clinic. The hospital staff often have to search through large amounts of unstructured patient data to find information about implants. Word embeddings has been identified as a possible tool to speed up this process. The purpose of this thesis is to evaluate this method, and that is done by training a Word2Vec model on patient journal data and analyzing the close neighbours of key search words by calculating cosine similarity. The 50 closest neighbours of each search words are categorized and annotated as relevant to the task of identifying risk patients ahead of MRI examinations or not. 10 search words were explored, leading to a total of 500 terms being annotated. In total, 14 different categories were observed in the result and out of these 8 were considered relevant. Out of the 500 terms, 340 (68%) were considered relevant. In addition, 48 implant models could be observed which are particularly interesting because if a patient have an implant, hospital staff needs to determine it’s exact model and the MRI conditions of that model. Overall these findings points towards a positive answer for the aim of the thesis, although further developments are needed.
APA, Harvard, Vancouver, ISO, and other styles
10

Korger, Christina. "Clustering of Distributed Word Representations and its Applicability for Enterprise Search." Master's thesis, Saechsische Landesbibliothek- Staats- und Universitaetsbibliothek Dresden, 2016. http://nbn-resolving.de/urn:nbn:de:bsz:14-qucosa-208869.

Full text
Abstract:
Machine learning of distributed word representations with neural embeddings is a state-of-the-art approach to modelling semantic relationships hidden in natural language. The thesis “Clustering of Distributed Word Representations and its Applicability for Enterprise Search” covers different aspects of how such a model can be applied to knowledge management in enterprises. A review of distributed word representations and related language modelling techniques, combined with an overview of applicable clustering algorithms, constitutes the basis for practical studies. The latter have two goals: firstly, they examine the quality of German embedding models trained with gensim and a selected choice of parameter configurations. Secondly, clusterings conducted on the resulting word representations are evaluated against the objective of retrieving immediate semantic relations for a given term. The application of the final results to company-wide knowledge management is subsequently outlined by the example of the platform intergator and conceptual extensions."
APA, Harvard, Vancouver, ISO, and other styles
More sources

Book chapters on the topic "Word2Vec embedding"

1

Dridi, Amna, Mohamed Medhat Gaber, R. Muhammad Atif Azad, and Jagdev Bhogal. "k-NN Embedding Stability for word2vec Hyper-Parametrisation in Scientific Text." In Discovery Science. Springer International Publishing, 2018. http://dx.doi.org/10.1007/978-3-030-01771-2_21.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Zim, Sumaiya Kashmin, Fardeen Ashraf, Tasnia Iqbal, et al. "Exploring Word2vec Embedding for Sentiment Analysis of Bangla Raw and Romanized Text." In Proceedings of International Conference on Data Science and Applications. Springer Nature Singapore, 2023. http://dx.doi.org/10.1007/978-981-19-6634-7_48.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Chabukswar, Arati, P. Deepa Shenoy, and K. R. Venugopal. "Identification of Misinformation Using Word Embedding Technique Word2Vec, Machine Learning, and Deep Learning Models." In Data Management, Analytics and Innovation. Springer Nature Singapore, 2024. http://dx.doi.org/10.1007/978-981-97-3242-5_4.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Zhao, Junhao, Basem Suleiman, and Muhammad Johan Alibasa. "Feature Encoding by Location-Enhanced Word2Vec Embedding for Human Activity Recognition in Smart Homes." In Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering. Springer Nature Switzerland, 2023. http://dx.doi.org/10.1007/978-3-031-34776-4_11.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Sitender, Sangeeta, N. Sudha Sushma, and Saksham Kumar Sharma. "Effect of GloVe, Word2Vec and FastText Embedding on English and Hindi Neural Machine Translation Systems." In Proceedings of Data Analytics and Management. Springer Nature Singapore, 2023. http://dx.doi.org/10.1007/978-981-19-7615-5_37.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Horn, Nils, Michel Sebastian Erhardt, Manuel Di Stefano, Florian Bosten, and Rüdiger Buchkremer. "Vergleichende Analyse der Word-Embedding-Verfahren Word2Vec und GloVe am Beispiel von Kundenbewertungen eines Online-Versandhändlers." In Künstliche Intelligenz in Wirtschaft & Gesellschaft. Springer Fachmedien Wiesbaden, 2020. http://dx.doi.org/10.1007/978-3-658-29550-9_29.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Chugh, Mansi, Peter A. Whigham, and Grant Dick. "Stability of Word Embeddings Using Word2Vec." In AI 2018: Advances in Artificial Intelligence. Springer International Publishing, 2018. http://dx.doi.org/10.1007/978-3-030-03991-2_73.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Gu, Siyi, Jinxi Zhang, Xinyu Qiu, Fengzuo Du, and Haoxinran Yu. "Music Recommendation Algorithm Through Word2vec Embeddings." In Communications in Computer and Information Science. Springer Singapore, 2021. http://dx.doi.org/10.1007/978-981-16-8885-0_30.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Gabsi, Imen, Hager Kammoun, Rawed Mtar, and Ikram Amous. "Word2Vec-GloVe-BERT Embeddings for Query Expansion." In Intelligent Systems Design and Applications. Springer Nature Switzerland, 2024. http://dx.doi.org/10.1007/978-3-031-64836-6_17.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

Röchert, Daniel, German Neubaum, and Stefan Stieglitz. "Identifying Political Sentiments on YouTube: A Systematic Comparison Regarding the Accuracy of Recurrent Neural Network and Machine Learning Models." In Disinformation in Open Online Media. Springer International Publishing, 2020. http://dx.doi.org/10.1007/978-3-030-61841-4_8.

Full text
Abstract:
Abstract Since social media have increasingly become forums to exchange personal opinions, more and more approaches have been suggested to analyze those sentiments automatically. Neural networks and traditional machine learning methods allow individual adaption by training the data, tailoring the algorithm to the particular topic that is discussed. Still, a great number of methodological combinations involving algorithms (e.g., recurrent neural networks (RNN)), techniques (e.g., word2vec), and methods (e.g., Skip-Gram) are possible. This work offers a systematic comparison of sentiment analytical approaches using different word embeddings with RNN architectures and traditional machine learning techniques. Using German comments of controversial political discussions on YouTube, this study uses metrics such as F1-score, precision and recall to compare the quality of performance of different approaches. First results show that deep neural networks outperform multiclass prediction with small datasets in contrast to traditional machine learning models with word embeddings.
APA, Harvard, Vancouver, ISO, and other styles

Conference papers on the topic "Word2Vec embedding"

1

Moitra, Agnij. "Efficient Unicode Ordinal Values for Text Embedding with FastText and Word2Vec." In 2024 19th International Joint Symposium on Artificial Intelligence and Natural Language Processing (iSAI-NLP). IEEE, 2024. https://doi.org/10.1109/isai-nlp64410.2024.10799464.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Acharya, Archana, and Rajeev Goyal. "Ensemble Learning-Based Sarcasm Detection in Hinglish Tweets Using Word2Vec Embedding." In 2025 IEEE International Conference on Interdisciplinary Approaches in Technology and Management for Social Innovation (IATMSI). IEEE, 2025. https://doi.org/10.1109/iatmsi64286.2025.10985529.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Nasution, Nur Amalia, Erna Budhiarti Nababan, and Herman Mawengkang. "Comparing LSTM Algorithm with Word Embedding: FastText and Word2Vec in Bahasa Batak-English Translation." In 2024 12th International Conference on Information and Communication Technology (ICoICT). IEEE, 2024. http://dx.doi.org/10.1109/icoict61617.2024.10698481.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Riyanto, Aditya, and Amalia Zahra. "Dual Word Embedding Framework for Gender-Based Writing Style Analysis Using Word2Vec and BERT." In 2024 Beyond Technology Summit on Informatics International Conference (BTS-I2C). IEEE, 2024. https://doi.org/10.1109/bts-i2c63534.2024.10942182.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Over, Laura, Sabine Apfeld, Isabel Schlangen, and Snezhana Jovanoska. "Word Embeddings for Radar Emissions: A Comparison Between Word2vec and fastText." In 2025 IEEE International Radar Conference (RADAR). IEEE, 2025. https://doi.org/10.1109/radar52380.2025.11031834.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Chirumamilla, Pardha Saradhi, Nagagopiraju Vullam, Modugula Sivajyothi, Vunnava Dinesh Babu, Kamarajugadda Indumathy, and A. Lakshmana Rao. "Advanced News Classification Model with Capsule Networks Through Word2Vec and BERT Embeddings." In 2025 5th International Conference on Pervasive Computing and Social Networking (ICPCSN). IEEE, 2025. https://doi.org/10.1109/icpcsn65854.2025.11035204.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Arianto, Wahyu Ramadhan, Yuyun, Ahmad Abrar, et al. "Comparative Study of Word2Vec, FastText, and Glove Embeddings for Synonym Identification in Bugis Language." In 2024 Beyond Technology Summit on Informatics International Conference (BTS-I2C). IEEE, 2024. https://doi.org/10.1109/bts-i2c63534.2024.10942212.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Singamsetty, Sanjana, Harshitha Somayajula, Anu Likitha Immadisetty, Keerthi Sree Konkimalla, Srilatha Tokala, and Murali Krishna Enduri. "Employing TF-IDF and Word2Vec Embeddings to Identify Multi-Class Toxicity Through Machine and Deep Learning Approaches." In 2024 OITS International Conference on Information Technology (OCIT). IEEE, 2024. https://doi.org/10.1109/ocit65031.2024.00055.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Wu, Lingfei, Ian En-Hsu Yen, Kun Xu, et al. "Word Mover’s Embedding: From Word2Vec to Document Embedding." In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 2018. http://dx.doi.org/10.18653/v1/d18-1482.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

Uchida, Shuto, Tomohiro Yoshikawa, and Takeshi Furuhashi. "Application of Output Embedding on Word2Vec." In 2018 Joint 10th International Conference on Soft Computing and Intelligent Systems (SCIS) and 19th International Symposium on Advanced Intelligent Systems (ISIS). IEEE, 2018. http://dx.doi.org/10.1109/scis-isis.2018.00224.

Full text
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography