To see the other types of publications on this topic, follow the link: Word Vector Models.

Journal articles on the topic 'Word Vector Models'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 journal articles for your research on the topic 'Word Vector Models.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse journal articles on a wide variety of disciplines and organise your bibliography correctly.

1

Budenkov, S. S. "SEMANTIC WORD VECTOR MODELS FOR SENTIMENT ANALYSIS." Scientific and Technical Volga region Bulletin 7, no. 2 (2017): 75–78. http://dx.doi.org/10.24153/2079-5920-2017-7-2-75-78.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Haroon, Muhammad, Junaid Baber, Ihsan Ullah, Sher Muhammad Daudpota, Maheen Bakhtyar, and Varsha Devi. "Video Scene Detection Using Compact Bag of Visual Word Models." Advances in Multimedia 2018 (November 8, 2018): 1–9. http://dx.doi.org/10.1155/2018/2564963.

Full text
Abstract:
Video segmentation into shots is the first step for video indexing and searching. Videos shots are mostly very small in duration and do not give meaningful insight of the visual contents. However, grouping of shots based on similar visual contents gives a better understanding of the video scene; grouping of similar shots is known as scene boundary detection or video segmentation into scenes. In this paper, we propose a model for video segmentation into visual scenes using bag of visual word (BoVW) model. Initially, the video is divided into the shots which are later represented by a set of key
APA, Harvard, Vancouver, ISO, and other styles
3

Ma, Zhiyang, Wenfeng Zheng, Xiaobing Chen, and Lirong Yin. "Joint embedding VQA model based on dynamic word vector." PeerJ Computer Science 7 (March 3, 2021): e353. http://dx.doi.org/10.7717/peerj-cs.353.

Full text
Abstract:
The existing joint embedding Visual Question Answering models use different combinations of image characterization, text characterization and feature fusion method, but all the existing models use static word vectors for text characterization. However, in the real language environment, the same word may represent different meanings in different contexts, and may also be used as different grammatical components. These differences cannot be effectively expressed by static word vectors, so there may be semantic and grammatical deviations. In order to solve this problem, our article constructs a j
APA, Harvard, Vancouver, ISO, and other styles
4

Nishida, Satoshi, Antoine Blanc, Naoya Maeda, Masataka Kado, and Shinji Nishimoto. "Behavioral correlates of cortical semantic representations modeled by word vectors." PLOS Computational Biology 17, no. 6 (2021): e1009138. http://dx.doi.org/10.1371/journal.pcbi.1009138.

Full text
Abstract:
The quantitative modeling of semantic representations in the brain plays a key role in understanding the neural basis of semantic processing. Previous studies have demonstrated that word vectors, which were originally developed for use in the field of natural language processing, provide a powerful tool for such quantitative modeling. However, whether semantic representations in the brain revealed by the word vector-based models actually capture our perception of semantic information remains unclear, as there has been no study explicitly examining the behavioral correlates of the modeled brain
APA, Harvard, Vancouver, ISO, and other styles
5

Tissier, Julien, Christophe Gravier, and Amaury Habrard. "Near-Lossless Binarization of Word Embeddings." Proceedings of the AAAI Conference on Artificial Intelligence 33 (July 17, 2019): 7104–11. http://dx.doi.org/10.1609/aaai.v33i01.33017104.

Full text
Abstract:
Word embeddings are commonly used as a starting point in many NLP models to achieve state-of-the-art performances. However, with a large vocabulary and many dimensions, these floating-point representations are expensive both in terms of memory and calculations which makes them unsuitable for use on low-resource devices. The method proposed in this paper transforms real-valued embeddings into binary embeddings while preserving semantic information, requiring only 128 or 256 bits for each vector. This leads to a small memory footprint and fast vector operations. The model is based on an autoenco
APA, Harvard, Vancouver, ISO, and other styles
6

Sassenhagen, Jona, and Christian J. Fiebach. "Traces of Meaning Itself: Encoding Distributional Word Vectors in Brain Activity." Neurobiology of Language 1, no. 1 (2020): 54–76. http://dx.doi.org/10.1162/nol_a_00003.

Full text
Abstract:
How is semantic information stored in the human mind and brain? Some philosophers and cognitive scientists argue for vectorial representations of concepts, where the meaning of a word is represented as its position in a high-dimensional neural state space. At the intersection of natural language processing and artificial intelligence, a class of very successful distributional word vector models has developed that can account for classic EEG findings of language, that is, the ease versus difficulty of integrating a word with its sentence context. However, models of semantics have to account not
APA, Harvard, Vancouver, ISO, and other styles
7

Bojanowski, Piotr, Edouard Grave, Armand Joulin, and Tomas Mikolov. "Enriching Word Vectors with Subword Information." Transactions of the Association for Computational Linguistics 5 (December 2017): 135–46. http://dx.doi.org/10.1162/tacl_a_00051.

Full text
Abstract:
Continuous word representations, trained on large unlabeled corpora are useful for many natural language processing tasks. Popular models that learn such representations ignore the morphology of words, by assigning a distinct vector to each word. This is a limitation, especially for languages with large vocabularies and many rare words. In this paper, we propose a new approach based on the skipgram model, where each word is represented as a bag of character n-grams. A vector representation is associated to each character n-gram; words being represented as the sum of these representations. Our
APA, Harvard, Vancouver, ISO, and other styles
8

Xu, Beibei, Zhiying Tan, Kenli Li, Taijiao Jiang, and Yousong Peng. "Predicting the host of influenza viruses based on the word vector." PeerJ 5 (July 18, 2017): e3579. http://dx.doi.org/10.7717/peerj.3579.

Full text
Abstract:
Newly emerging influenza viruses continue to threaten public health. A rapid determination of the host range of newly discovered influenza viruses would assist in early assessment of their risk. Here, we attempted to predict the host of influenza viruses using the Support Vector Machine (SVM) classifier based on the word vector, a new representation and feature extraction method for biological sequences. The results show that the length of the word within the word vector, the sequence type (DNA or protein) and the species from which the sequences were derived for generating the word vector all
APA, Harvard, Vancouver, ISO, and other styles
9

Nguyen, Dat Quoc, Richard Billingsley, Lan Du, and Mark Johnson. "Improving Topic Models with Latent Feature Word Representations." Transactions of the Association for Computational Linguistics 3 (December 2015): 299–313. http://dx.doi.org/10.1162/tacl_a_00140.

Full text
Abstract:
Probabilistic topic models are widely used to discover latent topics in document collections, while latent feature vector representations of words have been used to obtain high performance in many NLP tasks. In this paper, we extend two different Dirichlet multinomial topic models by incorporating latent feature vector representations of words trained on very large corpora to improve the word-topic mapping learnt on a smaller corpus. Experimental results show that by using information from the external corpora, our new models produce significant improvements on topic coherence, document cluste
APA, Harvard, Vancouver, ISO, and other styles
10

Li, Zhen, Dan Qu, Yanxia Li, Chaojie Xie, and Qi Chen. "A Position Weighted Information Based Word Embedding Model for Machine Translation." International Journal on Artificial Intelligence Tools 29, no. 07n08 (2020): 2040005. http://dx.doi.org/10.1142/s0218213020400059.

Full text
Abstract:
Deep learning technology promotes the development of neural network machine translation (NMT). End-to-End (E2E) has become the mainstream in NMT. It uses word vectors as the initial value of the input layer. The effect of word vector model directly affects the accuracy of E2E-NMT. Researchers have proposed many approaches to learn word representations and have achieved significant results. However, the drawbacks of these methods still limit the performance of E2E-NMT systems. This paper focuses on the word embedding technology and proposes the PW-CBOW word vector model which can present better
APA, Harvard, Vancouver, ISO, and other styles
11

Losieva, Y. "Representation of Words in Natural Language Processing: A Survey." Bulletin of Taras Shevchenko National University of Kyiv. Series: Physics and Mathematics, no. 2 (2019): 82–87. http://dx.doi.org/10.17721/1812-5409.2019/2.10.

Full text
Abstract:
The article is devoted to research to the state-of-art vector representation of words in natural language processing. Three main types of vector representation of a word are described, namely: static word embeddings, use of deep neural networks for word representation and dynamic) word embeddings based on the context of the text. This is a very actual and much-demanded area in natural language processing, computational linguistics and artificial intelligence at all. Proposed to consider several different models for vector representation of the word (or word embeddings), from the simplest (as a
APA, Harvard, Vancouver, ISO, and other styles
12

Bhatta, Janardan, Dipesh Shrestha, Santosh Nepal, Saurav Pandey, and Shekhar Koirala. "Efficient Estimation of Nepali Word Representations in Vector Space." Journal of Innovations in Engineering Education 3, no. 1 (2020): 71–77. http://dx.doi.org/10.3126/jiee.v3i1.34327.

Full text
Abstract:
Word representation is a means of representing a word as mathematical entities that can be read, reasoned and manipulated by computational models. The representation is required for input to any new modern data models and in many cases, the accuracy of a model depends on it. In this paper, we analyze various methods of calculating vector space for Nepali words and postulate a word to vector model based on the Skip-gram model with NCE loss capturing syntactic and semantic word relationships.
 This is an attempt to implement a paper by Mikolov on Nepali words.
APA, Harvard, Vancouver, ISO, and other styles
13

Yeh, Hsiang-Yuan, Yu-Ching Yeh, and Da-Bai Shen. "Word Vector Models Approach to Text Regression of Financial Risk Prediction." Symmetry 12, no. 1 (2020): 89. http://dx.doi.org/10.3390/sym12010089.

Full text
Abstract:
Linking textual information in finance reports to the stock return volatility provides a perspective on exploring useful insights for risk management. We introduce different kinds of word vector representations in the modeling of textual information: bag-of-words, pre-trained word embeddings, and domain-specific word embeddings. We apply linear and non-linear methods to establish a text regression model for volatility prediction. A large number of collected annually-published financial reports in the period from 1996 to 2013 is used in the experiments. We demonstrate that the domain-specific w
APA, Harvard, Vancouver, ISO, and other styles
14

Paperno, Denis, and Marco Baroni. "When the Whole Is Less Than the Sum of Its Parts: How Composition Affects PMI Values in Distributional Semantic Vectors." Computational Linguistics 42, no. 2 (2016): 345–50. http://dx.doi.org/10.1162/coli_a_00250.

Full text
Abstract:
Distributional semantic models, deriving vector-based word representations from patterns of word usage in corpora, have many useful applications (Turney and Pantel 2010 ). Recently, there has been interest in compositional distributional models, which derive vectors for phrases from representations of their constituent words (Mitchell and Lapata 2010 ). Often, the values of distributional vectors are pointwise mutual information (PMI) scores obtained from raw co-occurrence counts. In this article we study the relation between the PMI dimensions of a phrase vector and its components in order to
APA, Harvard, Vancouver, ISO, and other styles
15

Lu, Wei, Kailun Shi, Yuanyuan Cai, and Xiaoping Che. "Semantic Similarity Measurement Using Knowledge-Augmented Multiple-prototype Distributed Word Vector." International Journal of Interdisciplinary Telecommunications and Networking 8, no. 2 (2016): 45–57. http://dx.doi.org/10.4018/ijitn.2016040105.

Full text
Abstract:
Recent years, textual semantic similarity measurements play an important role in Natural Language Processing. The semantic similarity between concepts or terms can be measured by various resources like corpora, ontologies, taxonomies, etc. With the development of deep learning, distributed vector models are constructed for extracting the latent semantic information from corpora. Most of existing models create a single prototype vector to represent the meaning of a word such as CBOW. However, due to lexical ambiguity, encoding word meaning with a single vector is problematic. In this work, the
APA, Harvard, Vancouver, ISO, and other styles
16

Robnik-Šikonja, Marko, Kristjan Reba, and Igor Mozetič. "Cross-lingual transfer of sentiment classifiers." Slovenščina 2.0: empirical, applied and interdisciplinary research 9, no. 1 (2021): 1–25. http://dx.doi.org/10.4312/slo2.0.2021.1.1-25.

Full text
Abstract:
Word embeddings represent words in a numeric space so that semantic relations between words are represented as distances and directions in the vector space. Cross-lingual word embeddings transform vector spaces of different languages so that similar words are aligned. This is done by mapping one language’s vector space to the vector space of another language or by construction of a joint vector space for multiple languages. Cross-lingual embeddings can be used to transfer machine learning models between languages, thereby compensating for insufficient data in less-resourced languages. We use c
APA, Harvard, Vancouver, ISO, and other styles
17

Mao, Xingliang, Shuai Chang, Jinjing Shi, Fangfang Li, and Ronghua Shi. "Sentiment-Aware Word Embedding for Emotion Classification." Applied Sciences 9, no. 7 (2019): 1334. http://dx.doi.org/10.3390/app9071334.

Full text
Abstract:
Word embeddings are effective intermediate representations for capturing semantic regularities between words in natural language processing (NLP) tasks. We propose sentiment-aware word embedding for emotional classification, which consists of integrating sentiment evidence within the emotional embedding component of a term vector. We take advantage of the multiple types of emotional knowledge, just as the existing emotional lexicon, to build emotional word vectors to represent emotional information. Then the emotional word vector is combined with the traditional word embedding to construct the
APA, Harvard, Vancouver, ISO, and other styles
18

Chatterjee, Soma, and Kamal Sarkar. "Combining IR Models for Bengali Information Retrieval." International Journal of Information Retrieval Research 8, no. 3 (2018): 68–83. http://dx.doi.org/10.4018/ijirr.2018070105.

Full text
Abstract:
Word mismatch between queries and documents is a fundamental problem in information retrieval domain. In this article, the authors present an effective approach to Bengali information retrieval that combines two IR models to tackle the word mismatch problem in Bengali IR. The proposed hybrid model combines the traditional word-based IR model with another IR model that uses semantic text similarity measure based on vector embeddings of words. Experimental results show that the performance of our proposed hybrid Bengali IR model significantly improves over the baseline IR model.
APA, Harvard, Vancouver, ISO, and other styles
19

Mrkšić, Nikola, Ivan Vulić, Diarmuid Ó. Séaghdha, et al. "Semantic Specialization of Distributional Word Vector Spaces using Monolingual and Cross-Lingual Constraints." Transactions of the Association for Computational Linguistics 5 (December 2017): 309–24. http://dx.doi.org/10.1162/tacl_a_00063.

Full text
Abstract:
We present Attract-Repel, an algorithm for improving the semantic quality of word vectors by injecting constraints extracted from lexical resources. Attract-Repel facilitates the use of constraints from mono- and cross-lingual resources, yielding semantically specialized cross-lingual vector spaces. Our evaluation shows that the method can make use of existing cross-lingual lexicons to construct high-quality vector spaces for a plethora of different languages, facilitating semantic transfer from high- to lower-resource ones. The effectiveness of our approach is demonstrated with state-of-the-a
APA, Harvard, Vancouver, ISO, and other styles
20

Yang, Hejung, Young-In Lee, Hyun-jung Lee, Sook Whan Cho, and Myoung-Wan Koo. "A Study on Word Vector Models for Representing Korean Semantic Information." Phonetics and Speech Sciences 7, no. 4 (2015): 41–47. http://dx.doi.org/10.13064/ksss.2015.7.4.041.

Full text
APA, Harvard, Vancouver, ISO, and other styles
21

Erk, Katrin. "Vector Space Models of Word Meaning and Phrase Meaning: A Survey." Language and Linguistics Compass 6, no. 10 (2012): 635–53. http://dx.doi.org/10.1002/lnco.362.

Full text
APA, Harvard, Vancouver, ISO, and other styles
22

Ye, Na, Xin Qin, Lili Dong, Xiang Zhang, and Kangkang Sun. "Chinese Named Entity Recognition Based on Character-Word Vector Fusion." Wireless Communications and Mobile Computing 2020 (July 4, 2020): 1–7. http://dx.doi.org/10.1155/2020/8866540.

Full text
Abstract:
Due to the lack of explicit markers in Chinese text to define the boundaries of words, it is often more difficult to identify named entities in Chinese than in English. At present, the pretreatment of the character or word vector models is adopted in the training of the Chinese named entity recognition model. Aimed at the problems that taking character vector as an input of the neural network cannot use the words’ semantic meanings and give up the words’ explicit boundary information, and taking the word vector as an input of the neural network relies on the accuracy of the segmentation algori
APA, Harvard, Vancouver, ISO, and other styles
23

Da’u, Aminu, and Naomie Salim. "Aspect extraction on user textual reviews using multi-channel convolutional neural network." PeerJ Computer Science 5 (May 6, 2019): e191. http://dx.doi.org/10.7717/peerj-cs.191.

Full text
Abstract:
Aspect extraction is a subtask of sentiment analysis that deals with identifying opinion targets in an opinionated text. Existing approaches to aspect extraction typically rely on using handcrafted features, linear and integrated network architectures. Although these methods can achieve good performances, they are time-consuming and often very complicated. In real-life systems, a simple model with competitive results is generally more effective and preferable over complicated models. In this paper, we present a multichannel convolutional neural network for aspect extraction. The model consists
APA, Harvard, Vancouver, ISO, and other styles
24

Zhao, Fuqiang, Zhengyu Zhu, and Ping Han. "A novel model for semantic similarity measurement based on wordnet and word embedding." Journal of Intelligent & Fuzzy Systems 40, no. 5 (2021): 9831–42. http://dx.doi.org/10.3233/jifs-202337.

Full text
Abstract:
To measure semantic similarity between words, a novel model DFRVec that encodes multiple semantic information of a word in WordNet into a vector space is presented in this paper. Firstly, three different sub-models are proposed: 1) DefVec: encoding the definitions of a word in WordNet; 2) FormVec: encoding the part-of-speech (POS) of a word in WordNet; 3) RelVec: encoding the relations of a word in WordNet. Then by combining the three sub-models with an existing word embedding, the new model for generating the vector of a word is proposed. Finally, based on DFRVec and the path information in W
APA, Harvard, Vancouver, ISO, and other styles
25

Abdelkader, Mostefai, and Mekour Mansour. "A Method Based on a New Word Embedding Approach for Process Model Matching." International Journal of Artificial Intelligence and Machine Learning 11, no. 1 (2021): 1–14. http://dx.doi.org/10.4018/ijaiml.2021010101.

Full text
Abstract:
This paper proposes a method based on a new word embedding approach for matching business process model. The proposed method aligns two process models in four steps. First activity labels are extracted and pre-processed to remove meaningless words, then each word composing an activity label and using a semantic similarity metric based on WordNet is represented with an n-dimensional vector in the space of the vocabulary of the two labels to be compared. Based on these representations, a vector representation of each activity label is computed by averaging the vectors representing words found in
APA, Harvard, Vancouver, ISO, and other styles
26

Naseem, Usman, Imran Razzak, Shah Khalid Khan, and Mukesh Prasad. "A Comprehensive Survey on Word Representation Models: From Classical to State-of-the-Art Word Representation Language Models." ACM Transactions on Asian and Low-Resource Language Information Processing 20, no. 5 (2021): 1–35. http://dx.doi.org/10.1145/3434237.

Full text
Abstract:
Word representation has always been an important research area in the history of natural language processing (NLP). Understanding such complex text data is imperative, given that it is rich in information and can be used widely across various applications. In this survey, we explore different word representation models and its power of expression, from the classical to modern-day state-of-the-art word representation language models (LMS). We describe a variety of text representation methods, and model designs have blossomed in the context of NLP, including SOTA LMs. These models can transform
APA, Harvard, Vancouver, ISO, and other styles
27

Arthur O. Santos, Flávio, Thiago Dias Bispo, Hendrik Teixeira Macedo, and Cleber Zanchettin. "Morphological Skip-Gram: Replacing FastText characters n-gram with morphological knowledge." Inteligencia Artificial 24, no. 67 (2021): 1–17. http://dx.doi.org/10.4114/intartif.vol24iss67pp1-17.

Full text
Abstract:
Natural language processing systems have attracted much interest of the industry. This branch of study is composed of some applications such as machine translation, sentiment analysis, named entity recognition, question and answer, and others. Word embeddings (i.e., continuous word representations) are an essential module for those applications generally used as word representation to machine learning models. Some popular methods to train word embeddings are GloVe and Word2Vec. They achieve good word representations, despite limitations: both ignore morphological information of the words and c
APA, Harvard, Vancouver, ISO, and other styles
28

Cotterell, Ryan, and Hinrich Schütze. "Joint Semantic Synthesis and Morphological Analysis of the Derived Word." Transactions of the Association for Computational Linguistics 6 (December 2018): 33–48. http://dx.doi.org/10.1162/tacl_a_00003.

Full text
Abstract:
Much like sentences are composed of words, words themselves are composed of smaller units. For example, the English word questionably can be analyzed as question+ able+ ly. However, this structural decomposition of the word does not directly give us a semantic representation of the word’s meaning. Since morphology obeys the principle of compositionality, the semantics of the word can be systematically derived from the meaning of its parts. In this work, we propose a novel probabilistic model of word formation that captures both the analysis of a word w into its constituent segments and the syn
APA, Harvard, Vancouver, ISO, and other styles
29

Zhu, Lixing, Yulan He, and Deyu Zhou. "A Neural Generative Model for Joint Learning Topics and Topic-Specific Word Embeddings." Transactions of the Association for Computational Linguistics 8 (August 2020): 471–85. http://dx.doi.org/10.1162/tacl_a_00326.

Full text
Abstract:
We propose a novel generative model to explore both local and global context for joint learning topics and topic-specific word embeddings. In particular, we assume that global latent topics are shared across documents, a word is generated by a hidden semantic vector encoding its contextual semantic meaning, and its context words are generated conditional on both the hidden semantic vector and global latent topics. Topics are trained jointly with the word embeddings. The trained model maps words to topic-dependent embeddings, which naturally addresses the issue of word polysemy. Experimental re
APA, Harvard, Vancouver, ISO, and other styles
30

Krishnamurthy, Balaji, Nikaash Puri, and Raghavender Goel. "Learning Vector-space Representations of Items for Recommendations Using Word Embedding Models." Procedia Computer Science 80 (2016): 2205–10. http://dx.doi.org/10.1016/j.procs.2016.05.380.

Full text
APA, Harvard, Vancouver, ISO, and other styles
31

Lauscher, Anne, Goran Glavaš, Simone Paolo Ponzetto, and Ivan Vulić. "A General Framework for Implicit and Explicit Debiasing of Distributional Word Vector Spaces." Proceedings of the AAAI Conference on Artificial Intelligence 34, no. 05 (2020): 8131–38. http://dx.doi.org/10.1609/aaai.v34i05.6325.

Full text
Abstract:
Distributional word vectors have recently been shown to encode many of the human biases, most notably gender and racial biases, and models for attenuating such biases have consequently been proposed. However, existing models and studies (1) operate on under-specified and mutually differing bias definitions, (2) are tailored for a particular bias (e.g., gender bias) and (3) have been evaluated inconsistently and non-rigorously. In this work, we introduce a general framework for debiasing word embeddings. We operationalize the definition of a bias by discerning two types of bias specification: e
APA, Harvard, Vancouver, ISO, and other styles
32

El-Alami, Fatima-Zahra, Said Ouatik El Alaoui, and Noureddine En-Nahnahi. "Deep Neural Models and Retrofitting for Arabic Text Categorization." International Journal of Intelligent Information Technologies 16, no. 2 (2020): 74–86. http://dx.doi.org/10.4018/ijiit.2020040104.

Full text
Abstract:
Arabic text categorization is an important task in text mining particularly with the fast-increasing quantity of the Arabic online data. Deep neural network models have shown promising performance and indicated great data modeling capacities in managing large and substantial datasets. This article investigates convolution neural networks (CNNs), long short-term memory (LSTM) and their combination for Arabic text categorization. This work additionally handles the morphological variety of Arabic words by exploring the word embeddings model using position weights and subword information. To guara
APA, Harvard, Vancouver, ISO, and other styles
33

Al Mahmud, Nahyan, and Shahfida Amjad Munni. "Qualitative Analysis of PLP in LSTM for Bangla Speech Recognition." International journal of Multimedia & Its Applications 12, no. 5 (2020): 1–8. http://dx.doi.org/10.5121/ijma.2020.12501.

Full text
Abstract:
The performance of various acoustic feature extraction methods has been compared in this work using Long Short-Term Memory (LSTM) neural network in a Bangla speech recognition system. The acoustic features are a series of vectors that represents the speech signals. They can be classified in either words or sub word units such as phonemes. In this work, at first linear predictive coding (LPC) is used as acoustic vector extraction technique. LPC has been chosen due to its widespread popularity. Then other vector extraction techniques like Mel frequency cepstral coefficients (MFCC) and perceptual
APA, Harvard, Vancouver, ISO, and other styles
34

Jiang, Shengchen, Yantuan Xian, Hongbin Wang, Zhiju Zhang, and Huaqin Li. "Representation Learning with LDA Models for Entity Disambiguation in Specific Domains." Journal of Advanced Computational Intelligence and Intelligent Informatics 25, no. 3 (2021): 326–34. http://dx.doi.org/10.20965/jaciii.2021.p0326.

Full text
Abstract:
Entity disambiguation is extremely important in knowledge construction. The word representation model ignores the influence of the ordering between words on the sentence or text information. Thus, we propose a domain entity disambiguation method that fuses the doc2vec and LDA topic models. In this study, the doc2vec document is used to indicate that the model obtains the vector form of the entity reference item and the candidate entity from the domain corpus and knowledge base, respectively. Moreover, the context similarity and category referential similarity calculations are performed based o
APA, Harvard, Vancouver, ISO, and other styles
35

VIRPIOJA, SAMI, MARI-SANNA PAUKKERI, ABHISHEK TRIPATHI, TIINA LINDH-KNUUTILA, and KRISTA LAGUS. "Evaluating vector space models with canonical correlation analysis." Natural Language Engineering 18, no. 3 (2011): 399–436. http://dx.doi.org/10.1017/s1351324911000271.

Full text
Abstract:
AbstractVector space models are used in language processing applications for calculating semantic similarities of words or documents. The vector spaces are generated with feature extraction methods for text data. However, evaluation of the feature extraction methods may be difficult. Indirect evaluation in an application is often time-consuming and the results may not generalize to other applications, whereas direct evaluations that measure the amount of captured semantic information usually require human evaluators or annotated data sets. We propose a novel direct evaluation method based on c
APA, Harvard, Vancouver, ISO, and other styles
36

Garabík, Radovan. "Word Embedding Based on Large-Scale Web Corpora as a Powerful Lexicographic Tool." Rasprave Instituta za hrvatski jezik i jezikoslovlje 46, no. 2 (2020): 603–18. http://dx.doi.org/10.31724/rihjj.46.2.8.

Full text
Abstract:
The Aranea Project offers a set of comparable corpora for two dozens of (mostly European) languages providing a convenient dataset for nLP applications that require training on large amounts of data. The article presents word embedding models trained on the Aranea corpora and an online interface to query the models and visualize the results. The implementation is aimed towards lexicographic use but can be also useful in other fields of linguistic study since the vector space is a plausible model of semantic space of word meanings. Three different models are available – one for a combination of
APA, Harvard, Vancouver, ISO, and other styles
37

Socher, Richard, Andrej Karpathy, Quoc V. Le, Christopher D. Manning, and Andrew Y. Ng. "Grounded Compositional Semantics for Finding and Describing Images with Sentences." Transactions of the Association for Computational Linguistics 2 (December 2014): 207–18. http://dx.doi.org/10.1162/tacl_a_00177.

Full text
Abstract:
Previous work on Recursive Neural Networks (RNNs) shows that these models can produce compositional feature vectors for accurately representing and classifying sentences or images. However, the sentence vectors of previous models cannot accurately represent visually grounded meaning. We introduce the DT-RNN model which uses dependency trees to embed sentences into a vector space in order to retrieve images that are described by those sentences. Unlike previous RNN-based models which use constituency trees, DT-RNNs naturally focus on the action and agents in a sentence. They are better able to
APA, Harvard, Vancouver, ISO, and other styles
38

Dehghan, M., K. Faez, M. Ahmadi, and M. Shridhar. "Unconstrained Farsi handwritten word recognition using fuzzy vector quantization and hidden Markov models." Pattern Recognition Letters 22, no. 2 (2001): 209–14. http://dx.doi.org/10.1016/s0167-8655(00)00090-8.

Full text
APA, Harvard, Vancouver, ISO, and other styles
39

TURNEY, P. D., and S. M. MOHAMMAD. "Experiments with three approaches to recognizing lexical entailment." Natural Language Engineering 21, no. 3 (2014): 437–76. http://dx.doi.org/10.1017/s1351324913000387.

Full text
Abstract:
AbstractInference in natural language often involves recognizing lexical entailment (RLE), that is, identifying whether one word entails another. For example,buyentailsown. Two general strategies for RLE have been proposed: One strategy is to manually construct an asymmetric similarity measure for context vectors (directional similarity) and another is to treat RLE as a problem of learning to recognize semantic relations using supervised machine-learning techniques (relation classification). In this paper, we experiment with two recent state-of-the-art representatives of the two general strate
APA, Harvard, Vancouver, ISO, and other styles
40

Kumar, Vaibhav, Tenzin Singhay Bhotia, Vaibhav Kumar, and Tanmoy Chakraborty. "Nurse is Closer to Woman than Surgeon? Mitigating Gender-Biased Proximities in Word Embeddings." Transactions of the Association for Computational Linguistics 8 (August 2020): 486–503. http://dx.doi.org/10.1162/tacl_a_00327.

Full text
Abstract:
Word embeddings are the standard model for semantic and syntactic representations of words. Unfortunately, these models have been shown to exhibit undesirable word associations resulting from gender, racial, and religious biases. Existing post-processing methods for debiasing word embeddings are unable to mitigate gender bias hidden in the spatial arrangement of word vectors. In this paper, we propose RAN-Debias, a novel gender debiasing methodology that not only eliminates the bias present in a word vector but also alters the spatial distribution of its neighboring vectors, achieving a bias-f
APA, Harvard, Vancouver, ISO, and other styles
41

Fiok, Krzysztof, Waldemar Karwowski, Edgar Gutierrez, and Mohammad Reza-Davahli. "Comparing the Quality and Speed of Sentence Classification with Modern Language Models." Applied Sciences 10, no. 10 (2020): 3386. http://dx.doi.org/10.3390/app10103386.

Full text
Abstract:
After the advent of Glove and Word2vec, the dynamic development of language models (LMs) used to generate word embeddings has enabled the creation of better text classifier frameworks. With the vector representations of words generated by newer LMs, embeddings are no longer static but are context-aware. However, the quality of results provided by state-of-the-art LMs comes at the price of speed. Our goal was to present a benchmark to provide insight into the speed–quality trade-off of a sentence classifier framework based on word embeddings provided by selected LMs. We used a recurrent neural
APA, Harvard, Vancouver, ISO, and other styles
42

Padó, Sebastian, and Mirella Lapata. "Dependency-Based Construction of Semantic Space Models." Computational Linguistics 33, no. 2 (2007): 161–99. http://dx.doi.org/10.1162/coli.2007.33.2.161.

Full text
Abstract:
Traditionally, vector-based semantic space models use word co-occurrence counts from large corpora to represent lexical meaning. In this article we present a novel framework for constructing semantic spaces that takes syntactic relations into account. We introduce a formalization for this class of models, which allows linguistic knowledge to guide the construction process. We evaluate our framework on a range of tasks relevant for cognitive science and natural language processing: semantic priming, synonymy detection, and word sense disambiguation. In all cases, our framework obtains results t
APA, Harvard, Vancouver, ISO, and other styles
43

Arora, Sanjeev, Yuanzhi Li, Yingyu Liang, Tengyu Ma, and Andrej Risteski. "A Latent Variable Model Approach to PMI-based Word Embeddings." Transactions of the Association for Computational Linguistics 4 (December 2016): 385–99. http://dx.doi.org/10.1162/tacl_a_00106.

Full text
Abstract:
Semantic word embeddings represent the meaning of a word via a vector, and are created by diverse methods. Many use nonlinear operations on co-occurrence statistics, and have hand-tuned hyperparameters and reweighting methods. This paper proposes a new generative model, a dynamic version of the log-linear topic model of Mnih and Hinton (2007). The methodological novelty is to use the prior to compute closed form expressions for word statistics. This provides a theoretical justification for nonlinear models like PMI, word2vec, and GloVe, as well as some hyperparameter choices. It also helps exp
APA, Harvard, Vancouver, ISO, and other styles
44

Turney, P. D., and P. Pantel. "From Frequency to Meaning: Vector Space Models of Semantics." Journal of Artificial Intelligence Research 37 (February 27, 2010): 141–88. http://dx.doi.org/10.1613/jair.2934.

Full text
Abstract:
Computers understand very little of the meaning of human language. This profoundly limits our ability to give instructions to computers, the ability of computers to explain their actions to us, and the ability of computers to analyse and process text. Vector space models (VSMs) of semantics are beginning to address these limits. This paper surveys the use of VSMs for semantic processing of text. We organize the literature on VSMs according to the structure of the matrix in a VSM. There are currently three broad classes of VSMs, based on term-document, word-context, and pair-pattern matrices, y
APA, Harvard, Vancouver, ISO, and other styles
45

Di Carlo, Valerio, Federico Bianchi, and Matteo Palmonari. "Training Temporal Word Embeddings with a Compass." Proceedings of the AAAI Conference on Artificial Intelligence 33 (July 17, 2019): 6326–34. http://dx.doi.org/10.1609/aaai.v33i01.33016326.

Full text
Abstract:
Temporal word embeddings have been proposed to support the analysis of word meaning shifts during time and to study the evolution of languages. Different approaches have been proposed to generate vector representations of words that embed their meaning during a specific time interval. However, the training process used in these approaches is complex, may be inefficient or it may require large text corpora. As a consequence, these approaches may be difficult to apply in resource-scarce domains or by scientists with limited in-depth knowledge of embedding models. In this paper, we propose a new
APA, Harvard, Vancouver, ISO, and other styles
46

Liu, Chang, Pengyuan Zhang, Ta Li, and Yonghong Yan. "Semantic Features Based N-Best Rescoring Methods for Automatic Speech Recognition." Applied Sciences 9, no. 23 (2019): 5053. http://dx.doi.org/10.3390/app9235053.

Full text
Abstract:
In this work, we aim to re-rank the n-best hypotheses of an automatic speech recognition system by punishing the sentences which have words that are semantically different from the context and rewarding the sentences where all words are in semantical harmony. To achieve this, we proposed a topic similarity score that measures the difference between topic distribution of words and the corresponding sentence. We also proposed another word-discourse score that quantifies the likeliness for a word to appear in the sentence by the inner production of word vector and discourse vector. Besides, we us
APA, Harvard, Vancouver, ISO, and other styles
47

Chen, Xiaojun. "Synthetic Network and Search Filter Algorithm in English Oral Duplicate Correction Map." Complexity 2021 (April 13, 2021): 1–12. http://dx.doi.org/10.1155/2021/9960101.

Full text
Abstract:
Combining the communicative language competence model and the perspective of multimodal research, this research proposes a research framework for oral communicative competence under the multimodal perspective. This not only truly reflects the language communicative competence but also fully embodies the various contents required for assessment in the basic attributes of spoken language. Aiming at the feature sparseness of the user evaluation matrix, this paper proposes a feature weight assignment algorithm based on the English spoken category keyword dictionary and user search records. The alg
APA, Harvard, Vancouver, ISO, and other styles
48

Sun, Yanfeng, Minglei Zhang, Si Chen, and Xiaohu Shi. "A Financial Embedded Vector Model and Its Applications to Time Series Forecasting." International Journal of Computers Communications & Control 13, no. 5 (2018): 881–94. http://dx.doi.org/10.15837/ijccc.2018.5.3286.

Full text
Abstract:
Inspired by the embedding representation in Natural Language Processing (NLP), we develop a financial embedded vector representation model to abstract the temporal characteristics of financial time series. Original financial features are discretized firstly, and then each set of discretized features is considered as a “word” of NLP, while the whole financial time series corresponds to the “sentence” or “paragraph”. Therefore the embedded vector models in NLP could be applied to the financial time series. To test the proposed model, we use RBF neural networks as regression model to predict fina
APA, Harvard, Vancouver, ISO, and other styles
49

Camacho-Collados, Jose, and Mohammad Taher Pilehvar. "From Word To Sense Embeddings: A Survey on Vector Representations of Meaning." Journal of Artificial Intelligence Research 63 (December 6, 2018): 743–88. http://dx.doi.org/10.1613/jair.1.11259.

Full text
Abstract:

 
 
 Over the past years, distributed semantic representations have proved to be effective and flexible keepers of prior knowledge to be integrated into downstream applications. This survey focuses on the representation of meaning. We start from the theoretical background behind word vector space models and highlight one of their major limitations: the meaning conflation deficiency, which arises from representing a word with all its possible meanings as a single vector. Then, we explain how this deficiency can be addressed through a transition from the word level to the more fi
APA, Harvard, Vancouver, ISO, and other styles
50

Zhou, Wang, Sun, and Sun. "A Method of Short Text Representation Based on the Feature Probability Embedded Vector." Sensors 19, no. 17 (2019): 3728. http://dx.doi.org/10.3390/s19173728.

Full text
Abstract:
Text representation is one of the key tasks in the field of natural language processing (NLP). Traditional feature extraction and weighting methods often use the bag-of-words (BoW) model, which may lead to a lack of semantic information as well as the problems of high dimensionality and high sparsity. At present, to solve these problems, a popular idea is to utilize deep learning methods. In this paper, feature weighting, word embedding, and topic models are combined to propose an unsupervised text representation method named the feature, probability, and word embedding method. The main idea i
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!