To see the other types of publications on this topic, follow the link: Named entity disambiguation.

Journal articles on the topic 'Named entity disambiguation'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 journal articles for your research on the topic 'Named entity disambiguation.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse journal articles on a wide variety of disciplines and organise your bibliography correctly.

1

Nguyen, Hien T., and Tru H. Cao. "NAMED ENTITY DISAMBIGUATION: A HYBRID APPROACH." International Journal of Computational Intelligence Systems 5, no. 6 (November 2012): 1052–67. http://dx.doi.org/10.1080/18756891.2012.747661.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Alokaili, Amal, and Mohamed El Bachir Menai. "SVM ensembles for named entity disambiguation." Computing 102, no. 4 (August 21, 2019): 1051–76. http://dx.doi.org/10.1007/s00607-019-00748-x.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

HABIB, MENA B., and MAURICE VAN KEULEN. "TwitterNEED: A hybrid approach for named entity extraction and disambiguation for tweet." Natural Language Engineering 22, no. 3 (July 10, 2015): 423–56. http://dx.doi.org/10.1017/s1351324915000194.

Full text
Abstract:
AbstractTwitter is a rich source of continuously and instantly updated information. Shortness and informality of tweets are challenges for Natural Language Processing tasks. In this paper, we present TwitterNEED, a hybrid approach for Named Entity Extraction and Named Entity Disambiguation for tweets. We believe that disambiguation can help to improve the extraction process. This mimics the way humans understand language and reduces error propagation in the whole system. Our extraction approach aims for high extraction recall first, after which a Support Vector Machine attempts to filter out false positives among the extracted candidates using features derived from the disambiguation phase in addition to other word shape and Knowledge Base features. For Named Entity Disambiguation, we obtain a list of entity candidates from the YAGO Knowledge Base in addition to top-ranked pages from the Google search engine for each extracted mention. We use a Support Vector Machine to rank the candidate pages according to a set of URL and context similarity features. For evaluation, five data sets are used to evaluate the extraction approach, and three of them to evaluate both the disambiguation approach and the combined extraction and disambiguation approach. Experiments show better results compared to our competitors DBpedia Spotlight, Stanford Named Entity Recognition, and the AIDA disambiguation system.
APA, Harvard, Vancouver, ISO, and other styles
4

Guo, Zhaochen, and Denilson Barbosa. "Robust named entity disambiguation with random walks." Semantic Web 9, no. 4 (June 29, 2018): 459–79. http://dx.doi.org/10.3233/sw-170273.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Virliani, Muthia, Moch Arif Bijaksana, and Arie Ardiyanti Suryani. "Analysis of Name Entities in Text Using Robust Disambiguation Method." SISFOTENIKA 10, no. 2 (May 25, 2020): 178. http://dx.doi.org/10.30700/jst.v10i2.963.

Full text
Abstract:
<em>Named entities are proper nouns or objects contained in a text, such as a person's name, country name, and others. Names of persons in some text are often ambiguous, which makes it difficult for ordinary people to find out these same names are the same person or not. An ambiguity of names also found in hadith, like the name Abdullah in hadith number 86 and 2411, that might be the same person or might be different. Based on this problem, then this study focuses on named entity disambiguation, which considered further semantic and lexical relation between a named entity. Expected in the future, it would help people to understand the ambiguity of the name or distinguish ambiguous names. The method used in this research was Robust Disambiguation because, in this method, the context of the named entity considered. The resulted output obtained was in the form of named entity that grouped based on the same person or different person processed with Density-based Spatial Clustering of Applications with Noise. This research resulted in an accuracy value of 90%, a precision value of 97%, and a recall value of 89% obtained from actual value and predicted value</em>
APA, Harvard, Vancouver, ISO, and other styles
6

Fernández, Norberto, Jesús Arias Fisteus, Luis Sánchez, and Gonzalo López. "IdentityRank: Named entity disambiguation in the news domain." Expert Systems with Applications 39, no. 10 (August 2012): 9207–21. http://dx.doi.org/10.1016/j.eswa.2012.02.084.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Barrena, Ander, Aitor Soroa, and Eneko Agirre. "Towards zero-shot cross-lingual named entity disambiguation." Expert Systems with Applications 184 (December 2021): 115542. http://dx.doi.org/10.1016/j.eswa.2021.115542.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Wang, Fang, Wei Wu, Zhoujun Li, and Ming Zhou. "Named entity disambiguation for questions in community question answering." Knowledge-Based Systems 126 (June 2017): 68–77. http://dx.doi.org/10.1016/j.knosys.2017.03.017.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Lašek, Ivo, and Peter Vojtáš. "Various approaches to text representation for named entity disambiguation." International Journal of Web Information Systems 9, no. 3 (August 23, 2013): 242–59. http://dx.doi.org/10.1108/ijwis-05-2013-0016.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

Barua, Jayendra, and Rajdeep Niyogi. "Improving named entity recognition and disambiguation in news headlines." International Journal of Intelligent Information and Database Systems 12, no. 4 (2019): 279. http://dx.doi.org/10.1504/ijiids.2019.10026240.

Full text
APA, Harvard, Vancouver, ISO, and other styles
11

Barua, Jayendra, and Rajdeep Niyogi. "Improving named entity recognition and disambiguation in news headlines." International Journal of Intelligent Information and Database Systems 12, no. 4 (2019): 279. http://dx.doi.org/10.1504/ijiids.2019.104530.

Full text
APA, Harvard, Vancouver, ISO, and other styles
12

Jia, Bingjing, Hu Yang, Bin Wu, and Ying Xing. "Collective Entity Disambiguation Based on Hierarchical Semantic Similarity." International Journal of Data Warehousing and Mining 16, no. 2 (April 2020): 1–17. http://dx.doi.org/10.4018/ijdwm.2020040101.

Full text
Abstract:
Entity disambiguation involves mapping mentions in texts to the corresponding entities in a given knowledge base. Most previous approaches were based on handcrafted features and failed to capture semantic information over multiple granularities. For accurately disambiguating entities, various information aspects of mentions and entities should be used in. This article proposes a hierarchical semantic similarity model to find important clues related to mentions and entities based on multiple sources of information, such as contexts of the mentions, entity descriptions and categories. This model can effectively measure the semantic matching between mentions and target entities. Global features are also added, including prior popularity and global coherence, to improve the performance. In order to verify the effect of hierarchical semantic similarity model combined with global features, named HSSMGF, experiments were carried out on five publicly available benchmark datasets. Results demonstrate the proposed method is very effective in the case that documents have more mentions.
APA, Harvard, Vancouver, ISO, and other styles
13

Makris, Christos, and Michael Angelos Simos. "OTNEL: A Distributed Online Deep Learning Semantic Annotation Methodology." Big Data and Cognitive Computing 4, no. 4 (October 29, 2020): 31. http://dx.doi.org/10.3390/bdcc4040031.

Full text
Abstract:
Semantic representation of unstructured text is crucial in modern artificial intelligence and information retrieval applications. The semantic information extraction process from an unstructured text fragment to a corresponding representation from a concept ontology is known as named entity disambiguation. In this work, we introduce a distributed, supervised deep learning methodology employing a long short-term memory-based deep learning architecture model for entity linking with Wikipedia. In the context of a frequently changing online world, we introduce and study the domain of online training named entity disambiguation, featuring on-the-fly adaptation to underlying knowledge changes. Our novel methodology evaluates polysemous anchor mentions with sense compatibility based on thematic segmentation of the Wikipedia knowledge graph representation. We aim at both robust performance and high entity-linking accuracy results. The introduced modeling process efficiently addresses conceptualization, formalization, and computational challenges for the online training entity-linking task. The novel online training concept can be exploited for wider adoption, as it is considerably beneficial for targeted topic, online global context consensus for entity disambiguation.
APA, Harvard, Vancouver, ISO, and other styles
14

Chithira, C. H. "Named Entity Disambiguation Anaphora Resolution and Question Answering in Speech." International Journal of Computer Sciences and Engineering 6, no. 12 (December 31, 2018): 628–32. http://dx.doi.org/10.26438/ijcse/v6i12.628632.

Full text
APA, Harvard, Vancouver, ISO, and other styles
15

Chai, Mingke, Dongmei Li, Tingting Zhuang, and Shuyi Yang. "Named Entity Disambiguation Based on Classified and Structural Semantic Relatedness." Chinese Journal of Electronics 27, no. 6 (November 1, 2018): 1176–82. http://dx.doi.org/10.1049/cje.2018.08.008.

Full text
APA, Harvard, Vancouver, ISO, and other styles
16

Zhu, Ganggao, and Carlos A. Iglesias. "Exploiting semantic similarity for named entity disambiguation in knowledge graphs." Expert Systems with Applications 101 (July 2018): 8–24. http://dx.doi.org/10.1016/j.eswa.2018.02.011.

Full text
APA, Harvard, Vancouver, ISO, and other styles
17

Medad, Amine, Mauro Gaio, Ludovic Moncla, Sébastien Mustière, and Yannick Le Nir. "Comparing supervised learning algorithms for Spatial Nominal Entity recognition." AGILE: GIScience Series 1 (July 15, 2020): 1–18. http://dx.doi.org/10.5194/agile-giss-1-15-2020.

Full text
Abstract:
Abstract. Discourse may contain both named and nominal entities. Most common nouns or nominal mentions in natural language do not have a single, simple meaning but rather a number of related meanings. This form of ambiguity led to the development of a task in natural language processing known as Word Sense Disambiguation. Recognition and categorisation of named and nominal entities is an essential step for Word Sense Disambiguation methods. Up to now, named entity recognition and categorisation systems mainly focused on the annotation, categorisation and identification of named entities. This paper focuses on the annotation and the identification of spatial nominal entities. We explore the combination of Transfer Learning principle and supervised learning algorithms, in order to build a system to detect spatial nominal entities. For this purpose, different supervised learning algorithms are evaluated with three different context sizes on two manually annotated datasets built from Wikipedia articles and hiking description texts. The studied algorithms have been selected for one or more of their specific properties potentially useful in solving our problem. The results of the first phase of experiments reveal that the selected algorithms have similar performances in terms of ability to detect spatial nominal entities. The study also confirms the importance of the size of the window to describe the context, when word-embedding principle is used to represent the semantics of each word.
APA, Harvard, Vancouver, ISO, and other styles
18

Malyszko, Jacek, Witold Abramowicz, and Milena Strozyna. "Named Entity Disambiguation for Maritime-related Data Retrieved from Heterogenous Sources." TransNav, the International Journal on Marine Navigation and Safety of Sea Transportation 10, no. 3 (2016): 465–77. http://dx.doi.org/10.12716/1001.10.03.12.

Full text
APA, Harvard, Vancouver, ISO, and other styles
19

Ji, Zizheng, Lin Dai, Jin Pang, and Tingting Shen. "Leveraging Concept-Enhanced Pre-Training Model and Masked-Entity Language Model for Named Entity Disambiguation." IEEE Access 8 (2020): 100469–84. http://dx.doi.org/10.1109/access.2020.2994247.

Full text
APA, Harvard, Vancouver, ISO, and other styles
20

Moro, Andrea, Alessandro Raganato, and Roberto Navigli. "Entity Linking meets Word Sense Disambiguation: a Unified Approach." Transactions of the Association for Computational Linguistics 2 (December 2014): 231–44. http://dx.doi.org/10.1162/tacl_a_00179.

Full text
Abstract:
Entity Linking (EL) and Word Sense Disambiguation (WSD) both address the lexical ambiguity of language. But while the two tasks are pretty similar, they differ in a fundamental respect: in EL the textual mention can be linked to a named entity which may or may not contain the exact mention, while in WSD there is a perfect match between the word form (better, its lemma) and a suitable word sense. In this paper we present Babelfy, a unified graph-based approach to EL and WSD based on a loose identification of candidate meanings coupled with a densest subgraph heuristic which selects high-coherence semantic interpretations. Our experiments show state-of-the-art performances on both tasks on 6 different datasets, including a multilingual setting. Babelfy is online at http://babelfy.org
APA, Harvard, Vancouver, ISO, and other styles
21

Santos, Joao Tiago Luis, Ivo Miguel Anastacio, and Bruno Emanuel Martins. "Named Entity Disambiguation over Texts Written in the Portuguese or Spanish Languages." IEEE Latin America Transactions 13, no. 3 (March 2015): 856–62. http://dx.doi.org/10.1109/tla.2015.7069115.

Full text
APA, Harvard, Vancouver, ISO, and other styles
22

Nguyen, Dat Ba, Martin Theobald, and Gerhard Weikum. "J-NERD: Joint Named Entity Recognition and Disambiguation with Rich Linguistic Features." Transactions of the Association for Computational Linguistics 4 (December 2016): 215–29. http://dx.doi.org/10.1162/tacl_a_00094.

Full text
Abstract:
Methods for Named Entity Recognition and Disambiguation (NERD) perform NER and NED in two separate stages. Therefore, NED may be penalized with respect to precision by NER false positives, and suffers in recall from NER false negatives. Conversely, NED does not fully exploit information computed by NER such as types of mentions. This paper presents J-NERD, a new approach to perform NER and NED jointly, by means of a probabilistic graphical model that captures mention spans, mention types, and the mapping of mentions to entities in a knowledge base. We present experiments with different kinds of texts from the CoNLL’03, ACE’05, and ClueWeb’09-FACC1 corpora. J-NERD consistently outperforms state-of-the-art competitors in end-to-end NERD precision, recall, and F1.
APA, Harvard, Vancouver, ISO, and other styles
23

Molina-Villegas, Alejandro, Victor Muñiz-Sanchez, Jean Arreola-Trapala, and Filomeno Alcántara. "Geographic Named Entity Recognition and Disambiguation in Mexican News using word embeddings." Expert Systems with Applications 176 (August 2021): 114855. http://dx.doi.org/10.1016/j.eswa.2021.114855.

Full text
APA, Harvard, Vancouver, ISO, and other styles
24

Fan, Chao, and Yu Li. "Chinese Personal Name Disambiguation Based on Clustering." Wireless Communications and Mobile Computing 2021 (May 14, 2021): 1–7. http://dx.doi.org/10.1155/2021/3790176.

Full text
Abstract:
Personal name disambiguation is a significant issue in natural language processing, which is the basis for many tasks in automatic information processing. This research explores the Chinese personal name disambiguation based on clustering technique. Preprocessing is applied to transform raw corpus into standardized format at the beginning. And then, Chinese word segmentation, part-of-speech tagging, and named entity recognition are accomplished by lexical analysis. Furthermore, we make an effort to extract features that can better disambiguate Chinese personal names. Some rules for identifying target personal names are created to improve the experimental effect. Additionally, many calculation methods of feature weights are implemented such as bool weight, absolute frequency weight, tf-idf weight, and entropy weight. As for clustering algorithm, an agglomerative hierarchical clustering is selected by comparison with other clustering methods. Finally, a labeling approach is employed to bring forward feature words that can represent each cluster. The experiment achieves a good result for five groups of Chinese personal names.
APA, Harvard, Vancouver, ISO, and other styles
25

Mazur, Pawel, and Robert Dale. "Handling conjunctions in named entities." Lingvisticæ Investigationes. International Journal of Linguistics and Language Resources 30, no. 1 (August 10, 2007): 49–68. http://dx.doi.org/10.1075/li.30.1.05maz.

Full text
Abstract:
Although the literature contains reports of very high accuracy figures for the recognition of named entities in text, there are still some named entity phenomena that remain problematic for existing text processing systems. One of these is the ambiguity of conjunctions in candidate named entity strings, an all-too-prevalent problem in corporate and legal documents. In this paper, we distinguish four uses of the conjunction in these strings, and explore the use of a supervised machine learning approach to conjunction disambiguation trained on a very limited set of ‘name internal’ features that avoids the need for expensive lexical or semantic resources. We achieve 84% correctly classified examples using k-fold evaluation on a data set of 600 instances. We argue that further improvements are likely to require the use of wider domain knowledge and name external features.
APA, Harvard, Vancouver, ISO, and other styles
26

Basile, Pierpaolo, and Annalina Caputo. "Entity linking for tweets." Encyclopedia with Semantic Computing and Robotic Intelligence 01, no. 01 (March 2017): 1630020. http://dx.doi.org/10.1142/s2425038416300202.

Full text
Abstract:
Named Entity Linking (NEL) is the task of semantically annotating entity mentions in a portion of text with links to a knowledge base. The automatic annotation, which requires the recognition and disambiguation of the entity mention, usually exploits contextual clues like the context of usage and the coherence with respect to other entities. In Twitter, the limits of 140 characters originates very short and noisy text messages that pose new challenges to the entity linking task. We propose an overview of NEL methods focusing on approaches specifically developed to deal with short messages, like tweets. NEL is a fundamental task for the extraction and annotation of concepts in tweets, which is necessary for making the Twitter’s huge amount of interconnected user-generated contents machine readable and enable the intelligent information access.
APA, Harvard, Vancouver, ISO, and other styles
27

Zhu, Ming, Busra Celikkaya, Parminder Bhatia, and Chandan K. Reddy. "LATTE: Latent Type Modeling for Biomedical Entity Linking." Proceedings of the AAAI Conference on Artificial Intelligence 34, no. 05 (April 3, 2020): 9757–64. http://dx.doi.org/10.1609/aaai.v34i05.6526.

Full text
Abstract:
Entity linking is the task of linking mentions of named entities in natural language text, to entities in a curated knowledge-base. This is of significant importance in the biomedical domain, where it could be used to semantically annotate a large volume of clinical records and biomedical literature, to standardized concepts described in an ontology such as Unified Medical Language System (UMLS). We observe that with precise type information, entity disambiguation becomes a straightforward task. However, fine-grained type information is usually not available in biomedical domain. Thus, we propose LATTE, a LATent Type Entity Linking model, that improves entity linking by modeling the latent fine-grained type information about mentions and entities. Unlike previous methods that perform entity linking directly between the mentions and the entities, LATTE jointly does entity disambiguation, and latent fine-grained type learning, without direct supervision. We evaluate our model on two biomedical datasets: MedMentions, a large scale public dataset annotated with UMLS concepts, and a de-identified corpus of dictated doctor's notes that has been annotated with ICD concepts. Extensive experimental evaluation shows our model achieves significant performance improvements over several state-of-the-art techniques.
APA, Harvard, Vancouver, ISO, and other styles
28

Weichselbraun, Albert, Daniel Streiff, and Arno Scharl. "Consolidating Heterogeneous Enterprise Data for Named Entity Linking and Web Intelligence." International Journal on Artificial Intelligence Tools 24, no. 02 (April 2015): 1540008. http://dx.doi.org/10.1142/s0218213015400084.

Full text
Abstract:
Linking named entities to structured knowledge sources paves the way for state-of-the-art Web intelligence applications which assign sentiment to the correct entities, identify trends, and reveal relations between organizations, persons and products. For this purpose this paper introduces Recognyze, a named entity linking component that uses background knowledge obtained from linked data repositories, and outlines the process of transforming heterogeneous data silos within an organization into a linked enterprise data repository which draws upon popular linked open data vocabularies to foster interoperability with public data sets. The presented examples use comprehensive real-world data sets from Orell Füssli Business Information, Switzerland's largest business information provider. The linked data repository created from these data sets comprises more than nine million triples on companies, the companies' contact information, key people, products and brands. We identify the major challenges of tapping into such sources for named entity linking, and describe required data pre-processing techniques to use and integrate such data sets, with a special focus on disambiguation and ranking algorithms. Finally, we conduct a comprehensive evaluation based on business news from the New Journal of Zurich and AWP Financial News to illustrate how these techniques improve the performance of the Recognyze named entity linking component.
APA, Harvard, Vancouver, ISO, and other styles
29

Ruch, Patrick, Robert Baud, and Antoine Geissbühler. "Using lexical disambiguation and named-entity recognition to improve spelling correction in the electronic patient record." Artificial Intelligence in Medicine 29, no. 1-2 (September 2003): 169–84. http://dx.doi.org/10.1016/s0933-3657(03)00052-6.

Full text
APA, Harvard, Vancouver, ISO, and other styles
30

Zhou, Xiaoling, Yukai Miao, Wei Wang, and Jianbin Qin. "A Recurrent Model for Collective Entity Linking with Adaptive Features." Proceedings of the AAAI Conference on Artificial Intelligence 34, no. 01 (April 3, 2020): 329–36. http://dx.doi.org/10.1609/aaai.v34i01.5367.

Full text
Abstract:
The vast amount of web data enables us to build knowledge bases with unprecedented quality and coverage. Named Entity Disambiguation (NED) is an important task that automatically resolves ambiguous mentions in free text to correct target entries in the knowledge base. Traditional machine learning based methods for NED were outperformed and made obsolete by the state-of-the-art deep learning based models. However, deep learning models are more complex, requiring large amount of training data and lengthy training and parameter tuning time. In this paper, we revisit traditional machine learning techniques and propose a light-weight, tuneable and time-efficient method without using deep learning or deep learning generated features. We propose novel adaptive features that focus on extracting discriminative features to better model similarities between candidate entities and the mention's context. We learn a local ranking model based on traditional and the new adaptive features based on the learning-to-rank framework. While arriving at linking decisions individually via the local model, our method also takes into consideration the correlation between decisions by running multiple recurrent global models, which can be deemed as a learned local search method. Our method attains performances comparable to the state-of-the-art deep learning-based methods on NED benchmark datasets while being significantly faster to train.
APA, Harvard, Vancouver, ISO, and other styles
31

Atmakuri, Shriya, Bhavya Shahi, Ashwath Rao B, and Muralikrishna S N. "A comparison of features for POS tagging in Kannada." International Journal of Engineering & Technology 7, no. 4 (September 19, 2018): 2418. http://dx.doi.org/10.14419/ijet.v7i4.14900.

Full text
Abstract:
This paper proposes a system of part of speech tagging for the South Indian language Kannada using supervised machine learning. POS tagging is an important step in Natural Language Processing and has varied applications such as word sense disambiguation, natural language understanding etc. Based on extensive research into methods used for POS tagging, Conditional Random fields have been chosen as our algorithm. CRFs are used for sequence modeling in POS tagging, named entity recognition and as an alternative to Hidden Markov Models. Three very large corpora are used and their results are compared. The feature sets for all three corpora are also varied. The best method for the task is determined using these results.
APA, Harvard, Vancouver, ISO, and other styles
32

Surdeanu, Mihai, Massimiliano Ciaramita, and Hugo Zaragoza. "Learning to Rank Answers to Non-Factoid Questions from Web Collections." Computational Linguistics 37, no. 2 (June 2011): 351–83. http://dx.doi.org/10.1162/coli_a_00051.

Full text
Abstract:
This work investigates the use of linguistically motivated features to improve search, in particular for ranking answers to non-factoid questions. We show that it is possible to exploit existing large collections of question–answer pairs (from online social Question Answering sites) to extract such features and train ranking models which combine them effectively. We investigate a wide range of feature types, some exploiting natural language processing such as coarse word sense disambiguation, named-entity identification, syntactic parsing, and semantic role labeling. Our experiments demonstrate that linguistic features, in combination, yield considerable improvements in accuracy. Depending on the system settings we measure relative improvements of 14% to 21% in Mean Reciprocal Rank and Precision@1, providing one of the most compelling evidence to date that complex linguistic features such as word senses and semantic roles can have a significant impact on large-scale information retrieval tasks.
APA, Harvard, Vancouver, ISO, and other styles
33

Mihaljević, Helena, and Lucía Santamaría. "Disambiguation of author entities in ADS using supervised learning and graph theory methods." Scientometrics 126, no. 5 (April 20, 2021): 3893–917. http://dx.doi.org/10.1007/s11192-021-03951-w.

Full text
Abstract:
AbstractDisambiguation of authors in digital libraries is essential for many tasks, including efficient bibliographical searches and scientometric analyses to the level of individuals. The question of how to link documents written by the same person has been given much attention by academic publishers and information retrieval researchers alike. Usual approaches rely on publications’ metadata such as affiliations, email addresses, co-authors, or scholarly topics. Lack of homogeneity in the structure of bibliographic collections and discipline-specific dissimilarities between them make the creation of general-purpose disambiguators arduous. We present an algorithm to disambiguate authorships in the Astrophysics Data System (ADS) following an established semi-supervised approach of training a classifier on authorship pairs and clustering the resulting graphs. Due to the lack of high-signal features such as email addresses and citations, we engineer additional content- and location-based features via text embeddings and named-entity recognition. We train various nonlinear tree-based classifiers and detect communities from the resulting weighted graphs through label propagation, a fast yet efficient algorithm that requires no tuning. The resulting procedure reaches reasonable complexity and offers possibilities for interpretation. We apply our method to the creation of author entities in a recent ADS snapshot. The algorithm is evaluated on 39 manually-labeled author blocks comprising 9545 authorships from 562 author profiles. Our best approach utilizes the Random Forest classifier and yields a micro- and macro-averaged BCubed $$\mathrm {F}_1$$ F 1 score of 0.95 and 0.87, respectively. We release our code and labeled data publicly to foster the development of further disambiguation procedures for ADS.
APA, Harvard, Vancouver, ISO, and other styles
34

Hoang, Phuong, Thomas Mahoney, Faizan Javed, and Matt McNair. "Large-Scale Occupational Skills Normalization for Online Recruitment." AI Magazine 39, no. 1 (March 27, 2018): 5–14. http://dx.doi.org/10.1609/aimag.v39i1.2775.

Full text
Abstract:
Job openings often go unfulfilled despite a surfeit of unemployed or underemployed workers. One of the main reasons for this is a mismatch between the skills required by employers and the skills that workers possess. This mismatch, also known as the skills gap, can pose socioeconomic challenges for an economy. A first step in alleviating the skills gap is to accurately detect skills in human capital data such as resumes and job ads. Comprehensive and accurate detection of skills facilitates analysis of labor market dynamics. It also helps bridge the divide between supply and demand of labor by facilitating reskilling and workforce training programs. In this paper, we describe SKILL, a Named Entity Normalization (NEN) system for occupational skills. SKILL is composed of 1) A skills tagger which uses properties of semantic word vectors to recognize and normalize relevant skills, and 2) A skill entity sense disambiguation component which infers the correct meaning of an identified skill. We discuss the technical design and the synergy between data science and engineering that was required to transform the system from a research prototype to a production service that serves customers from across the organization. We also discuss establishing customer feedback loops, and it led to improvements to the system over time. SKILL is currently used by various internal teams at CareerBuilder for big data workforce analytics, semantic search, job matching, and recommendations.
APA, Harvard, Vancouver, ISO, and other styles
35

DORNESCU, IUSTIN, and CONSTANTIN ORĂSAN. "Densification: Semantic document analysis using Wikipedia." Natural Language Engineering 20, no. 4 (October 14, 2013): 469–500. http://dx.doi.org/10.1017/s1351324913000296.

Full text
Abstract:
AbstractThis paper proposes a new method for semantic document analysis: densification, which identifies and ranks Wikipedia pages relevant to a given document. Although there are similarities with established tasks such as wikification and entity linking, the method does not aim for strict disambiguation of named entity mentions. Instead, densification uses existing links to rank additional articles that are relevant to the document, a form of explicit semantic indexing that enables higher-level semantic retrieval procedures that can be beneficial for a wide range of NLP applications. Because a gold standard for densification evaluation does not exist, a study is carried out to investigate the level of agreement achievable by humans, which questions the feasibility of creating an annotated data set. As a result, a semi-supervised approach is employed to develop a two-stage densification system: filtering unlikely candidate links and then ranking the remaining links. In a first evaluation experiment, Wikipedia articles are used to automatically estimate the performance in terms of recall. Results show that the proposed densification approach outperforms several wikification systems. A second experiment measures the impact of integrating the links predicted by the densification system into a semantic question answering (QA) system that relies on Wikipedia links to answer complex questions. Densification enables the QA system to find twice as many additional answers than when using a state-of-the-art wikification system.
APA, Harvard, Vancouver, ISO, and other styles
36

Saadi, Abdelhalim, and Hacene Belhadef. "Deep neural networks for Arabic information extraction." Smart and Sustainable Built Environment 9, no. 4 (April 3, 2020): 467–82. http://dx.doi.org/10.1108/sasbe-03-2019-0031.

Full text
Abstract:
PurposeThe purpose of this paper is to present a system based on deep neural networks to extract particular entities from natural language text, knowing that a massive amount of textual information is electronically available at present. Notably, a large amount of electronic text data indicates great difficulty in finding or extracting relevant information from them.Design/methodology/approachThis study presents an original system to extract Arabic-named entities by combining a deep neural network-based part-of-speech tagger and a neural network-based named entity extractor. Firstly, the system extracts the grammatical classes of the words with high precision depending on the context of the word. This module plays the role of the disambiguation process. Then, a second module is used to extract the named entities.FindingsUsing deep neural networks in natural language processing, requires tuning many hyperparameters, which is a time-consuming process. To deal with this problem, applying statistical methods like the Taguchi method is much requested. In this study, the system is successfully applied to the Arabic-named entities recognition, where accuracy of 96.81 per cent was reported, which is better than the state-of-the-art results.Research limitations/implicationsThe system is designed and trained for the Arabic language, but the architecture can be used for other languages.Practical implicationsInformation extraction systems are developed for different applications, such as analysing newspaper articles and databases for commercial, political and social objectives. Information extraction systems also can be built over an information retrieval (IR) system. The IR system eliminates irrelevant documents and paragraphs.Originality/valueThe proposed system can be regarded as the first attempt to use double deep neural networks to increase the accuracy. It also can be built over an IR system. The IR system eliminates irrelevant documents and paragraphs. This process reduces the mass number of documents from which the authors wish to extract the relevant information using an information extraction system.
APA, Harvard, Vancouver, ISO, and other styles
37

He, Zhe, Duo Wei, Gai Elhanan, Yan Chen, and Huanying Gu. "Validating UMLS Semantic Type Assignments Using SNOMED CT Semantic Tags." Methods of Information in Medicine 57, no. 01/02 (February 2018): 43–53. http://dx.doi.org/10.3414/me17-01-0120.

Full text
Abstract:
Summary Background: The UMLS assigns semantic types to all its integrated concepts. The semantic types are widely used in various natural language processing tasks in the biomedical domain, such as named entity recognition, semantic disambiguation, and semantic annotation. Due to the size of the UMLS, erroneous semantic type assignments are hard to detect. It is imperative to devise automated techniques to identify errors and inconsistencies in semantic type assignments. Objectives: Designing a methodology to perform programmatic checks to detect semantic type assignment errors for UMLS concepts with one or more SNOMED CT terms and evaluating concepts in a selected set of SNOMED CT hierarchies to verify our hypothesis that UMLS semantic type assignment errors may exist in concepts residing in semantically inconsistent groups. Methods: Our methodology is a four-stage process. 1) partitioning concepts in a SNOMED CT hierarchy into semantically uniform groups based on their assigned semantic tags; 2) partitioning concepts in each group from 1) into the disjoint sub-groups based on their semantic type assignments; 3) mapping all SNOMED CT semantic tags into one or more semantic types in the UMLS; 4) identifying semantically inconsistent groups that have inconsistent assignments between semantic tags and semantic types according to the mapping from 3) and providing concepts in such groups to the domain experts for reviewing. Results: We applied our method on the UMLS 2013AA release. Concepts of the semantically inconsistent groups in the PHYSICAL FORCE and RECORD ARTIFACT hierarchies have error rates 33% and 62.5% respectively, which are greatly larger than error rates 0.6% and 1% in semantically consistent groups of the two hierarchies. Conclusion: Concepts in semantically in - consistent groups are more likely to contain semantic type assignment errors. Our methodology can make auditing more efficient by limiting auditing resources on concepts of semantically inconsistent groups.
APA, Harvard, Vancouver, ISO, and other styles
38

Astari, Adelya, Moch Arif Bijaksana, and Arie Ardiyanti Suryani. "Analysis Name Entity Disambiguation Using Mining Evidence Method." Paradigma - Jurnal Komputer dan Informatika 22, no. 2 (August 31, 2020): 101–8. http://dx.doi.org/10.31294/p.v22i2.8196.

Full text
Abstract:
Hadith is the second guideline and source of Islamic teachings after the Qur'an. One of the most Saheeh hadith is the book of Saheeh al-Bukhaari. Hadith Sahih Bukhari has a chain of narrators, hadith numbers, and contents of different contents. This tradition also has science that discusses the history of the narrators of the hadith called the Science of Rijalul Hadith. In the Sahih Bukhari hadith there are the names of the narrators of the hadith who have the same name, causing obligation between names. That makes it difficult for many ordinary people to understand these ambiguous names because it is not yet known whether the two names are the same person or not. So, it raises the problem of a name ambiguation for ordinary people who cannot distinguish whether the name of the narrator is the same person or not. To solve these problems, a solution is built, namely the disambiguation of names to eliminate the ambiguity of the name by checking the name, hadith number, narrators chain, content topics, circles, countries, and companions of the Prophet that are seen from the 3 last names before the Prophet based on the chain of narrators. Also, the solution is assisted by using a method Mining Evidence with several other approaches, i.e. Association label documents, word association labels, context similarity, cosine similarity, and word2vec to obtain all similarity values between name entities. After the similarity values are obtained, the data are grouped using the Clustering algorithm. This system is expected to be able to produce a good system performance with a confusion matrix based on value precision, recall, and accuracy.
APA, Harvard, Vancouver, ISO, and other styles
39

Qian, Kun, Poornima Chozhiyath Raman, Yunyao Li, and Lucian Popa. "PARTNER: Human-in-the-Loop Entity Name Understanding with Deep Learning." Proceedings of the AAAI Conference on Artificial Intelligence 34, no. 09 (April 3, 2020): 13634–35. http://dx.doi.org/10.1609/aaai.v34i09.7104.

Full text
Abstract:
Entity name disambiguation is an important task for many text-based AI tasks. Entity names usually have internal semantic structures that are useful for resolving different variations of the same entity. We present, PARTNER, a deep learning-based interactive system for entity name understanding. Powered by effective active learning and weak supervision, PARTNER can learn deep learning-based models for identifying entity name structure with low human effort. PARTNER also allows the user to design complex normalization and variant generation functions without coding skills.
APA, Harvard, Vancouver, ISO, and other styles
40

Zhou, Jie, Bicheng Li, and Yongwang Tang. "Chinese Person Name Disambiguation Based on Two-Stage Clustering." Journal of Advanced Computational Intelligence and Intelligent Informatics 20, no. 5 (September 20, 2016): 755–64. http://dx.doi.org/10.20965/jaciii.2016.p0755.

Full text
Abstract:
Person name clustering disambiguation is the process that partitions name mentions according to corresponding target person entities in reality. The existed methods can not realize effective identification of important features to disambiguate person names. This paper presents a method of Chinese person name disambiguation based on two-stage clustering. This method adopts a stage-by-stage processing model to identify and utilize different types of important features. Firstly, we extract three kinds of core evidences namely direct social relation, indirect social relation and common description prefix, recognize document-pairs referring to the same person entity, and realize initial clustering of person names with high precision. Then, we take the result of initial clustering as new initial input, utilize the statistical properties of multi-documents to recognize and evaluate important features, and build a double-vector representation of clusters (cluster feature vector and important feature vector). Based on the processes above, the final clustering of person names is generated, and the recall of clustering is improved effectively. The experiments have been conducted on the dataset of CLP2010 Chinese person names disambiguation, and experimental results show that this method has good performance in person name clustering disambiguation.
APA, Harvard, Vancouver, ISO, and other styles
41

Hamzei, E., F. Hakimpour, and A. Forati. "SPATIAL QUERIES ENTITY RECOGNITION AND DISAMBIGUATION USING RULE-BASED APPROACH." ISPRS - International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences XL-1-W5 (December 11, 2015): 277–80. http://dx.doi.org/10.5194/isprsarchives-xl-1-w5-277-2015.

Full text
Abstract:
In the digital world, search engines have been proposed as one of challenging research areas. One of the main issues in search engines studies is query processing, which its aim is to understand user’s needs. If unsuitable spatial query processing approach is employed, the results will be associated with high degree of ambiguity. To evade such degree of ambiguity, in this paper we present a new algorithm which depends on rule-based systems to process queries. Our algorithm is implemented in the three basic steps including: deductively iterative splitting the query; finding candidates for the location names, the location types and spatial relationships; and finally checking the relationships logically and conceptually using a rule based system. As we finally present in the paper using our proposed method have two major advantages: the search engines can provide the capability of spatial analysis based on the specific process and secondly because of its disambiguation technique, user reaches the more desirable result.
APA, Harvard, Vancouver, ISO, and other styles
42

Rodriguez-Esteban, Raul. "Semantic persistence of ambiguous biomedical names in the citation network." Bioinformatics 36, no. 7 (December 12, 2019): 2224–28. http://dx.doi.org/10.1093/bioinformatics/btz923.

Full text
Abstract:
Abstract Motivation Name ambiguity has long been a central problem in biomedical text mining. To tackle it, it has been usually assumed that names present only one meaning within a given text. It is not known whether this assumption applies beyond the scope of single documents. Results Using a new method that leverages large numbers of biomedical annotations and normalized citations, this study shows that ambiguous biomedical names mentioned in scientific articles tend to present the same meaning in articles that cite them or that they cite, and, to a lesser extent, two steps away in the citation network. Citations, therefore, can be regarded as semantic connections between articles and the citation network should be considered for tasks such as automatic name disambiguation, entity linking and biomedical database annotation. A simple experiment shows the applicability of these findings to name disambiguation. Availability and implementation The code used for this analysis is available at: https://github.com/raroes/one-sense-per-citation-network.
APA, Harvard, Vancouver, ISO, and other styles
43

Kumar, N. Senthil, and Dinakaran Muruganantham. "Disambiguating the Twitter Stream Entities and Enhancing the Search Operation Using DBpedia Ontology." International Journal of Information Technology and Web Engineering 11, no. 2 (April 2016): 51–62. http://dx.doi.org/10.4018/ijitwe.2016040104.

Full text
Abstract:
The web and social web is holding the huge amount of unstructured data and makes the searching processing more cumbersome. The principal task here is to migrate the unstructured data into the structured data through the appropriate utilization of named entity detections. The goal of the paper is to automatically build and store the deep knowledge base of important facts and construct the comprehensive details about the facts such as its related named entities, its semantic classes of the entities and its mutual relationship with its temporal context can be thoroughly analyzed and probed. In this paper, the authors have given and proposed the model to identify all the major interpretations of the named entities and effectively link them to the appropriate mentions of the knowledge base (DBpedia). They finally evaluate the approaches that uniquely identify the DBpedia URIs of the selected entities and eliminate the other candidate mentions of the entities based on the authority rankings of those candidate mentions.
APA, Harvard, Vancouver, ISO, and other styles
44

Aldana-Bobadilla, Edwin, Alejandro Molina-Villegas, Ivan Lopez-Arevalo, Shanel Reyes-Palacios, Victor Muñiz-Sanchez, and Jean Arreola-Trapala. "Adaptive Geoparsing Method for Toponym Recognition and Resolution in Unstructured Text." Remote Sensing 12, no. 18 (September 17, 2020): 3041. http://dx.doi.org/10.3390/rs12183041.

Full text
Abstract:
The automatic extraction of geospatial information is an important aspect of data mining. Computer systems capable of discovering geographic information from natural language involve a complex process called geoparsing, which includes two important tasks: geographic entity recognition and toponym resolution. The first task could be approached through a machine learning approach, in which case a model is trained to recognize a sequence of characters (words) corresponding to geographic entities. The second task consists of assigning such entities to their most likely coordinates. Frequently, the latter process involves solving referential ambiguities. In this paper, we propose an extensible geoparsing approach including geographic entity recognition based on a neural network model and disambiguation based on what we have called dynamic context disambiguation. Once place names are recognized in an input text, they are solved using a grammar, in which a set of rules specifies how ambiguities could be solved, in a similar way to that which a person would utilize, considering the context. As a result, we have an assignment of the most likely geographic properties of the recognized places. We propose an assessment measure based on a ranking of closeness relative to the predicted and actual locations of a place name. Regarding this measure, our method outperforms OpenStreetMap Nominatim. We include other assessment measures to assess the recognition ability of place names and the prediction of what we called geographic levels (administrative jurisdiction of places).
APA, Harvard, Vancouver, ISO, and other styles
45

Han, Hongqi, Yongsheng Yu, Lijun Wang, Xiaorui Zhai, Yaxin Ran, and Jingpeng Han. "Disambiguating USPTO inventor names with semantic fingerprinting and DBSCAN clustering." Electronic Library 37, no. 2 (April 1, 2019): 225–39. http://dx.doi.org/10.1108/el-12-2018-0232.

Full text
Abstract:
PurposeThe aim of this study is to present a novel approach based on semantic fingerprinting and a clustering algorithm called density-based spatial clustering of applications with noise (DBSCAN), which can be used to convert investor records into 128-bit semantic fingerprints. Inventor disambiguation is a method used to discover a unique set of underlying inventors and map a set of patents to their corresponding inventors. Resolving the ambiguities between inventors is necessary to improve the quality of the patent database and to ensure accurate entity-level analysis. Most existing methods are based on machine learning and, while they often show good performance, this comes at the cost of time, computational power and storage space.Design/methodology/approachUsing DBSCAN, the meta and textual data in inventor records are converted into 128-bit semantic fingerprints. However, rather than using a string comparison or cosine similarity to calculate the distance between pair-wise fingerprint records, a binary number comparison function was used in DBSCAN. DBSCAN then clusters the inventor records based on this distance to disambiguate inventor names.FindingsExperiments conducted on the PatentsView campaign database of the United States Patent and Trademark Office show that this method disambiguates inventor names with recall greater than 99 per cent in less time and with substantially smaller storage requirement.Research limitations/implicationsA better semantic fingerprint algorithm and a better distance function may improve precision. Setting of different clustering parameters for each block or other clustering algorithms will be considered to improve the accuracy of the disambiguation results even further.Originality/valueCompared with the existing methods, the proposed method does not rely on feature selection and complex feature comparison computation. Most importantly, running time and storage requirements are drastically reduced.
APA, Harvard, Vancouver, ISO, and other styles
46

Wei, Chih-Hsuan, Hung-Yu Kao, and Zhiyong Lu. "GNormPlus: An Integrative Approach for Tagging Genes, Gene Families, and Protein Domains." BioMed Research International 2015 (2015): 1–7. http://dx.doi.org/10.1155/2015/918710.

Full text
Abstract:
The automatic recognition of gene names and their associated database identifiers from biomedical text has been widely studied in recent years, as these tasks play an important role in many downstream text-mining applications. Despite significant previous research, only a small number of tools are publicly available and these tools are typically restricted to detecting only mention level gene names or only document level gene identifiers. In this work, we report GNormPlus: an end-to-end and open source system that handles both gene mention and identifier detection. We created a new corpus of 694 PubMed articles to support our development of GNormPlus, containing manual annotations for not only gene names and their identifiers, but also closely related concepts useful for gene name disambiguation, such as gene families and protein domains. GNormPlus integrates several advanced text-mining techniques, including SimConcept for resolving composite gene names. As a result, GNormPlus compares favorably to other state-of-the-art methods when evaluated on two widely used public benchmarking datasets, achieving 86.7% F1-score on the BioCreative II Gene Normalization task dataset and 50.1% F1-score on the BioCreative III Gene Normalization task dataset. The GNormPlus source code and its annotated corpus are freely available, and the results of applying GNormPlus to the entire PubMed are freely accessible through our web-based tool PubTator.
APA, Harvard, Vancouver, ISO, and other styles
47

Çelebi, Arda, and Arzucan Özgür. "Cluster-based mention typing for named entity disambiguation." Natural Language Engineering, August 20, 2020, 1–37. http://dx.doi.org/10.1017/s1351324920000443.

Full text
Abstract:
Abstract An entity mention in text such as “Washington” may correspond to many different named entities such as the city “Washington D.C.” or the newspaper “Washington Post.” The goal of named entity disambiguation (NED) is to identify the mentioned named entity correctly among all possible candidates. If the type (e.g., location or person) of a mentioned entity can be correctly predicted from the context, it may increase the chance of selecting the right candidate by assigning low probability to the unlikely ones. This paper proposes cluster-based mention typing for NED. The aim of mention typing is to predict the type of a given mention based on its context. Generally, manually curated type taxonomies such as Wikipedia categories are used. We introduce cluster-based mention typing, where named entities are clustered based on their contextual similarities and the cluster ids are assigned as types. The hyperlinked mentions and their context in Wikipedia are used in order to obtain these cluster-based types. Then, mention typing models are trained on these mentions, which have been labeled with their cluster-based types through distant supervision. At the NED phase, first the cluster-based types of a given mention are predicted and then, these types are used as features in a ranking model to select the best entity among the candidates. We represent entities at multiple contextual levels and obtain different clusterings (and thus typing models) based on each level. As each clustering breaks the entity space differently, mention typing based on each clustering discriminates the mention differently. When predictions from all typing models are used together, our system achieves better or comparable results based on randomization tests with respect to the state-of-the-art levels on four defacto test sets.
APA, Harvard, Vancouver, ISO, and other styles
48

Ruas, Pedro, Andre Lamurias, and Francisco M. Couto. "Linking chemical and disease entities to ontologies by integrating PageRank with extracted relations from literature." Journal of Cheminformatics 12, no. 1 (September 21, 2020). http://dx.doi.org/10.1186/s13321-020-00461-4.

Full text
Abstract:
Abstract Background Named Entity Linking systems are a powerful aid to the manual curation of digital libraries, which is getting increasingly costly and inefficient due to the information overload. Models based on the Personalized PageRank (PPR) algorithm are one of the state-of-the-art approaches, but these have low performance when the disambiguation graphs are sparse. Findings This work proposes a Named Entity Linking framework designated by Relation Extraction for Entity Linking (REEL) that uses automatically extracted relations to overcome this limitation. Our method builds a disambiguation graph, where the nodes are the ontology candidates for the entities and the edges are added according to the relations established in the text, which the method extracts automatically. The PPR algorithm and the information content of each ontology are then applied to choose the candidate for each entity that maximises the coherence of the disambiguation graph. We evaluated the method on three gold standards: the subset of the CRAFT corpus with ChEBI annotations (CRAFT-ChEBI), the subset of the BC5CDR corpus with disease annotations from the MEDIC vocabulary (BC5CDR-Diseases) and the subset with chemical annotations from the CTD-Chemical vocabulary (BC5CDR-Chemicals). The F1-Score achieved by REEL was 85.8%, 80.9% and 90.3% in these gold standards, respectively, outperforming baseline approaches. Conclusions We demonstrated that RE tools can improve Named Entity Linking by capturing semantic information expressed in text missing in Knowledge Bases and use it to improve the disambiguation graph of Named Entity Linking models. REEL can be adapted to any text mining pipeline and potentially to any domain, as long as there is an ontology or other knowledge Base available.
APA, Harvard, Vancouver, ISO, and other styles
49

Kandasamy, Saravanakumar, and Aswani Kumar Cherukuri. "Query expansion using named entity disambiguation for a question‐answering system." Concurrency and Computation: Practice and Experience 32, no. 4 (December 27, 2018). http://dx.doi.org/10.1002/cpe.5119.

Full text
APA, Harvard, Vancouver, ISO, and other styles
50

"Semantic based entity retrieval and disambiguation system for Twitter streams." Knowledge Management & E-Learning: An International Journal, June 28, 2019, 262–80. http://dx.doi.org/10.34105/j.kmel.2019.11.014.

Full text
Abstract:
Social media networks have evolved as a large repository of short documents and gives the greater challenges to effectively retrieve the content out of it. Many factors were involved in this process such as restricted length of a content, informal use of language (i.e., slangs, abbreviations, styles, etc.) and low contextualization of the user generated content. To meet out the above stated problems, latest studies on context-based information searching have been developed and built on adding semantics to the user generated content into the existing knowledge base. And also, earlier, bag-of-concepts has been used to link the potential noun phrases into existing knowledge sources. Thus, in this paper, we have effectively utilized the relationships among the concepts and equivalence prevailing in the related concepts of the selected named entities by deriving the potential meaning of entities and find the semantic similarity between the named entities with three other potential sources of references (DBpedia, Anchor Texts and Twitter Trends).
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography