Academic literature on the topic 'Multilingual, named entity recognition, NER'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the lists of relevant articles, books, theses, conference reports, and other scholarly sources on the topic 'Multilingual, named entity recognition, NER.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Journal articles on the topic "Multilingual, named entity recognition, NER"

1

Sharma, Yashvardhan, Rupal Bhargava, and Bapiraju Vamsi Tadikonda. "Named Entity Recognition for Code Mixed Social Media Sentences." International Journal of Software Science and Computational Intelligence 13, no. 2 (2021): 23–36. http://dx.doi.org/10.4018/ijssci.2021040102.

Full text
Abstract:
With the increase of internet applications and social media platforms there has been an increase in the informal way of text communication. People belonging to different regions tend to mix their regional language with English on social media text. This has been the trend with many multilingual nations now and is commonly known as code mixing. In code mixing, multiple languages are used within a statement. The problem of named entity recognition (NER) is a well-researched topic in natural language processing (NLP), but the present NER systems tend to perform inefficiently on code-mixed text. This paper proposes three approaches to improve named entity recognizers for handling code-mixing. The first approach is based on machine learning techniques such as support vector machines and other tree-based classifiers. The second approach is based on neural networks and the third approach uses long short-term memory (LSTM) architecture to solve the problem.
APA, Harvard, Vancouver, ISO, and other styles
2

Yan, Huijiong, Tao Qian, Liang Xie, and Shanguang Chen. "Unsupervised cross-lingual model transfer for named entity recognition with contextualized word representations." PLOS ONE 16, no. 9 (2021): e0257230. http://dx.doi.org/10.1371/journal.pone.0257230.

Full text
Abstract:
Named entity recognition (NER) is one fundamental task in the natural language processing (NLP) community. Supervised neural network models based on contextualized word representations can achieve highly-competitive performance, which requires a large-scale manually-annotated corpus for training. While for the resource-scarce languages, the construction of such as corpus is always expensive and time-consuming. Thus, unsupervised cross-lingual transfer is one good solution to address the problem. In this work, we investigate the unsupervised cross-lingual NER with model transfer based on contextualized word representations, which greatly advances the cross-lingual NER performance. We study several model transfer settings of the unsupervised cross-lingual NER, including (1) different types of the pretrained transformer-based language models as input, (2) the exploration strategies of the multilingual contextualized word representations, and (3) multi-source adaption. In particular, we propose an adapter-based word representation method combining with parameter generation network (PGN) better to capture the relationship between the source and target languages. We conduct experiments on a benchmark ConLL dataset involving four languages to simulate the cross-lingual setting. Results show that we can obtain highly-competitive performance by cross-lingual model transfer. In particular, our proposed adapter-based PGN model can lead to significant improvements for cross-lingual NER.
APA, Harvard, Vancouver, ISO, and other styles
3

Van der Heijden, Niels, Samira Abnar, and Ekaterina Shutova. "A Comparison of Architectures and Pretraining Methods for Contextualized Multilingual Word Embeddings." Proceedings of the AAAI Conference on Artificial Intelligence 34, no. 05 (2020): 9090–97. http://dx.doi.org/10.1609/aaai.v34i05.6443.

Full text
Abstract:
The lack of annotated data in many languages is a well-known challenge within the field of multilingual natural language processing (NLP). Therefore, many recent studies focus on zero-shot transfer learning and joint training across languages to overcome data scarcity for low-resource languages. In this work we (i) perform a comprehensive comparison of state-of-the-art multilingual word and sentence encoders on the tasks of named entity recognition (NER) and part of speech (POS) tagging; and (ii) propose a new method for creating multilingual contextualized word embeddings, compare it to multiple baselines and show that it performs at or above state-of-the-art level in zero-shot transfer settings. Finally, we show that our method allows for better knowledge sharing across languages in a joint training setting.
APA, Harvard, Vancouver, ISO, and other styles
4

Berend, Gáabor. "Sparse Coding of Neural Word Embeddings for Multilingual Sequence Labeling." Transactions of the Association for Computational Linguistics 5 (December 2017): 247–61. http://dx.doi.org/10.1162/tacl_a_00059.

Full text
Abstract:
In this paper we propose and carefully evaluate a sequence labeling framework which solely utilizes sparse indicator features derived from dense distributed word representations. The proposed model obtains (near) state-of-the art performance for both part-of-speech tagging and named entity recognition for a variety of languages. Our model relies only on a few thousand sparse coding-derived features, without applying any modification of the word representations employed for the different tasks. The proposed model has favorable generalization properties as it retains over 89.8% of its average POS tagging accuracy when trained at 1.2% of the total available training data, i.e. 150 sentences per language.
APA, Harvard, Vancouver, ISO, and other styles
5

Nguyen, Huyen T. M., Quyen T. Ngo, Luong X. Vu, Vu M. Tran, and Hien T. T. Nguyen. "VLSP Shared Task: Named Entity Recognition." Journal of Computer Science and Cybernetics 34, no. 4 (2019): 283–94. http://dx.doi.org/10.15625/1813-9663/34/4/13161.

Full text
Abstract:
Named entities (NE) are phrases that contain the names of persons, organizations, locations, times and quantities, monetary values, percentages, etc. Named Entity Recognition (NER) is the task of recognizing named entities in documents. NER is an important subtask of Information Extraction, which has attracted researchers all over the world since 1990s. For Vietnamese language, although there exists some research projects and publications on NER task before 2016, no systematic comparison of the performance of NER systems has been done. In 2016, the organizing committee of the VLSP workshop decided to launch the first NER shared task, in order to get an objective evaluation of Vietnamese NER systems and to promote the development of high quality systems. As a result, the first dataset with morpho-syntactic and NE annotations has been released for benchmarking NER systems. At VLSP 2018, the NER shared task has been organized for the second time, providing a bigger dataset containing texts from various domains, but without morpho-syntactic annotation. These resources are available for research purpose via the VLSP website vlsp.org.vn/resources. In this paper, we describe the datasets as well as the evaluation results obtained from these two campaigns.
APA, Harvard, Vancouver, ISO, and other styles
6

Rana, Prince, Sunil Kumar Gupta, and Kamlesh Dutta. "Named Entity Recognition (NER) for Hindi." International Journal of Computer Sciences and Engineering 6, no. 7 (2018): 856–59. http://dx.doi.org/10.26438/ijcse/v6i7.856859.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Rachmad, Dwi Swasono. "Review Named Entity Recognition dengan Menggunakan Machine Learning." Jurnal Sains dan Informatika 6, no. 1 (2020): 28–33. http://dx.doi.org/10.34128/jsi.v6i1.204.

Full text
Abstract:
NER atau Named Entity Recognition yang sering dikenal sebagai salah satu komponen utama dari sistem pertanyaan jawaban. NER memiliki cara tradisional yang selanjutkan dikembangkan sebagai salah satu komponen untuk mendapatkan informasi dengan mengekstraksi kata dan terdapat teknik yang dapat difokuskan pada tahap terakhir. Pada artikel ini dapat diketahui dengan melakukan beberapa pendekatan telah digunakan oleh beberapa peneliti dalam meneliti fungsi NER sebagai ekstraksi informasi kata. Name Entity Recognition atau NER pada berbagai penerapan yang telah dilakukan penelitiannya. NER memiliki fungsi sebagai ekstrasi dari kata yang dapat memberikan informasi terkait kalimat atau kata-kata. Berdasarkan pada penelitian dapat diketahui terdapat beberapa masalah pada sistem penjawab pertanyaan yang masih merupakan bidang yang menarik untuk dilakukan pada bahasa Indonesia, Bahasa India khususnya Telugu, bahasa Arab, dan NER pada kelas nama, lokasi, organisasi, dan lainnya menghasilkan hasil yang baik dan akurasi tinggi. Namun NER yang tidak dilakukan pada kelas lokasi seperti tanggal, waktu, dan tempat serta tidak menggunakan data yang besar untuk ekstrasi dalam NER. Dalam hal ini, NER akan dimanfaatkan untuk machine learning yang lebih baik untuk mengenal berbagai kata atau elemen eksraksi dari suatu kata.
APA, Harvard, Vancouver, ISO, and other styles
8

Ekbal, Asif, and Sivaji Bandyopadhyay. "Named Entity Recognition in Bengali." Northern European Journal of Language Technology 1 (February 2, 2010): 26–58. http://dx.doi.org/10.3384/nejlt.2000-1533.091226.

Full text
Abstract:
This paper reports about a multi-engine approach for the development of a Named Entity Recognition (NER) system in Bengali by combining the classifiers such as Maximum Entropy (ME), Conditional Random Field (CRF) and Support Vector Machine (SVM) with the help of weighted voting techniques. The training set consists of approximately 272K wordforms, out of which 150K wordforms have been manually annotated with the four major named entity (NE) tags, namely Person name, Location name, Organization name and Miscellaneous name. An appropriate tag conversion routine has been defined in order to convert the 122K wordforms of the IJCNLP-08 NER Shared Task on South and South East Asian Languages (NERSSEAL)1 data into the desired forms. The individual classifiers make use of the different contextual information of the words along with the variety of features that are helpful to predict the various NE classes. Lexical context patterns, generated from an unlabeled corpus of 3 million wordforms in a semi-automatic way, have been used as the features of the classifiers in order to improve their performance. In addition, we propose a number of techniques to post-process the output of each classifier in order to reduce the errors and to improve the performance further. Finally, we use three weighted voting techniques to combine the individual models. Experimental results show the effectiveness of the proposed multi-engine approach with the overall Recall, Precision and F-Score values of 93.98%, 90.63% and 92.28%, respectively, which shows an improvement of 14.92% in F-Score over the best performing baseline SVM based system and an improvement of 18.36% in F-Score over the least performing baseline ME based system. Comparative evaluation results also show that the proposed system outperforms the three other existing Bengali NER systems.
APA, Harvard, Vancouver, ISO, and other styles
9

Bari, M. Saiful, Shafiq Joty, and Prathyusha Jwalapuram. "Zero-Resource Cross-Lingual Named Entity Recognition." Proceedings of the AAAI Conference on Artificial Intelligence 34, no. 05 (2020): 7415–23. http://dx.doi.org/10.1609/aaai.v34i05.6237.

Full text
Abstract:
Recently, neural methods have achieved state-of-the-art (SOTA) results in Named Entity Recognition (NER) tasks for many languages without the need for manually crafted features. However, these models still require manually annotated training data, which is not available for many languages. In this paper, we propose an unsupervised cross-lingual NER model that can transfer NER knowledge from one language to another in a completely unsupervised way without relying on any bilingual dictionary or parallel data. Our model achieves this through word-level adversarial learning and augmented fine-tuning with parameter sharing and feature augmentation. Experiments on five different languages demonstrate the effectiveness of our approach, outperforming existing models by a good margin and setting a new SOTA for each language pair.
APA, Harvard, Vancouver, ISO, and other styles
10

Mayhew, Stephen, Gupta Nitish, and Dan Roth. "Robust Named Entity Recognition with Truecasing Pretraining." Proceedings of the AAAI Conference on Artificial Intelligence 34, no. 05 (2020): 8480–87. http://dx.doi.org/10.1609/aaai.v34i05.6368.

Full text
Abstract:
Although modern named entity recognition (NER) systems show impressive performance on standard datasets, they perform poorly when presented with noisy data. In particular, capitalization is a strong signal for entities in many languages, and even state of the art models overfit to this feature, with drastically lower performance on uncapitalized text. In this work, we address the problem of robustness of NER systems in data with noisy or uncertain casing, using a pretraining objective that predicts casing in text, or a truecaser, leveraging unlabeled data. The pretrained truecaser is combined with a standard BiLSTM-CRF model for NER by appending output distributions to character embeddings. In experiments over several datasets of varying domain and casing quality, we show that our new model improves performance in uncased text, even adding value to uncased BERT embeddings. Our method achieves a new state of the art on the WNUT17 shared task dataset.
APA, Harvard, Vancouver, ISO, and other styles
More sources

Dissertations / Theses on the topic "Multilingual, named entity recognition, NER"

1

Bridal, Olle. "Named-entity recognition with BERT for anonymization of medical records." Thesis, Linköpings universitet, Institutionen för datavetenskap, 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-176547.

Full text
Abstract:
Sharing data is an important part of the progress of science in many fields. In the largely deep learning dominated field of natural language processing, textual resources are in high demand. In certain domains, such as that of medical records, the sharing of data is limited by ethical and legal restrictions and therefore requires anonymization. The process of manual anonymization is tedious and expensive, thus automated anonymization is of great value. Since medical records consist of unstructured text, pieces of sensitive information have to be identified in order to be masked for anonymization. Named-entity recognition (NER) is the subtask of information extraction named entities, such as person names or locations, are identified and categorized. Recently, models that leverage unsupervised training on large quantities of unlabeled training data have performed impressively on the NER task, which shows promise in their usage for the problem of anonymization. In this study, a small set of medical records was annotated with named-entity tags. Because of the lack of any training data, a BERT model already fine-tuned for NER was then evaluated on the evaluation set. The aim was to find out how well the model would perform on NER on medical records, and to explore the possibility of using the model to anonymize medical records. The most positive result was that the model was able to identify all person names in the dataset. The average accuracy for identifying all entity types was however relatively low. It is discussed that the success of identifying person names shows promise in the model’s application for anonymization. However, because the overall accuracy is significantly worse than that of models fine-tuned on domain-specific data, it is suggested that there might be better methods for anonymization in the absence of relevant training data.
APA, Harvard, Vancouver, ISO, and other styles
2

Nikolic, Vladan. "Creating a Graph Database from a Set of Documents." Thesis, KTH, Skolan för datavetenskap och kommunikation (CSC), 2015. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-176042.

Full text
Abstract:
In the context of search, it may be advantageous in some use-cases to have documents saved in a graph database rather than a document-orientated database. Graph databases are able to model relationships between objects, in this case documents, in ways which allow for efficient retrieval, as well as search queries that are slightly more specific or complex. This report will attempt to explore the possibilities of storing an existing set of documents into a graph database. A Named Entity Recognizer was used on a set of news articles in order to extract entities from each news article’s body of text. News articles that contain the same entities are then connected to each other in the graph. Ideas to improve this entity extraction are also explored. The method of evaluation that was utilized in this report proved not to be ideal for this task in that only a relative measure was given, not an absolute one. As such, no absolute answer with regards to the quality of the method can be presented. It is clear that improvements can be made, and the result should be subject to further study.<br>I ett sökkontext kan det vara födelaktigt att i några användarscenarion utgå från dokument lagrade i en grafdatabas gentemot en dokument-orienterad databas. Grafdatabaser kan modellera förhållanden mellan objekt, som i detta fall är dokument, på ett sätt som ökar effektiviteten för vissa mer specifika eller komplexa sökfrågor. Denna rapport utforskar möjligheterna i att lagra existerande dokument i en grafdatabas. En Named Entity Recognizer används för att extrahera entiter från en stor samling nyhetsartiklar. Nyhetsartiklar som innehåller samma entiteter är sedan kopplade till varandra i grafen. Dessutom undersöks möjligheter till att förbättra extraheringen av entiteter. Evalueringsmetoden som användes visade sig mindre än ideal, då endast en relativ snarare än absolut bedömning kan göras av den slutgiltiga grafen. Därav kan inget slutgiltigt svar ges angående grafens och metodens kvalitet, men resultatet bör vara av intresse för framtida undersökningar.
APA, Harvard, Vancouver, ISO, and other styles
3

Chau, Ting-Hey. "Translation Memory System Optimization : How to effectively implement translation memory system optimization." Thesis, KTH, Skolan för datavetenskap och kommunikation (CSC), 2015. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-169218.

Full text
Abstract:
Translation of technical manuals is expensive, especially when a larger company needs to publish manuals for their whole product range in over 20 different languages. When a text segment (i.e. a phrase, sentence or paragraph) is manually translated, we would like to reuse these translated segments in future translation tasks. A translated segment is stored with its corresponding source language, often called a language pair in a Translation Memory System. A language pair in a Translation Memory represents a Translation Entry also known as a Translation Unit. During a translation, when a text segment in a source document matches a segment in the Translation Memory, available target languages in the Translation Unit will not require a human translation. The previously translated segment can be inserted into the target document. Such functionality is provided in the single source publishing software, Skribenta developed by Excosoft. Skribenta requires text segments in source documents to find an exact or a full match in the Translation Memory, in order to apply a translation to a target language. A full match can only be achieved if a source segment is stored in a standardized form, which requires manual tagging of entities, and often reoccurring words such as model names and product numbers. This thesis investigates different ways to improve and optimize a Translation Memory System. One way was to aid users with the work of manual tagging of entities, by developing Heuristic algorithms to approach the problem of Named Entity Recognition (NER). The evaluation results from the developed Heuristic algorithms were compared with the result from an off the shelf NER tool developed by Stanford. The results shows that the developed Heuristic algorithms is able to achieve a higher F-Measure compare to the Stanford NER, and may be a great initial step to aid Excosofts’ users to improve their Translation Memories.<br>Översättning av tekniska manualer är väldigt kostsamt, speciellt när större organisationer behöver publicera produktmanualer för hela deras utbud till över 20 olika språk. När en text (t.ex. en fras, mening, paragraf) har blivit översatt så vill vi kunna återanvända den översatta texten i framtida översättningsprojekt och dokument. De översatta texterna lagras i ett översättningsminne (Translation Memory). Varje text lagras i sitt källspråk tillsammans med dess översättning på ett annat språk, så kallat målspråk. Dessa utgör då ett språkpar i ett översättningsminnessystem (Translation Memory System). Ett språkpar som lagras i ett översättningsminne utgör en Translation Entry även kallat Translation Unit. Om man hittar en matchning när man söker på källspråket efter en given textsträng i översättningsminnet, får man upp översättningar på alla möjliga målspråk för den givna textsträngen. Dessa kan i sin tur sättas in i måldokumentet. En sådan funktionalitet erbjuds i publicerings programvaran Skribenta, som har utvecklats av Excosoft. För att utföra en översättning till ett målspråk kräver Skribenta att text i källspråket hittar en exakt matchning eller en s.k. full match i översättningsminnet. En full match kan bara uppnås om en text finns lagrad i standardform. Detta kräver manuell taggning av entiteter och ofta förekommande ord som modellnamn och produktnummer. I denna uppsats undersöker jag hur man effektivt implementerar en optimering i ett översättningsminnessystem, bland annat genom att underlätta den manuella taggningen av entitier. Detta har gjorts genom olika Heuristiker som angriper problemet med Named Entity Recognition (NER). Resultat från de utvecklade Heuristikerna har jämförts med resultatet från det NER-verktyg som har utvecklats av Stanford. Resultaten visar att de Heuristiker som jag utvecklat uppnår ett högre F-Measure jämfört med Stanford NER och kan därför vara ett bra inledande steg för att hjälpa Excosofts användare att förbättra deras översättningsminnen.
APA, Harvard, Vancouver, ISO, and other styles
4

Hathurusinghe, Rajitha. "Building a Personally Identifiable Information Recognizer in a Privacy Preserved Manner Using Automated Annotation and Federated Learning." Thesis, Université d'Ottawa / University of Ottawa, 2020. http://hdl.handle.net/10393/41011.

Full text
Abstract:
This thesis explores the training of a deep neural network based named entity recognizer in an end-to-end privacy preserved setting where dataset creation and model training happen in an environment with minimal manual interventions. With the improvement of accuracy in Deep Learning Models for practical tasks, a rising concern is satisfying the demand for training data for these models amidst the concerns on the data privacy. Several scenarios of data protection are suggested in the recent past due to public concerns hence the legal guidelines to enforce them. A promising new development is the decentralized model training on isolated datasets, which eliminates the compromises of privacy upon providing data to a centralized entity. However, in this federated setting curating the data source is still a privacy risk mostly in unstructured data sources such as text. We explore the feasibility of automatic dataset annotation for a Named Entity Recognition (NER) task and training a deep learning model with it in two federated learning settings. We explore the feasibility of utilizing a dataset created in this manner for fine-tuning a stateof- the-art deep learning language model for the downstream task of named entity recognition. We also explore this novel setting of deep learning NLP model and federated learning for its deviation from the classical centralized setting. We created an automatically annotated dataset containing around 80,000 sentences, a manual human annotated test set and tools to extend the dataset with more manual annotations. We observed the noise from automated annotation can be overcome to a level by increasing the dataset size. We also contributed to the federated learning framework with state-of-the-art NLP model developments. Overall, our NER model achieved around 0.80 F1-score for recognition of entities in sentences.
APA, Harvard, Vancouver, ISO, and other styles
5

Täckström, Oscar. "Predicting Linguistic Structure with Incomplete and Cross-Lingual Supervision." Doctoral thesis, Uppsala universitet, Institutionen för lingvistik och filologi, 2013. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-197610.

Full text
Abstract:
Contemporary approaches to natural language processing are predominantly based on statistical machine learning from large amounts of text, which has been manually annotated with the linguistic structure of interest. However, such complete supervision is currently only available for the world's major languages, in a limited number of domains and for a limited range of tasks. As an alternative, this dissertation considers methods for linguistic structure prediction that can make use of incomplete and cross-lingual supervision, with the prospect of making linguistic processing tools more widely available at a lower cost. An overarching theme of this work is the use of structured discriminative latent variable models for learning with indirect and ambiguous supervision; as instantiated, these models admit rich model features while retaining efficient learning and inference properties. The first contribution to this end is a latent-variable model for fine-grained sentiment analysis with coarse-grained indirect supervision. The second is a model for cross-lingual word-cluster induction and the application thereof to cross-lingual model transfer. The third is a method for adapting multi-source discriminative cross-lingual transfer models to target languages, by means of typologically informed selective parameter sharing. The fourth is an ambiguity-aware self- and ensemble-training algorithm, which is applied to target language adaptation and relexicalization of delexicalized cross-lingual transfer parsers. The fifth is a set of sequence-labeling models that combine constraints at the level of tokens and types, and an instantiation of these models for part-of-speech tagging with incomplete cross-lingual and crowdsourced supervision. In addition to these contributions, comprehensive overviews are provided of structured prediction with no or incomplete supervision, as well as of learning in the multilingual and cross-lingual settings. Through careful empirical evaluation, it is established that the proposed methods can be used to create substantially more accurate tools for linguistic processing, compared to both unsupervised methods and to recently proposed cross-lingual methods. The empirical support for this claim is particularly strong in the latter case; our models for syntactic dependency parsing and part-of-speech tagging achieve the hitherto best published results for a wide number of target languages, in the setting where no annotated training data is available in the target language.
APA, Harvard, Vancouver, ISO, and other styles
6

Doležal, Jan. "Komponent pro sémantické obohacení." Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2018. http://www.nusl.cz/ntk/nusl-385991.

Full text
Abstract:
This master's thesis describes Semantic Enrichment Component (SEC), that searches entities (e.g., persons or places) in the input text document and returns information about them. The goals of this component are to create a single interface for named entity recognition tools, to enable parallel document processing, to save memory while using the knowledge base, and to speed up access to its content. To achieve these goals, the output of the named entity recognition tools in the text was specified, the tool for storing the preprocessed knowledge base into the shared memory was implemented, and the client-server scheme was used to create the component.
APA, Harvard, Vancouver, ISO, and other styles
7

Thomas, Stefan. "Verbesserung einer Erkennungs- und Normalisierungsmaschine für natürlichsprachige Zeitausdrücke." 2012. https://ul.qucosa.de/id/qucosa%3A17239.

Full text
Abstract:
Digital gespeicherte Daten erfreuen sich einer stetig steigenden Verwendung. Insbesondere die computerbasierte Kommunikation über E-Mail, SMS, Messenger usw. hat klassische Kommunikationsmittel nahezu vollständig verdrängt. Einen Mehrwert aus diesen Daten zu generieren, ist sowohl im geschäftlichen als auch im privaten Bereich von entscheidender Bedeutung. Eine Möglichkeit den Nutzer zu unterstützen ist es, seine textuellen Daten umfassend zu analysieren und bestimmte Elemente hervorzuheben und ihm die Erstellung von Einträgen für Kalender, Adressbuch und dergleichen abzunehmen bzw. zumindest vorzubereiten. Eine weitere Möglichkeit stellt die semantische Suche in den Daten des Nutzers dar. Selbst mit Volltextsuche muss man bisher den genauen Wortlaut kennen, wenn man eine bestimmte Information sucht. Durch ein tiefgreifendes Verständnis für Zeit ist es nun aber möglich, über einen Zeitstrahl alle mit einem bestimmten Zeitpunkt oder einer Zeitspanne verknüpften Daten zu finden. Es existieren bereits viele Ansätze um Named Entity Recognition voll- bzw. semi-automatisch durchzuführen, aber insbesondere Verfahren, welche weitgehend sprachunabhängig arbeiten und sich somit leicht auf viele Sprachen skalieren lassen, sind kaum publiziert. Um ein solches Verfahren für natürlichsprachige Zeitausdrücke zu verbessern, werden in dieser Arbeit, basierend auf umfangreichen Analysen, Möglichkeiten vorgestellt. Es wird speziell eine Strategie entwickelt, die auf einem Verfahren des maschinellen Lernens beruht und so den manuellen Aufwand für die Unterstützung neuer Sprachen reduziert. Diese und weitere Strategien wurden implementiert und in die bestehende Architektur der Zeiterkennungsmaschine der ExB-Gruppe integriert.
APA, Harvard, Vancouver, ISO, and other styles
8

Almutairi, Abeer N. "Unsupervised Method for Disease Named Entity Recognition." Thesis, 2019. http://hdl.handle.net/10754/659966.

Full text
Abstract:
Diseases take a central role in biomedical research; many studies aim to enable access to disease information, by designing named entity recognition models to make use of the available information. Disease recognition is a problem that has been tackled by various approaches of which the most famous are the lexical and supervised approaches. However, the aforementioned approaches have many drawbacks as their performance is affected by the amount of human-annotated data set available. Moreover, lexicalapproachescannotdistinguishbetweenrealmentionsofdiseasesand mentionsofotherentitiesthatsharethesamenameoracronym. Thechallengeofthis project is to find a model that can combine the strengths of the lexical approaches and supervised approaches, to design a named entity recognizer. We demonstrate that our model can accurately identify disease name mentions in text, by using word embedding to capture context information of each mention, which enables the model todistinguishifitisarealdiseasementionornot. Weevaluateourmodelusingagold standard data set which showed high precision of 84% and accuracy of 96%. Finally, we compare the performance of our model to different statistical name entity recognition models, and the results show that our model outperforms the unsupervised lexical approaches.
APA, Harvard, Vancouver, ISO, and other styles
9

Ghaddar, Abbas. "Leveraging distant supervision for improved named entity recognition." Thesis, 2020. http://hdl.handle.net/1866/24799.

Full text
Abstract:
Les techniques d'apprentissage profond ont fait un bond au cours des dernières années, et ont considérablement changé la manière dont les tâches de traitement automatique du langage naturel (TALN) sont traitées. En quelques années, les réseaux de neurones et les plongements de mots sont rapidement devenus des composants centraux à adopter dans le domaine. La supervision distante (SD) est une technique connue en TALN qui consiste à générer automatiquement des données étiquetées à partir d'exemples partiellement annotés. Traditionnellement, ces données sont utilisées pour l'entraînement en l'absence d'annotations manuelles, ou comme données supplémentaires pour améliorer les performances de généralisation. Dans cette thèse, nous étudions comment la supervision distante peut être utilisée dans un cadre d'un TALN moderne basé sur l'apprentissage profond. Puisque les algorithmes d'apprentissage profond s'améliorent lorsqu'une quantité massive de données est fournie (en particulier pour l'apprentissage des représentations), nous revisitons la génération automatique des données avec la supervision distante à partir de Wikipédia. On applique des post-traitements sur Wikipédia pour augmenter la quantité d'exemples annotés, tout en introduisant une quantité raisonnable de bruit. Ensuite, nous explorons différentes méthodes d'utilisation de données obtenues par supervision distante pour l'apprentissage des représentations, principalement pour apprendre des représentations de mots classiques (statistiques) et contextuelles. À cause de sa position centrale pour de nombreuses applications du TALN, nous choisissons la reconnaissance d'entité nommée (NER) comme tâche principale. Nous expérimentons avec des bancs d’essai NER standards et nous observons des performances état de l’art. Ce faisant, nous étudions un cadre plus intéressant, à savoir l'amélioration des performances inter-domaines (généralisation).<br>Recent years have seen a leap in deep learning techniques that greatly changed the way Natural Language Processing (NLP) tasks are tackled. In a couple of years, neural networks and word embeddings quickly became central components to be adopted in the domain. Distant supervision (DS) is a well-used technique in NLP to produce labeled data from partially annotated examples. Traditionally, it was mainly used as training data in the absence of manual annotations, or as additional training data to improve generalization performances. In this thesis, we study how distant supervision can be employed within a modern deep learning based NLP framework. As deep learning algorithms gets better when massive amount of data is provided (especially for representation learning), we revisit the task of generating distant supervision data from Wikipedia. We apply post-processing treatments on the original dump to further increase the quantity of labeled examples, while introducing a reasonable amount of noise. Then, we explore different methods for using distant supervision data for representation learning, mainly to learn classic and contextualized word representations. Due to its importance as a basic component in many NLP applications, we choose Named-Entity Recognition (NER) as our main task. We experiment on standard NER benchmarks showing state-of-the-art performances. By doing so, we investigate a more interesting setting, that is, improving the cross-domain (generalization) performances.
APA, Harvard, Vancouver, ISO, and other styles
10

Dittrich, Felix. "Künstliche neuronale Netze zur Verarbeitung natürlicher Sprache." 2021. https://htwk-leipzig.qucosa.de/id/qucosa%3A74486.

Full text
Abstract:
An der Verarbeitung natürlicher Sprache durch computerbasierte Systeme wurde immer aktiv entwickelt und geforscht, um Aufgaben in den am weitesten verbreiteten Sprachen zu lösen. In dieser Arbeit werden verschiedene Ansätze zur Lösung von Problemen in diesem Bereich mittels künstlicher neuronaler Netze beschrieben. Dabei konzentriert sich diese Arbeit hauptsächlich auf modernere Architekturen wie Transformatoren oder BERT. Ziel dabei ist es, diese besser zu verstehen und herauszufinden, welche Vorteile sie gegenüber herkömmlichen künstlichen neuronalen Netzwerken haben. Anschließend wird dieses erlangte Wissen an einer Aufgabe aus dem Bereich der Verarbeitung natürlicher Sprache getestet, in welcher mittels einer sogenannten Named Entity Recognition (NER) spezielle Informationen aus Texten extrahiert werden.:1 Einleitung 1.1 Verarbeitung natürlicher Sprache (NLP) 1.2 Neuronale Netze 1.2.1 Biologischer Hintergrund 1.3 Aufbau der Arbeit 2 Grundlagen 2.1 Künstliche neuronale Netze 2.1.1 Arten des Lernens 2.1.2 Aktivierungsfunktionen 2.1.3 Verlustfunktionen 2.1.4 Optimierer 2.1.5 Über- und Unteranpassung 2.1.6 Explodierender und verschwindender Gradient 2.1.7 Optimierungsverfahren 3 Netzwerkarchitekturen zur Verarbeitung natürlicher Sprache 3.1 Rekurrente neuronale Netze (RNN) 3.1.1 Langes Kurzzeitgedächtnis (LSTM) 3.2 Autoencoder 3.3 Transformator 3.3.1 Worteinbettungen 3.3.2 Positionscodierung 3.3.3 Encoderblock 3.3.4 Decoderblock 3.3.5 Grenzen Transformatorarchitektur 3.4 Bidirektionale Encoder-Darstellungen von Transformatoren (BERT) 3.4.1 Vortraining 3.4.2 Feinabstimmung 4 Praktischer Teil und Ergebnisse 4.1 Aufgabe 4.2 Verwendete Bibliotheken, Programmiersprachen und Software 4.2.1 Python 4.2.2 NumPy 4.2.3 pandas 4.2.4 scikit-learn 4.2.5 Tensorflow 4.2.6 Keras 4.2.7 ktrain 4.2.8 Data Version Control (dvc) 4.2.9 FastAPI 4.2.10 Docker 4.2.11 Amazon Web Services 4.3 Daten 4.4 Netzwerkarchitektur 4.5 Training 4.6 Auswertung 4.7 Implementierung 5 Schlussbemerkungen 5.1 Zusammenfassung und Ausblick
APA, Harvard, Vancouver, ISO, and other styles

Book chapters on the topic "Multilingual, named entity recognition, NER"

1

Maharjan, Gopal, Bal Krishna Bal, and Santosh Regmi. "Named Entity Recognition (NER) for Nepali." In Communications in Computer and Information Science. Springer International Publishing, 2019. http://dx.doi.org/10.1007/978-3-030-29750-3_6.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Esteves, Diego, José Marcelino, Piyush Chawla, Asja Fischer, and Jens Lehmann. "HORUS-NER: A Multimodal Named Entity Recognition Framework for Noisy Data." In Advances in Intelligent Data Analysis XIX. Springer International Publishing, 2021. http://dx.doi.org/10.1007/978-3-030-74251-5_8.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Biswas, Sitanath, Sujata Dash, and Sweta Acharya. "Firefly Algorithm Based Multilingual Named Entity Recognition for Indian Languages." In Communications in Computer and Information Science. Springer Singapore, 2018. http://dx.doi.org/10.1007/978-981-13-3140-4_49.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Theivendiram, Pranavan, Megala Uthayakumar, Nilusija Nadarasamoorthy, et al. "Named-Entity-Recognition (NER) for Tamil Language Using Margin-Infused Relaxed Algorithm (MIRA)." In Computational Linguistics and Intelligent Text Processing. Springer International Publishing, 2018. http://dx.doi.org/10.1007/978-3-319-75477-2_33.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Wang, Yu, Yun Li, Ziye Zhu, Bin Xia, and Zheng Liu. "SC-NER: A Sequence-to-Sequence Model with Sentence Classification for Named Entity Recognition." In Advances in Knowledge Discovery and Data Mining. Springer International Publishing, 2019. http://dx.doi.org/10.1007/978-3-030-16148-4_16.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Zhu, Peng, Dawei Cheng, Fangzhou Yang, Yifeng Luo, Weining Qian, and Aoying Zhou. "ZH-NER: Chinese Named Entity Recognition with Adversarial Multi-task Learning and Self-Attentions." In Database Systems for Advanced Applications. Springer International Publishing, 2021. http://dx.doi.org/10.1007/978-3-030-73197-7_40.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Szarvas, György, Richárd Farkas, and András Kocsor. "A Multilingual Named Entity Recognition System Using Boosting and C4.5 Decision Tree Learning Algorithms." In Discovery Science. Springer Berlin Heidelberg, 2006. http://dx.doi.org/10.1007/11893318_27.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Vīksna, Rinalds, and Inguna Skadiņa. "Large Language Models for Latvian Named Entity Recognition." In Frontiers in Artificial Intelligence and Applications. IOS Press, 2020. http://dx.doi.org/10.3233/faia200603.

Full text
Abstract:
Transformer-based language models pre-trained on large corpora have demonstrated good results on multiple natural language processing tasks for widely used languages including named entity recognition (NER). In this paper, we investigate the role of the BERT models in the NER task for Latvian. We introduce the BERT model pre-trained on the Latvian language data. We demonstrate that the Latvian BERT model, pre-trained on large Latvian corpora, achieves better results (81.91 F1-measure on average vs 78.37 on M-BERT for a dataset with nine named entity types, and 79.72 vs 78.83 on another dataset with seven types) than multilingual BERT and outperforms previously developed Latvian NER systems.
APA, Harvard, Vancouver, ISO, and other styles
9

Palshikar, Girish Keshav. "Techniques for Named Entity Recognition." In Advances in Human and Social Aspects of Technology. IGI Global, 2012. http://dx.doi.org/10.4018/978-1-4666-0894-8.ch011.

Full text
Abstract:
While building and using a fully semantic understanding of Web contents is a distant goal, named entities (NEs) provide a small, tractable set of elements carrying a well-defined semantics. Generic named entities are names of persons, locations, organizations, phone numbers, and dates, while domain-specific named entities includes names of for example, proteins, enzymes, organisms, genes, cells, et cetera, in the biological domain. An ability to automatically perform named entity recognition (NER) – i.e., identify occurrences of NE in Web contents – can have multiple benefits, such as improving the expressiveness of queries and also improving the quality of the search results. A number of factors make building highly accurate NER a challenging task. Given the importance of NER in semantic processing of text, this chapter presents a detailed survey of NER techniques for English text.
APA, Harvard, Vancouver, ISO, and other styles
10

Sandhya P. and Mahek Laxmikant Kantesaria. "Named Entity Recognition in Document Summarization." In Trends and Applications of Text Summarization Techniques. IGI Global, 2020. http://dx.doi.org/10.4018/978-1-5225-9373-7.ch005.

Full text
Abstract:
Named entity recognition (NER) is a subtask of the information extraction. NER system reads the text and highlights the entities. NER will separate different entities according to the project. NER is the process of two steps. The steps are detection of names and classifications of them. The first step is further divided into the segmentation. The second step will consist to choose an ontology which will organize the things categorically. Document summarization is also called automatic summarization. It is a process in which the text document with the help of software will create a summary by selecting the important points of the original text. In this chapter, the authors explain how document summarization is performed using named entity recognition. They discuss about the different types of summarization techniques. They also discuss about how NER works and its applications. The libraries available for NER-based information extraction are explained. They finally explain how NER is applied into document summarization.
APA, Harvard, Vancouver, ISO, and other styles

Conference papers on the topic "Multilingual, named entity recognition, NER"

1

Al-Rfou, Rami, Vivek Kulkarni, Bryan Perozzi, and Steven Skiena. "POLYGLOT-NER: Massive Multilingual Named Entity Recognition." In Proceedings of the 2015 SIAM International Conference on Data Mining. Society for Industrial and Applied Mathematics, 2015. http://dx.doi.org/10.1137/1.9781611974010.66.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Poibeau, Thierry. "The multilingual named entity recognition framework." In the tenth conference. Association for Computational Linguistics, 2003. http://dx.doi.org/10.3115/1067737.1067772.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Hakala, Kai, and Sampo Pyysalo. "Biomedical Named Entity Recognition with Multilingual BERT." In Proceedings of The 5th Workshop on BioNLP Open Shared Tasks. Association for Computational Linguistics, 2019. http://dx.doi.org/10.18653/v1/d19-5709.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Druzhkina, Anna, Alexey Leontyev, and Maria Stepanova. "German NER with a Multilingual Rule Based Information Extraction System: Analysis and Issues." In Proceedings of the Sixth Named Entity Workshop. Association for Computational Linguistics, 2016. http://dx.doi.org/10.18653/v1/w16-2704.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Ni, Jian, and Radu Florian. "Improving Multilingual Named Entity Recognition with Wikipedia Entity Type Mapping." In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 2016. http://dx.doi.org/10.18653/v1/d16-1135.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Mueller, David, Nicholas Andrews, and Mark Dredze. "Sources of Transfer in Multilingual Named Entity Recognition." In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 2020. http://dx.doi.org/10.18653/v1/2020.acl-main.720.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Moreno, Jose G., Elvys Linhares Pontes, Mickael Coustaty, and Antoine Doucet. "TLR at BSNLP2019: A Multilingual Named Entity Recognition System." In Proceedings of the 7th Workshop on Balto-Slavic Natural Language Processing. Association for Computational Linguistics, 2019. http://dx.doi.org/10.18653/v1/w19-3711.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Arkhipov, Mikhail, Maria Trofimova, Yuri Kuratov, and Alexey Sorokin. "Tuning Multilingual Transformers for Language-Specific Named Entity Recognition." In Proceedings of the 7th Workshop on Balto-Slavic Natural Language Processing. Association for Computational Linguistics, 2019. http://dx.doi.org/10.18653/v1/w19-3712.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Bhardwaj, Bhavya, Syed Ishtiyaq Ahmed, J. Jaiharie, R. Sorabh Dadhich, and M. Ganesan. "Web Scraping Using Summarization and Named Entity Recognition (NER)." In 2021 7th International Conference on Advanced Computing and Communication Systems (ICACCS). IEEE, 2021. http://dx.doi.org/10.1109/icaccs51430.2021.9441888.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

Mikhailov, Vladislav, and Tatiana Shavrina. "Domain-Transferable Method for Named Entity Recognition Task." In 9th International Conference on Natural Language Processing (NLP 2020). AIRCC Publishing Corporation, 2020. http://dx.doi.org/10.5121/csit.2020.101407.

Full text
Abstract:
Named Entity Recognition (NER) is a fundamental task in the fields of natural language processing and information extraction. NER has been widely used as a standalone tool or an essential component in a variety of applications such as question answering, dialogue assistants and knowledge graphs development. However, training reliable NER models requires a large amount of labelled data which is expensive to obtain, particularly in specialized domains. This paper describes a method to learn a domain-specific NER model for an arbitrary set of named entities when domain-specific supervision is not available. We assume that the supervision can be obtained with no human effort, and neural models can learn from each other. The code, data and models are publicly available.
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!