Academic literature on the topic 'Cross lingual text classification'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the lists of relevant articles, books, theses, conference reports, and other scholarly sources on the topic 'Cross lingual text classification.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Journal articles on the topic "Cross lingual text classification"

1

Zhang, Mozhi, Yoshinari Fujinuma, and Jordan Boyd-Graber. "Exploiting Cross-Lingual Subword Similarities in Low-Resource Document Classification." Proceedings of the AAAI Conference on Artificial Intelligence 34, no. 05 (April 3, 2020): 9547–54. http://dx.doi.org/10.1609/aaai.v34i05.6500.

Full text
Abstract:
Text classification must sometimes be applied in a low-resource language with no labeled training data. However, training data may be available in a related language. We investigate whether character-level knowledge transfer from a related language helps text classification. We present a cross-lingual document classification framework (caco) that exploits cross-lingual subword similarity by jointly training a character-based embedder and a word-based classifier. The embedder derives vector representations for input words from their written forms, and the classifier makes predictions based on the word vectors. We use a joint character representation for both the source language and the target language, which allows the embedder to generalize knowledge about source language words to target language words with similar forms. We propose a multi-task objective that can further improve the model if additional cross-lingual or monolingual resources are available. Experiments confirm that character-level knowledge transfer is more data-efficient than word-level transfer between related languages.
APA, Harvard, Vancouver, ISO, and other styles
2

Moreo Fernández, Alejandro, Andrea Esuli, and Fabrizio Sebastiani. "Distributional Correspondence Indexing for Cross-Lingual and Cross-Domain Sentiment Classification." Journal of Artificial Intelligence Research 55 (January 20, 2016): 131–63. http://dx.doi.org/10.1613/jair.4762.

Full text
Abstract:
Domain Adaptation (DA) techniques aim at enabling machine learning methods learn effective classifiers for a "target'' domain when the only available training data belongs to a different "source'' domain. In this paper we present the Distributional Correspondence Indexing (DCI) method for domain adaptation in sentiment classification. DCI derives term representations in a vector space common to both domains where each dimension reflects its distributional correspondence to a pivot, i.e., to a highly predictive term that behaves similarly across domains. Term correspondence is quantified by means of a distributional correspondence function (DCF). We propose a number of efficient DCFs that are motivated by the distributional hypothesis, i.e., the hypothesis according to which terms with similar meaning tend to have similar distributions in text. Experiments show that DCI obtains better performance than current state-of-the-art techniques for cross-lingual and cross-domain sentiment classification. DCI also brings about a significantly reduced computational cost, and requires a smaller amount of human intervention. As a final contribution, we discuss a more challenging formulation of the domain adaptation problem, in which both the cross-domain and cross-lingual dimensions are tackled simultaneously.
APA, Harvard, Vancouver, ISO, and other styles
3

Steinberger, Ralf, and Bruno Pouliquen. "Cross-lingual Named Entity Recognition." Lingvisticæ Investigationes. International Journal of Linguistics and Language Resources 30, no. 1 (August 10, 2007): 135–62. http://dx.doi.org/10.1075/li.30.1.09ste.

Full text
Abstract:
Named Entity Recognition and Classification (NERC) is a known and well-explored text analysis application that has been applied to various languages. We are presenting an automatic, highly multilingual news analysis system that fully integrates NERC for locations, persons and organisations with document clustering, multi-label categorisation, name attribute extraction, name variant merging and the calculation of social networks. The proposed application goes beyond the state-of-the-art by automatically merging the information found in news written in ten different languages, and by using the aggregated name information to automatically link related news documents across languages for all 45 language pair combinations. While state-of-the-art approaches for cross-lingual name variant merging and document similarity calculation require bilingual resources, the methods proposed here are mostly language-independent and require a minimal amount of monolingual language-specific effort. The development of resources for additional languages is therefore kept to a minimum and new languages can be plugged into the system effortlessly. The presented online news analysis application is fully functional and has, at the end of the year 2006, reached average usage statistics of 600,000 hits per day.
APA, Harvard, Vancouver, ISO, and other styles
4

Pelicon, Andraž, Marko Pranjić, Dragana Miljković, Blaž Škrlj, and Senja Pollak. "Zero-Shot Learning for Cross-Lingual News Sentiment Classification." Applied Sciences 10, no. 17 (August 29, 2020): 5993. http://dx.doi.org/10.3390/app10175993.

Full text
Abstract:
In this paper, we address the task of zero-shot cross-lingual news sentiment classification. Given the annotated dataset of positive, neutral, and negative news in Slovene, the aim is to develop a news classification system that assigns the sentiment category not only to Slovene news, but to news in another language without any training data required. Our system is based on the multilingual BERTmodel, while we test different approaches for handling long documents and propose a novel technique for sentiment enrichment of the BERT model as an intermediate training step. With the proposed approach, we achieve state-of-the-art performance on the sentiment analysis task on Slovenian news. We evaluate the zero-shot cross-lingual capabilities of our system on a novel news sentiment test set in Croatian. The results show that the cross-lingual approach also largely outperforms the majority classifier, as well as all settings without sentiment enrichment in pre-training.
APA, Harvard, Vancouver, ISO, and other styles
5

Wan, Xiaojun. "Bilingual Co-Training for Sentiment Classification of Chinese Product Reviews." Computational Linguistics 37, no. 3 (September 2011): 587–616. http://dx.doi.org/10.1162/coli_a_00061.

Full text
Abstract:
The lack of reliable Chinese sentiment resources limits research progress on Chinese sentiment classification. However, there are many freely available English sentiment resources on the Web. This article focuses on the problem of cross-lingual sentiment classification, which leverages only available English resources for Chinese sentiment classification. We first investigate several basic methods (including lexicon-based methods and corpus-based methods) for cross-lingual sentiment classification by simply leveraging machine translation services to eliminate the language gap, and then propose a bilingual co-training approach to make use of both the English view and the Chinese view based on additional unlabeled Chinese data. Experimental results on two test sets show the effectiveness of the proposed approach, which can outperform basic methods and transductive methods.
APA, Harvard, Vancouver, ISO, and other styles
6

Liu, Ling, and Sang-Bing Tsai. "Intelligent Recognition and Teaching of English Fuzzy Texts Based on Fuzzy Computing and Big Data." Wireless Communications and Mobile Computing 2021 (July 10, 2021): 1–10. http://dx.doi.org/10.1155/2021/1170622.

Full text
Abstract:
In this paper, we conduct in-depth research and analysis on the intelligent recognition and teaching of English fuzzy text through parallel projection and region expansion. Multisense Soft Cluster Vector (MSCVec), a multisense word vector model based on nonnegative matrix decomposition and sparse soft clustering, is constructed. The MSCVec model is a monolingual word vector model, which uses nonnegative matrix decomposition of positive point mutual information between words and contexts to extract low-rank expressions of mixed semantics of multisense words and then uses sparse. It uses the nonnegative matrix decomposition of the positive pointwise mutual information between words and contexts to extract the low-rank expressions of the mixed semantics of the polysemous words and then uses the sparse soft clustering algorithm to partition the multiple word senses of the polysemous words and also obtains the global sense of the polysemous word affiliation distribution; the specific polysemous word cluster classes are determined based on the negative mean log-likelihood of the global affiliation between the contextual semantics and the polysemous words, and finally, the polysemous word vectors are learned using the Fast text model under the extended dictionary word set. The advantage of the MSCVec model is that it is an unsupervised learning process without any knowledge base, and the substring representation in the model ensures the generation of unregistered word vectors; in addition, the global affiliation of the MSCVec model can also expect polysemantic word vectors to single word vectors. Compared with the traditional static word vectors, MSCVec shows excellent results in both word similarity and downstream text classification task experiments. The two sets of features are then fused and extended into new semantic features, and similarity classification experiments and stack generalization experiments are designed for comparison. In the cross-lingual sentence-level similarity detection task, SCLVec cross-lingual word vector lexical-level features outperform MSCVec multisense word vector features as the input embedding layer; deep semantic sentence-level features trained by twin recurrent neural networks outperform the semantic features of twin convolutional neural networks; extensions of traditional statistical features can effectively improve cross-lingual similarity detection performance, especially cross-lingual topic model (BL-LDA); the stack generalization integration approach maximizes the error rate of the underlying classifier and improves the detection accuracy.
APA, Harvard, Vancouver, ISO, and other styles
7

Santini, Marina, and Min-Chun Shih. "Exploring the Potential of an Extensible Domain-Specific Web Corpus for “Layfication”." International Journal of Cyber-Physical Systems 2, no. 1 (January 2020): 20–32. http://dx.doi.org/10.4018/ijcps.2020010102.

Full text
Abstract:
This article presents experiments based on the extensible domain-specific web corpus for “layfication”. For these experiments, both the existing layfication corpus (in Swedish and in English) and a new addition in English (the NHS-PubMed subcorpus) are used. With this extended corpus, methods to classify lay-specialized medical sublanguages cross-linguistically using small data and noisy web documents are investigated. Sublanguage is a language variety used in specific domains. Here, the authors focus on two medical sublanguages, namely the “patientspeak” (lay) and the medical jargon (specialized). Cross-lingual sublanguage classification is still largely underexplored although it can be crucial in downstream applications for digital health and cyber-physical systems. Classification models are built using small and noisy training sets in Swedish and evaluated on English test sets. The performance of Naive Bayes classifiers—built with stopwords and with Bag-of-Words—is compared with convolutional neural network classifiers leveraging on MUSE multi-lingual word embeddings. Results are promising and nuanced. These results are proposed as a first baseline for cross-lingual sublanguage classification.
APA, Harvard, Vancouver, ISO, and other styles
8

Moreo Fernández, Alejandro, Andrea Esuli, and Fabrizio Sebastiani. "Lightweight Random Indexing for Polylingual Text Classification." Journal of Artificial Intelligence Research 57 (October 13, 2016): 151–85. http://dx.doi.org/10.1613/jair.5194.

Full text
Abstract:
Multilingual Text Classification (MLTC) is a text classification task in which documents are written each in one among a set L of natural languages, and in which all documents must be classified under the same classification scheme, irrespective of language. There are two main variants of MLTC, namely Cross-Lingual Text Classification (CLTC) and Polylingual Text Classification (PLTC). In PLTC, which is the focus of this paper, we assume (differently from CLTC) that for each language in L there is a representative set of training documents; PLTC consists of improving the accuracy of each of the |L| monolingual classifiers by also leveraging the training documents written in the other (|L| − 1) languages. The obvious solution, consisting of generating a single polylingual classifier from the juxtaposed monolingual vector spaces, is usually infeasible, since the dimensionality of the resulting vector space is roughly |L| times that of a monolingual one, and is thus often unmanageable. As a response, the use of machine translation tools or multilingual dictionaries has been proposed. However, these resources are not always available, or are not always free to use. One machine-translation-free and dictionary-free method that, to the best of our knowledge, has never been applied to PLTC before, is Random Indexing (RI). We analyse RI in terms of space and time efficiency, and propose a particular configuration of it (that we dub Lightweight Random Indexing – LRI). By running experiments on two well known public benchmarks, Reuters RCV1/RCV2 (a comparable corpus) and JRC-Acquis (a parallel one), we show LRI to outperform (both in terms of effectiveness and efficiency) a number of previously proposed machine-translation-free and dictionary-free PLTC methods that we use as baselines.
APA, Harvard, Vancouver, ISO, and other styles
9

Artetxe, Mikel, and Holger Schwenk. "Massively Multilingual Sentence Embeddings for Zero-Shot Cross-Lingual Transfer and Beyond." Transactions of the Association for Computational Linguistics 7 (November 2019): 597–610. http://dx.doi.org/10.1162/tacl_a_00288.

Full text
Abstract:
We introduce an architecture to learn joint multilingual sentence representations for 93 languages, belonging to more than 30 different families and written in 28 different scripts. Our system uses a single BiLSTM encoder with a shared byte-pair encoding vocabulary for all languages, which is coupled with an auxiliary decoder and trained on publicly available parallel corpora. This enables us to learn a classifier on top of the resulting embeddings using English annotated data only, and transfer it to any of the 93 languages without any modification. Our experiments in cross-lingual natural language inference (XNLI data set), cross-lingual document classification (MLDoc data set), and parallel corpus mining (BUCC data set) show the effectiveness of our approach. We also introduce a new test set of aligned sentences in 112 languages, and show that our sentence embeddings obtain strong results in multilingual similarity search even for low- resource languages. Our implementation, the pre-trained encoder, and the multilingual test set are available at https://github.com/facebookresearch/LASER .
APA, Harvard, Vancouver, ISO, and other styles
10

Li, Gen, Nan Duan, Yuejian Fang, Ming Gong, and Daxin Jiang. "Unicoder-VL: A Universal Encoder for Vision and Language by Cross-Modal Pre-Training." Proceedings of the AAAI Conference on Artificial Intelligence 34, no. 07 (April 3, 2020): 11336–44. http://dx.doi.org/10.1609/aaai.v34i07.6795.

Full text
Abstract:
We propose Unicoder-VL, a universal encoder that aims to learn joint representations of vision and language in a pre-training manner. Borrow ideas from cross-lingual pre-trained models, such as XLM (Lample and Conneau 2019) and Unicoder (Huang et al. 2019), both visual and linguistic contents are fed into a multi-layer Transformer (Vaswani et al. 2017) for the cross-modal pre-training, where three pre-trained tasks are employed, including Masked Language Modeling(MLM), Masked Object Classification(MOC) and Visual-linguistic Matching(VLM). The first two tasks learn context-aware representations for input tokens based on linguistic and visual contents jointly. The last task tries to predict whether an image and a text describe each other. After pretraining on large-scale image-caption pairs, we transfer Unicoder-VL to caption-based image-text retrieval and visual commonsense reasoning, with just one additional output layer. We achieve state-of-the-art or comparable results on both two tasks and show the powerful ability of the cross-modal pre-training.
APA, Harvard, Vancouver, ISO, and other styles
More sources

Dissertations / Theses on the topic "Cross lingual text classification"

1

Petrenz, Philipp. "Cross-lingual genre classification." Thesis, University of Edinburgh, 2014. http://hdl.handle.net/1842/9658.

Full text
Abstract:
Automated classification of texts into genres can benefit NLP applications, in that the structure, location and even interpretation of information within a text are dictated by its genre. Cross-lingual methods promise such benefits to languages which lack genre-annotated training data. While there has been work on genre classification for over two decades, none has considered cross-lingual methods before the start of this project. My research aims to fill this gap. It follows previous approaches to monolingual genre classification that exploit simple, low-level text features, many of which can be extracted in different languages and have similar functions. This contrasts with work on cross-lingual topic or sentiment classification of texts that typically use word frequencies as features. These have been shown to have limited use when it comes to genres. Many such methods also assume cross-lingual resources, such as machine translation, which limits the range of their application. A selection of these approaches are used as baselines in my experiments. I report the results of two semi-supervised methods for exploiting genre-labelled source language texts and unlabelled target language texts. The first is a relatively simple algorithm that bridges the language gap by exploiting cross-lingual features and then iteratively re-trains a classification model on previously predicted target texts. My results show that this approach works well where only few cross-lingual resources are available and texts are to be classified into broad genre categories. It is also shown that further improvements can be achieved through multi-lingual training or cross-lingual feature selection if genre-annotated texts are available in several source languages. The second is a variant of the label propagation algorithm. This graph-based classifier learns genre-specific feature set weights from both source and target language texts and uses them to adjust the propagation channels for each text. This allows further feature sets to be added as additional resources, such as Part of Speech taggers, become available. While the method performs well even with basic text features, it is shown to benefit from additional feature sets. Results also indicate that it handles fine-grained genre classes better than the iterative re-labelling method.
APA, Harvard, Vancouver, ISO, and other styles
2

Shih, Min-Chun. "Exploring Cross-lingual Sublanguage Classification with Multi-lingual Word Embeddings." Thesis, Linköpings universitet, Statistik och maskininlärning, 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-166148.

Full text
Abstract:
Cross-lingual text classification is an important task due to the globalization and the increased availability of multilingual data. This thesis explores the method of implementing cross-lingual classification on Swedish and English medical corpora. Specifically, this the- sis explores the simple convolutional neural network (CNN) with MUSE pre-trained word embeddings to approach binary classification of sublanguages (“lay” and “specialized”) from Swedish healthcare texts to English healthcare texts. MUSE is a library that provides state-of-the-art multilingual word embeddings and large-scale high-quality bilingual dictionaries. The thesis presents experiments with imbalanced and balanced class distribution on training data and test data to examine the effect of class distribution, and also examine the influences of clean test dataset and noisy test dataset. The results show that balanced distribution of classes in training data performs significantly better than the training data with imbalanced class distribution, and clean test data gives the benefit of transferring the labels from one language to another. The thesis also compares the performance of the simple convolutional neural network model with the Naive Bayes baseline. Results show that on this task a simple Naive Bayes classifier based on bag-of-words translated using MUSE English-Swedish dictionary outperforms a simple CNN model based on MUSE pre-trained word embeddings in several experimental settings.
APA, Harvard, Vancouver, ISO, and other styles
3

Tafreshi, Shabnam. "Cross-Genre, Cross-Lingual, and Low-Resource Emotion Classification." Thesis, The George Washington University, 2021. http://pqdtopen.proquest.com/#viewpdf?dispub=28088437.

Full text
Abstract:
Emotions can be defined as a natural, instinctive state of mind arising from one’s circumstances, mood, and relationships with others. It has long been a question to be answered by psychology that how and what is it that humans feel. Enabling computers to recognize human emotions has been an of interest to researchers since 1990s (Picard et al., 1995). Ever since, this area of research has grown significantly and emotion detection is becoming an important component in many natural language processing tasks. Several theories exist for defining emotions and are chosen by researchers according to their needs. For instance, according to appraisal theory, a psychology theory, emotions are produced by our evaluations (appraisals or estimates) of events that cause a specific reaction in different people. Some emotions are easy and universal, while others are complex and nuanced. Emotion classification is generally the process of labeling a piece of text with one or more corresponding emotion labels. Psychologists have developed numerous models and taxonomies of emotions. The model or taxonomy depends on the problem, and thorough study is often required to select the best model. Early studies of emotion classification focused on building computational models to classify basic emotion categories. In recent years, increasing volumes of social media and the digitization of data have opened a new horizon in this area of study, where emotion classification is a key component of applications, including mood and behavioral studies, as well as disaster relief, amongst many other applications. Sophisticated models have been built to detect and classify emotion in text, but few analyze how well a model is able to learn emotion cues. The ability to learn emotion cues properly and be able to generalize this learning is very important. This work investigates the robustness of emotion classification approaches across genres and languages, with a focus on quantifying how well state-of-the-art models are able to learn emotion cues. First, we use multi-task learning and hierarchical models to build emotion models that were trained on data combined from multiple genres. Our hypothesis is that a multi-genre, noisy training environment will help the classifier learn emotion cues that are prevalent across genres. Second, we explore splitting text (i.e. sentence) into its clauses and testing whether the model’s performance improves. Emotion analysis needs fine-grained annotation and clause-level annotation can be beneficial to design features to improve emotion detection performance. Intuitively, clause-level annotations may help the model focus on emotion cues, while ignoring irrelevant portions of the text. Third, we adopted a transfer learning approach for cross-lingual/genre emotion classification to focus the classifier’s attention on emotion cues which are consistent across languages. Fourth, we empirically show how to combine different genres to be able to build robust models that can be used as source models for emotion transfer to low-resource target languages. Finally, this study involved curating and re-annotating popular emotional data sets in different genres, and annotating a multi-genre corpus of Persian tweets and news, and generating a collection of emotional sentences for a low-resource language, Azerbaijani, a language spoken in the north west of Iran.
APA, Harvard, Vancouver, ISO, and other styles
4

Weijand, Sasha. "AUTOMATED GENDER CLASSIFICATION IN WIKIPEDIA BIOGRAPHIESa cross-lingual comparison." Thesis, Umeå universitet, Institutionen för datavetenskap, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:umu:diva-163371.

Full text
Abstract:
The written word plays an important role in the reinforcement of gender stereotypes, especially in texts of a more formal character. Wikipedia biographies have a lot of information about famous people, but do they describe men and women with different kinds of words? This thesis aims to evaluate and explore a method for gender classification of text. In this study, two machine learning classifiers, Random Forest (RF) and Support Vector Machine (SVM), are applied to the gender classification of Wikipedia biographies in two languages, English and French. Their performance is evaluated and compared. The 500 most important words (features) are listed for each of the classifiers.A short review is given on the theoretic foundations of text classification, and a detailed description on how the datasets are built, what tools are used, and why. The datasets used are built from the first 5 paragraphs in each biography, with only nouns, verbs, adjectives and adverbs remaining. Feature ranking is also applied, where the top tenth of the features are kept.Performance is measured using the F0:5-score. The comparison shows that the RF and SVM classifiers' performance are close to each other, but that the classifiers perform worse on the French set than on the English. Initial performance scores range from 0.82 to 0.86, but they drop drastically when the most important features are removed from the set. A majority of the top most important features are nouns related to career and family roles, in both languages.The results show that there are indeed some semantic differences in language depending on the gender of the person described. Whether these depend on the writers' biased views, an unequal gender distribution of real world contexts, such as careers, or if these differences depend on how the datasets were built, is not clear.
APA, Harvard, Vancouver, ISO, and other styles
5

Krithivasan, Bhavani. "Cross-Language tweet classification using Bing Translator." Kansas State University, 2017. http://hdl.handle.net/2097/38556.

Full text
Abstract:
Master of Science
Department of Computing and Information Sciences
Doina Caragea
Social media affects our daily lives. It is one of the first sources for finding breaking news. In particular, Twitter is one of the popular social media platforms, with around 330 million monthly users. From local events such as Fake Patty's Day to across the world happenings - Twitter gets there first. During a disaster, tweets can be used to post warnings, status of available medical and food supply, emergency personnel, and updates. Users were practically tweeting about the Hurricane Sandy, despite lack of network during the storm. Analysis of these tweets can help monitor the disaster, plan and manage the crisis, and aid in research. In this research, we use the publicly available tweets posted during several disasters and identify the relevant tweets. As the languages in the datasets are different, Bing translation API has been used to detect and translate the tweets. The translations are then, used as training datasets for supervised machine learning algorithms. Supervised learning is the process of learning from a labeled training dataset. This learned classifier can then be used to predict the correct output for any valid input. When trained to more observations, the algorithm improves its predictive performance.
APA, Harvard, Vancouver, ISO, and other styles
6

Varga, Andrea. "Exploiting domain knowledge for cross-domain text classification in heterogeneous data sources." Thesis, University of Sheffield, 2014. http://etheses.whiterose.ac.uk/7538/.

Full text
Abstract:
With the growing amount of data generated in large heterogeneous repositories (such as the Word Wide Web, corporate repositories, citation databases), there is an increased need for the end users to locate relevant information efficiently. Text Classification (TC) techniques provide automated means for classifying fragments of text (phrases, paragraphs or documents) into predefined semantic types, allowing an efficient way for organising and analysing such large document collections. Current approaches to TC rely on supervised learning, which perform well on the domains on which the TC system is built, but tend to adapt poorly to different domains. This thesis presents a body of work for exploring adaptive TC techniques across hetero- geneous corpora in large repositories with the goal of finding novel ways of bridging the gap across domains. The proposed approaches rely on the exploitation of domain knowledge for the derivation of stable cross-domain features. This thesis also investigates novel ways of estimating the performance of a TC classifier, by means of domain similarity measures. For this purpose, two novel knowledge-based similarity measures are proposed that capture the usefulness of the selected cross-domain features for cross-domain TC. The evaluation of these approaches and measures is presented on real world datasets against various strong baseline methods and content-based measures used in transfer learning. This thesis explores how domain knowledge can be used to enhance the representation of documents to address the lexical gap across the domains. Given that the effectiveness of a text classifier largely depends on the availability of annotated data, this thesis explores techniques which can leverage data from social knowledge sources (such as DBpedia and Freebase). Techniques are further presented, which explore the feasibility of exploiting different semantic graph structures from knowledge sources in order to create novel cross- domain features and domain similarity metrics. The methodologies presented provide a novel representation of documents, and exploit four wide coverage knowledge sources: DBpedia, Freebase, SNOMED-CT and MeSH. The contribution of this thesis demonstrates the feasibility of exploiting domain knowl- edge for adaptive TC and domain similarity, providing an enhanced representation of docu- ments with semantic information about entities, that can indeed reduce the lexical differences between domains.
APA, Harvard, Vancouver, ISO, and other styles
7

Asian, Jelita, and jelitayang@gmail com. "Effective Techniques for Indonesian Text Retrieval." RMIT University. Computer Science and Information Technology, 2007. http://adt.lib.rmit.edu.au/adt/public/adt-VIT20080110.084651.

Full text
Abstract:
The Web is a vast repository of data, and information on almost any subject can be found with the aid of search engines. Although the Web is international, the majority of research on finding of information has a focus on languages such as English and Chinese. In this thesis, we investigate information retrieval techniques for Indonesian. Although Indonesia is the fourth most populous country in the world, little attention has been given to search of Indonesian documents. Stemming is the process of reducing morphological variants of a word to a common stem form. Previous research has shown that stemming is language-dependent. Although several stemming algorithms have been proposed for Indonesian, there is no consensus on which gives better performance. We empirically explore these algorithms, showing that even the best algorithm still has scope for improvement. We propose novel extensions to this algorithm and develop a new Indonesian stemmer, and show that these can improve stemming correctness by up to three percentage points; our approach makes less than one error in thirty-eight words. We propose a range of techniques to enhance the performance of Indonesian information retrieval. These techniques include: stopping; sub-word tokenisation; and identification of proper nouns; and modifications to existing similarity functions. Our experiments show that many of these techniques can increase retrieval performance, with the highest increase achieved when we use grams of size five to tokenise words. We also present an effective method for identifying the language of a document; this allows various information retrieval techniques to be applied selectively depending on the language of target documents. We also address the problem of automatic creation of parallel corpora --- collections of documents that are the direct translations of each other --- which are essential for cross-lingual information retrieval tasks. Well-curated parallel corpora are rare, and for many languages, such as Indonesian, do not exist at all. We describe algorithms that we have developed to automatically identify parallel documents for Indonesian and English. Unlike most current approaches, which consider only the context and structure of the documents, our approach is based on the document content itself. Our algorithms do not make any prior assumptions about the documents, and are based on the Needleman-Wunsch algorithm for global alignment of protein sequences. Our approach works well in identifying Indonesian-English parallel documents, especially when no translation is performed. It can increase the separation value, a measure to discriminate good matches of parallel documents from bad matches, by approximately ten percentage points. We also investigate the applicability of our identification algorithms for other languages that use the Latin alphabet. Our experiments show that, with minor modifications, our alignment methods are effective for English-French, English-German, and French-German corpora, especially when the documents are not translated. Our technique can increase the separation value for the European corpus by up to twenty-eight percentage points. Together, these results provide a substantial advance in understanding techniques that can be applied for effective Indonesian text retrieval.
APA, Harvard, Vancouver, ISO, and other styles
8

Mozafari, Marzieh. "Hate speech and offensive language detection using transfer learning approaches." Electronic Thesis or Diss., Institut polytechnique de Paris, 2021. http://www.theses.fr/2021IPPAS007.

Full text
Abstract:
Une des promesses des plateformes de réseaux sociaux (comme Twitter et Facebook) est de fournir un endroit sûr pour que les utilisateurs puissent partager leurs opinions et des informations. Cependant, l’augmentation des comportements abusifs, comme le harcèlement en ligne ou la présence de discours de haine, est bien réelle. Dans cette thèse, nous nous concentrons sur le discours de haine, l'un des phénomènes les plus préoccupants concernant les réseaux sociaux.Compte tenu de sa forte progression et de ses graves effets négatifs, les institutions, les plateformes de réseaux sociaux et les chercheurs ont tenté de réagir le plus rapidement possible. Les progrès récents des algorithmes de traitement automatique du langage naturel (NLP) et d'apprentissage automatique (ML) peuvent être adaptés pour développer des méthodes automatiques de détection des discours de haine dans ce domaine.Le but de cette thèse est d'étudier le problème du discours de haine et de la détection des propos injurieux dans les réseaux sociaux. Nous proposons différentes approches dans lesquelles nous adaptons des modèles avancés d'apprentissage par transfert (TL) et des techniques de NLP pour détecter automatiquement les discours de haine et les contenus injurieux, de manière monolingue et multilingue.La première contribution concerne uniquement la langue anglaise. Tout d'abord, nous analysons le contenu textuel généré par les utilisateurs en introduisant un nouveau cadre capable de catégoriser le contenu en termes de similarité basée sur différentes caractéristiques. En outre, en utilisant l'API Perspective de Google, nous mesurons et analysons la « toxicité » du contenu. Ensuite, nous proposons une approche TL pour l'identification des discours de haine en utilisant une combinaison du modèle non supervisé pré-entraîné BERT (Bidirectional Encoder Representations from Transformers) et de nouvelles stratégies supervisées de réglage fin. Enfin, nous étudions l'effet du biais involontaire dans notre modèle pré-entraîné BERT et proposons un nouveau mécanisme de généralisation dans les données d'entraînement en repondérant les échantillons puis en changeant les stratégies de réglage fin en termes de fonction de perte pour atténuer le biais racial propagé par le modèle. Pour évaluer les modèles proposés, nous utilisons deux datasets publics provenant de Twitter.Dans la deuxième contribution, nous considérons un cadre multilingue où nous nous concentrons sur les langues à faibles ressources dans lesquelles il n'y a pas ou peu de données annotées disponibles. Tout d'abord, nous présentons le premier corpus de langage injurieux en persan, composé de 6 000 messages de micro-blogs provenant de Twitter, afin d'étudier la détection du langage injurieux. Après avoir annoté le corpus, nous réalisons étudions les performances des modèles de langages pré-entraînés monolingues et multilingues basés sur des transformeurs (par exemple, ParsBERT, mBERT, XLM-R) dans la tâche en aval. De plus, nous proposons un modèle d'ensemble pour améliorer la performance de notre modèle. Enfin, nous étendons notre étude à un problème d'apprentissage multilingue de type " few-shot ", où nous disposons de quelques données annotées dans la langue cible, et nous adaptons une approche basée sur le méta-apprentissage pour traiter l'identification des discours de haine et du langage injurieux dans les langues à faibles ressources
The great promise of social media platforms (e.g., Twitter and Facebook) is to provide a safe place for users to communicate their opinions and share information. However, concerns are growing that they enable abusive behaviors, e.g., threatening or harassing other users, cyberbullying, hate speech, racial and sexual discrimination, as well. In this thesis, we focus on hate speech as one of the most concerning phenomenon in online social media.Given the high progression of online hate speech and its severe negative effects, institutions, social media platforms, and researchers have been trying to react as quickly as possible. The recent advancements in Natural Language Processing (NLP) and Machine Learning (ML) algorithms can be adapted to develop automatic methods for hate speech detection in this area.The aim of this thesis is to investigate the problem of hate speech and offensive language detection in social media, where we define hate speech as any communication criticizing a person or a group based on some characteristics, e.g., gender, sexual orientation, nationality, religion, race. We propose different approaches in which we adapt advanced Transfer Learning (TL) models and NLP techniques to detect hate speech and offensive content automatically, in a monolingual and multilingual fashion.In the first contribution, we only focus on English language. Firstly, we analyze user-generated textual content to gain a brief insight into the type of content by introducing a new framework being able to categorize contents in terms of topical similarity based on different features. Furthermore, using the Perspective API from Google, we measure and analyze the toxicity of the content. Secondly, we propose a TL approach for identification of hate speech by employing a combination of the unsupervised pre-trained model BERT (Bidirectional Encoder Representations from Transformers) and new supervised fine-tuning strategies. Finally, we investigate the effect of unintended bias in our pre-trained BERT based model and propose a new generalization mechanism in training data by reweighting samples and then changing the fine-tuning strategies in terms of the loss function to mitigate the racial bias propagated through the model. To evaluate the proposed models, we use two publicly available datasets from Twitter.In the second contribution, we consider a multilingual setting where we focus on low-resource languages in which there is no or few labeled data available. First, we present the first corpus of Persian offensive language consisting of 6k micro blog posts from Twitter to deal with offensive language detection in Persian as a low-resource language in this domain. After annotating the corpus, we perform extensive experiments to investigate the performance of transformer-based monolingual and multilingual pre-trained language models (e.g., ParsBERT, mBERT, XLM-R) in the downstream task. Furthermore, we propose an ensemble model to boost the performance of our model. Then, we expand our study into a cross-lingual few-shot learning problem, where we have a few labeled data in target language, and adapt a meta-learning based approach to address identification of hate speech and offensive language in low-resource languages
APA, Harvard, Vancouver, ISO, and other styles
9

Franco, Salvador Marc. "A Cross-domain and Cross-language Knowledge-based Representation of Text and its Meaning." Doctoral thesis, Universitat Politècnica de València, 2017. http://hdl.handle.net/10251/84285.

Full text
Abstract:
Natural Language Processing (NLP) is a field of computer science, artificial intelligence, and computational linguistics concerned with the interactions between computers and human languages. One of its most challenging aspects involves enabling computers to derive meaning from human natural language. To do so, several meaning or context representations have been proposed with competitive performance. However, these representations still have room for improvement when working in a cross-domain or cross-language scenario. In this thesis we study the use of knowledge graphs as a cross-domain and cross-language representation of text and its meaning. A knowledge graph is a graph that expands and relates the original concepts belonging to a set of words. We obtain its characteristics using a wide-coverage multilingual semantic network as knowledge base. This allows to have a language coverage of hundreds of languages and millions human-general and -specific concepts. As starting point of our research we employ knowledge graph-based features - along with other traditional ones and meta-learning - for the NLP task of single- and cross-domain polarity classification. The analysis and conclusions of that work provide evidence that knowledge graphs capture meaning in a domain-independent way. The next part of our research takes advantage of the multilingual semantic network and focuses on cross-language Information Retrieval (IR) tasks. First, we propose a fully knowledge graph-based model of similarity analysis for cross-language plagiarism detection. Next, we improve that model to cover out-of-vocabulary words and verbal tenses and apply it to cross-language document retrieval, categorisation, and plagiarism detection. Finally, we study the use of knowledge graphs for the NLP tasks of community questions answering, native language identification, and language variety identification. The contributions of this thesis manifest the potential of knowledge graphs as a cross-domain and cross-language representation of text and its meaning for NLP and IR tasks. These contributions have been published in several international conferences and journals.
El Procesamiento del Lenguaje Natural (PLN) es un campo de la informática, la inteligencia artificial y la lingüística computacional centrado en las interacciones entre las máquinas y el lenguaje de los humanos. Uno de sus mayores desafíos implica capacitar a las máquinas para inferir el significado del lenguaje natural humano. Con este propósito, diversas representaciones del significado y el contexto han sido propuestas obteniendo un rendimiento competitivo. Sin embargo, estas representaciones todavía tienen un margen de mejora en escenarios transdominios y translingües. En esta tesis estudiamos el uso de grafos de conocimiento como una representación transdominio y translingüe del texto y su significado. Un grafo de conocimiento es un grafo que expande y relaciona los conceptos originales pertenecientes a un conjunto de palabras. Sus propiedades se consiguen gracias al uso como base de conocimiento de una red semántica multilingüe de amplia cobertura. Esto permite tener una cobertura de cientos de lenguajes y millones de conceptos generales y específicos del ser humano. Como punto de partida de nuestra investigación empleamos características basadas en grafos de conocimiento - junto con otras tradicionales y meta-aprendizaje - para la tarea de PLN de clasificación de la polaridad mono- y transdominio. El análisis y conclusiones de ese trabajo muestra evidencias de que los grafos de conocimiento capturan el significado de una forma independiente del dominio. La siguiente parte de nuestra investigación aprovecha la capacidad de la red semántica multilingüe y se centra en tareas de Recuperación de Información (RI). Primero proponemos un modelo de análisis de similitud completamente basado en grafos de conocimiento para detección de plagio translingüe. A continuación, mejoramos ese modelo para cubrir palabras fuera de vocabulario y tiempos verbales, y lo aplicamos a las tareas translingües de recuperación de documentos, clasificación, y detección de plagio. Por último, estudiamos el uso de grafos de conocimiento para las tareas de PLN de respuesta de preguntas en comunidades, identificación del lenguaje nativo, y identificación de la variedad del lenguaje. Las contribuciones de esta tesis ponen de manifiesto el potencial de los grafos de conocimiento como representación transdominio y translingüe del texto y su significado en tareas de PLN y RI. Estas contribuciones han sido publicadas en diversas revistas y conferencias internacionales.
El Processament del Llenguatge Natural (PLN) és un camp de la informàtica, la intel·ligència artificial i la lingüística computacional centrat en les interaccions entre les màquines i el llenguatge dels humans. Un dels seus majors reptes implica capacitar les màquines per inferir el significat del llenguatge natural humà. Amb aquest propòsit, diverses representacions del significat i el context han estat proposades obtenint un rendiment competitiu. No obstant això, aquestes representacions encara tenen un marge de millora en escenaris trans-dominis i trans-llenguatges. En aquesta tesi estudiem l'ús de grafs de coneixement com una representació trans-domini i trans-llenguatge del text i el seu significat. Un graf de coneixement és un graf que expandeix i relaciona els conceptes originals pertanyents a un conjunt de paraules. Les seves propietats s'aconsegueixen gràcies a l'ús com a base de coneixement d'una xarxa semàntica multilingüe d'àmplia cobertura. Això permet tenir una cobertura de centenars de llenguatges i milions de conceptes generals i específics de l'ésser humà. Com a punt de partida de la nostra investigació emprem característiques basades en grafs de coneixement - juntament amb altres tradicionals i meta-aprenentatge - per a la tasca de PLN de classificació de la polaritat mono- i trans-domini. L'anàlisi i conclusions d'aquest treball mostra evidències que els grafs de coneixement capturen el significat d'una forma independent del domini. La següent part de la nostra investigació aprofita la capacitat\hyphenation{ca-pa-ci-tat} de la xarxa semàntica multilingüe i se centra en tasques de recuperació d'informació (RI). Primer proposem un model d'anàlisi de similitud completament basat en grafs de coneixement per a detecció de plagi trans-llenguatge. A continuació, vam millorar aquest model per cobrir paraules fora de vocabulari i temps verbals, i ho apliquem a les tasques trans-llenguatges de recuperació de documents, classificació, i detecció de plagi. Finalment, estudiem l'ús de grafs de coneixement per a les tasques de PLN de resposta de preguntes en comunitats, identificació del llenguatge natiu, i identificació de la varietat del llenguatge. Les contribucions d'aquesta tesi posen de manifest el potencial dels grafs de coneixement com a representació trans-domini i trans-llenguatge del text i el seu significat en tasques de PLN i RI. Aquestes contribucions han estat publicades en diverses revistes i conferències internacionals.
Franco Salvador, M. (2017). A Cross-domain and Cross-language Knowledge-based Representation of Text and its Meaning [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/84285
TESIS
APA, Harvard, Vancouver, ISO, and other styles
10

van, Luenen Anne Fleur. "Recognising Moral Foundations in Online Extremist Discourse : A Cross-Domain Classification Study." Thesis, Uppsala universitet, Institutionen för lingvistik och filologi, 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-426921.

Full text
Abstract:
So far, studies seeking to recognise moral foundations in texts have been relatively successful (Araque et al., 2019; Lin et al., 2018; Mooijman et al., 2017; Rezapouret al., 2019). There are, however, two issues with these studies: Firstly, it is an extensive process to gather and annotate sufficient material for training. Secondly, models are only trained and tested within the same domain. It is yet unexplored how these models for moral foundation prediction perform when tested in other domains, but from their experience with annotation, Hoover et al. (2017) describe how moral sentiments on one topic (e.g. black lives matter) might be completely different from moral sentiments on another (e.g. presidential elections). This study attempts to explore to what extent models generalise to other domains. More specifically, we focus on training on Twitter data from non-extremist sources, and testing on data from an extremist (white nationalist) forum. We conducted two experiments. In our first experiment we test whether it is possible to do cross domain classification of moral foundations. Additionally, we compare the performance of a model using the Word2Vec embeddings used in previous studies to a model using the newer BERT embeddings. We find that although the performance drops significantly on the extremist out-domain test sets, out-domain classification is not impossible. Furthermore, we find that the BERT model generalises marginally better to the out-domain test set, than the Word2Vec model. In our second experiment we attempt to improve the generalisation to extremist test data by providing contextual knowledge. Although this does not improve the model, it does show the model’s robustness against noise. Finally we suggest an alternative approach for accounting for contextual knowledge.
APA, Harvard, Vancouver, ISO, and other styles
More sources

Books on the topic "Cross lingual text classification"

1

(Editor), Carol Peters, Fredric Gey (Editor), Julio Gonzalo (Editor), Henning Mueller (Editor), Gareh Jones (Editor), Michael Kluck (Editor), Bernardo Magnini (Editor), and Maarten de Rijke (Editor), eds. Accessing Multilingual Information Repositories: 6th Workshop of the Cross-Language Evaluation Forum, CLEF 2005, Vienna, Austria, 21-23 September, 2005, ... Papers (Lecture Notes in Computer Science). Springer, 2006.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
2

Widiger, Thomas A., ed. The Oxford Handbook of Personality Disorders. Oxford University Press, 2012. http://dx.doi.org/10.1093/oxfordhb/9780199735013.001.0001.

Full text
Abstract:
On the cusp of the newest edition of the American Psychiatric Association's Diagnostic and Statistical Manual of Mental Disorders (DSM), the field of personality disorders is thriving and productive. This is certainly a time of major transition for the classification, study, and treatment of personality disorders, as the personality disorders section of the DSM is undergoing major revision, leaving researchers and clinicians to wonder whether their area of specialty in the field of personality disorders will be retained, deleted, or revised in DSM-5. In advance of DSM-5, The Oxford Handbook of Personality Disorders provides a summary of the latest information concerning the diagnosis, assessment, construct validity, etiology, pathology, and treatment of personality disorders. The text looks at personality disorders proposed for retention in DSM-5. It also investigates personality disorders that are slated for deletion. The book further examines issues concerning three disorders that have never obtained or had previously lost official recognition (i.e., passive-aggressive, depressive, and racist). The book also includes articles authored by members of the DSM-5 Personality Disorders Work Group, which succinctly outline and explain the proposals, as well as articles by authors who raise significant questions and concerns (often differing) about these proposals. The text includes special coverage of largely neglected areas of investigation (i.e. childhood antecedents of personality disorder, cross-cultural validity). The book finally looks into controversial areas for the DSM, such as schizotypal personality disorder, narcissism, depressive personality disorder, dependent personality disorder, and dimensional classification.
APA, Harvard, Vancouver, ISO, and other styles

Book chapters on the topic "Cross lingual text classification"

1

Chen, Guan-Yuan, and Von-Wun Soo. "Deep Domain Adaptation for Low-Resource Cross-Lingual Text Classification Tasks." In Communications in Computer and Information Science, 155–68. Singapore: Springer Singapore, 2020. http://dx.doi.org/10.1007/978-981-15-6168-9_14.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Li, Xiuhong, Zhe Li, Jiabao Sheng, and Wushour Slamu. "Low-Resource Text Classification via Cross-Lingual Language Model Fine-Tuning." In Lecture Notes in Computer Science, 231–46. Cham: Springer International Publishing, 2020. http://dx.doi.org/10.1007/978-3-030-63031-7_17.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Cancedda, Nicola, and Jean-Michel Renders. "Cross-Lingual Text Mining." In Encyclopedia of Machine Learning and Data Mining, 299–306. Boston, MA: Springer US, 2017. http://dx.doi.org/10.1007/978-1-4899-7687-1_189.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Bel, Nuria, Cornelis H. A. Koster, and Marta Villegas. "Cross-Lingual Text Categorization." In Research and Advanced Technology for Digital Libraries, 126–39. Berlin, Heidelberg: Springer Berlin Heidelberg, 2003. http://dx.doi.org/10.1007/978-3-540-45175-4_13.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Shultz, Thomas R., Scott E. Fahlman, Susan Craw, Periklis Andritsos, Panayiotis Tsaparas, Ricardo Silva, Chris Drummond, et al. "Cross-Lingual Text Mining." In Encyclopedia of Machine Learning, 243–49. Boston, MA: Springer US, 2011. http://dx.doi.org/10.1007/978-0-387-30164-8_189.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Torres-Moreno, Juan-Manuel. "Multi and Cross-Lingual Summarization." In Automatic Text Summarization, 151–77. Hoboken, NJ, USA: John Wiley & Sons, Inc., 2014. http://dx.doi.org/10.1002/9781119004752.ch5.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Linhares Pontes, Elvys, Carlos-Emiliano González-Gallardo, Juan-Manuel Torres-Moreno, and Stéphane Huet. "Cross-Lingual Speech-to-Text Summarization." In Cryptology and Network Security, 385–95. Cham: Springer International Publishing, 2018. http://dx.doi.org/10.1007/978-3-319-98678-4_39.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Khare, Prashant, Grégoire Burel, Diana Maynard, and Harith Alani. "Cross-Lingual Classification of Crisis Data." In Lecture Notes in Computer Science, 617–33. Cham: Springer International Publishing, 2018. http://dx.doi.org/10.1007/978-3-030-00671-6_36.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Pikuliak, Matúš, and Marián Šimko. "Combining Cross-lingual and Cross-task Supervision for Zero-Shot Learning." In Text, Speech, and Dialogue, 162–70. Cham: Springer International Publishing, 2020. http://dx.doi.org/10.1007/978-3-030-58323-1_17.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

Dahiya, Anirudh, Manish Shrivastava, and Dipti Misra Sharma. "Cross-Lingual Transfer for Hindi Discourse Relation Identification." In Text, Speech, and Dialogue, 240–47. Cham: Springer International Publishing, 2020. http://dx.doi.org/10.1007/978-3-030-58323-1_26.

Full text
APA, Harvard, Vancouver, ISO, and other styles

Conference papers on the topic "Cross lingual text classification"

1

Xu, Ruochen, and Yiming Yang. "Cross-lingual Distillation for Text Classification." In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Stroudsburg, PA, USA: Association for Computational Linguistics, 2017. http://dx.doi.org/10.18653/v1/p17-1130.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Guo, Yuhong, and Min Xiao. "Transductive Representation Learning for Cross-Lingual Text Classification." In 2012 IEEE 12th International Conference on Data Mining (ICDM). IEEE, 2012. http://dx.doi.org/10.1109/icdm.2012.29.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Moreo, Alejandro, Andrea Pedrotti, and Fabrizio Sebastiani. "Heterogeneous document embeddings for cross-lingual text classification." In SAC '21: The 36th ACM/SIGAPP Symposium on Applied Computing. New York, NY, USA: ACM, 2021. http://dx.doi.org/10.1145/3412841.3442093.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Faqeeh, Mosab, Nawaf Abdulla, Mahmoud Al-Ayyoub, Yaser Jararweh, and Muhannad Quwaider. "Cross-Lingual Short-Text Document Classification for Facebook Comments." In 2014 2nd International Conference on Future Internet of Things and Cloud (FiCloud). IEEE, 2014. http://dx.doi.org/10.1109/ficloud.2014.99.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Andrade, Daniel, Kunihiko Sadamasa, Akihiro Tamura, and Masaaki Tsuchida. "Cross-lingual Text Classification Using Topic-Dependent Word Probabilities." In Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Stroudsburg, PA, USA: Association for Computational Linguistics, 2015. http://dx.doi.org/10.3115/v1/n15-1170.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Wang, Ziyun, Xuan Liu, Peiji Yang, Shixing Liu, and Zhisheng Wang. "Cross-lingual Text Classification with Heterogeneous Graph Neural Network." In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 2: Short Papers). Stroudsburg, PA, USA: Association for Computational Linguistics, 2021. http://dx.doi.org/10.18653/v1/2021.acl-short.78.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Xu, Ruochen, Yiming Yang, Hanxiao Liu, and Andrew Hsi. "Cross-lingual Text Classification via Model Translation with Limited Dictionaries." In CIKM'16: ACM Conference on Information and Knowledge Management. New York, NY, USA: ACM, 2016. http://dx.doi.org/10.1145/2983323.2983732.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Ni, Xiaochuan, Jian-Tao Sun, Jian Hu, and Zheng Chen. "Cross lingual text classification by mining multilingual topics from wikipedia." In the fourth ACM international conference. New York, New York, USA: ACM Press, 2011. http://dx.doi.org/10.1145/1935826.1935887.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Moh, Teng-Sheng, and Zhang Zhang. "Cross-lingual text classification with model translation and document translation." In the 50th Annual Southeast Regional Conference. New York, New York, USA: ACM Press, 2012. http://dx.doi.org/10.1145/2184512.2184530.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

Dong, Xin, and Gerard de Melo. "A Robust Self-Learning Framework for Cross-Lingual Text Classification." In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). Stroudsburg, PA, USA: Association for Computational Linguistics, 2019. http://dx.doi.org/10.18653/v1/d19-1658.

Full text
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography