To see the other types of publications on this topic, follow the link: Cross lingual text classification.

Journal articles on the topic 'Cross lingual text classification'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 journal articles for your research on the topic 'Cross lingual text classification.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse journal articles on a wide variety of disciplines and organise your bibliography correctly.

1

Zhang, Mozhi, Yoshinari Fujinuma, and Jordan Boyd-Graber. "Exploiting Cross-Lingual Subword Similarities in Low-Resource Document Classification." Proceedings of the AAAI Conference on Artificial Intelligence 34, no. 05 (April 3, 2020): 9547–54. http://dx.doi.org/10.1609/aaai.v34i05.6500.

Full text
Abstract:
Text classification must sometimes be applied in a low-resource language with no labeled training data. However, training data may be available in a related language. We investigate whether character-level knowledge transfer from a related language helps text classification. We present a cross-lingual document classification framework (caco) that exploits cross-lingual subword similarity by jointly training a character-based embedder and a word-based classifier. The embedder derives vector representations for input words from their written forms, and the classifier makes predictions based on the word vectors. We use a joint character representation for both the source language and the target language, which allows the embedder to generalize knowledge about source language words to target language words with similar forms. We propose a multi-task objective that can further improve the model if additional cross-lingual or monolingual resources are available. Experiments confirm that character-level knowledge transfer is more data-efficient than word-level transfer between related languages.
APA, Harvard, Vancouver, ISO, and other styles
2

Moreo Fernández, Alejandro, Andrea Esuli, and Fabrizio Sebastiani. "Distributional Correspondence Indexing for Cross-Lingual and Cross-Domain Sentiment Classification." Journal of Artificial Intelligence Research 55 (January 20, 2016): 131–63. http://dx.doi.org/10.1613/jair.4762.

Full text
Abstract:
Domain Adaptation (DA) techniques aim at enabling machine learning methods learn effective classifiers for a "target'' domain when the only available training data belongs to a different "source'' domain. In this paper we present the Distributional Correspondence Indexing (DCI) method for domain adaptation in sentiment classification. DCI derives term representations in a vector space common to both domains where each dimension reflects its distributional correspondence to a pivot, i.e., to a highly predictive term that behaves similarly across domains. Term correspondence is quantified by means of a distributional correspondence function (DCF). We propose a number of efficient DCFs that are motivated by the distributional hypothesis, i.e., the hypothesis according to which terms with similar meaning tend to have similar distributions in text. Experiments show that DCI obtains better performance than current state-of-the-art techniques for cross-lingual and cross-domain sentiment classification. DCI also brings about a significantly reduced computational cost, and requires a smaller amount of human intervention. As a final contribution, we discuss a more challenging formulation of the domain adaptation problem, in which both the cross-domain and cross-lingual dimensions are tackled simultaneously.
APA, Harvard, Vancouver, ISO, and other styles
3

Steinberger, Ralf, and Bruno Pouliquen. "Cross-lingual Named Entity Recognition." Lingvisticæ Investigationes. International Journal of Linguistics and Language Resources 30, no. 1 (August 10, 2007): 135–62. http://dx.doi.org/10.1075/li.30.1.09ste.

Full text
Abstract:
Named Entity Recognition and Classification (NERC) is a known and well-explored text analysis application that has been applied to various languages. We are presenting an automatic, highly multilingual news analysis system that fully integrates NERC for locations, persons and organisations with document clustering, multi-label categorisation, name attribute extraction, name variant merging and the calculation of social networks. The proposed application goes beyond the state-of-the-art by automatically merging the information found in news written in ten different languages, and by using the aggregated name information to automatically link related news documents across languages for all 45 language pair combinations. While state-of-the-art approaches for cross-lingual name variant merging and document similarity calculation require bilingual resources, the methods proposed here are mostly language-independent and require a minimal amount of monolingual language-specific effort. The development of resources for additional languages is therefore kept to a minimum and new languages can be plugged into the system effortlessly. The presented online news analysis application is fully functional and has, at the end of the year 2006, reached average usage statistics of 600,000 hits per day.
APA, Harvard, Vancouver, ISO, and other styles
4

Pelicon, Andraž, Marko Pranjić, Dragana Miljković, Blaž Škrlj, and Senja Pollak. "Zero-Shot Learning for Cross-Lingual News Sentiment Classification." Applied Sciences 10, no. 17 (August 29, 2020): 5993. http://dx.doi.org/10.3390/app10175993.

Full text
Abstract:
In this paper, we address the task of zero-shot cross-lingual news sentiment classification. Given the annotated dataset of positive, neutral, and negative news in Slovene, the aim is to develop a news classification system that assigns the sentiment category not only to Slovene news, but to news in another language without any training data required. Our system is based on the multilingual BERTmodel, while we test different approaches for handling long documents and propose a novel technique for sentiment enrichment of the BERT model as an intermediate training step. With the proposed approach, we achieve state-of-the-art performance on the sentiment analysis task on Slovenian news. We evaluate the zero-shot cross-lingual capabilities of our system on a novel news sentiment test set in Croatian. The results show that the cross-lingual approach also largely outperforms the majority classifier, as well as all settings without sentiment enrichment in pre-training.
APA, Harvard, Vancouver, ISO, and other styles
5

Wan, Xiaojun. "Bilingual Co-Training for Sentiment Classification of Chinese Product Reviews." Computational Linguistics 37, no. 3 (September 2011): 587–616. http://dx.doi.org/10.1162/coli_a_00061.

Full text
Abstract:
The lack of reliable Chinese sentiment resources limits research progress on Chinese sentiment classification. However, there are many freely available English sentiment resources on the Web. This article focuses on the problem of cross-lingual sentiment classification, which leverages only available English resources for Chinese sentiment classification. We first investigate several basic methods (including lexicon-based methods and corpus-based methods) for cross-lingual sentiment classification by simply leveraging machine translation services to eliminate the language gap, and then propose a bilingual co-training approach to make use of both the English view and the Chinese view based on additional unlabeled Chinese data. Experimental results on two test sets show the effectiveness of the proposed approach, which can outperform basic methods and transductive methods.
APA, Harvard, Vancouver, ISO, and other styles
6

Liu, Ling, and Sang-Bing Tsai. "Intelligent Recognition and Teaching of English Fuzzy Texts Based on Fuzzy Computing and Big Data." Wireless Communications and Mobile Computing 2021 (July 10, 2021): 1–10. http://dx.doi.org/10.1155/2021/1170622.

Full text
Abstract:
In this paper, we conduct in-depth research and analysis on the intelligent recognition and teaching of English fuzzy text through parallel projection and region expansion. Multisense Soft Cluster Vector (MSCVec), a multisense word vector model based on nonnegative matrix decomposition and sparse soft clustering, is constructed. The MSCVec model is a monolingual word vector model, which uses nonnegative matrix decomposition of positive point mutual information between words and contexts to extract low-rank expressions of mixed semantics of multisense words and then uses sparse. It uses the nonnegative matrix decomposition of the positive pointwise mutual information between words and contexts to extract the low-rank expressions of the mixed semantics of the polysemous words and then uses the sparse soft clustering algorithm to partition the multiple word senses of the polysemous words and also obtains the global sense of the polysemous word affiliation distribution; the specific polysemous word cluster classes are determined based on the negative mean log-likelihood of the global affiliation between the contextual semantics and the polysemous words, and finally, the polysemous word vectors are learned using the Fast text model under the extended dictionary word set. The advantage of the MSCVec model is that it is an unsupervised learning process without any knowledge base, and the substring representation in the model ensures the generation of unregistered word vectors; in addition, the global affiliation of the MSCVec model can also expect polysemantic word vectors to single word vectors. Compared with the traditional static word vectors, MSCVec shows excellent results in both word similarity and downstream text classification task experiments. The two sets of features are then fused and extended into new semantic features, and similarity classification experiments and stack generalization experiments are designed for comparison. In the cross-lingual sentence-level similarity detection task, SCLVec cross-lingual word vector lexical-level features outperform MSCVec multisense word vector features as the input embedding layer; deep semantic sentence-level features trained by twin recurrent neural networks outperform the semantic features of twin convolutional neural networks; extensions of traditional statistical features can effectively improve cross-lingual similarity detection performance, especially cross-lingual topic model (BL-LDA); the stack generalization integration approach maximizes the error rate of the underlying classifier and improves the detection accuracy.
APA, Harvard, Vancouver, ISO, and other styles
7

Santini, Marina, and Min-Chun Shih. "Exploring the Potential of an Extensible Domain-Specific Web Corpus for “Layfication”." International Journal of Cyber-Physical Systems 2, no. 1 (January 2020): 20–32. http://dx.doi.org/10.4018/ijcps.2020010102.

Full text
Abstract:
This article presents experiments based on the extensible domain-specific web corpus for “layfication”. For these experiments, both the existing layfication corpus (in Swedish and in English) and a new addition in English (the NHS-PubMed subcorpus) are used. With this extended corpus, methods to classify lay-specialized medical sublanguages cross-linguistically using small data and noisy web documents are investigated. Sublanguage is a language variety used in specific domains. Here, the authors focus on two medical sublanguages, namely the “patientspeak” (lay) and the medical jargon (specialized). Cross-lingual sublanguage classification is still largely underexplored although it can be crucial in downstream applications for digital health and cyber-physical systems. Classification models are built using small and noisy training sets in Swedish and evaluated on English test sets. The performance of Naive Bayes classifiers—built with stopwords and with Bag-of-Words—is compared with convolutional neural network classifiers leveraging on MUSE multi-lingual word embeddings. Results are promising and nuanced. These results are proposed as a first baseline for cross-lingual sublanguage classification.
APA, Harvard, Vancouver, ISO, and other styles
8

Moreo Fernández, Alejandro, Andrea Esuli, and Fabrizio Sebastiani. "Lightweight Random Indexing for Polylingual Text Classification." Journal of Artificial Intelligence Research 57 (October 13, 2016): 151–85. http://dx.doi.org/10.1613/jair.5194.

Full text
Abstract:
Multilingual Text Classification (MLTC) is a text classification task in which documents are written each in one among a set L of natural languages, and in which all documents must be classified under the same classification scheme, irrespective of language. There are two main variants of MLTC, namely Cross-Lingual Text Classification (CLTC) and Polylingual Text Classification (PLTC). In PLTC, which is the focus of this paper, we assume (differently from CLTC) that for each language in L there is a representative set of training documents; PLTC consists of improving the accuracy of each of the |L| monolingual classifiers by also leveraging the training documents written in the other (|L| − 1) languages. The obvious solution, consisting of generating a single polylingual classifier from the juxtaposed monolingual vector spaces, is usually infeasible, since the dimensionality of the resulting vector space is roughly |L| times that of a monolingual one, and is thus often unmanageable. As a response, the use of machine translation tools or multilingual dictionaries has been proposed. However, these resources are not always available, or are not always free to use. One machine-translation-free and dictionary-free method that, to the best of our knowledge, has never been applied to PLTC before, is Random Indexing (RI). We analyse RI in terms of space and time efficiency, and propose a particular configuration of it (that we dub Lightweight Random Indexing – LRI). By running experiments on two well known public benchmarks, Reuters RCV1/RCV2 (a comparable corpus) and JRC-Acquis (a parallel one), we show LRI to outperform (both in terms of effectiveness and efficiency) a number of previously proposed machine-translation-free and dictionary-free PLTC methods that we use as baselines.
APA, Harvard, Vancouver, ISO, and other styles
9

Artetxe, Mikel, and Holger Schwenk. "Massively Multilingual Sentence Embeddings for Zero-Shot Cross-Lingual Transfer and Beyond." Transactions of the Association for Computational Linguistics 7 (November 2019): 597–610. http://dx.doi.org/10.1162/tacl_a_00288.

Full text
Abstract:
We introduce an architecture to learn joint multilingual sentence representations for 93 languages, belonging to more than 30 different families and written in 28 different scripts. Our system uses a single BiLSTM encoder with a shared byte-pair encoding vocabulary for all languages, which is coupled with an auxiliary decoder and trained on publicly available parallel corpora. This enables us to learn a classifier on top of the resulting embeddings using English annotated data only, and transfer it to any of the 93 languages without any modification. Our experiments in cross-lingual natural language inference (XNLI data set), cross-lingual document classification (MLDoc data set), and parallel corpus mining (BUCC data set) show the effectiveness of our approach. We also introduce a new test set of aligned sentences in 112 languages, and show that our sentence embeddings obtain strong results in multilingual similarity search even for low- resource languages. Our implementation, the pre-trained encoder, and the multilingual test set are available at https://github.com/facebookresearch/LASER .
APA, Harvard, Vancouver, ISO, and other styles
10

Li, Gen, Nan Duan, Yuejian Fang, Ming Gong, and Daxin Jiang. "Unicoder-VL: A Universal Encoder for Vision and Language by Cross-Modal Pre-Training." Proceedings of the AAAI Conference on Artificial Intelligence 34, no. 07 (April 3, 2020): 11336–44. http://dx.doi.org/10.1609/aaai.v34i07.6795.

Full text
Abstract:
We propose Unicoder-VL, a universal encoder that aims to learn joint representations of vision and language in a pre-training manner. Borrow ideas from cross-lingual pre-trained models, such as XLM (Lample and Conneau 2019) and Unicoder (Huang et al. 2019), both visual and linguistic contents are fed into a multi-layer Transformer (Vaswani et al. 2017) for the cross-modal pre-training, where three pre-trained tasks are employed, including Masked Language Modeling(MLM), Masked Object Classification(MOC) and Visual-linguistic Matching(VLM). The first two tasks learn context-aware representations for input tokens based on linguistic and visual contents jointly. The last task tries to predict whether an image and a text describe each other. After pretraining on large-scale image-caption pairs, we transfer Unicoder-VL to caption-based image-text retrieval and visual commonsense reasoning, with just one additional output layer. We achieve state-of-the-art or comparable results on both two tasks and show the powerful ability of the cross-modal pre-training.
APA, Harvard, Vancouver, ISO, and other styles
11

Prokhorov, Victor, Mohammad Taher Pilehvar, Dimitri Kartsaklis, Pietro Lio, and Nigel Collier. "Unseen Word Representation by Aligning Heterogeneous Lexical Semantic Spaces." Proceedings of the AAAI Conference on Artificial Intelligence 33 (July 17, 2019): 6900–6907. http://dx.doi.org/10.1609/aaai.v33i01.33016900.

Full text
Abstract:
Word embedding techniques heavily rely on the abundance of training data for individual words. Given the Zipfian distribution of words in natural language texts, a large number of words do not usually appear frequently or at all in the training data. In this paper we put forward a technique that exploits the knowledge encoded in lexical resources, such as WordNet, to induce embeddings for unseen words. Our approach adapts graph embedding and cross-lingual vector space transformation techniques in order to merge lexical knowledge encoded in ontologies with that derived from corpus statistics. We show that the approach can provide consistent performance improvements across multiple evaluation benchmarks: in-vitro, on multiple rare word similarity datasets, and invivo, in two downstream text classification tasks.
APA, Harvard, Vancouver, ISO, and other styles
12

Kumari, Divya, Asif Ekbal, Rejwanul Haque, Pushpak Bhattacharyya, and Andy Way. "Reinforced NMT for Sentiment and Content Preservation in Low-resource Scenario." ACM Transactions on Asian and Low-Resource Language Information Processing 20, no. 4 (June 28, 2021): 1–27. http://dx.doi.org/10.1145/3450970.

Full text
Abstract:
The preservation of domain knowledge from source to the target is crucial in any translation workflows. Hence, translation service providers that use machine translation (MT) in production could reasonably expect that the translation process should transfer both the underlying pragmatics and the semantics of the source-side sentences into the target language. However, recent studies suggest that the MT systems often fail to preserve such crucial information (e.g., sentiment, emotion, gender traits) embedded in the source text in the target. In this context, the raw automatic translations are often directly fed to other natural language processing (NLP) applications (e.g., sentiment classifier) in a cross-lingual platform. Hence, the loss of such crucial information during the translation could negatively affect the performance of such downstream NLP tasks that heavily rely on the output of the MT systems. In our current research, we carefully balance both the sides (i.e., sentiment and semantics) during translation, by controlling a global-attention-based neural MT (NMT), to generate translations that encode the underlying sentiment of a source sentence while preserving its non-opinionated semantic content. Toward this, we use a state-of-the-art reinforcement learning method, namely, actor-critic , that includes a novel reward combination module, to fine-tune the NMT system so that it learns to generate translations that are best suited for a downstream task, viz. sentiment classification while ensuring the source-side semantics is intact in the process. Experimental results for Hindi–English language pair show that our proposed method significantly improves the performance of the sentiment classifier and alongside results in an improved NMT system.
APA, Harvard, Vancouver, ISO, and other styles
13

Wu, Hanqian, Zhike Wang, Feng Qing, and Shoushan Li. "Reinforced Transformer with Cross-Lingual Distillation for Cross-Lingual Aspect Sentiment Classification." Electronics 10, no. 3 (January 23, 2021): 270. http://dx.doi.org/10.3390/electronics10030270.

Full text
Abstract:
Though great progress has been made in the Aspect-Based Sentiment Analysis(ABSA) task through research, most of the previous work focuses on English-based ABSA problems, and there are few efforts on other languages mainly due to the lack of training data. In this paper, we propose an approach for performing a Cross-Lingual Aspect Sentiment Classification (CLASC) task which leverages the rich resources in one language (source language) for aspect sentiment classification in a under-resourced language (target language). Specifically, we first build a bilingual lexicon for domain-specific training data to translate the aspect category annotated in the source-language corpus and then translate sentences from the source language to the target language via Machine Translation (MT) tools. However, most MT systems are general-purpose, it non-avoidably introduces translation ambiguities which would degrade the performance of CLASC. In this context, we propose a novel approach called Reinforced Transformer with Cross-Lingual Distillation (RTCLD) combined with target-sensitive adversarial learning to minimize the undesirable effects of translation ambiguities in sentence translation. We conduct experiments on different language combinations, treating English as the source language and Chinese, Russian, and Spanish as target languages. The experimental results show that our proposed approach outperforms the state-of-the-art methods on different target languages.
APA, Harvard, Vancouver, ISO, and other styles
14

Zhou, Guangyou, Zhiyuan Zhu, Tingting He, and Xiaohua Tony Hu. "Cross-lingual sentiment classification with stacked autoencoders." Knowledge and Information Systems 47, no. 1 (June 11, 2015): 27–44. http://dx.doi.org/10.1007/s10115-015-0849-0.

Full text
APA, Harvard, Vancouver, ISO, and other styles
15

Pikuliak, Matúš, Marián Šimko, and Mária Bieliková. "Cross-lingual learning for text processing: A survey." Expert Systems with Applications 165 (March 2021): 113765. http://dx.doi.org/10.1016/j.eswa.2020.113765.

Full text
APA, Harvard, Vancouver, ISO, and other styles
16

Chau, Rowena, and Chung-Hsing Yeh. "A multilingual text mining approach to web cross-lingual text retrieval." Knowledge-Based Systems 17, no. 5-6 (August 2004): 219–27. http://dx.doi.org/10.1016/j.knosys.2004.04.001.

Full text
APA, Harvard, Vancouver, ISO, and other styles
17

Kumar, Aarti, and Sujoy Das. "Dealing with Relevance Ranking in Cross-Lingual Cross-Script Text Reuse." International Journal of Information Retrieval Research 6, no. 1 (January 2016): 16–35. http://dx.doi.org/10.4018/ijirr.2016010102.

Full text
Abstract:
Proliferation of multilingual content on the web has paved way for text reuse to get cross-lingual and also cross script. Identifying cross language text reuse becomes tougher if one considers cross-script less resourced languages. This paper focuses on identifying text reuse between English-Hindi news articles and improving their relevance ranking using two phases (i) Heuristic retrieval phase for reducing search space and (ii) post processing phase for improving the relevance ranking. Dictionary based strategy of Cross-Language Information Retrieval is used for heuristic retrieval and Parse Feature Vector Model (PFVS) is proposed for post processing to improve the relevance ranking. The application of this model has been successful in tackling the obfuscation problems of synonymy, hyponymy, hypernymy, antonym, sentence addition/ deletion and word inflection. Instead of using traditional approaches, Parse Feature Vectors have been explored to detect the reused documents and as per the knowledge of the authors it is a novel contribution with regards to these two language pairs.
APA, Harvard, Vancouver, ISO, and other styles
18

Seki, Kazuhiro. "On Cross-Lingual Text Similarity Using Neural Translation Models." Journal of Information Processing 27 (2019): 315–21. http://dx.doi.org/10.2197/ipsjjip.27.315.

Full text
APA, Harvard, Vancouver, ISO, and other styles
19

Ehsan, Nava, Azadeh Shakery, and Frank Wm Tompa. "Cross-lingual text alignment for fine-grained plagiarism detection." Journal of Information Science 45, no. 4 (August 13, 2018): 443–59. http://dx.doi.org/10.1177/0165551518787696.

Full text
Abstract:
Fast and easy access to a wide range of documents in various languages, in conjunction with the wide availability of translation and editing tools, has led to the need to develop effective tools for detecting cross-lingual plagiarism. Given a suspicious document, cross-lingual plagiarism detection comprises two main subtasks: retrieving documents that are candidate sources for that document and analysing those candidates one by one to determine their similarity to the suspicious document. In this article, we examine the second subtask, also called the detailed analysis subtask, where the goal is to align plagiarised fragments from source and suspicious documents in different languages. Our proposed approach has two main steps: the first step tries to find candidate plagiarised fragments and focuses on high recall, followed by a more precise similarity analysis based on dynamic text alignment that will filter the results by finding alignments between the identified fragments. With these two steps, the proximity of the terms will be considered in different levels of granularity. In both steps, our approach uses a dictionary to obtain translations of individual terms instead of using a machine translation system to convert longer passages from one language to another. We used a weighting scheme to distinct multiple translations of the terms. Experimental results show that our method outperforms the methods used by the systems that achieved the best results in the PAN-2012 and PAN-2014 competitions.
APA, Harvard, Vancouver, ISO, and other styles
20

Chen, Xilun, Yu Sun, Ben Athiwaratkun, Claire Cardie, and Kilian Weinberger. "Adversarial Deep Averaging Networks for Cross-Lingual Sentiment Classification." Transactions of the Association for Computational Linguistics 6 (December 2018): 557–70. http://dx.doi.org/10.1162/tacl_a_00039.

Full text
Abstract:
In recent years great success has been achieved in sentiment classification for English, thanks in part to the availability of copious annotated resources. Unfortunately, most languages do not enjoy such an abundance of labeled data. To tackle the sentiment classification problem in low-resource languages without adequate annotated data, we propose an Adversarial Deep Averaging Network (ADAN 1 ) to transfer the knowledge learned from labeled data on a resource-rich source language to low-resource languages where only unlabeled data exist. ADAN has two discriminative branches: a sentiment classifier and an adversarial language discriminator. Both branches take input from a shared feature extractor to learn hidden representations that are simultaneously indicative for the classification task and invariant across languages. Experiments on Chinese and Arabic sentiment classification demonstrate that ADAN significantly outperforms state-of-the-art systems.
APA, Harvard, Vancouver, ISO, and other styles
21

Majewska, Olga, Ivan Vulić, Diana McCarthy, Yan Huang, Akira Murakami, Veronika Laippala, and Anna Korhonen. "Investigating the cross-lingual translatability of VerbNet-style classification." Language Resources and Evaluation 52, no. 3 (October 20, 2017): 771–99. http://dx.doi.org/10.1007/s10579-017-9403-x.

Full text
APA, Harvard, Vancouver, ISO, and other styles
22

Haneef, Israr, Rao Muhammad Adeel Nawab, Ehsan Ullah Munir, and Imran Sarwar Bajwa. "Design and Development of a Large Cross-Lingual Plagiarism Corpus for Urdu-English Language Pair." Scientific Programming 2019 (March 17, 2019): 1–11. http://dx.doi.org/10.1155/2019/2962040.

Full text
Abstract:
Cross-lingual plagiarism occurs when the source (or original) text(s) is in one language and the plagiarized text is in another language. In recent years, cross-lingual plagiarism detection has attracted the attention of the research community because a large amount of digital text is easily accessible in many languages through online digital repositories and machine translation systems are readily available, making it easier to perform cross-lingual plagiarism and harder to detect it. To develop and evaluate cross-lingual plagiarism detection systems, standard evaluation resources are needed. The majority of earlier studies have developed cross-lingual plagiarism corpora for English and other European language pairs. However, for Urdu-English language pair, the problem of cross-lingual plagiarism detection has not been thoroughly explored although a large amount of digital text is readily available in Urdu and it is spoken in many countries of the world (particularly in Pakistan, India, and Bangladesh). To fulfill this gap, this paper presents a large benchmark cross-lingual corpus for Urdu-English language pair. The proposed corpus contains 2,395 source-suspicious document pairs (540 are automatic translation, 539 are artificially paraphrased, 508 are manually paraphrased, and 808 are nonplagiarized). Furthermore, our proposed corpus contains three types of cross-lingual examples including artificial (automatic translation and artificially paraphrased), simulated (manually paraphrased), and real (nonplagiarized), which have not been previously reported in the development of cross-lingual corpora. Detailed analysis of our proposed corpus was carried out using n-gram overlap and longest common subsequence approaches. Using Word unigrams, mean similarity scores of 1.00, 0.68, 0.52, and 0.22 were obtained for automatic translation, artificially paraphrased, manually paraphrased, and nonplagiarized documents, respectively. These results show that documents in the proposed corpus are created using different obfuscation techniques, which makes the dataset more realistic and challenging. We believe that the corpus developed in this study will help to foster research in an underresourced language of Urdu and will be useful in the development, comparison, and evaluation of cross-lingual plagiarism detection systems for Urdu-English language pair. Our proposed corpus is free and publicly available for research purposes.
APA, Harvard, Vancouver, ISO, and other styles
23

Espinosa-Anke, Luis, Geraint Palmer, Padraig Corcoran, Maxim Filimonov, Irena Spasić, and Dawn Knight. "English–Welsh Cross-Lingual Embeddings." Applied Sciences 11, no. 14 (July 16, 2021): 6541. http://dx.doi.org/10.3390/app11146541.

Full text
Abstract:
Cross-lingual embeddings are vector space representations where word translations tend to be co-located. These representations enable learning transfer across languages, thus bridging the gap between data-rich languages such as English and others. In this paper, we present and evaluate a suite of cross-lingual embeddings for the English–Welsh language pair. To train the bilingual embeddings, a Welsh corpus of approximately 145 M words was combined with an English Wikipedia corpus. We used a bilingual dictionary to frame the problem of learning bilingual mappings as a supervised machine learning task, where a word vector space is first learned independently on a monolingual corpus, after which a linear alignment strategy is applied to map the monolingual embeddings to a common bilingual vector space. Two approaches were used to learn monolingual embeddings, including word2vec and fastText. Three cross-language alignment strategies were explored, including cosine similarity, inverted softmax and cross-domain similarity local scaling (CSLS). We evaluated different combinations of these approaches using two tasks, bilingual dictionary induction, and cross-lingual sentiment analysis. The best results were achieved using monolingual fastText embeddings and the CSLS metric. We also demonstrated that by including a few automatically translated training documents, the performance of a cross-lingual text classifier for Welsh can increase by approximately 20 percent points.
APA, Harvard, Vancouver, ISO, and other styles
24

Yu, Jianxing, Shiqi Wang, and Jian Yin. "Adaptive Cross-Lingual Question Generation with Minimal Resources." Computer Journal 64, no. 7 (July 2021): 1056–68. http://dx.doi.org/10.1093/comjnl/bxab106.

Full text
Abstract:
Abstract The task of question generation (QG) aims to create valid questions and correlated answers from the given text. Despite the neural QG approaches have achieved promising results, they are typically developed for languages with rich annotated training data. Because of the high annotation cost, it is difficult to deploy to other low-resource languages. Besides, different samples have their own characteristics on the aspects of text contextual structure, question type and correlations. Without capturing these diversified characteristics, the traditional one-size-fits-all model is hard to generate the best results. To address this problem, we study the task of cross-lingual QG from an adaptive learning perspective. Concretely, we first build a basic QG model on a multilingual space using the labelled data. In this way, we can transfer the supervision from the high-resource language to the language lacking labelled data. We then design a task-specific meta-learner to optimize the basic QG model. Each sample and its similar instances are viewed as a pseudo-QG task. The asking patterns and logical forms contained in the similar samples can be used as a guide to fine-tune the model fitly and produce the optimal results accordingly. Considering that each sample contains the text, question and answer, with unknown semantic correlations among them, we propose a context-dependent retriever to measure the similarity of such structured inputs. Experimental results on three languages of three typical data sets show the effectiveness of our approach.
APA, Harvard, Vancouver, ISO, and other styles
25

Wang, Zhouhao, Enda Liu, Hiroki Sakaji, Tomoki Ito, Kiyoshi Izumi, Kota Tsubouchi, and Tatsuo Yamashita. "Estimation of Cross-Lingual News Similarities Using Text-Mining Methods." Journal of Risk and Financial Management 11, no. 1 (January 31, 2018): 8. http://dx.doi.org/10.3390/jrfm11010008.

Full text
APA, Harvard, Vancouver, ISO, and other styles
26

Chau, R., and C. H. Yeh. "Fuzzy conceptual indexing for concept-based cross-lingual text retrieval." IEEE Internet Computing 8, no. 5 (September 2004): 14–21. http://dx.doi.org/10.1109/mic.2004.38.

Full text
APA, Harvard, Vancouver, ISO, and other styles
27

Lyu, Dau-Cheng, Ren-Yuan Lyu, Yuang-Chin Chiang, and Chun-Nan Hsu. "Cross-lingual audio-to-text alignment for multimedia content management." Decision Support Systems 45, no. 3 (June 2008): 554–66. http://dx.doi.org/10.1016/j.dss.2007.07.003.

Full text
APA, Harvard, Vancouver, ISO, and other styles
28

Wei, Chih-Ping, Yen-Ting Lin, and Christopher C. Yang. "Cross-lingual text categorization: Conquering language boundaries in globalized environments." Information Processing & Management 47, no. 5 (September 2011): 786–804. http://dx.doi.org/10.1016/j.ipm.2011.01.011.

Full text
APA, Harvard, Vancouver, ISO, and other styles
29

Mouriño García, Marcos Antonio, Roberto Pérez Rodríguez, and Luis Anido Rifón. "Wikipedia-based cross-language text classification." Information Sciences 406-407 (September 2017): 12–28. http://dx.doi.org/10.1016/j.ins.2017.04.024.

Full text
APA, Harvard, Vancouver, ISO, and other styles
30

Zhang, Peng, Suge Wang, and Deyu Li. "Cross-lingual sentiment classification: Similarity discovery plus training data adjustment." Knowledge-Based Systems 107 (September 2016): 129–41. http://dx.doi.org/10.1016/j.knosys.2016.06.004.

Full text
APA, Harvard, Vancouver, ISO, and other styles
31

Lin, Zheng, Xiaolong Jin, Xueke Xu, Yuanzhuo Wang, Xueqi Cheng, Weiping Wang, and Dan Meng. "An Unsupervised Cross-Lingual Topic Model Framework for Sentiment Classification." IEEE/ACM Transactions on Audio, Speech, and Language Processing 24, no. 3 (March 2016): 432–44. http://dx.doi.org/10.1109/taslp.2015.2512041.

Full text
APA, Harvard, Vancouver, ISO, and other styles
32

HAKAMI, H., and D. BOLLEGALA. "A classification approach for detecting cross-lingual biomedical term translations." Natural Language Engineering 23, no. 1 (December 14, 2015): 31–51. http://dx.doi.org/10.1017/s1351324915000431.

Full text
Abstract:
AbstractFinding translations for technical terms is an important problem in machine translation. In particular, in highly specialized domains such as biology or medicine, it is difficult to find bilingual experts to annotate sufficient cross-lingual texts in order to train machine translation systems. Moreover, new terms are constantly being generated in the biomedical community, which makes it difficult to keep the translation dictionaries up to date for all language pairs of interest. Given a biomedical term in one language (source language), we propose a method for detecting its translations in a different language (target language). Specifically, we train a binary classifier to determine whether two biomedical terms written in two languages are translations. Training such a classifier is often complicated due to the lack of common features between the source and target languages. We propose several feature space concatenation methods to successfully overcome this problem. Moreover, we study the effectiveness of contextual and character n-gram features for detecting term translations. Experiments conducted using a standard dataset for biomedical term translation show that the proposed method outperforms several competitive baseline methods in terms of mean average precision and top-k translation accuracy.
APA, Harvard, Vancouver, ISO, and other styles
33

Yang, Jian, Shuming Ma, Dongdong Zhang, ShuangZhi Wu, Zhoujun Li, and Ming Zhou. "Alternating Language Modeling for Cross-Lingual Pre-Training." Proceedings of the AAAI Conference on Artificial Intelligence 34, no. 05 (April 3, 2020): 9386–93. http://dx.doi.org/10.1609/aaai.v34i05.6480.

Full text
Abstract:
Language model pre-training has achieved success in many natural language processing tasks. Existing methods for cross-lingual pre-training adopt Translation Language Model to predict masked words with the concatenation of the source sentence and its target equivalent. In this work, we introduce a novel cross-lingual pre-training method, called Alternating Language Modeling (ALM). It code-switches sentences of different languages rather than simple concatenation, hoping to capture the rich cross-lingual context of words and phrases. More specifically, we randomly substitute source phrases with target translations to create code-switched sentences. Then, we use these code-switched data to train ALM model to learn to predict words of different languages. We evaluate our pre-training ALM on the downstream tasks of machine translation and cross-lingual classification. Experiments show that ALM can outperform the previous pre-training methods on three benchmarks.1
APA, Harvard, Vancouver, ISO, and other styles
34

Hajmohammadi, Mohammad Sadegh, Roliana Ibrahim, and Ali Selamat. "Bi-view semi-supervised active learning for cross-lingual sentiment classification." Information Processing & Management 50, no. 5 (September 2014): 718–32. http://dx.doi.org/10.1016/j.ipm.2014.03.005.

Full text
APA, Harvard, Vancouver, ISO, and other styles
35

Fu, Zuohui, Yikun Xian, Shijie Geng, Yingqiang Ge, Yuting Wang, Xin Dong, Guang Wang, and Gerard De Melo. "ABSent: Cross-Lingual Sentence Representation Mapping with Bidirectional GANs." Proceedings of the AAAI Conference on Artificial Intelligence 34, no. 05 (April 3, 2020): 7756–63. http://dx.doi.org/10.1609/aaai.v34i05.6279.

Full text
Abstract:
A number of cross-lingual transfer learning approaches based on neural networks have been proposed for the case when large amounts of parallel text are at our disposal. However, in many real-world settings, the size of parallel annotated training data is restricted. Additionally, prior cross-lingual mapping research has mainly focused on the word level. This raises the question of whether such techniques can also be applied to effortlessly obtain cross-lingually aligned sentence representations. To this end, we propose an Adversarial Bi-directional Sentence Embedding Mapping (ABSent) framework, which learns mappings of cross-lingual sentence representations from limited quantities of parallel data. The experiments show that our method outperforms several technically more powerful approaches, especially under challenging low-resource circumstances. The source code is available from https://github.com/zuohuif/ABSent along with relevant datasets.
APA, Harvard, Vancouver, ISO, and other styles
36

Siddhant, Aditya, Melvin Johnson, Henry Tsai, Naveen Ari, Jason Riesa, Ankur Bapna, Orhan Firat, and Karthik Raman. "Evaluating the Cross-Lingual Effectiveness of Massively Multilingual Neural Machine Translation." Proceedings of the AAAI Conference on Artificial Intelligence 34, no. 05 (April 3, 2020): 8854–61. http://dx.doi.org/10.1609/aaai.v34i05.6414.

Full text
Abstract:
The recently proposed massively multilingual neural machine translation (NMT) system has been shown to be capable of translating over 100 languages to and from English within a single model (Aharoni, Johnson, and Firat 2019). Its improved translation performance on low resource languages hints at potential cross-lingual transfer capability for downstream tasks. In this paper, we evaluate the cross-lingual effectiveness of representations from the encoder of a massively multilingual NMT model on 5 downstream classification and sequence labeling tasks covering a diverse set of over 50 languages. We compare against a strong baseline, multilingual BERT (mBERT) (Devlin et al. 2018), in different cross-lingual transfer learning scenarios and show gains in zero-shot transfer in 4 out of these 5 tasks.
APA, Harvard, Vancouver, ISO, and other styles
37

Zola, Paola, Paulo Cortez, Costantino Ragno, and Eugenio Brentari. "Social Media Cross-Source and Cross-Domain Sentiment Classification." International Journal of Information Technology & Decision Making 18, no. 05 (September 2019): 1469–99. http://dx.doi.org/10.1142/s0219622019500305.

Full text
Abstract:
Due to the expansion of Internet and Web 2.0 phenomenon, there is a growing interest in sentiment analysis of freely opinionated text. In this paper, we propose a novel cross-source cross-domain sentiment classification, in which cross-domain-labeled Web sources (Amazon and Tripadvisor) are used to train supervised learning models (including two deep learning algorithms) that are tested on typically nonlabeled social media reviews (Facebook and Twitter). We explored a three-step methodology, in which distinct balanced training, text preprocessing and machine learning methods were tested, using two languages: English and Italian. The best results were achieved using undersampling training and a Convolutional Neural Network. Interesting cross-source classification performances were achieved, in particular when using Amazon and Tripadvisor reviews to train a model that is tested on Facebook data for both English and Italian.
APA, Harvard, Vancouver, ISO, and other styles
38

Sharoff, Serge. "Finding next of kin: Cross-lingual embedding spaces for related languages." Natural Language Engineering 26, no. 2 (September 4, 2019): 163–82. http://dx.doi.org/10.1017/s1351324919000354.

Full text
Abstract:
AbstractSome languages have very few NLP resources, while many of them are closely related to better-resourced languages. This paper explores how the similarity between the languages can be utilised by porting resources from better- to lesser-resourced languages. The paper introduces a way of building a representation shared across related languages by combining cross-lingual embedding methods with a lexical similarity measure which is based on the weighted Levenshtein distance. One of the outcomes of the experiments is a Panslavonic embedding space for nine Balto-Slavonic languages. The paper demonstrates that the resulting embedding space helps in such applications as morphological prediction, named-entity recognition and genre classification.
APA, Harvard, Vancouver, ISO, and other styles
39

AYETIRAN, Eniafe Festus. "An index-based joint multilingual/cross-lingual text categorization using topic expansion via BabelNet." TURKISH JOURNAL OF ELECTRICAL ENGINEERING & COMPUTER SCIENCES 28, no. 1 (January 27, 2020): 224–37. http://dx.doi.org/10.3906/elk-1901-140.

Full text
APA, Harvard, Vancouver, ISO, and other styles
40

Bispo, Thiago D., Hendrik T. Macedo, Fl�vio de O. Santos, Rafael P. da Silva, Leonardo N. Matos, Bruno O. P. Prado, Gilton J. F. da Silva, and Adolfo Guimar�es. "Long Short-Term Memory Model for Classification of English-PtBR Cross-Lingual Hate Speech." Journal of Computer Science 15, no. 10 (October 1, 2019): 1546–71. http://dx.doi.org/10.3844/jcssp.2019.1546.1571.

Full text
APA, Harvard, Vancouver, ISO, and other styles
41

Hajmohammadi, Mohammad Sadegh, Roliana Ibrahim, and Ali Selamat. "Cross-lingual sentiment classification using multiple source languages in multi-view semi-supervised learning." Engineering Applications of Artificial Intelligence 36 (November 2014): 195–203. http://dx.doi.org/10.1016/j.engappai.2014.07.020.

Full text
APA, Harvard, Vancouver, ISO, and other styles
42

Bitton, Yonatan, Raphael Cohen, Tamar Schifter, Eitan Bachmat, Michael Elhadad, and Noémie Elhadad. "Cross-lingual Unified Medical Language System entity linking in online health communities." Journal of the American Medical Informatics Association 27, no. 10 (September 10, 2020): 1585–92. http://dx.doi.org/10.1093/jamia/ocaa150.

Full text
Abstract:
Abstract Objective In Hebrew online health communities, participants commonly write medical terms that appear as transliterated forms of a source term in English. Such transliterations introduce high variability in text and challenge text-analytics methods. To reduce their variability, medical terms must be normalized, such as linking them to Unified Medical Language System (UMLS) concepts. We present a method to identify both transliterated and translated Hebrew medical terms and link them with UMLS entities. Materials and Methods We investigate the effect of linking terms in Camoni, a popular Israeli online health community in Hebrew. Our method, MDTEL (Medical Deep Transliteration Entity Linking), includes (1) an attention-based recurrent neural network encoder-decoder to transliterate words and mapping UMLS from English to Hebrew, (2) an unsupervised method for creating a transliteration dataset in any language without manually labeled data, and (3) an efficient way to identify and link medical entities in the Hebrew corpus to UMLS concepts, by producing a high-recall list of candidate medical terms in the corpus, and then filtering the candidates to relevant medical terms. Results We carry out experiments on 3 disease-specific communities: diabetes, multiple sclerosis, and depression. MDTEL tagging and normalizing on Camoni posts achieved 99% accuracy, 92% recall, and 87% precision. When tagging and normalizing terms in queries from the Camoni search logs, UMLS-normalized queries improved search results in 46% of the cases. Conclusions Cross-lingual UMLS entity linking from Hebrew is possible and improves search performance across communities. Annotated datasets, annotation guidelines, and code are made available online (https://github.com/yonatanbitton/mdtel).
APA, Harvard, Vancouver, ISO, and other styles
43

Thomas, Merin, Dr Latha C A, and Antony Puthussery. "Identification of language in a cross linguistic environment." Indonesian Journal of Electrical Engineering and Computer Science 18, no. 1 (April 1, 2020): 544. http://dx.doi.org/10.11591/ijeecs.v18.i1.pp544-548.

Full text
Abstract:
<p class="normal">World has become very small due to software internationationalism. Applications of machine translations are increasing day by day. Using multiple languages in the social media text is an developing trend. .Availability of fonts in the native language enhanced the usage of native text in internet communications. Usage of transliterations of language has become quite common. In Indian scenario current generations are familiar to talk in native language but not to read and write in the native language, hence they started using English representation of native language in textual messages. This paper describes the identification of the transliterated text in cross lingual environment .In this paper a Neural network model identifies the prominent language in the text and hence the same can be used to identify the meaning of the text in the concerned language. The model is based upon Recurrent Neural Networks that found to be the most efficient in machine translations. Language identification can serve as a base for many applications in multi linguistic environment. Currently the South Indian Languages Malayalam, Tamil are identified from given text. An algorithmic approach of Stop words based model is depicted in this paper. Model can be also enhanced to address all the Indian Languages that are in use.</p>
APA, Harvard, Vancouver, ISO, and other styles
44

KWON, YURI, JI-WON KIM, JAE-HOON HEO, HYEONG-MIN JEON, EUI-BUM CHOI, and GWANG-MOON EOM. "CLASSIFICATION OF SPINAL POSTURES DURING CROSS-LEGGED SITTING ON THE FLOOR." Journal of Mechanics in Medicine and Biology 19, no. 08 (December 2019): 1940056. http://dx.doi.org/10.1142/s0219519419400566.

Full text
Abstract:
One of the most frequent sitting styles of Asians in everyday life is a cross-legged sitting. The cross-legged sitting results in higher compression load in spine than sitting on a chair, so a proper sitting posture is more needed. The purpose of this study was to classify the spinal posture during cross-legged sitting from the seat pressure pattern for future usage in the posture monitoring system. Twenty young men participated in this study. The seat pressure was measured for three spinal postures of flat, slump, and lordosis when subjects were instructed to pose a certain posture while seated on the floor with legs crossed. The contact area was divided into feet and buttocks by using a filter with a pressure threshold ([Formula: see text]). A decision tree was developed for the classification of three postures, with a decision variable of feet to buttocks pressure ratio. The three spinal postures were classified by comparison of feet-buttocks ratio ([Formula: see text] and thresholds ([Formula: see text], [Formula: see text]): a slump posture with a greater [Formula: see text] than [Formula: see text], a lordosis posture with a smaller [Formula: see text] than [Formula: see text]. Each threshold was calculated by adding or subtracting a certain percentage ([Formula: see text]) to or from the [Formula: see text] of flat posture and the classification accuracy was investigated with a range of thresholds. The accuracy of classification achieved 99.38% for certain ranges of thresholds. The developed algorithm showed the best performance when [Formula: see text] and [Formula: see text] were in the range of 2.85–5.67% and 1.58–2.20%, respectively. The feet-buttocks pressure ratio showed significant correlation with lumbar angle ([Formula: see text], [Formula: see text]). Anterior and posterior tilts of upper body in the slump and lordosis postures would result in more pressure concentration in the feet and buttocks, respectively, which was incorporated in the classification algorithm of this study. The result of this study could be extended to the real-time or offline monitoring of the sitting posture.
APA, Harvard, Vancouver, ISO, and other styles
45

Enweiji, Musbah Zaid, Taras Lehinevych, and Аndrey Glybovets. "CROSS-LANGUAGE TEXT CLASSIFICATION WITH CONVOLUTIONAL NEURAL NETWORKS FROM SCRATCH." EUREKA: Physics and Engineering 2 (March 31, 2017): 24–33. http://dx.doi.org/10.21303/2461-4262.2017.00304.

Full text
Abstract:
Cross language classification is an important task in multilingual learning, where documents in different languages often share the same set of categories. The main goal is to reduce the labeling cost of training classification model for each individual language. The novel approach by using Convolutional Neural Networks for multilingual language classification is proposed in this article. It learns representation of knowledge gained from languages. Moreover, current method works for new individual language, which was not used in training. The results of empirical study on large dataset of 21 languages demonstrate robustness and competitiveness of the presented approach.
APA, Harvard, Vancouver, ISO, and other styles
46

Gao, Dehong, Furu Wei, Wenjie Li, Xiaohua Liu, and Ming Zhou. "Cross-lingual Sentiment Lexicon Learning With Bilingual Word Graph Label Propagation." Computational Linguistics 41, no. 1 (March 2015): 21–40. http://dx.doi.org/10.1162/coli_a_00207.

Full text
Abstract:
In this article we address the task of cross-lingual sentiment lexicon learning, which aims to automatically generate sentiment lexicons for the target languages with available English sentiment lexicons. We formalize the task as a learning problem on a bilingual word graph, in which the intra-language relations among the words in the same language and the inter-language relations among the words between different languages are properly represented. With the words in the English sentiment lexicon as seeds, we propose a bilingual word graph label propagation approach to induce sentiment polarities of the unlabeled words in the target language. Particularly, we show that both synonym and antonym word relations can be used to build the intra-language relation, and that the word alignment information derived from bilingual parallel sentences can be effectively leveraged to build the inter-language relation. The evaluation of Chinese sentiment lexicon learning shows that the proposed approach outperforms existing approaches in both precision and recall. Experiments conducted on the NTCIR data set further demonstrate the effectiveness of the learned sentiment lexicon in sentence-level sentiment classification.
APA, Harvard, Vancouver, ISO, and other styles
47

Xu, Dong Dong, and Shao Bo Wu. "An Improved TFIDF Algorithm in Text Classification." Applied Mechanics and Materials 651-653 (September 2014): 2258–61. http://dx.doi.org/10.4028/www.scientific.net/amm.651-653.2258.

Full text
Abstract:
Term frequency/inverse document frequency (TF-IDF) is widely used in text classification at present, which is borrowed from Information Retrieval. Based on this conventional classical TF-IDF formula, we present a new TF-IDF weight schemes named CTF-IDF. The experiment shows that the improved method is feasible and effective. Furthermore, from the subsequent evaluations using 10-fold cross-validation, we can see the CTF-IDF greatly improves the accuracy of text classification.
APA, Harvard, Vancouver, ISO, and other styles
48

Fei, Rong, Quanzhu Yao, Yuanbo Zhu, Qingzheng Xu, Aimin Li, Haozheng Wu, and Bo Hu. "Deep Learning Structure for Cross-Domain Sentiment Classification Based on Improved Cross Entropy and Weight." Scientific Programming 2020 (June 29, 2020): 1–20. http://dx.doi.org/10.1155/2020/3810261.

Full text
Abstract:
Within the sentiment classification field, the convolutional neural network (CNN) and long short-term memory (LSTM) are praised for their classification and prediction performance, but their accuracy, loss rate, and time are not ideal. To this purpose, a deep learning structure combining the improved cross entropy and weight for word is proposed for solving cross-domain sentiment classification, which focuses on achieving better text sentiment classification by optimizing and improving recurrent neural network (RNN) and CNN. Firstly, we use the idea of hinge loss function (hinge loss) and the triplet loss function (triplet loss) to improve the cross entropy loss. The improved cross entropy loss function is combined with the CNN model and LSTM network which are tested in the two classification problems. Then, the LSTM binary-optimize (LSTM-BO) model and CNN binary-optimize (CNN-BO) model are proposed, which are more effective in fitting the predicted errors and preventing overfitting. Finally, considering the characteristics of the processing text of the recurrent neural network, the influence of input words for the final classification is analysed, which can obtain the importance of each word to the classification results. The experiment results show that within the same time, the proposed weight-recurrent neural network (W-RNN) model gives higher weight to words with stronger emotional tendency to reduce the loss of emotional information, which improves the accuracy of classification.
APA, Harvard, Vancouver, ISO, and other styles
49

ter, Pe, and Max well. "Co-Clustering based Classification Algorithm with Latent Semantic Relationship for Cross-Domain Text Classification throughWikipedia." Bonfring International Journal of Data Mining 7, no. 2 (May 31, 2017): 01–05. http://dx.doi.org/10.9756/bijdm.8330.

Full text
APA, Harvard, Vancouver, ISO, and other styles
50

Lee, Yong-Gu. "Classification Performance Analysis of Cross-Language Text Categorization using Machine Translation." Journal of the Korean Society for Library and Information Science 43, no. 1 (March 30, 2009): 313–32. http://dx.doi.org/10.4275/kslis.2009.43.1.313.

Full text
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography