Tesis sobre el tema "Cross lingual text classification"
Crea una cita precisa en los estilos APA, MLA, Chicago, Harvard y otros
Consulte los 24 mejores tesis para su investigación sobre el tema "Cross lingual text classification".
Junto a cada fuente en la lista de referencias hay un botón "Agregar a la bibliografía". Pulsa este botón, y generaremos automáticamente la referencia bibliográfica para la obra elegida en el estilo de cita que necesites: APA, MLA, Harvard, Vancouver, Chicago, etc.
También puede descargar el texto completo de la publicación académica en formato pdf y leer en línea su resumen siempre que esté disponible en los metadatos.
Explore tesis sobre una amplia variedad de disciplinas y organice su bibliografía correctamente.
Petrenz, Philipp. "Cross-lingual genre classification". Thesis, University of Edinburgh, 2014. http://hdl.handle.net/1842/9658.
Texto completoShih, Min-Chun. "Exploring Cross-lingual Sublanguage Classification with Multi-lingual Word Embeddings". Thesis, Linköpings universitet, Statistik och maskininlärning, 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-166148.
Texto completoTafreshi, Shabnam. "Cross-Genre, Cross-Lingual, and Low-Resource Emotion Classification". Thesis, The George Washington University, 2021. http://pqdtopen.proquest.com/#viewpdf?dispub=28088437.
Texto completoWeijand, Sasha. "AUTOMATED GENDER CLASSIFICATION IN WIKIPEDIA BIOGRAPHIESa cross-lingual comparison". Thesis, Umeå universitet, Institutionen för datavetenskap, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:umu:diva-163371.
Texto completoKrithivasan, Bhavani. "Cross-Language tweet classification using Bing Translator". Kansas State University, 2017. http://hdl.handle.net/2097/38556.
Texto completoDepartment of Computing and Information Sciences
Doina Caragea
Social media affects our daily lives. It is one of the first sources for finding breaking news. In particular, Twitter is one of the popular social media platforms, with around 330 million monthly users. From local events such as Fake Patty's Day to across the world happenings - Twitter gets there first. During a disaster, tweets can be used to post warnings, status of available medical and food supply, emergency personnel, and updates. Users were practically tweeting about the Hurricane Sandy, despite lack of network during the storm. Analysis of these tweets can help monitor the disaster, plan and manage the crisis, and aid in research. In this research, we use the publicly available tweets posted during several disasters and identify the relevant tweets. As the languages in the datasets are different, Bing translation API has been used to detect and translate the tweets. The translations are then, used as training datasets for supervised machine learning algorithms. Supervised learning is the process of learning from a labeled training dataset. This learned classifier can then be used to predict the correct output for any valid input. When trained to more observations, the algorithm improves its predictive performance.
Varga, Andrea. "Exploiting domain knowledge for cross-domain text classification in heterogeneous data sources". Thesis, University of Sheffield, 2014. http://etheses.whiterose.ac.uk/7538/.
Texto completoAsian, Jelita y jelitayang@gmail com. "Effective Techniques for Indonesian Text Retrieval". RMIT University. Computer Science and Information Technology, 2007. http://adt.lib.rmit.edu.au/adt/public/adt-VIT20080110.084651.
Texto completoMozafari, Marzieh. "Hate speech and offensive language detection using transfer learning approaches". Electronic Thesis or Diss., Institut polytechnique de Paris, 2021. http://www.theses.fr/2021IPPAS007.
Texto completoThe great promise of social media platforms (e.g., Twitter and Facebook) is to provide a safe place for users to communicate their opinions and share information. However, concerns are growing that they enable abusive behaviors, e.g., threatening or harassing other users, cyberbullying, hate speech, racial and sexual discrimination, as well. In this thesis, we focus on hate speech as one of the most concerning phenomenon in online social media.Given the high progression of online hate speech and its severe negative effects, institutions, social media platforms, and researchers have been trying to react as quickly as possible. The recent advancements in Natural Language Processing (NLP) and Machine Learning (ML) algorithms can be adapted to develop automatic methods for hate speech detection in this area.The aim of this thesis is to investigate the problem of hate speech and offensive language detection in social media, where we define hate speech as any communication criticizing a person or a group based on some characteristics, e.g., gender, sexual orientation, nationality, religion, race. We propose different approaches in which we adapt advanced Transfer Learning (TL) models and NLP techniques to detect hate speech and offensive content automatically, in a monolingual and multilingual fashion.In the first contribution, we only focus on English language. Firstly, we analyze user-generated textual content to gain a brief insight into the type of content by introducing a new framework being able to categorize contents in terms of topical similarity based on different features. Furthermore, using the Perspective API from Google, we measure and analyze the toxicity of the content. Secondly, we propose a TL approach for identification of hate speech by employing a combination of the unsupervised pre-trained model BERT (Bidirectional Encoder Representations from Transformers) and new supervised fine-tuning strategies. Finally, we investigate the effect of unintended bias in our pre-trained BERT based model and propose a new generalization mechanism in training data by reweighting samples and then changing the fine-tuning strategies in terms of the loss function to mitigate the racial bias propagated through the model. To evaluate the proposed models, we use two publicly available datasets from Twitter.In the second contribution, we consider a multilingual setting where we focus on low-resource languages in which there is no or few labeled data available. First, we present the first corpus of Persian offensive language consisting of 6k micro blog posts from Twitter to deal with offensive language detection in Persian as a low-resource language in this domain. After annotating the corpus, we perform extensive experiments to investigate the performance of transformer-based monolingual and multilingual pre-trained language models (e.g., ParsBERT, mBERT, XLM-R) in the downstream task. Furthermore, we propose an ensemble model to boost the performance of our model. Then, we expand our study into a cross-lingual few-shot learning problem, where we have a few labeled data in target language, and adapt a meta-learning based approach to address identification of hate speech and offensive language in low-resource languages
Franco, Salvador Marc. "A Cross-domain and Cross-language Knowledge-based Representation of Text and its Meaning". Doctoral thesis, Universitat Politècnica de València, 2017. http://hdl.handle.net/10251/84285.
Texto completoEl Procesamiento del Lenguaje Natural (PLN) es un campo de la informática, la inteligencia artificial y la lingüística computacional centrado en las interacciones entre las máquinas y el lenguaje de los humanos. Uno de sus mayores desafíos implica capacitar a las máquinas para inferir el significado del lenguaje natural humano. Con este propósito, diversas representaciones del significado y el contexto han sido propuestas obteniendo un rendimiento competitivo. Sin embargo, estas representaciones todavía tienen un margen de mejora en escenarios transdominios y translingües. En esta tesis estudiamos el uso de grafos de conocimiento como una representación transdominio y translingüe del texto y su significado. Un grafo de conocimiento es un grafo que expande y relaciona los conceptos originales pertenecientes a un conjunto de palabras. Sus propiedades se consiguen gracias al uso como base de conocimiento de una red semántica multilingüe de amplia cobertura. Esto permite tener una cobertura de cientos de lenguajes y millones de conceptos generales y específicos del ser humano. Como punto de partida de nuestra investigación empleamos características basadas en grafos de conocimiento - junto con otras tradicionales y meta-aprendizaje - para la tarea de PLN de clasificación de la polaridad mono- y transdominio. El análisis y conclusiones de ese trabajo muestra evidencias de que los grafos de conocimiento capturan el significado de una forma independiente del dominio. La siguiente parte de nuestra investigación aprovecha la capacidad de la red semántica multilingüe y se centra en tareas de Recuperación de Información (RI). Primero proponemos un modelo de análisis de similitud completamente basado en grafos de conocimiento para detección de plagio translingüe. A continuación, mejoramos ese modelo para cubrir palabras fuera de vocabulario y tiempos verbales, y lo aplicamos a las tareas translingües de recuperación de documentos, clasificación, y detección de plagio. Por último, estudiamos el uso de grafos de conocimiento para las tareas de PLN de respuesta de preguntas en comunidades, identificación del lenguaje nativo, y identificación de la variedad del lenguaje. Las contribuciones de esta tesis ponen de manifiesto el potencial de los grafos de conocimiento como representación transdominio y translingüe del texto y su significado en tareas de PLN y RI. Estas contribuciones han sido publicadas en diversas revistas y conferencias internacionales.
El Processament del Llenguatge Natural (PLN) és un camp de la informàtica, la intel·ligència artificial i la lingüística computacional centrat en les interaccions entre les màquines i el llenguatge dels humans. Un dels seus majors reptes implica capacitar les màquines per inferir el significat del llenguatge natural humà. Amb aquest propòsit, diverses representacions del significat i el context han estat proposades obtenint un rendiment competitiu. No obstant això, aquestes representacions encara tenen un marge de millora en escenaris trans-dominis i trans-llenguatges. En aquesta tesi estudiem l'ús de grafs de coneixement com una representació trans-domini i trans-llenguatge del text i el seu significat. Un graf de coneixement és un graf que expandeix i relaciona els conceptes originals pertanyents a un conjunt de paraules. Les seves propietats s'aconsegueixen gràcies a l'ús com a base de coneixement d'una xarxa semàntica multilingüe d'àmplia cobertura. Això permet tenir una cobertura de centenars de llenguatges i milions de conceptes generals i específics de l'ésser humà. Com a punt de partida de la nostra investigació emprem característiques basades en grafs de coneixement - juntament amb altres tradicionals i meta-aprenentatge - per a la tasca de PLN de classificació de la polaritat mono- i trans-domini. L'anàlisi i conclusions d'aquest treball mostra evidències que els grafs de coneixement capturen el significat d'una forma independent del domini. La següent part de la nostra investigació aprofita la capacitat\hyphenation{ca-pa-ci-tat} de la xarxa semàntica multilingüe i se centra en tasques de recuperació d'informació (RI). Primer proposem un model d'anàlisi de similitud completament basat en grafs de coneixement per a detecció de plagi trans-llenguatge. A continuació, vam millorar aquest model per cobrir paraules fora de vocabulari i temps verbals, i ho apliquem a les tasques trans-llenguatges de recuperació de documents, classificació, i detecció de plagi. Finalment, estudiem l'ús de grafs de coneixement per a les tasques de PLN de resposta de preguntes en comunitats, identificació del llenguatge natiu, i identificació de la varietat del llenguatge. Les contribucions d'aquesta tesi posen de manifest el potencial dels grafs de coneixement com a representació trans-domini i trans-llenguatge del text i el seu significat en tasques de PLN i RI. Aquestes contribucions han estat publicades en diverses revistes i conferències internacionals.
Franco Salvador, M. (2017). A Cross-domain and Cross-language Knowledge-based Representation of Text and its Meaning [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/84285
TESIS
van, Luenen Anne Fleur. "Recognising Moral Foundations in Online Extremist Discourse : A Cross-Domain Classification Study". Thesis, Uppsala universitet, Institutionen för lingvistik och filologi, 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-426921.
Texto completoTran, Thi Quynh Nhi. "Robust and comprehensive joint image-text representations". Thesis, Paris, CNAM, 2017. http://www.theses.fr/2017CNAM1096/document.
Texto completoThis thesis investigates the joint modeling of visual and textual content of multimedia documents to address cross-modal problems. Such tasks require the ability to match information across modalities. A common representation space, obtained by eg Kernel Canonical Correlation Analysis, on which images and text can be both represented and directly compared is a generally adopted solution.Nevertheless, such a joint space still suffers from several deficiencies that may hinder the performance of cross-modal tasks. An important contribution of this thesis is therefore to identify two major limitations of such a space. The first limitation concerns information that is poorly represented on the common space yet very significant for a retrieval task. The second limitation consists in a separation between modalities on the common space, which leads to coarse cross-modal matching. To deal with the first limitation concerning poorly-represented data, we put forward a model which first identifies such information and then finds ways to combine it with data that is relatively well-represented on the joint space. Evaluations on emph{text illustration} tasks show that by appropriately identifying and taking such information into account, the results of cross-modal retrieval can be strongly improved. The major work in this thesis aims to cope with the separation between modalities on the joint space to enhance the performance of cross-modal tasks.We propose two representation methods for bi-modal or uni-modal documents that aggregate information from both the visual and textual modalities projected on the joint space. Specifically, for uni-modal documents we suggest a completion process relying on an auxiliary dataset to find the corresponding information in the absent modality and then use such information to build a final bi-modal representation for a uni-modal document. Evaluations show that our approaches achieve state-of-the-art results on several standard and challenging datasets for cross-modal retrieval or bi-modal and cross-modal classification
Pagliarani, Andrea. "New markov chain based methods for single and cross-domain sentiment classification". Master's thesis, Alma Mater Studiorum - Università di Bologna, 2015. http://amslaurea.unibo.it/8445/.
Texto completoSaad, Motaz. "Fouille de documents et d'opinions multilingue". Thesis, Université de Lorraine, 2015. http://www.theses.fr/2015LORR0003/document.
Texto completoThe aim of this thesis is to study sentiments in comparable documents. First, we collect English, French and Arabic comparable corpora from Wikipedia and Euronews, and we align each corpus at the document level. We further gather English-Arabic news documents from local and foreign news agencies. The English documents are collected from BBC website and the Arabic documents are collected from Al-jazeera website. Second, we present a cross-lingual document similarity measure to automatically retrieve and align comparable documents. Then, we propose a cross-lingual sentiment annotation method to label source and target documents with sentiments. Finally, we use statistical measures to compare the agreement of sentiments in the source and the target pair of the comparable documents. The methods presented in this thesis are language independent and they can be applied on any language pair
Reimann, Sebastian Michael. "Multilingual Zero-Shot and Few-Shot Causality Detection". Thesis, Uppsala universitet, Institutionen för lingvistik och filologi, 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-446516.
Texto completoLasch, Alexander. "Nicoline Hortzitz, Die Sprache der Judenfeindschaft in der frühen Neuzeit (1450–1700): Untersuchungen zu Wortschatz, Text und Argumentation". De Gruyter, 2006. https://tud.qucosa.de/id/qucosa%3A74905.
Texto completoPollettini, Juliana Tarossi. "Auxílio na prevenção de doenças crônicas por meio de mapeamento e relacionamento conceitual de informações em biomedicina". Universidade de São Paulo, 2011. http://www.teses.usp.br/teses/disponiveis/95/95131/tde-24042012-223141/.
Texto completoGenomic medicine has suggested that the exposure to risk factors since conception may influence gene expression and consequently induce the development of chronic diseases in adulthood. Scientific papers bringing up these discoveries indicate that epigenetics must be exploited to prevent diseases of high prevalence, such as cardiovascular diseases, diabetes and obesity. A large amount of scientific information burdens health care professionals interested in being updated, once searches for accurate information become complex and expensive. Some computational techniques might support management of large biomedical information repositories and discovery of knowledge. This study presents a framework to support surveillance systems to alert health professionals about human development problems, retrieving scientific papers that relate chronic diseases to risk factors detected on a patient\'s clinical record. As a contribution, healthcare professionals will be able to create a routine with the family, setting up the best growing conditions. According to Butte, the effective transformation of results from biomedical research into knowledge that actually improves public health has been considered an important domain of informatics and has been called Translational Bioinformatics. Since chronic diseases are a serious health problem worldwide and leads the causes of mortality with 60% of all deaths, this scientific investigation will probably enable results from bioinformatics researches to directly benefit public health.
Michel, David. "All Negative on the Western Front: Analyzing the Sentiment of the Russian News Coverage of Sweden with Generic and Domain-Specific Multinomial Naive Bayes and Support Vector Machines Classifiers". Thesis, Uppsala universitet, Institutionen för lingvistik och filologi, 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-447398.
Texto completoKubalík, Jakub. "Mining of Textual Data from the Web for Speech Recognition". Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2010. http://www.nusl.cz/ntk/nusl-237170.
Texto completoChen, Guan-Yuan y 陳冠元. "Deep Transfer Learning for Cross-Lingual Text Classification Problems". Thesis, 2018. http://ndltd.ncl.edu.tw/handle/992hpt.
Texto completo國立清華大學
資訊系統與應用研究所
106
Recently, the data-driven machine learning approaches have shown their successes on many text classification tasks for a resource-abundant language. However, there are still many languages that lack of sufficient enough labeled data for carrying out the same specific tasks. They may be costly to obtain high-quality parallel corpus or cannot rely on automated machine translation due to unreliable or unavailable machine translation tools in those low-resource languages. In this work, we propose an effective transfer learning method in the scenarios where the large-scale cross-lingual data is not available. It combines transfer learning schemes of parameter sharing (parameter based) and domain adaptation (feature based) that are joint trained with high-resource and low-resource languages together. We conducted the cross-lingual transfer learning experiments on text classification on sentiment, subjectivity and question types from English to Chinese and from English to Vietnamese respectively. The experiments show that the proposed approach significantly outperformed the state-of-the-art models that are trained merely with monolingual data on the corresponding benchmarks.
Lin, Yen-Ting y 林彥廷. "Cross-Lingual Text Categorization". Thesis, 2004. http://ndltd.ncl.edu.tw/handle/82607711205882030045.
Texto completo國立中山大學
資訊管理學系研究所
92
With the emergence and proliferation of Internet services and e-commerce applications, a tremendous amount of information is accessible online, typically as textual documents. To facilitate subsequent access to and leverage from this information, the efficient and effective management—specifically, text categorization—of the ever-increasing volume of textual documents is essential to organizations and person. Existing text categorization techniques focus mainly on categorizing monolingual documents. However, with the globalization of business environments and advances in Internet technology, an organization or person often retrieves and archives documents in different languages, thus creating the need for cross-lingual text categorization. Motivated by the significance of and need for such a cross-lingual text categorization technique, this thesis designs a technique with two different category assignment methods, namely, individual- and cluster-based. The empirical evaluation results show that the cross-lingual text categorization technique performs well and the cluster-based method outperforms the individual-based method.
"Multi-lingual text retrieval and mining". 2003. http://library.cuhk.edu.hk/record=b5891637.
Texto completoThesis (M.Phil.)--Chinese University of Hong Kong, 2003.
Includes bibliographical references (leaves 130-134).
Abstracts in English and Chinese.
Chapter 1 --- Introduction --- p.1
Chapter 1.1 --- Cross-Lingual Information Retrieval (CLIR) --- p.2
Chapter 1.2 --- Bilingual Term Association Mining --- p.5
Chapter 1.3 --- Our Contributions --- p.6
Chapter 1.3.1 --- CLIR --- p.6
Chapter 1.3.2 --- Bilingual Term Association Mining --- p.7
Chapter 1.4 --- Thesis Organization --- p.8
Chapter 2 --- Related Work --- p.9
Chapter 2.1 --- CLIR Techniques --- p.9
Chapter 2.1.1 --- Existing Approaches --- p.9
Chapter 2.1.2 --- Difference Between Our Model and Existing Approaches --- p.13
Chapter 2.2 --- Bilingual Term Association Mining Techniques --- p.13
Chapter 2.2.1 --- Existing Approaches --- p.13
Chapter 2.2.2 --- Difference Between Our Model and Existing Approaches --- p.17
Chapter 3 --- Cross-Lingual Information Retrieval (CLIR) --- p.18
Chapter 3.1 --- Cross-Lingual Query Processing and Translation --- p.18
Chapter 3.1.1 --- Query Context and Document Context Generation --- p.20
Chapter 3.1.2 --- Context-Based Query Translation --- p.23
Chapter 3.1.3 --- Query Term Weighting --- p.28
Chapter 3.1.4 --- Final Weight Calculation --- p.30
Chapter 3.2 --- Retrieval on Documents and Automated Summaries --- p.32
Chapter 4 --- Experiments on Cross-Lingual Information Retrieval --- p.38
Chapter 4.1 --- Experimental Setup --- p.38
Chapter 4.2 --- Results of English-to-Chinese Retrieval --- p.45
Chapter 4.2.1 --- Using Mono-Lingual Retrieval as the Gold Standard --- p.45
Chapter 4.2.2 --- Using Human Relevance Judgments as the Gold Stan- dard --- p.49
Chapter 4.3 --- Results of Chinese-to-English Retrieval --- p.53
Chapter 4.3.1 --- Using Mono-lingual Retrieval as the Gold Standard --- p.53
Chapter 4.3.2 --- Using Human Relevance Judgments as the Gold Stan- dard --- p.57
Chapter 5 --- Discovering Comparable Multi-lingual Online News for Text Mining --- p.61
Chapter 5.1 --- Story Representation --- p.62
Chapter 5.2 --- Gloss Translation --- p.64
Chapter 5.3 --- Comparable News Discovery --- p.67
Chapter 6 --- Mining Bilingual Term Association Based on Co-occurrence --- p.75
Chapter 6.1 --- Bilingual Term Cognate Generation --- p.75
Chapter 6.2 --- Term Mining Algorithm --- p.77
Chapter 7 --- Phonetic Matching --- p.87
Chapter 7.1 --- Algorithm Design --- p.87
Chapter 7.2 --- Discovering Associations of English Terms and Chinese Terms --- p.93
Chapter 7.2.1 --- Converting English Terms into Phonetic Representation --- p.93
Chapter 7.2.2 --- Discovering Associations of English Terms and Man- darin Chinese Terms --- p.100
Chapter 7.2.3 --- Discovering Associations of English Terms and Can- tonese Chinese Terms --- p.104
Chapter 8 --- Experiments on Bilingual Term Association Mining --- p.111
Chapter 8.1 --- Experimental Setup --- p.111
Chapter 8.2 --- Result and Discussion of Bilingual Term Association Mining Based on Co-occurrence --- p.114
Chapter 8.3 --- Result and Discussion of Phonetic Matching --- p.121
Chapter 9 --- Conclusions and Future Work --- p.126
Chapter 9.1 --- Conclusions --- p.126
Chapter 9.1.1 --- CLIR --- p.126
Chapter 9.1.2 --- Bilingual Term Association Mining --- p.127
Chapter 9.2 --- Future Work --- p.128
Bibliography --- p.134
Chapter A --- Original English Queries --- p.135
Chapter B --- Manual translated Chinese Queries --- p.137
Chapter C --- Pronunciation symbols used by the PRONLEX Lexicon --- p.139
Chapter D --- Initial Letter-to-Phoneme Tags --- p.141
Chapter E --- English Sounds with their Chinese Equivalents --- p.143
Hsu, Kai-hsiang y 許凱翔. "Cross-Lingual Text Categorization: A Training-corpus Translation-based Approach". Thesis, 2005. http://ndltd.ncl.edu.tw/handle/29566553950618841626.
Texto completo國立中山大學
資訊管理學系研究所
93
Text categorization deals with the automatic learning of a text categorization model from a training set of preclassified documents on the basis of their contents and the assignment of unclassified documents to appropriate categories. Most of existing text categorization techniques deal with monolingual documents (i.e., all documents are written in one language) during the text categorization model learning and category assignment (or prediction). However, with the globalization of business environments and advances in Internet technology, an organization or individual often generates/acquires and subsequently archives documents in different languages, thus creating the need for cross-lingual text categorization (CLTC). Existing studies on CLTC focus on the prediction-corpus translation-based approach that lacks of a systematic mechanism for reducing translation noises; thus, limiting their cross-lingual categorization effectiveness. Motivated by the needs of providing more effective CLTC support, we design a training-corpus translation-based CLTC approach. Using the prediction-corpus translation-based approach as the performance benchmark, our empirical evaluation results show that our proposed CLTC approach achieves significantly better classification effectiveness than the benchmark approach does in both Chinese
Farra, Noura. "Cross-Lingual and Low-Resource Sentiment Analysis". Thesis, 2019. https://doi.org/10.7916/d8-x3b7-1r92.
Texto completoCHIU, HUANG-CHIEH y 邱皇傑. "Improving Cross-Lingual Retrieval of Healthcare Questions by Classification of Healthcare Information Needs". Thesis, 2018. http://ndltd.ncl.edu.tw/handle/55z747.
Texto completo慈濟大學
醫學資訊學系碩士班
107
People often use the Internet to find answers of healthcare questions. Many healthcare information websites thus collect and maintain a database of frequently asked questions (FAQs) answered by healthcare professionals. However, with the increasing amount of FAQs, users are difficult to identify specific FAQs to satisfy their information needs, and moreover many reliable healthcare FAQs are written in English. Therefore, we propose a technique to rank English healthcare FAQs with respect to Chinese healthcare questions. The technique considers information need aspects, which indicate basic types of healthcare information required by people. By recognizing the aspects, our technique can improve the performance of various kinds of FAQ retrievers. Empirical evaluation on thousands of English and Chinese healthcare FAQs show that our technique can significantly enhance several kinds of FAQ retrievers. Our technique can thus help the users to find reliable answers of healthcare questions.