Relevant bibliographies by topics / Interpretable Textual Semantic Similarity

Journal articles
Dissertations / Theses
Book chapters
Conference papers

Academic literature on the topic 'Interpretable Textual Semantic Similarity'

Author: Grafiati

Published: 5 June 2025

Last updated: 24 June 2025

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the lists of relevant articles, books, theses, conference reports, and other scholarly sources on the topic 'Interpretable Textual Semantic Similarity.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Journal articles on the topic "Interpretable Textual Semantic Similarity"

Abdo, Ababor Abafogi. "Survey on Interpretable Semantic Textual Similarity, and its Applications." International Journal of Innovative Technology and Exploring Engineering (IJITEE) 10, no. 3 (2021): 14–18. https://doi.org/10.35940/ijitee.B8294.0110321.

Full text

Abstract:

Both semantic representation and related natural language processing(NLP) tasks has become more popular due to the introduction of distributional semantics. Semantic textual similarity (STS)is one of a task in NLP, it determinesthe similarity based onthe meanings of two shorttexts (sentences). Interpretable STS is the way of giving explanation to semantic similarity between short texts. Giving interpretation is indeedpossible tohuman, but, constructing computational modelsthat explain as human level is challenging. The interpretable STS task give output in natural way with a continuous value on the scale from [0, 5] that represents the strength of semantic relation between pair sentences, where 0 is no similarity and 5 is complete similarity. This paper review all available methods were used in interpretable STS computation, classify them, specifyan existing limitations, and finally give directions for future work. This paper is organized the survey into nine sections as follows: firstly introduction at glance, then chunking techniques and available tools, the next one is rule based approach, the fourth section focus on machine learning approach, after that about works done via neural network, and the finally hybrid approach concerned. Application of interpretable STS, conclusion and future direction is also part of this paper.

APA, Harvard, Vancouver, ISO, and other styles

Abafogi, Abdo Ababor. "Survey on Interpretable Semantic Textual Similarity, and its Applications." International Journal of Innovative Technology and Exploring Engineering 10, no. 3 (2021): 14–18. http://dx.doi.org/10.35940/ijitee.b8294.0110321.

Full text

Abstract:

APA, Harvard, Vancouver, ISO, and other styles

Lopez-Gazpio, I., M. Maritxalar, A. Gonzalez-Agirre, G. Rigau, L. Uria, and E. Agirre. "Interpretable semantic textual similarity: Finding and explaining differences between sentences." Knowledge-Based Systems 119 (March 2017): 186–99. http://dx.doi.org/10.1016/j.knosys.2016.12.013.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Majumder, Goutam, Partha Pakray, Ranjita Das, and David Pinto. "Interpretable semantic textual similarity of sentences using alignment of chunks with classification and regression." Applied Intelligence 51, no. 10 (2021): 7322–49. http://dx.doi.org/10.1007/s10489-020-02144-x.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Lowin, Maximilian. "A Text-Based Predictive Maintenance Approach for Facility Management Requests Utilizing Association Rule Mining and Large Language Models." Machine Learning and Knowledge Extraction 6, no. 1 (2024): 233–58. http://dx.doi.org/10.3390/make6010013.

Full text

Abstract:

Introduction: Due to the lack of labeled data, applying predictive maintenance algorithms for facility management is cumbersome. Most companies are unwilling to share data or do not have time for annotation. In addition, most available facility management data are text data. Thus, there is a need for an unsupervised predictive maintenance algorithm that is capable of handling textual data. Methodology: This paper proposes applying association rule mining on maintenance requests to identify upcoming needs in facility management. By coupling temporal association rule mining with the concept of semantic similarity derived from large language models, the proposed methodology can discover meaningful knowledge in the form of rules suitable for decision-making. Results: Relying on the large German language models works best for the presented case study. Introducing a temporal lift filter allows for reducing the created rules to the most important ones. Conclusions: Only a few maintenance requests are sufficient to mine association rules that show links between different infrastructural failures. Due to the unsupervised manner of the proposed algorithm, domain experts need to evaluate the relevance of the specific rules. Nevertheless, the algorithm enables companies to efficiently utilize their data stored in databases to create interpretable rules supporting decision-making.

APA, Harvard, Vancouver, ISO, and other styles

Ismail, Shimaa, AbdelWahab Alsammak, and Tarek Elshishtawy. "Arabic Semantic-Based Textual Similarity." Benha Journal of Applied Sciences 7, no. 4 (2022): 133–42. http://dx.doi.org/10.21608/bjas.2022.254708.

Full text

APA, Harvard, Vancouver, ISO, and other styles

McCrae, John P., and Paul Buitelaar. "Linking Datasets Using Semantic Textual Similarity." Cybernetics and Information Technologies 18, no. 1 (2018): 109–23. http://dx.doi.org/10.2478/cait-2018-0010.

Full text

Abstract:

Abstract Linked data has been widely recognized as an important paradigm for representing data and one of the most important aspects of supporting its use is discovery of links between datasets. For many datasets, there is a significant amount of textual information in the form of labels, descriptions and documentation about the elements of the dataset and the fundament of a precise linking is in the application of semantic textual similarity to link these datasets. However, most linking tools so far rely on only simple string similarity metrics such as Jaccard scores. We present an evaluation of some metrics that have performed well in recent semantic textual similarity evaluations and apply these to linking existing datasets.

APA, Harvard, Vancouver, ISO, and other styles

John, P. McCrae, and Buitelaar Paul. "Linking Datasets Using Semantic Textual Similarity." Cybernetics and Information Technologies 18, no. 1 (2018): 109–23. https://doi.org/10.2478/cait-2018-0010.

Full text

Abstract:

Linked data has been widely recognized as an important paradigm for representing data and one of the most important aspects of supporting its use is discovery of links between datasets. For many datasets, there is a significant amount of textual information in the form of labels, descriptions and documentation about the elements of the dataset and the fundament of a precise linking is in the application of semantic textual similarity to link these datasets. However, most linking tools so far rely on only simple string similarity metrics such as Jaccard scores. We present an evaluation of some metrics that have performed well in recent semantic textual similarity evaluations and apply these to linking existing datasets

APA, Harvard, Vancouver, ISO, and other styles

Rao, N. Srinivas. "Text Summarization Based on Semantic Similarity." INTERANTIONAL JOURNAL OF SCIENTIFIC RESEARCH IN ENGINEERING AND MANAGEMENT 08, no. 04 (2024): 1–5. http://dx.doi.org/10.55041/ijsrem32218.

Full text

Abstract:

In the contemporary information age, the sheer volume of textual data poses a significant challenge for efficient comprehension and utilization. This project endeavors to address this challenge by developing a Text Summarization System grounded in semantic similarities. The primary goal is to create a robust and intuitive tool that extracts key information from large textual datasets, offering users a concise and meaningful summary. The proposed system employs advanced Natural Language Processing (NLP) techniques to analyze the semantic relationships within the text. Rather than relying solely on syntactic structures, the model identifies and leverages semantic similarities, such as shared concepts, themes, and contextual relationships, to distill the essential content. This approach enhances the summarization process by ensuring that the generated summaries reflect a deeper understanding of the underlying semantics, thereby capturing the core meaning of the text. Throughout the development of this project, the B.Tech student will delve into the intricacies of semantic analysis, exploring techniques to recognize and prioritize key concepts. The system's effectiveness will be evaluated through rigorous testing on diverse textual datasets, assessing its ability to generate coherent and relevant summaries across various domains. This project not only contributes to the field of NLP but also has practical applications in information retrieval, document summarization, and content curation. By providing an innovative solution to the challenges of information overload, the Text Summarization System based on semantic similarities offers a valuable tool for enhancing efficiency in information processing and decision-making. Index terms Text Summarization, Semantic Similarities, Natural Language Processing (NLP), Semantic Analysis, Information Retrieval, Document Summarization, Content Curation, Information Overload, Decision Making, Textual Data Analysis, Key Concept Recognition, Conceptual Relationships, Syntactic Structures, Semantic Understanding, Textual Datasets Evalution.

APA, Harvard, Vancouver, ISO, and other styles

Luo, Jiajia, Hongtao Shan, Gaoyu Zhang, et al. "Exploiting Syntactic and Semantic Information for Textual Similarity Estimation." Mathematical Problems in Engineering 2021 (January 23, 2021): 1–12. http://dx.doi.org/10.1155/2021/4186750.

Full text

Abstract:

The textual similarity task, which measures the similarity between two text pieces, has recently received much attention in the natural language processing (NLP) domain. However, due to the vagueness and diversity of language expression, only considering semantic or syntactic features, respectively, may cause the loss of critical textual knowledge. This paper proposes a new type of structure tree for sentence representation, which exploits both syntactic (structural) and semantic information known as the weight vector dependency tree (WVD-tree). WVD-tree comprises structure trees with syntactic information along with word vectors representing semantic information of the sentences. Further, Gaussian attention weight is proposed for better capturing important semantic features of sentences. Meanwhile, we design an enhanced tree kernel to calculate the common parts between two structures for similarity judgment. Finally, WVD-tree is tested on widely used semantic textual similarity tasks. The experimental results prove that WVD-tree can effectively improve the accuracy of sentence similarity judgments.

APA, Harvard, Vancouver, ISO, and other styles

More sources

Dissertations / Theses on the topic "Interpretable Textual Semantic Similarity"

Vo, Ngoc Phuoc An. "Contributions to Semantic Textual Similarity Algorithms." Doctoral thesis, Università degli studi di Trento, 2016. https://hdl.handle.net/11572/369262.

Full text

Abstract:

Similarity plays a central role in language understanding process. However, it is always difficult to precisely define on which type of data and what similarity metrics we can apply in order to assess the similarity of two texts. According to this spirit, the task Semantic Textual Similarity (STS) was introduced as a pilot task at the Semantic Evaluation (SemEval) workshop in year 2012. This thesis seeks to investigate the variances of performance of STS systems with respect to the heterogeneous data sources, and find solutions to alleviate these variances to improve the system performance. We carry a series of works focusing on addressing different aspects of measuring semantic similarity for texts under the umbrella of the Semantic Textual Similarity task. Firstly, we analyze the variance of system performance on dierent corpora with preliminary experiments and propose the hypothesis that system performance depends heavily on the type of train and test corpora coming from heterogeneous sources. We analyze a standard textual similarity model built on vectorial representation and we derive a couple of modalities which help significantly alleviating the negative in influence of vectorial mapping model. In particular, we study how structural information and the most advanced word alignment models in Machine Translation improve the accuracy of systems. Our analysis also leads us to carry out, for the first time, an analysis between Semantic Relatedness and Textual Entailment, then we propose a co-learning model to improve the accuracy on both tasks by exploiting their mutual relationship. As a result, all these steps lead to a consistent improvement over the standard model which is manifested across corpora. The evaluation shows that our system systematically achieves and goes beyond the former state of the art, whereas it also reduces the variation of the accuracy on various types of corpora.

APA, Harvard, Vancouver, ISO, and other styles

Vo, Ngoc Phuoc An. "Contributions to Semantic Textual Similarity Algorithms." Doctoral thesis, University of Trento, 2016. http://eprints-phd.biblio.unitn.it/1735/1/PhD-Thesis_VO.pdf.

Full text

Abstract:

APA, Harvard, Vancouver, ISO, and other styles

Gaona, Miguel Angel Rios. "Methods for measuring semantic similarity of texts." Thesis, University of Wolverhampton, 2014. http://hdl.handle.net/2436/346894.

Full text

Abstract:

Measuring semantic similarity is a task needed in many Natural Language Processing (NLP) applications. For example, in Machine Translation evaluation, semantic similarity is used to assess the quality of the machine translation output by measuring the degree of equivalence between a reference translation and the machine translation output. The problem of semantic similarity (Corley and Mihalcea, 2005) is de ned as measuring and recognising semantic relations between two texts. Semantic similarity covers di erent types of semantic relations, mainly bidirectional and directional. This thesis proposes new methods to address the limitations of existing work on both types of semantic relations. Recognising Textual Entailment (RTE) is a directional relation where a text T entails the hypothesis H (entailment pair) if the meaning of H can be inferred from the meaning of T (Dagan and Glickman, 2005; Dagan et al., 2013). Most of the RTE methods rely on machine learning algorithms. de Marne e et al. (2006) propose a multi-stage architecture where a rst stage determines an alignment between the T-H pairs to be followed by an entailment decision stage. A limitation of such approaches is that instead of recognising a non-entailment, an alignment that ts an optimisation criterion will be returned, but the alignment by itself is a poor predictor for iii non-entailment. We propose an RTE method following a multi-stage architecture, where both stages are based on semantic representations. Furthermore, instead of using simple similarity metrics to predict the entailment decision, we use a Markov Logic Network (MLN). The MLN is based on rich relational features extracted from the output of the predicate-argument alignment structures between T-H pairs. This MLN learns to reward pairs with similar predicates and similar arguments, and penalise pairs otherwise. The proposed methods show promising results. A source of errors was found to be the alignment step, which has low coverage. However, we show that when an alignment is found, the relational features improve the nal entailment decision. The task of Semantic Textual Similarity (STS) (Agirre et al., 2012) is de- ned as measuring the degree of bidirectional semantic equivalence between a pair of texts. The STS evaluation campaigns use datasets that consist of pairs of texts from NLP tasks such as Paraphrasing and Machine Translation evaluation. Methods for STS are commonly based on computing similarity metrics between the pair of sentences, where the similarity scores are used as features to train regression algorithms. Existing methods for STS achieve high performances over certain tasks, but poor results over others, particularly on unknown (surprise) tasks. Our solution to alleviate this unbalanced performances is to model STS in the context of Multi-task Learning using Gaussian Processes (MTL-GP) ( Alvarez et al., 2012) and state-of-the-art iv STS features ( Sari c et al., 2012). We show that the MTL-GP outperforms previous work on the same datasets.

APA, Harvard, Vancouver, ISO, and other styles

Nanda, Rohan <1990&gt. "Automated Identification of National Implementations of European Union Directives with Multilingual Information Retrieval based on Semantic Textual Similarity." Doctoral thesis, Alma Mater Studiorum - Università di Bologna, 2019. http://amsdottorato.unibo.it/8978/3/NANDA_ROHAN_tesi.pdf.

Full text

Abstract:

The effective transposition of European Union (EU) directives into Member States is important to achieve the policy goals defined in the Treaties and secondary legislation. National Implementing Measures (NIMs) are the legal texts officially adopted by the Member States to transpose the provisions of an EU directive. The measures undertaken by the Commission to monitor NIMs are time-consuming and expensive, as they resort to manual conformity checking studies and legal analysis. In this thesis, we developed a legal information retrieval system using semantic textual similarity techniques to automatically identify the transposition of EU directives into the national law at a fine-grained provision level. We modeled and developed various text similarity approaches such as lexical, semantic, knowledge-based, embeddings-based and concept-based methods. The text similarity systems utilized both textual features (tokens, N-grams, topic models, word and paragraph embeddings) and semantic knowledge from external knowledge bases (EuroVoc, IATE and Babelfy) to identify transpositions. This thesis work also involved the development of a multilingual corpus of 43 directives and their corresponding NIMs from Ireland (English legislation), Italy (Italian legislation) and Luxembourg (French legislation) to validate the text similarity based information retrieval system. A gold standard mapping (prepared by two legal researchers) between directive articles and NIM provisions was prepared to evaluate the various text similarity models. The results show that the lexical and semantic text similarity techniques were more effective in identifying transpositions as compared to the embeddings-based techniques. We also observed that the unsupervised text similarity techniques had the best performance in case of the Luxembourg Directive-NIM corpus.

APA, Harvard, Vancouver, ISO, and other styles

Morbieu, Stanislas. "Leveraging textual embeddings for unsupervised learning." Electronic Thesis or Diss., Université Paris Cité, 2020. http://www.theses.fr/2020UNIP5191.

Full text

Abstract:

Les données textuelles constituent un vivier d'information exploitable pour de nombreuses entreprises. En particulier, le web fournit une source quasiment inépuisable de données textuelles qui peuvent être utilisées à profit pour des systèmes de recommandation, de veille, de recherche d'information, etc. Les récentes avancées en traitement du langage naturel ont permit de capturer le sens des mots dans leur contexte afin d'améliorer les systèmes de traduction, de résumés, ou encore le regroupement de documents suivant des catégories prédéfinies. La majorité de ces applications reposent cependant souvent sur une intervention humaine non négligeable pour annoter des corpus : Elle consiste, par exemple, à fournir aux algorithmes des exemples d'affectation de catégories à des documents. L'algorithme apprend donc à reproduire le jugement humain et l'applique pour de nouveaux documents. L'objet de cette thèse est de tirer profit des dernières avancées qui capturent l'information sémantique du texte pour l'appliquer dans un cadre non supervisé. Les contributions s'articulent autour de trois axes principaux. Dans le premier, nous proposons une méthode pour transférer l'information capturée par un réseau neuronal pour de la classification croisée textuelle. Elle consiste à former simultanément des groupes de documents similaires et des groupes de mots cohérents. Ceci facilite l'interprétation d'un grand corpus puisqu'on peut caractériser des groupes de documents par des groupes de mots, résumant ainsi une grande volumétrie de texte. Plus précisément nous entraînons l'algorithme Paragraph Vectors sur un jeu de données augmenté en faisant varier les différents hyperparamètres, classifions les documents à partir des différentes représentations vectorielles obtenues et cherchons un consensus sur des différentes partitions. Une classification croisée contrainte de la matrice de co-occurrences termes-documents est ensuite appliquée pour conserver le partitionnement consensus obtenu. Cette méthode se révèle significativement meilleure en qualité de partitionnement des documents sur des corpus variés et a l'avantage de l'interprétation offerte par la classification croisée. Deuxièmement, nous présentons une méthode pour évaluer des algorithmes de classification croisée en exploitant des représentation vectorielles de mots appelées word embeddings. Il s’agit de vecteurs construits grâce à de gros volumes de textes, dont une caractéristique majeure est que deux mots sémantiquement proches ont des word embeddings proches selon une distance cosinus. Notre méthode permet de mesurer l'adéquation entre les partitions de documents et de mots, offrant ainsi de manière totalement non supervisée un indice de la qualité de la classification croisée. Troisièmement, nous proposons un système qui permet de recommander des petites annonces similaires lorsqu'on en consulte une. Leurs descriptions sont souvent courtes, syntaxiquement incorrectes, et l'utilisation de synonymes font qu'il est difficile pour des systèmes traditionnels de mesurer fidèlement la similarité sémantique. De plus, le fort renouvellement des annonces encore valides (produit non vendu) implique des choix permettant d’avoir un faible temps de calcul. Notre méthode, simple à implémenter, répond à ce cas d'usage et s'appuie de nouveau sur les word embeddings. L'utilisation de ceux-ci présente certains avantages mais impliquent également quelques difficultés : la création de tels vecteurs nécessite de choisir les valeurs de certains paramètres, et la différence entre le corpus sur lequel les word embeddings ont été construit et celui sur lequel ils sont utilisés fait émerger le problème des mots qui n'ont pas de représentation vectorielle. Nous présentons, pour palier ces problèmes, une analyse de l'impact des différents paramètres sur les word embeddings ainsi qu'une étude des méthodes permettant de traiter le problème de « mots en dehors du vocabulaire » Textual data is ubiquitous and is a useful information pool for many companies. In particular, the web provides an almost inexhaustible source of textual data that can be used for recommendation systems, business or technological watch, information retrieval, etc. Recent advances in natural language processing have made possible to capture the meaning of words in their context in order to improve automatic translation systems, text summary, or even the classification of documents according to predefined categories. However, the majority of these applications often rely on a significant human intervention to annotate corpora: This annotation consists, for example in the context of supervised classification, in providing algorithms with examples of assigning categories to documents. The algorithm therefore learns to reproduce human judgment in order to apply it for new documents. The object of this thesis is to take advantage of these latest advances which capture the semantic of the text and use it in an unsupervised framework. The contributions of this thesis revolve around three main axes. First, we propose a method to transfer the information captured by a neural network for co-clustering of documents and words. Co-clustering consists in partitioning the two dimensions of a data matrix simultaneously, thus forming both groups of similar documents and groups of coherent words. This facilitates the interpretation of a large corpus of documents since it is possible to characterize groups of documents by groups of words, thus summarizing a large corpus of text. More precisely, we train the Paragraph Vectors algorithm on an augmented dataset by varying the different hyperparameters, classify the documents from the different vector representations and apply a consensus algorithm on the different partitions. A constrained co-clustering of the co-occurrence matrix between terms and documents is then applied to maintain the consensus partitioning. This method is found to result in significantly better quality of document partitioning on various document corpora and provides the advantage of the interpretation offered by the co-clustering. Secondly, we present a method for evaluating co-clustering algorithms by exploiting vector representations of words called word embeddings. Word embeddings are vectors constructed using large volumes of text, one major characteristic of which is that two semantically close words have word embeddings close by a cosine distance. Our method makes it possible to measure the matching between the partition of the documents and the partition of the words, thus offering in a totally unsupervised setting a measure of the quality of the co-clustering. Thirdly, we are interested in recommending classified ads. We present a system that allows to recommend similar classified ads when consulting one. The descriptions of classified ads are often short, syntactically incorrect, and the use of synonyms makes it difficult for traditional systems to accurately measure semantic similarity. In addition, the high renewal rate of classified ads that are still valid (product not sold) implies choices that make it possible to have low computation time. Our method, simple to implement, responds to this use case and is again based on word embeddings. The use of these has advantages but also involves some difficulties: the creation of such vectors requires choosing the values of some parameters, and the difference between the corpus on which the word embeddings were built upstream. and the one on which they are used raises the problem of out-of-vocabulary words, which have no vector representation. To overcome these problems, we present an analysis of the impact of the different parameters on word embeddings as well as a study of the methods allowing to deal with the problem of out-of-vocabulary words

APA, Harvard, Vancouver, ISO, and other styles

Guzmán, Alejandro Tarafa. "Modelo para sumarização computacional de textos científicos." Universidade de São Paulo, 2017. http://www.teses.usp.br/teses/disponiveis/3/3139/tde-10082017-145217/.

Full text

Abstract:

Neste trabalho, propõe-se um modelo para a sumarização computacional extrativa de textos de artigos técnico-cientificos em inglês. A metodologia utilizada baseia-se em um módulo de avaliação de similaridade semântica textual entre sentenças, desenvolvido especialmente para integrar o modelo de sumarização. A aplicação deste módulo de similaridade à extração de sentenças é feita por intermédio do conceito de uma janela deslizante de comprimento variável, que facilita a detecção de equivalência semântica entre frases do artigo e aquelas de um léxico de frases típicas, atribuíveis a uma estrutura básica dos artigos. Os sumários obtidos em aplicações do modelo apresentam qualidade razoável e utilizável, para os efeitos de antecipar a informação contida nos artigos. In this work a model is proposed for the computational extractive summarization of scientific papers in English. Its methodology is based on a semantic textual similarity module, for the evaluation of equivalence between sentences, specially developed to integrate the summarization model. A variable width window facilitates the application of this module to detect semantic similarity between phrases in the article and those in a basic structure, assignable to the articles. Practical summaries obtained with the model show usable quality to anticipate the information found in the papers.

APA, Harvard, Vancouver, ISO, and other styles

Ferrero, Jérémy. "Similarités textuelles sémantiques translingues : vers la détection automatique du plagiat par traduction." Thesis, Université Grenoble Alpes (ComUE), 2017. http://www.theses.fr/2017GREAM088/document.

Full text

Abstract:

La mise à disposition massive de documents via Internet (pages Web, entrepôts de données,documents numériques, numérisés ou retranscrits, etc.) rend de plus en plus aisée la récupération d’idées. Malheureusement, ce phénomène s’accompagne d’une augmentation des cas de plagiat.En effet, s’approprier du contenu, peu importe sa forme, sans le consentement de son auteur (ou de ses ayants droit) et sans citer ses sources, dans le but de le présenter comme sa propre œuvre ou création est considéré comme plagiat. De plus, ces dernières années, l’expansion d’Internet a également facilité l’accès à des documents du monde entier (écrits dans des langues étrangères)et à des outils de traduction automatique de plus en plus performants, accentuant ainsi la progression d’un nouveau type de plagiat : le plagiat translingue. Ce plagiat implique l’emprunt d’un texte tout en le traduisant (manuellement ou automatiquement) de sa langue originale vers la langue du document dans lequel le plagiaire veut l’inclure. De nos jours, la prévention du plagiat commence à porter ses fruits, grâce notamment à des logiciels anti-plagiat performants qui reposent sur des techniques de comparaison monolingue déjà bien éprouvées. Néanmoins, ces derniers ne traitent pas encore de manière efficace les cas translingues. Cette thèse est née du besoin de Compilatio, une société d’édition de l’un de ces logiciels anti-plagiat, de mesurer des similarités textuelles sémantiques translingues (sous-tâche de la détection du plagiat). Après avoir défini le plagiat et les différents concepts abordés au cours de cette thèse, nous établissons un état de l’art des différentes approches de détection du plagiat translingue. Nousprésentons également les différents corpus déjà existants pour la détection du plagiat translingue et exposons les limites qu’ils peuvent rencontrer lors d’une évaluation de méthodes de détection du plagiat translingue. Nous présentons ensuite le corpus que nous avons constitué et qui ne possède pas la plupart des limites rencontrées par les différents corpus déjà existants. Nous menons,à l’aide de ce nouveau corpus, une évaluation de plusieurs méthodes de l’état de l’art et découvrons que ces dernières se comportent différemment en fonction de certaines caractéristiques des textes sur lesquelles elles opèrent. Ensuite, nous présentons des nouvelles méthodes de mesure de similarités textuelles sémantiques translingues basées sur des représentations continues de mots(word embeddings). Nous proposons également une notion de pondération morphosyntaxique et fréquentielle de mots, qui peut aussi bien être utilisée au sein d’un vecteur qu’au sein d’un sac de mots, et nous montrons que son introduction dans ces nouvelles méthodes augmente leurs performances respectives. Nous testons ensuite différents systèmes de fusion et combinaison entre différentes méthodes et étudions les performances, sur notre corpus, de ces méthodes et fusions en les comparant à celles des méthodes de l’état de l’art. Nous obtenons ainsi de meilleurs résultats que l’état de l’art dans la totalité des sous-corpus étudiés. Nous terminons en présentant et discutant les résultats de ces méthodes lors de notre participation à la tâche de similarité textuelle sémantique (STS) translingue de la campagne d’évaluation SemEval 2017, où nous nous sommes classés 1er à la sous-tâche correspondant le plus au scénario industriel de Compilatio The massive amount of documents through the Internet (e.g. web pages, data warehouses anddigital or transcribed texts) makes easier the recycling of ideas. Unfortunately, this phenomenonis accompanied by an increase of plagiarism cases. Indeed, claim ownership of content, withoutthe consent of its author and without crediting its source, and present it as new and original, isconsidered as plagiarism. In addition, the expansion of the Internet, which facilitates access todocuments throughout the world (written in foreign languages) as well as increasingly efficient(and freely available) machine translation tools, contribute to spread a new kind of plagiarism:cross-language plagiarism. Cross-language plagiarism means plagiarism by translation, i.e. a texthas been plagiarized while being translated (manually or automatically) from its original languageinto the language of the document in which the plagiarist wishes to include it. While prevention ofplagiarism is an active field of research and development, it covers mostly monolingual comparisontechniques. This thesis is a joint work between an academic laboratory (LIG) and Compilatio (asoftware publishing company of solutions for plagiarism detection), and proposes cross-lingualsemantic textual similarity measures, which is an important sub-task of cross-language plagiarismdetection.After defining the plagiarism and the different concepts discussed during this thesis, wepresent a state-of-the-art of the different cross-language plagiarism detection approaches. Wealso present the preexisting corpora for cross-language plagiarism detection and show their limits.Then we describe how we have gathered and built a new dataset, which does not contain mostof the limits encountered by the preexisting corpora. Using this new dataset, we conduct arigorous evaluation of several state-of-the-art methods and discover that they behave differentlyaccording to certain characteristics of the texts on which they operate. We next present newmethods for measuring cross-lingual semantic textual similarities based on word embeddings.We also propose a notion of morphosyntactic and frequency weighting of words, which can beused both within a vector and within a bag-of-words, and we show that its introduction inthe new methods increases their respective performance. Then we test different fusion systems(mostly based on linear regression). Our experiments show that we obtain better results thanthe state-of-the-art in all the sub-corpora studied. We conclude by presenting and discussingthe results of these methods obtained during our participation to the cross-lingual SemanticTextual Similarity (STS) task of SemEval-2017, where we ranked 1st on the sub-task that bestcorresponds to Compilatio’s use-case scenario

APA, Harvard, Vancouver, ISO, and other styles

Silva, Allan de Barcelos. "O uso de recursos linguísticos para mensurar a semelhança semântica entre frases curtas através de uma abordagem híbrida." Universidade do Vale do Rio dos Sinos, 2017. http://www.repositorio.jesuita.org.br/handle/UNISINOS/6974.

Full text

Abstract:

Submitted by JOSIANE SANTOS DE OLIVEIRA (josianeso) on 2018-04-04T11:46:54Z No. of bitstreams: 1 Allan de Barcelos Silva_.pdf: 2298557 bytes, checksum: dc876b1dd44e7a7095219195e809bb88 (MD5) Made available in DSpace on 2018-04-04T11:46:55Z (GMT). No. of bitstreams: 1 Allan de Barcelos Silva_.pdf: 2298557 bytes, checksum: dc876b1dd44e7a7095219195e809bb88 (MD5) Previous issue date: 2017-12-14 Nenhuma Na área de Processamento de Linguagem Natural, a avaliação da similaridade semântica textual é considerada como um elemento importante para a construção de recursos em diversas frentes de trabalho, tais como a recuperação de informações, a classificação de textos, o agrupamento de documentos, as aplicações de tradução, a interação através de diálogos, entre outras. A literatura da área descreve aplicações e técnicas voltadas, em grande parte, para a língua inglesa. Além disso, observa-se o uso prioritário de recursos probabilísticos, enquanto os aspectos linguísticos são utilizados de forma incipiente. Trabalhos na área destacam que a linguística possui um papel fundamental na avaliação de similaridade semântica textual, justamente por ampliar o potencial dos métodos exclusivamente probabilísticos e evitar algumas de suas falhas, que em boa medida são resultado da falta de tratamento mais aprofundado de aspectos da língua. Este contexto é potencializado no tratamento de frases curtas, que consistem no maior campo de utilização das técnicas de similaridade semântica textual, pois este tipo de sentença é composto por um conjunto reduzido de informações, diminuindo assim a capacidade de tratamento probabilístico eficiente. Logo, considera-se vital a identificação e aplicação de recursos a partir do estudo mais aprofundado da língua para melhor compreensão dos aspectos que definem a similaridade entre sentenças. O presente trabalho apresenta uma abordagem para avaliação da similaridade semântica textual em frases curtas no idioma português brasileiro. O principal diferencial apresentado é o uso de uma abordagem híbrida, na qual tanto os recursos de representação distribuída como os aspectos léxicos e linguísticos são utilizados. Para a consolidação do estudo, foi definida uma metodologia que permite a análise de diversas combinações de recursos, possibilitando a avaliação dos ganhos que são introduzidos com a ampliação de aspectos linguísticos e também através de sua combinação com o conhecimento gerado por outras técnicas. A abordagem proposta foi avaliada com relação a conjuntos de dados conhecidos na literatura (evento PROPOR 2016) e obteve bons resultados. One of the areas of Natural language processing (NLP), the task of assessing the Semantic Textual Similarity (STS) is one of the challenges in NLP and comes playing an increasingly important role in related applications. The STS is a fundamental part of techniques and approaches in several areas, such as information retrieval, text classification, document clustering, applications in the areas of translation, check for duplicates and others. The literature describes the experimentation with almost exclusive application in the English language, in addition to the priority use of probabilistic resources, exploring the linguistic ones in an incipient way. Since the linguistic plays a fundamental role in the analysis of semantic textual similarity between short sentences, because exclusively probabilistic works fails in some way (e.g. identification of far or close related sentences, anaphora) due to lack of understanding of the language. This fact stems from the few non-linguistic information in short sentences. Therefore, it is vital to identify and apply linguistic resources for better understand what make two or more sentences similar or not. The current work presents a hybrid approach, in which are used both of distributed, lexical and linguistic aspects for an evaluation of semantic textual similarity between short sentences in Brazilian Portuguese. We evaluated proposed approach with well-known and respected datasets in the literature (PROPOR 2016) and obtained good results.

APA, Harvard, Vancouver, ISO, and other styles

Hrinčár, Peter. "Použití neruonových sítí pro určení sémantické podobnosti dvou vět." Master's thesis, 2017. http://www.nusl.cz/ntk/nusl-355652.

Full text

Abstract:

Figuring out the degree of semantic similarity between two sentences is important for many practical applications of natural language processing. The goal is to determine the similarity of sentences on a scale from "sentences are unrelated" to "sentences are equivalent". In this thesis we examined application of di erent neural network architectures to solve this problem. We proposed models based on Recurrent neural networks, which convert text sequence to constant sized vector. We followed up with suitable representation of unknown words. Our experiments showed that simple architectures achieved better results on the used dataset. We see a future extension of this thesis by using bigger training dataset. 1

APA, Harvard, Vancouver, ISO, and other styles

Santos, José Pedro Pessoa dos. "Exploração de técnicas para a Resposta Automática a Perguntas por Agentes Conversacionais." Master's thesis, 2019. http://hdl.handle.net/10316/87897.

Full text

Abstract:

Dissertação de Mestrado em Engenharia Informática apresentada à Faculdade de Ciências e Tecnologia Esta tese introduz o tema da Resposta Automática a Perguntas por Agentes Conversacionais, cujo interesse tem vindo a crescer ao longo dos últimos anos devido à importância que estes têm na criação de uma relação entre os utilizadores e os produtos que utilizam, por exemplo, através de assistentes pessoais, websites de apoio ao cliente, entre outros.Numa primeira fase foram desenvolvidos modelos para o cálculo da Similaridade Semântica Textual em português com o objetivo de os utilizar para fazer o mapeamento de perguntas feitas por um utilizador e as respetivas respostas. Estes modelos recorrem ao cálculo de caraterísticas textuais entre pares de frases que são utilizadas para treinar algoritmos de aprendizagem computacional supervisionada que lhes atribuem um único valor de similaridade. A avaliação dos modelos foi feita com recurso à coleção da tarefa ASSIN de 2016, e, apesar de não terem atingido a performance do atual estado-da-arte, o seu desempenho está em par com os das melhores equipas participantes.Numa segunda fase, o modelo desenvolvido com melhor desempenho foi integrado num agente de diálogo de domínio específico. Este é capaz de identificar perguntas fora do seu domínio e, assim, responder com base num conjunto de frases de legendas de filmes, o que torna a conversa mais natural. A avaliação foi feita com recurso a variações das perguntas da base de conhecimento do agente, que permitiram quantificar o número de respostas corretas num cenário de diálogo mais próximo da realidade. Os resultados obtidos são promissores e substancialmente melhores do que os das baselines criadas para comparação. This thesis introduces the subject of Question Answering by Conversational Agents, whose interest has been rising over the past few years due to their importance on creating a relationship between consumers and the products they use, for example, through personal assistants, customer support websites, among others.In a first phase, a set of models to compute the Semantic Textual Similarity between sentences in Portuguese were developed with the aim of mapping questions from a user and their corresponding responses. These models required the extraction of textual features between pairs of sentences in order to train a variety of machine learning algorithms that can assign them a single value of similarity. The evaluation of these models resorted to the ASSIN 2016 task collection, and, although they did not reach state-of-the-art performance, it was up to the results obtained by the best participating teams.In a second phase, the model with the best performance was integrated into a domain-specific dialogue agent as its search engine. This agent is capable of identifying out-of-domain interactions and respond to them using a set of movie subtitles, which make the conversation feel more natural. In order to test how well the agent performed, a set of variations of the questions in the agent's knowledge-base were created. These allowed to quantify the number of correct responses in a more realistic conversational environment. The results were promising and substantially superior to the baselines developed.

APA, Harvard, Vancouver, ISO, and other styles

Book chapters on the topic "Interpretable Textual Semantic Similarity"

Majumder, Goutam, Partha Pakray, and David Eduardo Pinto Avendaño. "Interpretable Semantic Textual Similarity Using Lexical and Cosine Similarity." In Social Transformation – Digital Way. Springer Singapore, 2018. http://dx.doi.org/10.1007/978-981-13-1343-1_59.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Fakouri-Kapourchali, Roghayeh, Mohammad-Ali Yaghoub-Zadeh-Fard, and Mehdi Khalili. "Semantic Textual Similarity as a Service." In Service Research and Innovation. Springer International Publishing, 2018. http://dx.doi.org/10.1007/978-3-319-76587-7_14.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Kazuła, Maciej, and Marek Kozłowski. "Semantic Textual Similarity Using Various Approaches." In Studies in Big Data. Springer International Publishing, 2016. http://dx.doi.org/10.1007/978-3-319-30315-4_5.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Vázquez, Sonia, Zornitsa Kozareva, and Andrés Montoyo. "Textual Entailment Beyond Semantic Similarity Information." In Lecture Notes in Computer Science. Springer Berlin Heidelberg, 2006. http://dx.doi.org/10.1007/11925231_86.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Svoboda, Lukás̆, and Tomás̆ Brychcín. "Czech Dataset for Semantic Textual Similarity." In Text, Speech, and Dialogue. Springer International Publishing, 2018. http://dx.doi.org/10.1007/978-3-030-00794-2_23.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Kairaldeen, Ammar Riadh, and Gonenc Ercan. "Calculation of Textual Similarity Using Semantic Relatedness Functions." In Computational Linguistics and Intelligent Text Processing. Springer International Publishing, 2015. http://dx.doi.org/10.1007/978-3-319-18117-2_38.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Lu, Hsuehkuan, Yixin Cao, Hou Lei, and Juanzi Li. "Knowledge-Enhanced Bilingual Textual Representations for Cross-Lingual Semantic Textual Similarity." In Communications in Computer and Information Science. Springer Singapore, 2019. http://dx.doi.org/10.1007/978-981-15-0118-0_33.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Hay, Julien, Tim Van de Cruys, Philippe Muller, Bich-Liên Doan, Fabrice Popineau, and Ouassim Ait-Elhara. "Automatically Selecting Complementary Vector Representations for Semantic Textual Similarity." In Advances in Knowledge Discovery and Management. Springer International Publishing, 2019. http://dx.doi.org/10.1007/978-3-030-18129-1_3.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Shajalal, Md, and Masaki Aono. "Semantic Sentence Modeling for Learning Textual Similarity Exploiting LSTM." In Cyber Security and Computer Science. Springer International Publishing, 2020. http://dx.doi.org/10.1007/978-3-030-52856-0_34.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Do, Heejin, and Gary Geunbae Lee. "Aspect-Based Semantic Textual Similarity for Educational Test Items." In Lecture Notes in Computer Science. Springer Nature Switzerland, 2024. http://dx.doi.org/10.1007/978-3-031-64299-9_30.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Conference papers on the topic "Interpretable Textual Semantic Similarity"

Tu, Jingxuan, Keer Xu, Liulu Yue, Bingyang Ye, Kyeongmin Rim, and James Pustejovsky. "Linguistically Conditioned Semantic Textual Similarity." In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, 2024. http://dx.doi.org/10.18653/v1/2024.acl-long.64.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Al-Saiyd, Nedhal A., and Intisar A. Al-Sayed. "Semantic Textual Similarity (STS) in Arabic using Lexical-Semantic Analysis." In 2024 25th International Arab Conference on Information Technology (ACIT). IEEE, 2024. https://doi.org/10.1109/acit62805.2024.10877260.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Li, Xianming, and Jing Li. "AoE: Angle-optimized Embeddings for Semantic Textual Similarity." In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, 2024. http://dx.doi.org/10.18653/v1/2024.acl-long.101.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Yang, Shanshan, Steve Yang, and Feng Mai. "Financial Semantic Textual Similarity: A New Dataset and Model." In 2024 IEEE Symposium on Computational Intelligence for Financial Engineering and Economics (CIFEr). IEEE, 2024. https://doi.org/10.1109/cifer62890.2024.10772793.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Zhang, Bowen, and Chunping Li. "Pcc-tuning: Breaking the Contrastive Learning Ceiling in Semantic Textual Similarity." In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 2024. http://dx.doi.org/10.18653/v1/2024.emnlp-main.791.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Rep, Ivan, David Dukić, and Jan Šnajder. "Are ELECTRA’s Sentence Embeddings Beyond Repair? The Case of Semantic Textual Similarity." In Findings of the Association for Computational Linguistics: EMNLP 2024. Association for Computational Linguistics, 2024. http://dx.doi.org/10.18653/v1/2024.findings-emnlp.535.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Sasoko, Wasis Haryo, Arief Setyanto, Kusrini, and Rodrigo Martinez-Bejar. "Comparative Study and Evaluation of Machine Learning Models for Semantic Textual Similarity." In 2024 8th International Conference on Information Technology, Information Systems and Electrical Engineering (ICITISEE). IEEE, 2024. http://dx.doi.org/10.1109/icitisee63424.2024.10730053.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Danish, Syed Muhammad Hasan, Syed Muhammad Ejaz Hasnain, Hamza Ashraf, and Rukaiya Rukaiya. "Comparative Analysis of BERT and TF-IDF for Textual Semantic Similarity Assessment." In 2024 26th International Multitopic Conference (INMIC). IEEE, 2024. https://doi.org/10.1109/inmic64792.2024.11004377.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Rajagukguk, Rio Chandra, and Masayu Leylia Khodra. "Interpretable Semantic Textual Similarity for Indonesian Sentence." In 2018 5th International Conference on Advanced Informatics: Concept Theory and Applications (ICAICTA). IEEE, 2018. http://dx.doi.org/10.1109/icaicta.2018.8541297.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Agirre, Eneko, Aitor Gonzalez-Agirre, Inigo Lopez-Gazpio, Montse Maritxalar, German Rigau, and Larraitz Uria. "SemEval-2016 Task 2: Interpretable Semantic Textual Similarity." In Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016). Association for Computational Linguistics, 2016. http://dx.doi.org/10.18653/v1/s16-1082.

Full text

APA, Harvard, Vancouver, ISO, and other styles

We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

Contents

Academic literature on the topic 'Interpretable Textual Semantic Similarity'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Journal articles on the topic "Interpretable Textual Semantic Similarity"

Dissertations / Theses on the topic "Interpretable Textual Semantic Similarity"

Book chapters on the topic "Interpretable Textual Semantic Similarity"

Conference papers on the topic "Interpretable Textual Semantic Similarity"