To see the other types of publications on this topic, follow the link: Interpretable Textual Semantic Similarity.

Journal articles on the topic 'Interpretable Textual Semantic Similarity'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 journal articles for your research on the topic 'Interpretable Textual Semantic Similarity.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse journal articles on a wide variety of disciplines and organise your bibliography correctly.

1

Abdo, Ababor Abafogi. "Survey on Interpretable Semantic Textual Similarity, and its Applications." International Journal of Innovative Technology and Exploring Engineering (IJITEE) 10, no. 3 (2021): 14–18. https://doi.org/10.35940/ijitee.B8294.0110321.

Full text
Abstract:
Both semantic representation and related natural language processing(NLP) tasks has become more popular due to the introduction of distributional semantics. Semantic textual similarity (STS)is one of a task in NLP, it determinesthe similarity based onthe meanings of two shorttexts (sentences). Interpretable STS is the way of giving explanation to semantic similarity between short texts. Giving interpretation is indeedpossible tohuman, but, constructing computational modelsthat explain as human level is challenging. The interpretable STS task give output in natural way with a continuous value on the scale from [0, 5] that represents the strength of semantic relation between pair sentences, where 0 is no similarity and 5 is complete similarity. This paper review all available methods were used in interpretable STS computation, classify them, specifyan existing limitations, and finally give directions for future work. This paper is organized the survey into nine sections as follows: firstly introduction at glance, then chunking techniques and available tools, the next one is rule based approach, the fourth section focus on machine learning approach, after that about works done via neural network, and the finally hybrid approach concerned. Application of interpretable STS, conclusion and future direction is also part of this paper.
APA, Harvard, Vancouver, ISO, and other styles
2

Abafogi, Abdo Ababor. "Survey on Interpretable Semantic Textual Similarity, and its Applications." International Journal of Innovative Technology and Exploring Engineering 10, no. 3 (2021): 14–18. http://dx.doi.org/10.35940/ijitee.b8294.0110321.

Full text
Abstract:
Both semantic representation and related natural language processing(NLP) tasks has become more popular due to the introduction of distributional semantics. Semantic textual similarity (STS)is one of a task in NLP, it determinesthe similarity based onthe meanings of two shorttexts (sentences). Interpretable STS is the way of giving explanation to semantic similarity between short texts. Giving interpretation is indeedpossible tohuman, but, constructing computational modelsthat explain as human level is challenging. The interpretable STS task give output in natural way with a continuous value on the scale from [0, 5] that represents the strength of semantic relation between pair sentences, where 0 is no similarity and 5 is complete similarity. This paper review all available methods were used in interpretable STS computation, classify them, specifyan existing limitations, and finally give directions for future work. This paper is organized the survey into nine sections as follows: firstly introduction at glance, then chunking techniques and available tools, the next one is rule based approach, the fourth section focus on machine learning approach, after that about works done via neural network, and the finally hybrid approach concerned. Application of interpretable STS, conclusion and future direction is also part of this paper.
APA, Harvard, Vancouver, ISO, and other styles
3

Lopez-Gazpio, I., M. Maritxalar, A. Gonzalez-Agirre, G. Rigau, L. Uria, and E. Agirre. "Interpretable semantic textual similarity: Finding and explaining differences between sentences." Knowledge-Based Systems 119 (March 2017): 186–99. http://dx.doi.org/10.1016/j.knosys.2016.12.013.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Majumder, Goutam, Partha Pakray, Ranjita Das, and David Pinto. "Interpretable semantic textual similarity of sentences using alignment of chunks with classification and regression." Applied Intelligence 51, no. 10 (2021): 7322–49. http://dx.doi.org/10.1007/s10489-020-02144-x.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Lowin, Maximilian. "A Text-Based Predictive Maintenance Approach for Facility Management Requests Utilizing Association Rule Mining and Large Language Models." Machine Learning and Knowledge Extraction 6, no. 1 (2024): 233–58. http://dx.doi.org/10.3390/make6010013.

Full text
Abstract:
Introduction: Due to the lack of labeled data, applying predictive maintenance algorithms for facility management is cumbersome. Most companies are unwilling to share data or do not have time for annotation. In addition, most available facility management data are text data. Thus, there is a need for an unsupervised predictive maintenance algorithm that is capable of handling textual data. Methodology: This paper proposes applying association rule mining on maintenance requests to identify upcoming needs in facility management. By coupling temporal association rule mining with the concept of semantic similarity derived from large language models, the proposed methodology can discover meaningful knowledge in the form of rules suitable for decision-making. Results: Relying on the large German language models works best for the presented case study. Introducing a temporal lift filter allows for reducing the created rules to the most important ones. Conclusions: Only a few maintenance requests are sufficient to mine association rules that show links between different infrastructural failures. Due to the unsupervised manner of the proposed algorithm, domain experts need to evaluate the relevance of the specific rules. Nevertheless, the algorithm enables companies to efficiently utilize their data stored in databases to create interpretable rules supporting decision-making.
APA, Harvard, Vancouver, ISO, and other styles
6

Ismail, Shimaa, AbdelWahab Alsammak, and Tarek Elshishtawy. "Arabic Semantic-Based Textual Similarity." Benha Journal of Applied Sciences 7, no. 4 (2022): 133–42. http://dx.doi.org/10.21608/bjas.2022.254708.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

McCrae, John P., and Paul Buitelaar. "Linking Datasets Using Semantic Textual Similarity." Cybernetics and Information Technologies 18, no. 1 (2018): 109–23. http://dx.doi.org/10.2478/cait-2018-0010.

Full text
Abstract:
Abstract Linked data has been widely recognized as an important paradigm for representing data and one of the most important aspects of supporting its use is discovery of links between datasets. For many datasets, there is a significant amount of textual information in the form of labels, descriptions and documentation about the elements of the dataset and the fundament of a precise linking is in the application of semantic textual similarity to link these datasets. However, most linking tools so far rely on only simple string similarity metrics such as Jaccard scores. We present an evaluation of some metrics that have performed well in recent semantic textual similarity evaluations and apply these to linking existing datasets.
APA, Harvard, Vancouver, ISO, and other styles
8

John, P. McCrae, and Buitelaar Paul. "Linking Datasets Using Semantic Textual Similarity." Cybernetics and Information Technologies 18, no. 1 (2018): 109–23. https://doi.org/10.2478/cait-2018-0010.

Full text
Abstract:
Linked data has been widely recognized as an important paradigm for representing data and one of the most important aspects of supporting its use is discovery of links between datasets. For many datasets, there is a significant amount of textual information in the form of labels, descriptions and documentation about the elements of the dataset and the fundament of a precise linking is in the application of semantic textual similarity to link these datasets. However, most linking tools so far rely on only simple string similarity metrics such as Jaccard scores. We present an evaluation of some metrics that have performed well in recent semantic textual similarity evaluations and apply these to linking existing datasets
APA, Harvard, Vancouver, ISO, and other styles
9

Rao, N. Srinivas. "Text Summarization Based on Semantic Similarity." INTERANTIONAL JOURNAL OF SCIENTIFIC RESEARCH IN ENGINEERING AND MANAGEMENT 08, no. 04 (2024): 1–5. http://dx.doi.org/10.55041/ijsrem32218.

Full text
Abstract:
In the contemporary information age, the sheer volume of textual data poses a significant challenge for efficient comprehension and utilization. This project endeavors to address this challenge by developing a Text Summarization System grounded in semantic similarities. The primary goal is to create a robust and intuitive tool that extracts key information from large textual datasets, offering users a concise and meaningful summary. The proposed system employs advanced Natural Language Processing (NLP) techniques to analyze the semantic relationships within the text. Rather than relying solely on syntactic structures, the model identifies and leverages semantic similarities, such as shared concepts, themes, and contextual relationships, to distill the essential content. This approach enhances the summarization process by ensuring that the generated summaries reflect a deeper understanding of the underlying semantics, thereby capturing the core meaning of the text. Throughout the development of this project, the B.Tech student will delve into the intricacies of semantic analysis, exploring techniques to recognize and prioritize key concepts. The system's effectiveness will be evaluated through rigorous testing on diverse textual datasets, assessing its ability to generate coherent and relevant summaries across various domains. This project not only contributes to the field of NLP but also has practical applications in information retrieval, document summarization, and content curation. By providing an innovative solution to the challenges of information overload, the Text Summarization System based on semantic similarities offers a valuable tool for enhancing efficiency in information processing and decision-making. Index terms Text Summarization, Semantic Similarities, Natural Language Processing (NLP), Semantic Analysis, Information Retrieval, Document Summarization, Content Curation, Information Overload, Decision Making, Textual Data Analysis, Key Concept Recognition, Conceptual Relationships, Syntactic Structures, Semantic Understanding, Textual Datasets Evalution.
APA, Harvard, Vancouver, ISO, and other styles
10

Luo, Jiajia, Hongtao Shan, Gaoyu Zhang, et al. "Exploiting Syntactic and Semantic Information for Textual Similarity Estimation." Mathematical Problems in Engineering 2021 (January 23, 2021): 1–12. http://dx.doi.org/10.1155/2021/4186750.

Full text
Abstract:
The textual similarity task, which measures the similarity between two text pieces, has recently received much attention in the natural language processing (NLP) domain. However, due to the vagueness and diversity of language expression, only considering semantic or syntactic features, respectively, may cause the loss of critical textual knowledge. This paper proposes a new type of structure tree for sentence representation, which exploits both syntactic (structural) and semantic information known as the weight vector dependency tree (WVD-tree). WVD-tree comprises structure trees with syntactic information along with word vectors representing semantic information of the sentences. Further, Gaussian attention weight is proposed for better capturing important semantic features of sentences. Meanwhile, we design an enhanced tree kernel to calculate the common parts between two structures for similarity judgment. Finally, WVD-tree is tested on widely used semantic textual similarity tasks. The experimental results prove that WVD-tree can effectively improve the accuracy of sentence similarity judgments.
APA, Harvard, Vancouver, ISO, and other styles
11

Zhu, Mingdong, Derong Shen, Lixin Xu, and Xianfang Wang. "Scalable Multi-grained Cross-modal Similarity Query with Interpretability." Data Science and Engineering 6, no. 3 (2021): 280–93. http://dx.doi.org/10.1007/s41019-021-00162-4.

Full text
Abstract:
AbstractCross-modal similarity query has become a highlighted research topic for managing multimodal datasets such as images and texts. Existing researches generally focus on query accuracy by designing complex deep neural network models and hardly consider query efficiency and interpretability simultaneously, which are vital properties of cross-modal semantic query processing system on large-scale datasets. In this work, we investigate multi-grained common semantic embedding representations of images and texts and integrate interpretable query index into the deep neural network by developing a novel Multi-grained Cross-modal Query with Interpretability (MCQI) framework. The main contributions are as follows: (1) By integrating coarse-grained and fine-grained semantic learning models, a multi-grained cross-modal query processing architecture is proposed to ensure the adaptability and generality of query processing. (2) In order to capture the latent semantic relation between images and texts, the framework combines LSTM and attention mode, which enhances query accuracy for the cross-modal query and constructs the foundation for interpretable query processing. (3) Index structure and corresponding nearest neighbor query algorithm are proposed to boost the efficiency of interpretable queries. (4) A distributed query algorithm is proposed to improve the scalability of our framework. Comparing with state-of-the-art methods on widely used cross-modal datasets, the experimental results show the effectiveness of our MCQI approach.
APA, Harvard, Vancouver, ISO, and other styles
12

Alfeo, Antonio L., Mario G. C. A. Cimino, and Gigliola Vaglini. "Technological troubleshooting based on sentence embedding with deep transformers." Journal of Intelligent Manufacturing 32, no. 6 (2021): 1699–710. http://dx.doi.org/10.1007/s10845-021-01797-w.

Full text
Abstract:
AbstractIn nowadays manufacturing, each technical assistance operation is digitally tracked. This results in a huge amount of textual data that can be exploited as a knowledge base to improve these operations. For instance, an ongoing problem can be addressed by retrieving potential solutions among the ones used to cope with similar problems during past operations. To be effective, most of the approaches for semantic textual similarity need to be supported by a structured semantic context (e.g. industry-specific ontology), resulting in high development and management costs. We overcome this limitation with a textual similarity approach featuring three functional modules. The data preparation module provides punctuation and stop-words removal, and word lemmatization. The pre-processed sentences undergo the sentence embedding module, based on Sentence-BERT (Bidirectional Encoder Representations from Transformers) and aimed at transforming the sentences into fixed-length vectors. Their cosine similarity is processed by the scoring module to match the expected similarity between the two original sentences. Finally, this similarity measure is employed to retrieve the most suitable recorded solutions for the ongoing problem. The effectiveness of the proposed approach is tested (i) against a state-of-the-art competitor and two well-known textual similarity approaches, and (ii) with two case studies, i.e. private company technical assistance reports and a benchmark dataset for semantic textual similarity. With respect to the state-of-the-art, the proposed approach results in comparable retrieval performance and significantly lower management cost: 30-min questionnaires are sufficient to obtain the semantic context knowledge to be injected into our textual search engine.
APA, Harvard, Vancouver, ISO, and other styles
13

Wang, Yuxia, Shimin Tao, Ning Xie, Hao Yang, Timothy Baldwin, and Karin Verspoor. "Collective Human Opinions in Semantic Textual Similarity." Transactions of the Association for Computational Linguistics 11 (2023): 997–1013. http://dx.doi.org/10.1162/tacl_a_00584.

Full text
Abstract:
Abstract Despite the subjective nature of semantic textual similarity (STS) and pervasive disagreements in STS annotation, existing benchmarks have used averaged human ratings as gold standard. Averaging masks the true distribution of human opinions on examples of low agreement, and prevents models from capturing the semantic vagueness that the individual ratings represent. In this work, we introduce USTS, the first Uncertainty-aware STS dataset with ∼15,000 Chinese sentence pairs and 150,000 labels, to study collective human opinions in STS. Analysis reveals that neither a scalar nor a single Gaussian fits a set of observed judgments adequately. We further show that current STS models cannot capture the variance caused by human disagreement on individual instances, but rather reflect the predictive confidence over the aggregate dataset.
APA, Harvard, Vancouver, ISO, and other styles
14

Vijayan, Naveen Edapurath. "Enhancing Chatbot Response Relevance through Semantic Similarity Measures." Journal of Artificial Intelligence & Cloud Computing 1, no. 1 (2022): 1–5. http://dx.doi.org/10.47363/jaicc/2022(1)e182.

Full text
Abstract:
Semantic similarity measures have shown promise in enhancing natural language understanding by quantifying the likeness between textual elements. This paper investigates the application of semantic similarity measures to improve chatbot response relevance.
APA, Harvard, Vancouver, ISO, and other styles
15

Liu, Yang, Mengyuan Liu, Shudong Huang, and Jiancheng Lv. "Asymmetric Visual Semantic Embedding Framework for Efficient Vision-Language Alignment." Proceedings of the AAAI Conference on Artificial Intelligence 39, no. 6 (2025): 5676–84. https://doi.org/10.1609/aaai.v39i6.32605.

Full text
Abstract:
Learning visual semantic similarity is a critical challenge in bridging the gap between images and texts. However, there exist inherent variations between vision and language data, such as information density, i.e., images can contain textual information from multiple different views, which makes it difficult to compute the similarity between these two modalities accurately and efficiently. In this paper, we propose a novel framework called Asymmetric Visual Semantic Embedding (AVSE) to dynamically select features from various regions of images tailored to different textual inputs for similarity calculation. To capture information from different views in the image, we design a radial bias sampling module to sample image patches and obtain image features from various views, Furthermore, AVSE introduces a novel module for efficient computation of visual semantic similarity between asymmetric image and text embeddings. Central to this module is the presumption of foundational semantic units within the embeddings, denoted as ``meta-semantic embeddings." It segments all embeddings into meta-semantic embeddings with the same dimension and calculates visual semantic similarity by finding the optimal match of meta-semantic embeddings of two modalities. Our proposed AVSE model is extensively evaluated on the large-scale MS-COCO and Flickr30K datasets, demonstrating its superiority over recent state-of-the-art methods.
APA, Harvard, Vancouver, ISO, and other styles
16

Sowmya, Vangapelli, Bulusu Vardhan, and Mantena Raju. "Improving Semantic Textual Similarity with Phrase Entity Alignment." International Journal of Intelligent Engineering and Systems 10, no. 4 (2017): 193–204. http://dx.doi.org/10.22266/ijies2017.0831.21.

Full text
APA, Harvard, Vancouver, ISO, and other styles
17

Brychcín, Tomáš. "Linear transformations for cross-lingual semantic textual similarity." Knowledge-Based Systems 187 (January 2020): 104819. http://dx.doi.org/10.1016/j.knosys.2019.06.027.

Full text
APA, Harvard, Vancouver, ISO, and other styles
18

Wang, Chunlin, Irene Castellón, and Elisabet Comelles. "Linguistic analysis of datasets for semantic textual similarity." Digital Scholarship in the Humanities 35, no. 2 (2019): 471–84. http://dx.doi.org/10.1093/llc/fqy076.

Full text
Abstract:
Abstract Semantic Textual Similarity (STS), which measures the equivalence of meanings between two textual segments, is an important and useful task in Natural Language Processing. In this article, we have analyzed the datasets provided by the Semantic Evaluation (SemEval) 2012–2014 campaigns for this task in order to find out appropriate linguistic features for each dataset, taking into account the influence that linguistic features at different levels (e.g. syntactic constituents and lexical semantics) might have on the sentence similarity. Results indicate that a linguistic feature may have a different effect on different corpus due to the great difference in sentence structure and vocabulary between datasets. Thus, we conclude that the selection of linguistic features according to the genre of the text might be a good strategy for obtaining better results in the STS task. This analysis could be a useful reference for measuring system building and linguistic feature tuning.
APA, Harvard, Vancouver, ISO, and other styles
19

Hassan, Basma, Samir E. Abdelrahman, Reem Bahgat, and Ibrahim Farag. "UESTS: An Unsupervised Ensemble Semantic Textual Similarity Method." IEEE Access 7 (2019): 85462–82. http://dx.doi.org/10.1109/access.2019.2925006.

Full text
APA, Harvard, Vancouver, ISO, and other styles
20

Wang, Yanshan, Naveed Afzal, Sunyang Fu, et al. "MedSTS: a resource for clinical semantic textual similarity." Language Resources and Evaluation 54, no. 1 (2018): 57–72. http://dx.doi.org/10.1007/s10579-018-9431-1.

Full text
APA, Harvard, Vancouver, ISO, and other styles
21

Zhao, Ying, Tingyu Xia, Yunqi Jiang, and Yuan Tian. "Enhancing inter-sentence attention for Semantic Textual Similarity." Information Processing & Management 61, no. 1 (2024): 103535. http://dx.doi.org/10.1016/j.ipm.2023.103535.

Full text
APA, Harvard, Vancouver, ISO, and other styles
22

Zhou, Yan, Yunhan Zhang, Fangfang Zhang, Yeting Zhang, and Xiaodi Wang. "Trajectory Compression with Spatio-Temporal Semantic Constraints." ISPRS International Journal of Geo-Information 13, no. 6 (2024): 212. http://dx.doi.org/10.3390/ijgi13060212.

Full text
Abstract:
Most trajectory compression methods primarily focus on geometric similarity between compressed and original trajectories, lacking explainability of compression results due to ignoring semantic information. This paper proposes a spatio-temporal semantic constrained trajectory compression method. It constructs a new trajectory distance measurement model integrating both semantic and spatio-temporal features. This model quantifies semantic features using information entropy and measures spatio-temporal features with synchronous Euclidean distance. The compression principle is to retain feature points with maximum spatio-temporal semantic distance from the original trajectory until the compression rate is satisfied. Experimental results show these methods closely resemble each other in maintaining geometric similarity of trajectories, but our method significantly outperforms DP, TD-TR, and CascadeSync methods in preserving semantic similarity of trajectories. This indicates that our method considers both geometric and semantic features during compression, resulting in the compressed trajectory becoming more interpretable.
APA, Harvard, Vancouver, ISO, and other styles
23

Li, Meijing, Xianhe Zhou, Keun Ho Ryu, and Nipon Theera-Umpon. "An Ensemble Semantic Textual Similarity Measure Based on Multiple Evidences for Biomedical Documents." Computational and Mathematical Methods in Medicine 2022 (August 27, 2022): 1–14. http://dx.doi.org/10.1155/2022/8238432.

Full text
Abstract:
With the increasing volume of the published biomedical literature, the fast and effective retrieval of the literature on the sequence, structure, and function of biological entities is an essential task for the rapid development of biology and medicine. To capture the semantic information in biomedical literature more effectively when biomedical documents are clustered, we propose a new multi-evidence-based semantic text similarity calculation method. Two semantic similarities and one content similarity are used, in which two semantic similarities include MeSH-based semantic similarity and word embedding-based semantic similarity. To fuse three different similarities more effectively, after, respectively, calculating two semantic and one content similarities between biomedical documents, feedforward neural network is applied to integrate the two semantic similarities. Finally, weighted linear combination method is used to integrate the semantic and content similarities. To evaluate the effectiveness, the proposed method is compared with the existing basic methods, and the proposed method outperforms the existing related methods. Based on the proven results of this study, this method can be used not only in actual biological or medical experiments such as protein sequence or function analysis but also in biological and medical research fields, which will help to provide, use, and understand thematically consistent documents.
APA, Harvard, Vancouver, ISO, and other styles
24

Mansoor, Muhammad, Zahoor ur Rehman, Muhammad Shaheen, Muhammad Attique Khan, and Mohamed Habib. "Deep Learning based Semantic Similarity Detection using Text Data." Information Technology And Control 49, no. 4 (2020): 495–510. http://dx.doi.org/10.5755/j01.itc.49.4.27118.

Full text
Abstract:
Similarity detection in the text is the main task for a number of Natural Language Processing (NLP) applications. As textual data is comparatively large in quantity and huge in volume than the numeric data, therefore measuring textual similarity is one of the important problems. Most of the similarity detection algorithms are based upon word to word matching, sentence/paragraph matching, and matching of the whole document. In this research, a novel approach is proposed using deep learning models, combining Long Short Term Memory network (LSTM) with Convolutional Neural Network (CNN) for measuring semantics similarity between two questions. The proposed model takes sentence pairs as input to measure the similarity between them. The model is tested on publicly available Quora’s dataset. The model in comparison to the existing techniques gave 87.50 % accuracy which is better than the previous approaches.
APA, Harvard, Vancouver, ISO, and other styles
25

Hendre, Manik, Prasenjit Mukherjee, Raman Preet, and Manish Godse. "Efficacy of Deep Neural Embeddings-Based Semantic Similarity in Automatic Essay Evaluation." International Journal of Cognitive Informatics and Natural Intelligence 17, no. 1 (2023): 1–14. http://dx.doi.org/10.4018/ijcini.323190.

Full text
Abstract:
Semantic similarity is used extensively for understanding the context and meaning of the text data. In this paper, use of the semantic similarity in an automatic essay evaluation system is proposed. Different text embedding methods are used to compute the semantic similarity. Recent neural embedding methods including Google sentence encoder (GSE), embeddings for language models (ELMo), and global vectors (GloVe) are employed for computing the semantic similarity. Traditional methods of textual data representation such as TF-IDF and Jaccard index are also used in finding the semantic similarity. Experimental analysis of an intra-class and inter-class semantic similarity score distributions shows that the GSE outperforms other methods by accurately distinguishing essays from the same or different set/topic. Semantic similarity calculated using the GSE method is further used for finding the correlation with human rated essay scores, which shows high correlation with the human-rated scores on various essay traits.
APA, Harvard, Vancouver, ISO, and other styles
26

Dietze, Stefan, Alessio Gugliotta, John Domingue, and Michael Mrissa. "Mediation Spaces for Similarity-Based Semantic Web Services Selection." International Journal of Web Services Research 8, no. 1 (2011): 1–20. http://dx.doi.org/10.4018/jwsr.2011010101.

Full text
Abstract:
Semantic Web Services (SWS) aim at the automated discovery, selection and orchestration of Web services based on comprehensive, machine-interpretable semantic descriptions. The latter are, in principle, deployed by multiple possible actors (i.e., service providers and service consumers); thus, a high level of heterogeneity between distinct SWS annotations is expected. Therefore, mediation between concurrent semantic representations of services is a key requirement to fully implement the SWS vision. In this paper, the authors argue that “semantic-level mediation” is necessary to identify semantic similarities across distinct SWS representations. The authors formalize and implement a mediation approach based on “Mediation Spaces” (MS), which enables the implicit representation of semantic similarities among distinct SWS descriptions. As a result, given a specific SWS approach and the proposed MS, a general purpose algorithm is implemented to empower SWS selection with the automatic computation of semantic similarities between a given SWS request and a set of SWS offers. A prototypical application illustrates the approach and highlights the benefits w.r.t. current mediation approaches.
APA, Harvard, Vancouver, ISO, and other styles
27

Silva, Vivian S., André Freitas, and Siegfried Handschuh. "Exploring Knowledge Graphs in an Interpretable Composite Approach for Text Entailment." Proceedings of the AAAI Conference on Artificial Intelligence 33 (July 17, 2019): 7023–30. http://dx.doi.org/10.1609/aaai.v33i01.33017023.

Full text
Abstract:
Recognizing textual entailment is a key task for many semantic applications, such as Question Answering, Text Summarization, and Information Extraction, among others. Entailment scenarios can range from a simple syntactic variation to more complex semantic relationships between pieces of text, but most approaches try a one-size-fits-all solution that usually favors some scenario to the detriment of another. We propose a composite approach for recognizing text entailment which analyzes the entailment pair to decide whether it must be resolved syntactically or semantically. We also make the answer interpretable: whenever an entailment is solved semantically, we explore a knowledge base composed of structured lexical definitions to generate natural language humanlike justifications, explaining the semantic relationship holding between the pieces of text. Besides outperforming wellestablished entailment algorithms, our composite approach gives an important step towards Explainable AI, using world knowledge to make the semantic reasoning process explicit and understandable.
APA, Harvard, Vancouver, ISO, and other styles
28

Han, Jin, and Liang Yang. "Sentence Embedding Generation Framework Based on Kullback–Leibler Divergence Optimization and RoBERTa Knowledge Distillation." Mathematics 12, no. 24 (2024): 3990. https://doi.org/10.3390/math12243990.

Full text
Abstract:
In natural language processing (NLP) tasks, computing semantic textual similarity (STS) is crucial for capturing nuanced semantic differences in text. Traditional word vector methods, such as Word2Vec and GloVe, as well as deep learning models like BERT, face limitations in handling context dependency and polysemy and present challenges in computational resources and real-time processing. To address these issues, this paper introduces two novel methods. First, a sentence embedding generation method based on Kullback–Leibler Divergence (KLD) optimization is proposed, which enhances semantic differentiation between sentence vectors, thereby improving the accuracy of textual similarity computation. Second, this study proposes a framework incorporating RoBERTa knowledge distillation, which integrates the deep semantic insights of the RoBERTa model with prior methodologies to enhance sentence embeddings while preserving computational efficiency. Additionally, the study extends its contributions to sentiment analysis tasks by leveraging the enhanced embeddings for classification. The sentiment analysis experiments, conducted using a Stochastic Gradient Descent (SGD) classifier on the ACL IMDB dataset, demonstrate the effectiveness of the proposed methods, achieving high precision, recall, and F1 score metrics. To further augment model accuracy and efficacy, a feature selection approach is introduced, specifically through the Dynamic Principal Component Selection (DPCS) algorithm. The DPCS method autonomously identifies and prioritizes critical features, thus enriching the expressive capacity of sentence vectors and significantly advancing the accuracy of similarity computations. Experimental results demonstrate that our method outperforms existing methods in semantic similarity computation on the SemEval-2016 dataset. When evaluated using cosine similarity of average vectors, our model achieved a Pearson correlation coefficient (τ) of 0.470, a Spearman correlation coefficient (ρ) of 0.481, and a mean absolute error (MAE) of 2.100. Compared to traditional methods such as Word2Vec, GloVe, and FastText, our method significantly enhances similarity computation accuracy. Using TF-IDF-weighted cosine similarity evaluation, our model achieved a τ of 0.528, ρ of 0.518, and an MAE of 1.343. Additionally, in the cosine similarity assessment leveraging the Dynamic Principal Component Smoothing (DPCS) algorithm, our model achieved a τ of 0.530, ρ of 0.518, and an MAE of 1.320, further demonstrating the method’s effectiveness and precision in handling semantic similarity. These results indicate that our proposed method has high relevance and low error in semantic textual similarity tasks, thereby better capturing subtle semantic differences between texts.
APA, Harvard, Vancouver, ISO, and other styles
29

Tang, Zhuo, Qi Xiao, Li Zhu, Kenli Li, and Keqin Li. "A semantic textual similarity measurement model based on the syntactic-semantic representation." Intelligent Data Analysis 23, no. 4 (2019): 933–50. http://dx.doi.org/10.3233/ida-183947.

Full text
APA, Harvard, Vancouver, ISO, and other styles
30

Saidi, Rakia, Fethi Jarray, and Didier Schwab. "A BERT-GRU Model for Measuring the Similarity of Arabic Text." JUCS - Journal of Universal Computer Science 30, no. 6 (2024): 779–90. http://dx.doi.org/10.3897/jucs.111217.

Full text
Abstract:
Semantic Textual Similarity (STS) aims to assess the semantic similarity between two pieces of text. As a challenging task in natural language processing, various approaches for STS in high-resource languages, such as English, have been proposed. In this paper, we are concerned with STS in low resource languages such as Arabic. A baseline approach for STS is based on vector embedding of the input text and application of similarity metric on the embedding space. In this contribution, we propose a cross-encoder neural network (Cross-BERT-GRU) to handle semantic similarity of Arabic sentences that benefits from both the strong contextual understanding of BERT and the sequential modeling capabilities of GRU. The architecture begins by inputting the BERT word embeddings for each word into a GRU cell to model long-term dependencies. Then, max pooling and average pooling are applied to the hidden outputs of the GRU cell, serving as the sentence -pair encoder. Finally, a softmax layer is utilized to predict the degree of similarity. The experiment results show a Spearman correlation coefficient of around 0.9 and that Cross-BERT-GRU outperforms the other BERT models in predicting the semantic textual similarity of Arabic sentences. The experimentation results also indicate that the performance improves by integrating data augmentation techniques.
APA, Harvard, Vancouver, ISO, and other styles
31

Saidi, Rakia, Fethi Jarray, and Didier Schwab. "A BERT-GRU Model for Measuring the Similarity of Arabic Text." JUCS - Journal of Universal Computer Science 30, no. (6) (2024): 779–90. https://doi.org/10.3897/jucs.111217.

Full text
Abstract:
Semantic Textual Similarity (STS) aims to assess the semantic similarity between two pieces of text. As a challenging task in natural language processing, various approaches for STS in high-resource languages, such as English, have been proposed. In this paper, we are concerned with STS in low resource languages such as Arabic. A baseline approach for STS is based on vector embedding of the input text and application of similarity metric on the embedding space. In this contribution, we propose a cross-encoder neural network (Cross-BERT-GRU) to handle semantic similarity of Arabic sentences that benefits from both the strong contextual understanding of BERT and the sequential modeling capabilities of GRU. The architecture begins by inputting the BERT word embeddings for each word into a GRU cell to model long-term dependencies. Then, max pooling and average pooling are applied to the hidden outputs of the GRU cell, serving as the sentence -pair encoder. Finally, a softmax layer is utilized to predict the degree of similarity. The experiment results show a Spearman correlation coefficient of around 0.9 and that Cross-BERT-GRU outperforms the other BERT models in predicting the semantic textual similarity of Arabic sentences. The experimentation results also indicate that the performance improves by integrating data augmentation techniques.
APA, Harvard, Vancouver, ISO, and other styles
32

Devarajan, Viji, and Revathy Subramanian. "Analyzing semantic similarity amongst textual documents to suggest near duplicates." Indonesian Journal of Electrical Engineering and Computer Science 25, no. 3 (2022): 1703–11. https://doi.org/10.11591/ijeecs.v25.i3.pp1703-1711.

Full text
Abstract:
Data deduplication techniques removing repeated or redundant data from the storage. In recent days, more data has been generated and stored in the storage environment. More redundant and semantically similar content of the data occupied in the storage environment due to this storage efficiency will be reduced and cost of the storage will be high. To overcome this problem, we proposed a method hybrid bidirectional encoder representation from transformers for text semantics using graph convolutional network hybrid bidirectional encoder representation from transformers (BERT) model for text semantics (HBTSG) word embedding-based deep learning model to identify near duplicates based on the semantic relationship between text documents. In this paper we hybridize the concepts of chunking and semantic analysis. The chunking process is carried out to split the documents into blocks. Next stage we identify the semantic relationship between documents using word embedding techniques. It combines the advantages of the chunking, feature extraction, and semantic relations to provide better results.
APA, Harvard, Vancouver, ISO, and other styles
33

Wang, Yanshan, Sunyang Fu, Feichen Shen, Sam Henry, Ozlem Uzuner, and Hongfang Liu. "The 2019 n2c2/OHNLP Track on Clinical Semantic Textual Similarity: Overview." JMIR Medical Informatics 8, no. 11 (2020): e23375. http://dx.doi.org/10.2196/23375.

Full text
Abstract:
Background Semantic textual similarity is a common task in the general English domain to assess the degree to which the underlying semantics of 2 text segments are equivalent to each other. Clinical Semantic Textual Similarity (ClinicalSTS) is the semantic textual similarity task in the clinical domain that attempts to measure the degree of semantic equivalence between 2 snippets of clinical text. Due to the frequent use of templates in the Electronic Health Record system, a large amount of redundant text exists in clinical notes, making ClinicalSTS crucial for the secondary use of clinical text in downstream clinical natural language processing applications, such as clinical text summarization, clinical semantics extraction, and clinical information retrieval. Objective Our objective was to release ClinicalSTS data sets and to motivate natural language processing and biomedical informatics communities to tackle semantic text similarity tasks in the clinical domain. Methods We organized the first BioCreative/OHNLP ClinicalSTS shared task in 2018 by making available a real-world ClinicalSTS data set. We continued the shared task in 2019 in collaboration with National NLP Clinical Challenges (n2c2) and the Open Health Natural Language Processing (OHNLP) consortium and organized the 2019 n2c2/OHNLP ClinicalSTS track. We released a larger ClinicalSTS data set comprising 1642 clinical sentence pairs, including 1068 pairs from the 2018 shared task and 1006 new pairs from 2 electronic health record systems, GE and Epic. We released 80% (1642/2054) of the data to participating teams to develop and fine-tune the semantic textual similarity systems and used the remaining 20% (412/2054) as blind testing to evaluate their systems. The workshop was held in conjunction with the American Medical Informatics Association 2019 Annual Symposium. Results Of the 78 international teams that signed on to the n2c2/OHNLP ClinicalSTS shared task, 33 produced a total of 87 valid system submissions. The top 3 systems were generated by IBM Research, the National Center for Biotechnology Information, and the University of Florida, with Pearson correlations of r=.9010, r=.8967, and r=.8864, respectively. Most top-performing systems used state-of-the-art neural language models, such as BERT and XLNet, and state-of-the-art training schemas in deep learning, such as pretraining and fine-tuning schema, and multitask learning. Overall, the participating systems performed better on the Epic sentence pairs than on the GE sentence pairs, despite a much larger portion of the training data being GE sentence pairs. Conclusions The 2019 n2c2/OHNLP ClinicalSTS shared task focused on computing semantic similarity for clinical text sentences generated from clinical notes in the real world. It attracted a large number of international teams. The ClinicalSTS shared task could continue to serve as a venue for researchers in natural language processing and medical informatics communities to develop and improve semantic textual similarity techniques for clinical text.
APA, Harvard, Vancouver, ISO, and other styles
34

Jan, Rafiya, and Afaq Alam Khan. "Emotion Mining Using Semantic Similarity." International Journal of Synthetic Emotions 9, no. 2 (2018): 1–22. http://dx.doi.org/10.4018/ijse.2018070101.

Full text
Abstract:
Social networks are considered as the most abundant sources of affective information for sentiment and emotion classification. Emotion classification is the challenging task of classifying emotions into different types. Emotions being universal, the automatic exploration of emotion is considered as a difficult task to perform. A lot of the research is being conducted in the field of automatic emotion detection in textual data streams. However, very little attention is paid towards capturing semantic features of the text. In this article, the authors present the technique of semantic relatedness for automatic classification of emotion in the text using distributional semantic models. This approach uses semantic similarity for measuring the coherence between the two emotionally related entities. Before classification, data is pre-processed to remove the irrelevant fields and inconsistencies and to improve the performance. The proposed approach achieved the accuracy of 71.795%, which is competitive considering as no training or annotation of data is done.
APA, Harvard, Vancouver, ISO, and other styles
35

Fialho, Pedro, Luísa Coheur, and Paulo Quaresma. "Benchmarking Natural Language Inference and Semantic Textual Similarity for Portuguese." Information 11, no. 10 (2020): 484. http://dx.doi.org/10.3390/info11100484.

Full text
Abstract:
Two sentences can be related in many different ways. Distinct tasks in natural language processing aim to identify different semantic relations between sentences. We developed several models for natural language inference and semantic textual similarity for the Portuguese language. We took advantage of pre-trained models (BERT); additionally, we studied the roles of lexical features. We tested our models in several datasets—ASSIN, SICK-BR and ASSIN2—and the best results were usually achieved with ptBERT-Large, trained in a Brazilian corpus and tuned in the latter datasets. Besides obtaining state-of-the-art results, this is, to the best of our knowledge, the most all-inclusive study about natural language inference and semantic textual similarity for the Portuguese language.
APA, Harvard, Vancouver, ISO, and other styles
36

Zhenzhen Qi, Zhenzhen Qi. "English Sentence Semantic Feature Extraction Method Based on Fuzzy Logic Algorithm." Journal of Electrical Systems 20, no. 1 (2024): 262–75. http://dx.doi.org/10.52783/jes.681.

Full text
Abstract:
Semantic features play a pivotal role in natural language processing, providing a deeper understanding of the meaning and context within textual data. In the realm of machine learning and artificial intelligence, semantic feature extraction involves translating linguistic elements into numerical representations, often utilizing advanced techniques like word embeddings and deep learning models. The integration of semantic features enhances the precision and context-awareness of language models, enabling applications such as sentiment analysis, document categorization, and information retrieval to operate with greater accuracy and relevance. The paper introduces a novel approach, Hierarchical Mandhami Optimized Semantic Feature Extraction (HMOSFE), designed to enhance semantic feature extraction from English sentences. The proposed HMOSFE model comprises fusion of hierarchical clustering and fuzzy-based feature extraction, HMOSFE aims to capture intricate semantic relationships within sentences, providing nuanced insights into the underlying meaning of textual content. The model employs pre-trained word embeddings for term representation, calculates a similarity matrix using cosine similarity, and utilizes hierarchical clustering for document grouping. Fuzzy logic contributes to assigning weights to features, enabling a more refined understanding of semantic significance. The paper presents comprehensive results, including semantic similarity estimations, clustering distances, and fuzzy memberships, demonstrating the effectiveness of HMOSFE across diverse documents.
APA, Harvard, Vancouver, ISO, and other styles
37

Devarajan, Viji, and Revathy Subramanian. "Analyzing semantic similarity amongst textual documents to suggest near duplicates." Indonesian Journal of Electrical Engineering and Computer Science 25, no. 3 (2022): 1703. http://dx.doi.org/10.11591/ijeecs.v25.i3.pp1703-1711.

Full text
Abstract:
<span>Data deduplication techniques removing repeated or redundant data from the storage. In recent days, more data has been generated and stored in the storage environment. More redundant and semantically similar content of the data occupied in the storage environment due to this storage efficiency will be reduced and cost of the storage will be high. To overcome this problem, we proposed a method hybrid bidirectional encoder representation from transformers for text semantics using graph convolutional network hybrid bidirectional encoder representation from transformers (BERT) model for text semantics (HBTSG) word embedding-based deep learning model to <span>identify near duplicates based on the semantic relationship between text documents. In this paper we hybridize the concepts of chunking and</span> <span>semantic analysis. The chunking process is carried out to split the documents into blocks. Next stage we identify the semantic relationship between</span> <span>documents using word embedding techniques. It combines the advantages of the chunking, feature extraction, and semantic relations to provide better results.</span></span>
APA, Harvard, Vancouver, ISO, and other styles
38

Iqbal, MD Asif, Omar Sharif, Mohammed Moshiul Hoque, and Iqbal H. Sarker. "Word Embedding based Textual Semantic Similarity Measure in Bengali." Procedia Computer Science 193 (2021): 92–101. http://dx.doi.org/10.1016/j.procs.2021.10.010.

Full text
APA, Harvard, Vancouver, ISO, and other styles
39

Blanco, Eduardo, and Dan Moldovan. "A Semantic Logic-Based Approach to Determine Textual Similarity." IEEE/ACM Transactions on Audio, Speech, and Language Processing 23, no. 4 (2015): 683–93. http://dx.doi.org/10.1109/taslp.2015.2403613.

Full text
APA, Harvard, Vancouver, ISO, and other styles
40

Shajalal, Md, and Masaki Aono. "Semantic textual similarity between sentences using bilingual word semantics." Progress in Artificial Intelligence 8, no. 2 (2019): 263–72. http://dx.doi.org/10.1007/s13748-019-00180-4.

Full text
APA, Harvard, Vancouver, ISO, and other styles
41

Majumder, Goutam, Partha Pakray, and David Pinto. "Measuring interpretable semantic similarity of sentences using a multi chunk aligner." Journal of Intelligent & Fuzzy Systems 36, no. 5 (2019): 4797–808. http://dx.doi.org/10.3233/jifs-179028.

Full text
APA, Harvard, Vancouver, ISO, and other styles
42

Krisnawati, Lucia D., Aditya W. Mahastama, Su-Cheng Haw, Kok-Why Ng, and Palanichamy Naveen. "Indonesian-English Textual Similarity Detection Using Universal Sentence Encoder (USE) and Facebook AI Similarity Search (FAISS)." CommIT (Communication and Information Technology) Journal 18, no. 2 (2024): 183–95. http://dx.doi.org/10.21512/commit.v18i2.11274.

Full text
Abstract:
The tremendous development in Natural Language Processing (NLP) has enabled the detection of bilingual and multilingual textual similarity. One of the main challenges of the Textual Similarity Detection (TSD) system lies in learning effective text representation. The research focuses on identifying similar texts between Indonesian and English across a broad range of semantic similarity spectrums. The primary challenge is generating English and Indonesian dense vector representation, a.k.a. embeddings that share a single vector space. Through trial and error, the research proposes using the Universal Sentence Encoder (USE) model to construct bilingual embeddings and FAISS to index the bilingual dataset. The comparison between query vectors and index vectors is done using two approaches: the heuristic comparison with Euclidian distance and a clustering algorithm, Approximate Nearest Neighbors (ANN). The system is tested with four different semantic granularities, two text granularities, and evaluation metrics with a cutoff value of k={2,10}. Four semantic granularities used are highly similar or near duplicate, Semantic Entailment (SE), Topically Related (TR), and Out of Topic (OOT), while the text granularities take on the sentence and paragraph levels. The experimental results demonstrate that the proposed system successfully ranks similar texts in different languages within the top ten. It has been proven by the highest F1@2 score of 0.96 for the near duplicate category on the sentence level. Unlike the near-duplicate category, the highest F1 scores of 0.77 and 0.89 are shown by the SE and TR categories, respectively. The experiment results also show a high correlation between text and semantic granularity.
APA, Harvard, Vancouver, ISO, and other styles
43

Rudskyi, Oleksandr, Andrii Kopp, Tetiana Goncharenko, and Igor Gamayun. "INTELLIGENT TECHNOLOGY FOR SEMANTIC COMPLETENESS ASSESSMENT OF BUSINESS PROCESS MODELS." Bulletin of National Technical University "KhPI". Series: System Analysis, Control and Information Technologies, no. 2 (12) (December 24, 2024): 56–65. https://doi.org/10.20998/2079-0023.2024.02.09.

Full text
Abstract:
In this paper, we present a method for comparing business process models with their textual descriptions, using a semantic-based approach based on the SBERT (Sentence-Bidirectional Encoder Representations from Transformers) model. Business process models, especially those created with the BPMN (Business Process Model and Notation) standard, are crucial for optimizing organizational activities. Ensuring the alignment between these models and their textual descriptions is essential for improving business process accuracy and clarity. Traditional set similarity methods, which rely on tokenization and basic word matching, fail to capture deeper semantic relationships, leading to lower accuracy in comparison. Our approach addresses this issue by leveraging the SBERT model to evaluate the semantic similarity between the text description and the BPMN business process model. The experimental results demonstrate that the SBERT-based method outperforms traditional methods, based on similarity measures, by an average of 31%, offering more reliable and contextually relevant comparisons. The ability of SBERT to capture semantic similarity, including identifying synonyms and contextually relevant terms, provides a significant advantage over simple token-based approaches, which often overlook nuanced language variations. The experimental results demonstrate that the SBERT-based approach, proposed in this study, improves the alignment between textual descriptions and corresponding business process models. This advancement is allowing to improve the overall quality and accuracy of business process documentation, leading to fewer errors, introducing better clarity in business process descriptions, and better communication between all the stakeholders. The overall results obtained in this study contribute to enhancing the quality and consistency of BPMN business process models and related documentation.
APA, Harvard, Vancouver, ISO, and other styles
44

Ormerod, Mark, Jesús Martínez del Rincón, and Barry Devereux. "Predicting Semantic Similarity Between Clinical Sentence Pairs Using Transformer Models: Evaluation and Representational Analysis." JMIR Medical Informatics 9, no. 5 (2021): e23099. http://dx.doi.org/10.2196/23099.

Full text
Abstract:
Background Semantic textual similarity (STS) is a natural language processing (NLP) task that involves assigning a similarity score to 2 snippets of text based on their meaning. This task is particularly difficult in the domain of clinical text, which often features specialized language and the frequent use of abbreviations. Objective We created an NLP system to predict similarity scores for sentence pairs as part of the Clinical Semantic Textual Similarity track in the 2019 n2c2/OHNLP Shared Task on Challenges in Natural Language Processing for Clinical Data. We subsequently sought to analyze the intermediary token vectors extracted from our models while processing a pair of clinical sentences to identify where and how representations of semantic similarity are built in transformer models. Methods Given a clinical sentence pair, we take the average predicted similarity score across several independently fine-tuned transformers. In our model analysis we investigated the relationship between the final model’s loss and surface features of the sentence pairs and assessed the decodability and representational similarity of the token vectors generated by each model. Results Our model achieved a correlation of 0.87 with the ground-truth similarity score, reaching 6th place out of 33 teams (with a first-place score of 0.90). In detailed qualitative and quantitative analyses of the model’s loss, we identified the system’s failure to correctly model semantic similarity when both sentence pairs contain details of medical prescriptions, as well as its general tendency to overpredict semantic similarity given significant token overlap. The token vector analysis revealed divergent representational strategies for predicting textual similarity between bidirectional encoder representations from transformers (BERT)–style models and XLNet. We also found that a large amount information relevant to predicting STS can be captured using a combination of a classification token and the cosine distance between sentence-pair representations in the first layer of a transformer model that did not produce the best predictions on the test set. Conclusions We designed and trained a system that uses state-of-the-art NLP models to achieve very competitive results on a new clinical STS data set. As our approach uses no hand-crafted rules, it serves as a strong deep learning baseline for this task. Our key contribution is a detailed analysis of the model’s outputs and an investigation of the heuristic biases learned by transformer models. We suggest future improvements based on these findings. In our representational analysis we explore how different transformer models converge or diverge in their representation of semantic signals as the tokens of the sentences are augmented by successive layers. This analysis sheds light on how these “black box” models integrate semantic similarity information in intermediate layers, and points to new research directions in model distillation and sentence embedding extraction for applications in clinical NLP.
APA, Harvard, Vancouver, ISO, and other styles
45

Wu, Hao, Degen Huang, and Xiaohui Lin. "Semantic Textual Similarity with Constituent Parsing Heterogeneous Graph Attention Networks." Symmetry 17, no. 4 (2025): 486. https://doi.org/10.3390/sym17040486.

Full text
Abstract:
Semantic Textual Similarity (STS) serves as a metric for evaluating the semantic symmetry between texts, playing a pivotal role in various natural language processing (NLP) tasks. To facilitate the accurate measurement of semantic symmetry, high-quality text representation is essential. This paper studies how to utilize constituent parsing for text representation in STS. Unlike most existing syntax models, we propose a heterogeneous graph attention network that integrates constituent parsing (HGAT-CP). The heterogeneous graph contains meaningfully connected sentences, verb phrase (VP), noun phrase (NP), phrase, and word nodes, which are derived from the constituent parsing tree. This graph is fed to a graph attention network for context propagation among relevant nodes, which effectively captures the relations of inter-sentence components. In addition, we leverage the relationships between verb phrases (VPs) and noun phrases (NPs) across sentence pairs for data augmentation, which is denoted as HGAT_CP(NP, VP). We extensively evaluate our method on three datasets, and experimental results demonstrate that our proposed HGAT_CP(NP, VP) achieves significant improvements on the majority of the datasets. Notably, on the SICK dataset, HGAT_CP(NP, VP) achieved improvements of 0.39 and 1.84 compared to SimCSE-ROBERTa_large and SimCSE-ROBERTa_base, respectively.
APA, Harvard, Vancouver, ISO, and other styles
46

Zhou, Ya, Cheng Li, Guimin Huang, Qingkai Guo, Hui Li, and Xiong Wei. "A Short-Text Similarity Model Combining Semantic and Syntactic Information." Electronics 12, no. 14 (2023): 3126. http://dx.doi.org/10.3390/electronics12143126.

Full text
Abstract:
As one of the prominent research directions in the field of natural language processing (NLP), short-text similarity has been widely used in search recommendation and question-and-answer systems. Most of the existing short textual similarity models focus on considering semantic similarity while overlooking the importance of syntactic similarity. In this paper, we first propose an enhanced knowledge language representation model based on graph convolutional networks (KEBERT-GCN), which effectively uses fine-grained word relations in the knowledge base to assess semantic similarity and model the relationship between knowledge structure and text structure. To fully leverage the syntactic information of sentences, we also propose a computational model of constituency parse trees based on tree kernels (CPT-TK), which combines syntactic information, semantic features, and attentional weighting mechanisms to evaluate syntactic similarity. Finally, we propose a comprehensive model that integrates both semantic and syntactic information to comprehensively evaluate short-text similarity. The experimental results demonstrate that our proposed short-text similarity model outperforms the models proposed in recent years, achieving a Pearson correlation coefficient of 0.8805 on the STS-B dataset.
APA, Harvard, Vancouver, ISO, and other styles
47

Liu, Haoyan, Lei Fang, Jian-Guang Lou, and Zhoujun Li. "Leveraging Web Semantic Knowledge in Word Representation Learning." Proceedings of the AAAI Conference on Artificial Intelligence 33 (July 17, 2019): 6746–53. http://dx.doi.org/10.1609/aaai.v33i01.33016746.

Full text
Abstract:
Much recent work focuses on leveraging semantic lexicons like WordNet to enhance word representation learning (WRL) and achieves promising performance on many NLP tasks. However, most existing methods might have limitations because they require high-quality, manually created, semantic lexicons or linguistic structures. In this paper, we propose to leverage semantic knowledge automatically mined from web structured data to enhance WRL. We first construct a semantic similarity graph, which is referred as semantic knowledge, based on a large collection of semantic lists extracted from the web using several pre-defined HTML tag patterns. Then we introduce an efficient joint word representation learning model to capture semantics from both semantic knowledge and text corpora. Compared with recent work on improving WRL with semantic resources, our approach is more general, and can be easily scaled with no additional effort. Extensive experimental results show that our approach outperforms the state-of-the-art methods on word similarity, word sense disambiguation, text classification and textual similarity tasks.
APA, Harvard, Vancouver, ISO, and other styles
48

Wagenpfeil, Stefan. "Multimedia Graph Codes for Fast and Semantic Retrieval-Augmented Generation." Electronics 14, no. 12 (2025): 2472. https://doi.org/10.3390/electronics14122472.

Full text
Abstract:
Retrieval-Augmented Generation (RAG) has become a central approach to enhance the factual consistency and domain specificity of large language models (LLMs) by incorporating external context at inference time. However, most existing RAG systems rely on dense vector-based similarity, which fails to capture complex semantic structures, relational dependencies, and multimodal content. In this paper, we introduce Graph Codes—a matrix-based encoding of Multimedia Feature Graphs—as an alternative retrieval paradigm. Graph Codes preserve semantic topology by explicitly encoding entities and their typed relationships from multimodal documents, enabling structure-aware and interpretable retrieval. We evaluate our system in two domains: multimodal scene understanding (200 annotated image-question pairs) and clinical question answering (150 real-world medical queries with 10,000 structured knowledge snippets). Results show that our method outperforms dense retrieval baselines in precision (+9–15%), reduces hallucination rates by over 30%, and yields higher expert-rated answer quality. Theoretically, this work demonstrates that symbolic similarity over typed semantic graphs provides a more faithful alignment mechanism than latent embeddings. Practically, it enables interpretable, modality-agnostic retrieval pipelines deployable in high-stakes domains such as medicine or law. We conclude that Graph Code-based RAG bridges the gap between structured knowledge representation and neural generation, offering a robust and explainable alternative to existing approaches.
APA, Harvard, Vancouver, ISO, and other styles
49

Pungulescu, Crina. "Using Textual Analysis to Diversify Portfolios." Economics and Finance Letters 9, no. 1 (2022): 87–98. http://dx.doi.org/10.18488/29.v9i1.3028.

Full text
Abstract:
Semantic fingerprinting is a leading AI solution that combines recent developments from cognitive neuroscience and psycholinguistics to analyze text with human-level accuracy. As an efficient method of quantifying text, it has already found its application in finance where the semantic fingerprints of company descriptions have been shown to successfully predict stock return correlations of Dow Jones Industrial Average (DJIA) constituents. By extension, it has been suggested that diversified portfolios could be constructed to exploit the fundamental (dis)similarity between companies’ core activities (measured by the semantic overlap of company descriptions). This paper follows the performance of two portfolios made of the same DJIA constituent companies: the “minimum semantic concentration” portfolio (constructed with text-based portfolio weights) and the traditional “minimum variance” portfolio, over a time span of 16 years including two high volatility events: the 2007 − 2009 financial crisis and the COVID pandemic. The results confirm that textual analysis using semantic fingerprinting is consistently successful in predicting stock return correlations and is valuable as a portfolio selection criterion. However, in times of high market volatility the fundamental information given by the companies’ core activities, while still relevant, might carry less weight.
APA, Harvard, Vancouver, ISO, and other styles
50

Glavaš, Goran, Marc Franco-Salvador, Simone P. Ponzetto, and Paolo Rosso. "A resource-light method for cross-lingual semantic textual similarity." Knowledge-Based Systems 143 (March 2018): 1–9. http://dx.doi.org/10.1016/j.knosys.2017.11.041.

Full text
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography