Log in

Relevant bibliographies by topics / Semantic search algorithms / Journal articles

To see the other types of publications on this topic, follow the link: Semantic search algorithms.

Journal articles on the topic 'Semantic search algorithms'

Author: Grafiati

Published: 5 June 2025

Last updated: 25 June 2025

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 journal articles for your research on the topic 'Semantic search algorithms.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse journal articles on a wide variety of disciplines and organise your bibliography correctly.

1

Mammadov, Eshgin. "MATHEMATİCAL FOUNDATİONS OF SEMANTİC SEARCH İN INTERNET ENGİNES." Deutsche internationale Zeitschrift für zeitgenössische Wissenschaft 77 (April 4, 2024): 47–54. https://doi.org/10.5281/zenodo.10929008.

Full text

Abstract:

The advancement of semantic search algorithms relies heavily on the integration of sophisticated mathematical frameworks to decipher and interpret the semantics of user queries and web documents. This article provides an in-depth exploration of three key mathematical models utilized in semantic search: Vector Space Models (VSM), Latent Semantic Analysis (LSA), and Word Embeddings. Each model is meticulously examined, elucidating their mathematical foundations, operational principles, and integration into semantic search algorit hms. From the mathematical representation of documents and queries in vector space to the application of Singular Value Decomposition (SVD) in uncovering latent semantic structures, the article delves into the intricacies of these models. Furthermore, it explores how Word Embeddings, exemplified by Word2Vec and GloVe, revolutionize semantic understanding through dense vector representations of words. By synthesizing these mathematical frameworks into semantic search algorithms, search engines can bridge the semantic gap between user intent and search results, ultimately enhancing the accuracy, relevance, and user experience of information retrieval. Through this nuanced analysis, the article underscores the indispensable role of mathematics in propelling the evolution of semantic search technology towards more intuitive and efficient information retrieval systems in the digital

APA, Harvard, Vancouver, ISO, and other styles

2

Yin, Ying, Longfei Ma, Yuqi Gong, Yucen Shi, Fazal Wahab, and Yuhai Zhao. "Deep Semantics-Enhanced Neural Code Search." Electronics 13, no. 23 (2024): 4704. http://dx.doi.org/10.3390/electronics13234704.

Full text

Abstract:

Code search uses natural language queries to retrieve code snippets from a vast database, identifying those that are semantically similar to the query. This enables developers to reuse code and enhance software development efficiency. Most existing code search algorithms focus on capturing semantic and structural features by learning from both text and code graph structures. However, these algorithms often struggle to capture deeper semantic and structural features within these sources, leading to lower accuracy in code search results. To address this issue, this paper proposes a novel semantics-enhanced neural code search algorithm called SENCS, which employs graph serialization and a two-stage attention mechanism. First, the code program dependency graph is transformed into a unique serialized encoding, and a bidirectional long short-term memory (LSTM) model is used to learn the structural information of the code in the graph sequence to generate code vectors rich in structural features. Second, a two-stage attention mechanism enhances the embedded vectors by assigning different weight information to various code features during the code feature fusion phase, capturing significant feature information from different code feature sequences, resulting in code vectors rich in semantic and structural information. To validate the performance of the proposed code search algorithm, extensive experiments were conducted on two widely used code search datasets, CodeSearchNet and JavaNet. The experimental results show that the proposed SENCS algorithm improves the average code search accuracy metrics by 8.30 % (MRR) and 17.85% (DCG) and compared to the best baseline code search model in the literature, with an average improvement of 14.86% in the SR@1 metric. Experiments with two open-source datasets demonstrate SENCS achieves a better search effect than state of-the-art models.

APA, Harvard, Vancouver, ISO, and other styles

3

Paiva, Sara. "A Fuzzy Algorithm for Optimizing Semantic Documental Searches." International Journal of Web Portals 6, no. 1 (2014): 50–63. http://dx.doi.org/10.4018/ijwp.2014010104.

Full text

Abstract:

Search for documents is a common and pertinent task lots of organizations face every day as well as common Internet users in their daily searches. One specific document search is scientific paper search in reference manager systems such as Mendeley or IEEExplore. Considering the difficult task finding documents can sometimes represent, semantic search is currently being applied to improve this type of search. As the act of deciding if a document is a good result for a given search expression is vague, fuzziness becomes an important aspect when defining search algorithms. In this paper, the author present a fuzzy algorithm for improving documental searches optimized for specific scenarios where we want to find a document but don´t remember the exact words used, if plural or singular words were used or if a synonym was used. The author also present the application of this algorithm to a real scenario comparing to Mendeley and IEEExplore results.

APA, Harvard, Vancouver, ISO, and other styles

4

Hao, Liang Liang. "A Web Service Composition Algorithm Based on Graph Search and Semantic Web." Applied Mechanics and Materials 687-691 (November 2014): 1637–40. http://dx.doi.org/10.4028/www.scientific.net/amm.687-691.1637.

Full text

Abstract:

With the Development of web service technology, a single web service cannot fulfill different users’ diverse requirements. Adding semantic information to the input-output message of web services provides us a method to implement web service composition automatically. After researching on existing algorithms for web service composition, this article proposed a QoS-oriented web service composition algorithm based on graph search with semantic information.

APA, Harvard, Vancouver, ISO, and other styles

5

Boushaki, Saida Ishak, Omar Bendjeghaba, and Nadjet Kamel. "Biomedical Document Clustering Based on Accelerated Symbiotic Organisms Search Algorithm." International Journal of Swarm Intelligence Research 12, no. 4 (2021): 169–85. http://dx.doi.org/10.4018/ijsir.2021100109.

Full text

Abstract:

Clustering is an important unsupervised analysis technique for big data mining. It finds its application in several domains including biomedical documents of the MEDLINE database. Document clustering algorithms based on metaheuristics is an active research area. However, these algorithms suffer from the problems of getting trapped in local optima, need many parameters to adjust, and the documents should be indexed by a high dimensionality matrix using the traditional vector space model. In order to overcome these limitations, in this paper a new documents clustering algorithm (ASOS-LSI) with no parameters is proposed. It is based on the recent symbiotic organisms search metaheuristic (SOS) and enhanced by an acceleration technique. Furthermore, the documents are represented by semantic indexing based on the famous latent semantic indexing (LSI). Conducted experiments on well-known biomedical documents datasets show the significant superiority of ASOS-LSI over five famous algorithms in terms of compactness, f-measure, purity, misclassified documents, entropy, and runtime.

APA, Harvard, Vancouver, ISO, and other styles

6

Shelke, Priya, Chaitali Shewale, Riddhi Mirajkar, Suruchi Dedgoankar, Pawan Wawage, and Riddhi Pawar. "A Systematic and Comparative Analysis of Semantic Search Algorithms." International Journal on Recent and Innovation Trends in Computing and Communication 11, no. 11s (2023): 222–29. http://dx.doi.org/10.17762/ijritcc.v11i11s.8094.

Full text

Abstract:

Users often struggle to discover the information they need online because of the massive volume of data that is readily available as well as being generated every day in the today’s digital age. Traditional keyword-based search engines may not be able to handle complex queries, which could result in irrelevant or insufficient search results. This issue can be solved by semantic search, which utilises machine learning and natural language processing to interpret the meaning and context of a user's query. In this paper we focus on analyzing the BM-25 algorithm, Mean of Word Vectors approach, Universal Sentence Encoder model, and Sentence-BERT model on the CISI Dataset for Semantic Search Task. The results indicate that, the Finetuned SBERT model performs the best.

APA, Harvard, Vancouver, ISO, and other styles

7

Gomathi, Ramalingam, and Dhandapani Sharmila. "A Novel Adaptive Cuckoo Search for Optimal Query Plan Generation." Scientific World Journal 2014 (2014): 1–7. http://dx.doi.org/10.1155/2014/727658.

Full text

Abstract:

The emergence of multiple web pages day by day leads to the development of the semantic web technology. A World Wide Web Consortium (W3C) standard for storing semantic web data is the resource description framework (RDF). To enhance the efficiency in the execution time for querying large RDF graphs, the evolving metaheuristic algorithms become an alternate to the traditional query optimization methods. This paper focuses on the problem of query optimization of semantic web data. An efficient algorithm called adaptive Cuckoo search (ACS) for querying and generating optimal query plan for large RDF graphs is designed in this research. Experiments were conducted on different datasets with varying number of predicates. The experimental results have exposed that the proposed approach has provided significant results in terms of query execution time. The extent to which the algorithm is efficient is tested and the results are documented.

APA, Harvard, Vancouver, ISO, and other styles

8

Stanchev, Lubomir. "Fine-Tuning an Algorithm for Semantic Search Using a Similarity Graph." International Journal of Semantic Computing 09, no. 03 (2015): 283–306. http://dx.doi.org/10.1142/s1793351x15400073.

Full text

Abstract:

Given a set of documents and an input query that is expressed in a natural language, the problem of document search is retrieving the most relevant documents. Unlike most existing systems that perform document search based on keyword matching, we propose a method that considers the meaning of the words in the queries and documents. As a result, our algorithm can return documents that have no words in common with the input query as long as the documents are relevant. For example, a document that contains the words "Ford", "Chrysler" and "General Motors" multiple times is surely relevant for the query "car" even if the word "car" never appears in the document. Our information retrieval algorithm is based on a similarity graph that contains the degree of semantic closeness between terms, where a term can be a word or a phrase. Since the algorithms that constructs the similarity graph takes as input a myriad of parameters, in this paper we fine-tune the part of the algorithm that constructs the Wikipedia part of the graph. Specifically, we experimentally fine-tune the algorithm on the Miller and Charles study benchmark that contains 30 pairs of terms and their similarity score as determined by human users. We then evaluate the performance of the fine-tuned algorithm on the Cranfield benchmark that contains 1400 documents and 225 natural language queries. The benchmark also contains the relevant documents for every query as determined by human judgment. The results show that the fine-tuned algorithm produces higher mean average precision (MAP) score than traditional keyword-based search algorithms because our algorithm considers not only the words and phrases in the query and documents, but also their meaning.

APA, Harvard, Vancouver, ISO, and other styles

9

Gomathi, R., and D. Sharmila. "Application of Harmony Search Algorithm to Optimize SPARQL Protocol and Resource Description Framework Query Language Queries in Healthcare Data." Journal of Medical Imaging and Health Informatics 11, no. 11 (2021): 2862–67. http://dx.doi.org/10.1166/jmihi.2021.3877.

Full text

Abstract:

The rapid developing international of internet, Semantic Web has become a platform for intelligent agents mainly in the healthcare sector. Inside the beyond few years there is a widening in the Semantic web data field in the healthcare industry. With a growth in the quantity of Semantic web data field in health industry, there exist some challenges to be resolved. One such challenge is to provide an efficient querying mechanism that can handle large number of Semantic web data. Consider many query languages; especially SPARQL (SPARQL Protocol and RDF Query Language) is the most popular query language. Each of these query languages has their own design strategy and it was identified in research that it is difficult to handle and query large quantity of RDF data efficiently using these languages. In the proposed process, Harmony search identify met heuristic algorithm to optimize the SPARQL queries in the healthcare data in the applicable manner. The application of Harmony search algorithm is evaluated with large Resource Description Framework (RDF) datasets and SPARQL queries. To assess performance, the algorithm’s implementation is compared to existing nature-inspired algorithms. The performance analysis shows that the proposed application performs well for large RDF datasets.

APA, Harvard, Vancouver, ISO, and other styles

10

He, Weinan, Zilei Wang, and Yixin Zhang. "Target Semantics Clustering via Text Representations for Robust Universal Domain Adaptation." Proceedings of the AAAI Conference on Artificial Intelligence 39, no. 16 (2025): 17132–40. https://doi.org/10.1609/aaai.v39i16.33883.

Full text

Abstract:

Universal Domain Adaptation (UniDA) focuses on transferring source domain knowledge to the target domain under both domain shift and unknown category shift. Its main challenge lies in identifying common class samples and aligning them. Current methods typically obtain target domain semantics centers from an unconstrained continuous image representation space. Due to domain shift and the unknown number of clusters, these centers often result in complex and less robust alignment algorithm. In this paper, based on vision-language models, we search for semantic centers in a semantically meaningful and discrete text representation space. The constrained space ensures almost no domain bias and appropriate semantic granularity for these centers, enabling a simple and robust adaptation algorithm. Specifically, we propose TArget Semantics Clustering (TASC) via Text Representations, which leverages information maximization as a unified objective and involves two stages. First, with the frozen encoders, a greedy search-based framework is used to search for an optimal set of text embeddings to represent target semantics. Second, with the search results fixed, encoders are refined based on gradient descent, simultaneously achieving robust domain alignment and private class clustering. Additionally, we propose Universal Maximum Similarity (UniMS), a scoring function tailored for detecting open-set samples in UniDA. Experimentally, we evaluate the universality of UniDA algorithms under four category shift scenarios. Extensive experiments on four benchmarks demonstrate the effectiveness and robustness of our method, which has achieved state-of-the-art performance.

APA, Harvard, Vancouver, ISO, and other styles

11

Xu, Ming, and Yun Ke. "The Research of Information Retrieval Technology Based on Semantic Analysis." Advanced Materials Research 926-930 (May 2014): 2160–63. http://dx.doi.org/10.4028/www.scientific.net/amr.926-930.2160.

Full text

Abstract:

The common information retrieval technology is mainly based on keyword matching and this kind of method only focuse on the optimization algorithm, ignoring the semantic research. This does not solve the fundamental semantic multiplicity, retrieve diversity, related web undetected, sort unstandardized. This paper is a study of these problems arise for the current proposed MIRSA information retrieval model based on semantic analysis. This model consists of the following four main key points: disambiguation method, semantic expansion algorithm, the search terms match strategy, web sorting algorithms. This model can effectively solve the problem of semantic multiplicity, avoid missed relevant pages and reasonably improve the sor of related pages.

APA, Harvard, Vancouver, ISO, and other styles

12

Yuan, Xiao Yan. "Research on Search Sorting Algorithm Based on Multi-Dimensional Matching." Advanced Materials Research 926-930 (May 2014): 3195–99. http://dx.doi.org/10.4028/www.scientific.net/amr.926-930.3195.

Full text

Abstract:

Since the current search sorting algorithms cannot find the desirable webpages quickly and accurately, a novel search sorting algorithm based on multi-dimensional matching is proposed in this study. This algorithm computes the semantic similarity of search terms based on ontology concept, and then the relevance of temporal information of those search terms with time of webpage. As a result, the relevance of the search term with the content of the webpage is calculated, hence realizing most appropriate webpage sorting. Finally, several methods are compared in terms of their average precisions and average recall ratios.

APA, Harvard, Vancouver, ISO, and other styles

13

Tekli, Joe, Gilbert Tekli, and Richard Chbeir. "Combining offline and on-the-fly disambiguation to perform semantic-aware XML querying." Computer Science and Information Systems, no. 00 (2022): 63. http://dx.doi.org/10.2298/csis220228063t.

Full text

Abstract:

Many efforts have been deployed by the IR community to extend free-text query processing toward semi-structured XML search. Most methods rely on the concept of Lowest Comment Ancestor (LCA) between two or multiple structural nodes to identify the most specific XML elements containing query keywords posted by the user. Yet, few of the existing approaches consider XML semantics, and the methods that process semantics generally rely on computationally expensive word sense disambiguation (WSD) techniques, or apply semantic analysis in one stage only: performing query relaxation/refinement over the bag of words retrieval model, to reduce processing time. In this paper, we describe a new approach for XML keyword search aiming to solve the limitations mentioned above. Our solution first transforms the XML document collection (offline) and the keyword query (on-the-fly) into meaningful semantic representations using context-based and global disambiguation methods, specially designed to allow almost linear computation efficiency. We use a semantic-aware inverted index to allow semantic-aware search, result selection, and result ranking functionality. The semantically augmented XML data tree is processed for structural node clustering, based on semantic query concepts (i.e., key-concepts), in order to identify and rank candidate answer sub-trees containing related occurrences of query key-concepts. Dedicated weighting functions and various search algorithms have been developed for that purpose and will be presented here. Experimental results highlight the quality and potential of our approach.

APA, Harvard, Vancouver, ISO, and other styles

14

Kumar, Pilli Suresh. "Core Technologies in Semantic Search Engines." International Journal of Research and Innovation in Applied Science X, no. IV (2025): 287–97. https://doi.org/10.51584/ijrias.2025.10040023.

Full text

Abstract:

Semantic search engines have revolutionized the way we retrieve information from the web by focusing on user intent and contextual meaning, rather than relying solely on keyword matching. This is enabled by core technologies like NLP, Knowledge Graphs, AI, ML. NLP helps search engines make sense of human language, allowing them to understand how words and phrases relate to each other. This utilizes Knowledge Graphs to improve search results, as this builds the data into relations, supplying the search engine with the ability to return more precise and contextual results. AI and ML algorithms work within search engines to improve the quality of outputs, learning based on interactions and helping to continuously improve ranking models. Further factors such as ontologies and entity recognition are involved in contextual awareness, allowing for more accurate responses to complex queries as well. Vector search with encoders moves us away from naive keyword search to allow much more semantically related and deeper search that fulfills a deeper connection of the user to the data. Semantic search engines are becoming more sophisticated as the digital landscape evolves, enabling such innovations as voice search; conversational AI; and recommendation systems. This review article describes these key pillars, their interdependencies, and their implications for the future of information retrieval, conveying that semantic search is transforming the next-generation intelligent search systems.

APA, Harvard, Vancouver, ISO, and other styles

15

Goncharova, Oksana V., Zaur A. Zavrumov, and Svetlana Khaleeva. "Search algorithms of verbal identity markers in modern scientific discourse." Current Issues in Philology and Pedagogical Linguistics, no. 2 (June 25, 2024): 18–29. http://dx.doi.org/10.29025/2079-6021-2024-2-18-29.

Full text

Abstract:

The article is devoted to the study of identity verbalization specifics via Data Mining. The research material consists of English texts from Internet scientific repositories and e-libraries devoted to various concepts of youth identity. A methodology based on the use of modern natural language processing and machine learning tools was developed and tested as part of the research. The analysis was carried out using the Natural Language Toolkit library for tokenization and POS-tagging procedures for calculating the frequency of tokens from the «identity» environment. Word Embeddings, pre-trained Word2Vec model and K-means algorithm were used for the subsequent analysis and clustering of words based on their semantic proximity. Gensim library and Scikit-learn library were used to work with the Word2Vec model. As a result, it was proved that in English scientific discourse young person’s identity is verbalized within 9 semantic categories: behavior, communities, communication, education, identity, language, practice, complexity, science, the most common of which are education (33%), language (21%) and communities (18%). N-grams analysis made it possible to identify semantic fields, establish their attributes, and evaluate texts’ similarity, which provided the most accurate vector space search for semantically close n-grams. Optimization made it possible to establish a similarity measure to rank phrases according to the query, as well as assign each n-gram a certain ranking weight. Improvements can be achieved by adding statistical word weighting, such as TF-IDF. The proposed system is capable of searching in a large text array of related phrases with a similar meaning.

APA, Harvard, Vancouver, ISO, and other styles

16

Атаева, Ольга Муратовна, Владимир Алексеевич Серебряков, and Наталия Павловна Тучкова. "On the Synonym Search Model." Russian Digital Libraries Journal 24, no. 6 (2022): 1006–22. http://dx.doi.org/10.26907/1562-5419-2021-24-6-1006-1022.

Full text

Abstract:

The problem of finding the most relevant documents as a result of an extended and refined query is considered. For this, a search model and a text preprocessing mechanism are proposed, as well as the joint use of a search engine and a neural network model built on the basis of an index using word2vec algorithms to generate an extended query with synonyms and refine search results based on a selection of similar documents in a digital semantic library. The paper investigates the construction of a vector representation of documents based on paragraphs in relation to the data array of the digital semantic library LibMeta. Each piece of text is labeled. Both the whole document and its separate parts can be marked. The problem of enriching user queries with synonyms was solved, then when building a search model together with word2vec algorithms, an approach of "indexing first, then training" was used to cover more information and give more accurate search results. The model was trained on the basis of the library's mathematical content. Examples of training, extended query and search quality assessment using training and synonyms are given.

APA, Harvard, Vancouver, ISO, and other styles

17

Dhingra, Vandana, and Komal Bhatia. "Comparative Analysis of Ontology Ranking Algorithms." International Journal of Information Technology and Web Engineering 7, no. 3 (2012): 55–66. http://dx.doi.org/10.4018/jitwe.2012070104.

Full text

Abstract:

Ontologies are the backbone of knowledge representation on Semantic web. Challenges involved in building ontologies are in terms of time, efforts, skill, and domain specific knowledge. In order to minimize these challenges, one of the major advantages of ontologies is its potential of “reuse,” currently supported by various search engines like Swoogle, Ontokhoj. As the number of ontologies that such search engines like Swoogle, OntoKhoj Falcon can find increases, so will the need increase for a proper ranking method to order the returned lists of ontologies in terms of their relevancy to the query which can save a lot of time and effort. This paper deals with analysis of various ontology ranking algorithms. Based on the analysis of different ontology ranking algorithms, a comparative study is done to find out their relative strengths and limitations based on various parameters which provide a significant research direction in ranking of ontologies in semantic web.

APA, Harvard, Vancouver, ISO, and other styles

18

Jia, Zheshu, and Deyun Chen. "A Segmentation Algorithm of Image Semantic Sequence Data Based on Graph Convolution Network." Security and Communication Networks 2021 (April 22, 2021): 1–11. http://dx.doi.org/10.1155/2021/5596820.

Full text

Abstract:

Image semantic data have multilevel feature information. In the actual segmentation, the existing segmentation algorithms have some limitations, resulting in the fact that the final segmentation accuracy is too small. To solve this problem, a segmentation algorithm of image semantic sequence data based on graph convolution network is constructed. The graph convolution network is used to construct the image search process. The semantic sequence data are extracted. After the qualified data points are accumulated, the gradient amplitude forms complete rotation field and no scatter field in the diffusion process, which enhances the application scope of the algorithm, controls the accuracy of the segmentation algorithm, and completes the construction of the data segmentation algorithm. After the experimental dataset is prepared and the semantic segmentation direction is defined, we compare our method with four methods. The results show that the segmentation algorithm designed in this paper has the highest accuracy.

APA, Harvard, Vancouver, ISO, and other styles

19

Fedorov, Alexander, and Alexey Nikolaevich Shikov. "SEMANTIC NETWORK TRANSFORMATION METHOD FOR AUTOMATION OF PROGRAMMING PROBLEMS SOLUTIONS EVALUATION IN E-LEARNING." Vestnik of Astrakhan State Technical University. Series: Management, computer science and informatics 2020, no. 4 (2020): 7–17. http://dx.doi.org/10.24143/2072-9502-2020-4-7-17.

Full text

Abstract:

The article presents a semantic network transformation method for a programcode into an N-dimensional vector. The proposed method allows automating the quality assessment of solving programming problems in the process of e-learning. The method includes the authentic algorithms of building and converting the network. In order to determine the algorithm in the program code there is a template of this algorithm, presented in the form of a subgraph of abstract concepts of the language in the semantic network, built on the basis of this code. The search for the algorithm by comparing the subgraph of the network with the template network helped to identify the BFS algorithm with a given accuracy: the cutoff threshold for the perceptron outputs is 0.85, which is based on the calculation of accuracy of the single-layer perceptron in the classification of the MNIST base equal to 88%, which confirms the effectiveness of the developed method and requires further research using machine learning methods to find the optimal value of the coordinates of the nodes of the semantic network and templates of algorithms.

APA, Harvard, Vancouver, ISO, and other styles

20

NIEMANN, H., G. SAGERER, and W. EICHHORN. "CONTROL STRATEGIES IN A HIERARCHICAL KNOWLEDGE STRUCTURE." International Journal of Pattern Recognition and Artificial Intelligence 02, no. 03 (1988): 557–72. http://dx.doi.org/10.1142/s0218001488000327.

Full text

Abstract:

Two control strategies are presented as working on a hierarchical knowledge structure based on a semantic network. The control algorithms cover strict top-down control and a bidirectional control which is a mixture of top-down (model driven) and bottom-up (data driven) analysis. The knowledge used by the algorithm is represented in a semantic network. Besides the network some other knowledge sources may be generated automatically to direct the analysis and limit the search space. The approach was used successfully in image and speech understanding.

APA, Harvard, Vancouver, ISO, and other styles

21

Du, Jie, and Roy Rada. "Knowledge in Memetic Algorithms for Stock Classification." International Journal of Artificial Life Research 4, no. 1 (2014): 13–29. http://dx.doi.org/10.4018/ijalr.2014010102.

Full text

Abstract:

This paper introduces a framework for a knowledge-based memetic algorithm, called KBMA. The problem of stock classification is the test bed for the performance of KBMA. Domain knowledge is incorporated into the initialization and reproduction phases of evolutionary computation. In particular, the structure of financial statements is used to sort the attributes, which contributed to a faster convergence on near optimal solutions. A semantic net is used to measure the distance between parents and offspring. Two case studies were implemented, in which domain knowledge is used to constrain the reproductive operators so that the offspring is semantically dissimilar (or similar) to the parent. The results show that KBMA outperformed the random memetic algorithm in the former case but did not in the latter case. The interpretation of the results is that when the search algorithm is distant from its goal, making large steps as defined by the semantic knowledge is helpful to the search.

APA, Harvard, Vancouver, ISO, and other styles

22

Gulyamova, Shakhnoza Kakhramonovna Gulyamova. "SEMANTIC ANALYSIS AND SYN SIS AND SYNTHESIS IN THE A THESIS IN THE AUTOMATIC ANALYSIS OF THE TEXT." Scientific Reports of Bukhara State University 5, no. 1 (2021): 112–24. http://dx.doi.org/10.52297/2181-1466/2021/5/1/9.

Full text

Abstract:

Introduction. In the information-search engine, semantic analysis and synthesis occupy a leading place. When we say automatic semantic analysis, using specially developed linguistic algorithms, we understand a set of methods and techniques that can be used with sufficient accuracy to express the meaning of random speech in a natural language with the help of a rigorous, accurate tool that is carried out on a computer. Highlighting the importance of the semantic analyzer in the information search engine, it is first of all associated with the study of the process of semantic analysis and synthesis in the automatic analysis of the text, the elimination of its problems. Research methods. The direct semantic analysis and synthesis method were used to cover the importance of semantic analysis and synthesis in the automatic analysis of text. Through this, their leading position in the automatic analysis of the text was manifested. Because initially the morphological and syntactic analysis of the text is carried out, and then the semantic analysis is performed. Semantic analysis works with meaning. Moreover, semantics is closely related to philosophy, psychology and other sciences, in addition to knowledge of the structure of the language. In semantic analysis, it is necessary to take into account both the social and cultural features of the native language. The process of human thinking, the means of expressing ideas, is a difficult process to formalize language. Results and discussions.

APA, Harvard, Vancouver, ISO, and other styles

23

Xu, Hao. "Semantic Relationships of Scientific Discourses." Applied Mechanics and Materials 380-384 (August 2013): 2242–45. http://dx.doi.org/10.4028/www.scientific.net/amm.380-384.2242.

Full text

Abstract:

Scientific discourses have obviously enhanced their accessibility and reusability in response to the development of Semantic Web technologies. A handful of representation models of discourse representation have been proposed during these years for semantic search and strategy reading. In this paper, we delineate the relationships that operate between entities or specific instances of entities, such as Semantic Relationships. Such definitions and demonstrations of relationships will be served for semantic algorithms and applications.

APA, Harvard, Vancouver, ISO, and other styles

24

Cheng, Quanying, Yunqiang Zhu, Hongyun Zeng, et al. "A Method for Identifying Geospatial Data Sharing Websites by Combining Multi-Source Semantic Information and Machine Learning." Applied Sciences 11, no. 18 (2021): 8705. http://dx.doi.org/10.3390/app11188705.

Full text

Abstract:

Geospatial data sharing is an inevitable requirement for scientific and technological innovation and economic and social development decisions in the era of big data. With the development of modern information technology, especially Web 2.0, a large number of geospatial data sharing websites (GDSW) have been developed on the Internet. GDSW is a point of access to geospatial data, which is able to provide a geospatial data inventory. How to precisely identify these data websites is the foundation and prerequisite of sharing and utilizing web geospatial data and is also the main challenge of data sharing at this stage. GDSW identification can be regarded as a binary website classification problem, which can be solved by the current popular machine learning method. However, the websites obtained from the Internet contain a large number of blogs, companies, institutions, etc. If GDSW is directly used as the sample data of machine learning, it will greatly affect the classification precision. For this reason, this paper proposes a method to precisely identify GDSW by combining multi-source semantic information and machine learning. Firstly, based on the keyword set, we used the Baidu search engine to find the websites that may be related to geospatial data in the open web environment. Then, we used the multi-source semantic information of geospatial data content, morphology, sources, and shared websites to filter out a large number of websites that contained geospatial keywords but were not related to geospatial data in the search results through the calculation of comprehensive similarity. Finally, the filtered geospatial data websites were used as the sample data of machine learning, and the GDSWs were identified and evaluated. In this paper, training sets are extracted from the original search data and the data filtered by multi-source semantics, the two datasets are trained by machine learning classification algorithms (KNN, LR, RF, and SVM), and the same test datasets are predicted. The results show that: (1) compared with the four classification algorithms, the classification precision of RF and SVM on the original data is higher than that of the other two algorithms. (2) Taking the data filtered by multi-source semantic information as the sample data for machine learning, the precision of all classification algorithms has been greatly improved. The SVM algorithm has the highest precision among the four classification algorithms. (3) In order to verify the robustness of this method, different initial sample data mentioned above are selected for classification using the same method. The results show that, among the four classification algorithms, the classification precision of SVM is still the highest, which shows that the proposed method is robust and scalable. Therefore, taking the data filtered by multi-source semantic information as the sample data to train through machine learning can effectively improve the classification precision of GDSW, and comparing the four classification algorithms, SVM has the best classification effect. In addition, this method has good robustness, which is of great significance to promote and facilitate the sharing and utilization of open geospatial data.

APA, Harvard, Vancouver, ISO, and other styles

25

Murugesan, R., and K. Devaki. "Liver Lesion Detection Using Semantic Segmentation and Chaotic Cuckoo Search Algorithm." Information Technology and Control 52, no. 3 (2023): 761–75. http://dx.doi.org/10.5755/j01.itc.52.3.34032.

Full text

Abstract:

The classic feature extraction techniques used in recent research on computer-aided diagnosis (CAD) of liver cancer have several disadvantages, including duplicated features and substantial computational expenses. Modern deep learning methods solve these issues by implicitly detecting complex structures in massive quantities of healthcare image data. This study suggests a unique bio-inspired deep-learning way for improving liver cancer prediction outcomes. Initially, a novel semantic segmentation technique known as UNet++ is proposed to extract liver lesions from computed tomography (CT) images. Second, a hybrid approach that combines the Chaotic Cuckoo Search algorithm and AlexNet is indicated as a feature extractor and classifier for liver lesions. LiTS, a freely accessible database that contains abdominal CT images, was employed for liver tumor diagnosis and investigation. The segmentation results were evaluated using the Dice similarity coefficient and Correlation coefficient. The classification results were assessed using Accuracy, Precision, Recall, F1 Score, and Specificity. Concerning the performance metrics such as accuracy, precision, and recall, the recommended method performs better than existing algorithms producing the highest values such as 99.2%, 98.6%, and 98.8%, respectively.

APA, Harvard, Vancouver, ISO, and other styles

26

Edrees, Zahir, and Henda Juma. "Comparative Analysis of Page Ranking Algorithms for Efficient Information Retrieval." American Journal of Information Science and Technology 9, no. 1 (2025): 15–23. https://doi.org/10.11648/j.ajist.20250901.12.

Full text

Abstract:

Search engines have become crucial tools today, providing users with access to vast amounts of information. At the core of search engine functionality lies the ranking algorithm, which is responsible for determining the relevance and order of web pages returned in response to user queries. Ranking algorithms play a critical role in ensuring that users receive the most relevant and useful results, particularly in the face of exponentially growing web content. This paper provides an in-depth analysis of PageRank algorithms, focusing on their significance in information retrieval systems. The study begins with an overview of the foundational PageRank algorithm developed by Google, detailing its reliance on hyperlink structures to rank web pages. The limitations of the original algorithm, such as its inability to consider page content relevance and dynamic updates, are explored. In response to these limitations, the paper examines advanced ranking methods, including the Weighted PageRank (WPR), Hyperlink-Induced Topic Search (HITS), and the Stochastic System Analysis Approach (SALSA). Each of these algorithms is analyzed in terms of efficiency, response time, scalability, and effectiveness. Additionally, the paper investigates recent enhancements in ranking methods that address the evolving needs of modern search engines, such as personalized search and semantic relevance. Experimental comparisons are conducted to evaluate the performance of these algorithms on large-scale datasets. Key metrics, including time response, computational efficiency, and relevance accuracy, are used to compare and rank the algorithms. The findings provide valuable insights into the strengths and weaknesses of different PageRank methods, contributing to the development of more efficient and effective information retrieval systems.

APA, Harvard, Vancouver, ISO, and other styles

27

Akerke, Akanova, Ospanova Nazira, Kukharenko Yevgeniya, and Abildinova Gulmira. "DEVELOPMENT OF THE ALGORITHM OF KEYWORD SEARCH IN THE KAZAKH LANGUAGE TEXT CORPUS." Eastern-European Journal of Enterprise Technologies 5, no. 2 (101) (2019): 26–32. https://doi.org/10.15587/1729-4061.2019.179036.

Full text

Abstract:

The issue of semantic text analysis occupies a special place in computational linguistics. Researchers in this field have an increased interest in developing an algorithm that will improve the quality of text corpus processing and probabilistic determination of text content. The results of the study on the application of methods, approaches, algorithms for semantic text analysis in computational linguistics in international and Kazakhstan science led to the development of an algorithm of keyword search in a Kazakh text. The first step of the algorithm was to compile a reference dictionary of keywords for the Kazakh language text corpus. The solution to this problem was to apply the Porter (stemmer) algorithm for the Kazakh language text corpus. The implementation of the stemmer allowed highlighting unique word stems and getting a reference dictionary, which was subsequently indexed. The next step is to collect learning data from the text corpus. To calculate the degree of semantic proximity between words, each word is assigned a vector of the corresponding word forms of the reference dictionary, which results in a pair of a keyword and a vector. And the last step of the algorithm is neural network learning. During learning, the error backpropagation method is used, which allows a semantic analysis of the text corpus and obtaining a probabilistic number of words close to the expected number of keywords. This process automates the processing of text material by creating digital learning models of keywords. The algorithm is used to develop a neurocomputer system that will automatically check the text works of online learners. The uniqueness of the keyword search algorithm is the use of neural network learning for texts in the Kazakh language. In Kazakhstan, scientists in the field of computational linguistics conducted a number of studies based on morphological analysis, lemmatization and other approaches and implemented linguistic tools (mainly translation dictionaries). The scope of neural network learning for parsing of the Kazakh language remains an open issue in the Kazakhstan science. The developed algorithm involves solving one of the problems of effective semantic analysis of the text in the Kazakh language

APA, Harvard, Vancouver, ISO, and other styles

28

Luo, Mao, Ningning He, Xinyun Wu, Caiquan Xiong, and Wanghao Xu. "A Four-Label-Based Algorithm for Solving Stable Extension Enumeration in Abstract Argumentation Frameworks." Applied Sciences 14, no. 17 (2024): 7656. http://dx.doi.org/10.3390/app14177656.

Full text

Abstract:

In abstract argumentation frameworks, the computation of stable extensions is an important semantic task for evaluating the acceptability of arguments. The current approaches for the computation of stable extensions are typically conducted through methodologies that are either label-based or extension-based. Label-based algorithms operate by assigning labels to each argument, thus reducing the attack relations between arguments to constraint relations among the labels. This paper analyzes the existing two-label and three-label enumeration algorithms for stable extensions through case studies. It is found that both the two-label and three-label algorithms are not precise enough in defining types of arguments. To address these issues, this paper proposes a four-label enumeration algorithm for stable extensions. This method introduces amust_in label to pre-mark certain in-type arguments, thereby achieving a finer classification of in-type arguments. This enhances the labelings’ propagation ability and reduces the algorithm’s search space. Our proposed four-label algorithm was tested on authoritative benchmark sets of abstract argumentation framework problems: ICCMA 2019, ICCMA 2021, and ICCMA 2023. Experimental results show that the four-label algorithm significantly improves solving efficiency compared to existing two-label and three-label algorithms. Additionally, ablation experiments confirm that both the four-label transition strategy and preprocessing strategy enhance the algorithm’s performance.

APA, Harvard, Vancouver, ISO, and other styles

29

Sirakov, Nikolay Metodiev, Sang C. Suh, and Salvatore Attardo. "AUTOMATIC OBJECT IDENTIFICATION USING VISUAL LOW LEVEL FEATURE EXTRACTION AND ONTOLOGICAL KNOWLEDGE." Journal of Integrated Design and Process Science: Transactions of the SDPS, Official Journal of the Society for Design and Process Science 14, no. 2 (2010): 13–26. http://dx.doi.org/10.3233/jid-2010-14202.

Full text

Abstract:

The present work is a part of research study aiming to develop an algorithm and a software system capable of quick identification of weapons and relations between human and a weapon in a scene. Bridging the semantic gap between the low level knowledge extracted from an image and the high level semantics needed to negotiate the weapon domain ontology is connected to the features extraction algorithms. Also, the ontology is anticipated to help facilitate the recognition part of the work. To accelerate the search process a hierarchy of attributes and concepts will be applied to cluster the ontology using a set of extracted features. The ontological structure, the clustering ideas and the feature extraction approaches and algorithms are introduced in the paper. Experimental results for boundary and convex hull extraction are shown as well. The paper ends with discussion and the future directions of the present work.

APA, Harvard, Vancouver, ISO, and other styles

30

Mussiraliyeva, Shynar, Milana Bolatbek, Kuralay Azanbay, and Zhastay Yeltay. "Development of an error correction algorithm for Kazakh language." Journal of Mathematics, Mechanics and Computer Science 123, no. 3 (2024): 81–97. http://dx.doi.org/10.26577/jmmcs2024-v123-i3-8.

Full text

Abstract:

This article discusses a method for correcting spelling errors in the Kazakh language using the advantages of morphological analysis and a model based on noisy channels. To achieve this goal, modern problems of automatic processing of Kazakh textual information were analyzed, existing linguistic resources and processing systems of the Kazakh language were systematized, the basic requirements for the development of a system for analyzing Kazakh textual information based on machine learning were determined, and models and algorithms for extracting facts from unstructured and poorly structured text arrays were developed. The search function, an enhanced spelling correction algorithm, was utilized in this work and has the ability to recommend the proper spelling of the input text. The maximum editing distance, whether to include the original word when near matches are not found, and how to handle case sensitivity and exclusion based on regular expressions are all easily adjustable features of this functionality. Because of their adaptability, algorithms can be applied to a wide range of problems, from straightforward spell checks in user interfaces to intricate natural language processing assignments. Because of the way it’s designed, the search function finds possible corrections and verifies the context of words while accounting for user preferences like verbosity and ignore markers. Most modern multilingual natural language processing programs use only the graphical stage of text processing. On the other hand, semantic text analysis or analysis of the meaning of natural language is still an important problem in the theory of artificial intelligence and computational linguistics. But in order to process the grammar and semantics of multilingual information, precreated semantic and grammatical corpora of each natural language are necessary. To solve this problem, several tasks were considered and solved. These tasks included the analysis of research in the field of machine learning methods used in the processing of textual information, the existing problems of formalization and modeling of the Kazakh language, as well as the development and implementation of models, methods and algorithms for morphological and semantic analysis of texts of the Kazakh language.

APA, Harvard, Vancouver, ISO, and other styles

31

Uthayan, K. R., and G. S. Anandha Mala. "Hybrid Ontology for Semantic Information Retrieval Model Using Keyword Matching Indexing System." Scientific World Journal 2015 (2015): 1–9. http://dx.doi.org/10.1155/2015/414910.

Full text

Abstract:

Ontology is the process of growth and elucidation of concepts of an information domain being common for a group of users. Establishing ontology into information retrieval is a normal method to develop searching effects of relevant information users require. Keywords matching process with historical or information domain is significant in recent calculations for assisting the best match for specific input queries. This research presents a better querying mechanism for information retrieval which integrates the ontology queries with keyword search. The ontology-based query is changed into a primary order to predicate logic uncertainty which is used for routing the query to the appropriate servers. Matching algorithms characterize warm area of researches in computer science and artificial intelligence. In text matching, it is more dependable to study semantics model and query for conditions of semantic matching. This research develops the semantic matching results between input queries and information in ontology field. The contributed algorithm is a hybrid method that is based on matching extracted instances from the queries and information field. The queries and information domain is focused on semantic matching, to discover the best match and to progress the executive process. In conclusion, the hybrid ontology in semantic web is sufficient to retrieve the documents when compared to standard ontology.

APA, Harvard, Vancouver, ISO, and other styles

32

Badenes-Olmedo, Carlos, José Luis Redondo-García, and Oscar Corcho. "Large-scale semantic exploration of scientific literature using topic-based hashing algorithms." Semantic Web 11, no. 5 (2020): 735–50. http://dx.doi.org/10.3233/sw-200373.

Full text

Abstract:

Searching for similar documents and exploring major themes covered across groups of documents are common activities when browsing collections of scientific papers. This manual knowledge-intensive task can become less tedious and even lead to unexpected relevant findings if unsupervised algorithms are applied to help researchers. Most text mining algorithms represent documents in a common feature space that abstract them away from the specific sequence of words used in them. Probabilistic Topic Models reduce that feature space by annotating documents with thematic information. Over this low-dimensional latent space some locality-sensitive hashing algorithms have been proposed to perform document similarity search. However, thematic information gets hidden behind hash codes, preventing thematic exploration and limiting the explanatory capability of topics to justify content-based similarities. This paper presents a novel hashing algorithm based on approximate nearest-neighbor techniques that uses hierarchical sets of topics as hash codes. It not only performs efficient similarity searches, but also allows extending those queries with thematic restrictions explaining the similarity score from the most relevant topics. Extensive evaluations on both scientific and industrial text datasets validate the proposed algorithm in terms of accuracy and efficiency.

APA, Harvard, Vancouver, ISO, and other styles

33

Potaraev, V. V., and L. V. Serebryanaya. "Automatic generation of semantic network for question answering." Doklady BGUIR 18, no. 4 (2020): 44–52. http://dx.doi.org/10.35596/1729-7648-2020-18-4-44-52.

Full text

Abstract:

Semantic network model for representing data and knowledge was analysed. Selection of this model for working with text information was justified. The objective of automatic semantic network generation based on an arbitrary Russian-language text was formulated. Initial data, conditions and constraints necessary for network generation algorithm are listed. As a result of the part-of-speech analysis for each word and word order in a sentence, semantic relations between words are determined. The Lexeme dictionary was created to determine the part of speech of words in sentences. A set of question types used in the semantic network was selected. The number of relations in the network is regulated due to the possibility to use only necessary relation types when resolving a specific task. With that, the relations in semantic network can have very different types, which makes it a universal model for representing data and knowledge. The algorithm was developed which allows one to get answers for the questions asked. The semantic network model was generated automatically for the sentences considered. In the proposed algorithm the semantic network is interpreted as unoriented graph on which breadth-first search algorithm is used to find an answer. The proposed algorithms were implemented in a software tool which automatically generates the semantic network for an arbitrary text. The created software tool allows asking questions and getting answers to them based on the information which is stored in the semantic network. The experiments have shown that the generated semantic network gives correct answers to the questions posed. The network is modified by adding and removing information in it. There is a possibility to choose complexity of network structure depending on a specific task being resolved. The proposed approach for building and working with the semantic network allows one to process texts in various languages, to use it in information systems with natural-language interface, and to resolve such tasks as text classification and text search.

APA, Harvard, Vancouver, ISO, and other styles

34

Abdurrosyiid amrullah and Indra Gita Anugrah. "Implementation of Weighted Tree Similarity and Cosine Sorensen-Dice Algorithms for Semantic Search in Document Repository Information System." Journal of Development Research 5, no. 1 (2021): 21–27. http://dx.doi.org/10.28926/jdr.v5i1.143.

Full text

Abstract:

As more and more documents we manage, the more difficult it is in the search process, and the need to use information retrieval becomes important. With the information retrieval system, it can help in searching for documents that match the similarity of keywords. Usually document searches usually only see the name of the document (file) being searched for by the user without paying attention to the content or metadata of the document, so that it cannot meet their information needs. Document search has several approaches, including full-text search, plain metadata search and semantic search. This study uses the Weighted Tree Similarity algorithm with the Cosine Sorensen Dice algorithm to calculate the semantic search similarity. In this study, document metadata is represented in the form of a tree that has labeled nodes, labeled branches and weighted branches. The similarity calculation on the subtree edge label uses Cosine Sorensen Dice, while the total similarity of a document uses the weighted tree similarity. The metadata structure of the document uses the taxonomy owner, description, title, disposition content and type. The result of this research is a document search application with taxonomic weight on file storage.

APA, Harvard, Vancouver, ISO, and other styles

35

Li, Pu, Yuncheng Jiang, Ju Wang, and Zhilei Yin. "Semantic Extension of Query for the Linked Data." International Journal on Semantic Web and Information Systems 13, no. 4 (2017): 109–33. http://dx.doi.org/10.4018/ijswis.2017100106.

Full text

Abstract:

With the advent of Big Data Era, users prefer to get knowledge rather than pages from Web. Linked Data, a new form of knowledge representation and publishing described by RDF, can provide a more precise and comprehensible semantic structure to satisfy the aforementioned requirement. Further, the SPARQL query language for RDF is the foundation of many current researches about Linked Data querying. However, these SPARQL-based methods cannot fully express the semantics of the query, so they cannot unleash the potential of Linked Data. To fill this gap, this paper designs a new querying method which extends the SPARQL pattern. Firstly, the authors present some new semantic properties for predicates in RDF triples and design a Semantic Matrix for Predicates (SMP). They then establish a well-defined framework for the notion of Semantically-Extended Query Model for the Linked Data (SEQMLD). Moreover, the authors propose some novel algorithms for executing queries by integrating semantic extension into SPARQL pattern. Lastly, experimental results show that the authors' proposal has a good generality and performs better than some of the most representative similarity search methods.

APA, Harvard, Vancouver, ISO, and other styles

36

Wieser, Christoph, François Bry, Alexandre Bérard, and Richard Lagrange. "ARTigo: Building an Artwork Search Engine With Games and Higher-Order Latent Semantic Analysis." Proceedings of the AAAI Conference on Human Computation and Crowdsourcing 1 (November 3, 2013): 15–20. http://dx.doi.org/10.1609/hcomp.v1i1.13060.

Full text

Abstract:

This article describes how a semantic search engine has been build from, and still is continuously improved by, a semantic analysis of the “footprints” left by players on the gaming Web platform ARTigo. The Web platform offers several Games With a Purpose (GWAPs) some of which have been specifically designed to collect the data needed for building the artwork search engine. ARTigo is a “tagging ecosystem” of games that cooperate so as to gather a wide range of information on artworks. The ARTigo ecosystem generates a folksonomy saved as 3rd-order tensor, that is a generalization of a matrix, the three orders or dimensions of which represent (1) who (2) tagged an (3) an artwork. The semantic search engine is build using a non-trivial generalization of the well-known, matrix-based, Latent Semantic Analysis (LSA) methods and algorithms. ARTigo is in service for five years and is subject to an active research constantly resulting in new developments, some of which are reported about for the first time in this article.

APA, Harvard, Vancouver, ISO, and other styles

37

Li, Kai, Guo-Jun Qi, Jun Ye, Tuoerhongjiang Yusuph, and Kien A. Hua. "Semantic Image Retrieval with Feature Space Rankings." International Journal of Semantic Computing 11, no. 02 (2017): 171–92. http://dx.doi.org/10.1142/s1793351x17400074.

Full text

Abstract:

Learning to hash is receiving increasing research attention due to its effectiveness in addressing the large-scale similarity search problem. Most of the existing hashing algorithms are focused on learning hash functions in the form of numeric quantization of some projected feature space. In this work, we propose a novel hash learning method that encodes features’ relative ordering instead of quantizing their numeric values in a set of low-dimensional ranking subspaces. We formulate the ranking-based hash learning problem as the optimization of a continuous probabilistic error function using softmax approximation and present an efficient learning algorithm to solve the problem. As a generalization of Winner-Take-All (WTA) hashing, the proposed algorithm naturally enjoys the numeric stability benefits of rank correlation measures while being optimized to achieve high precision with very compact code. Additionally, the proposed method can also be easily extended to nonlinear kernel spaces to discover ranking structures that can not be revealed in linear subspaces. We demonstrate through extensive experiments that the proposed method can achive competitive performances as compared to a number of state-of-the-art hashing methods.

APA, Harvard, Vancouver, ISO, and other styles

38

Rani Manukonda, Sumathi, Asst Prof Kmit, Narayanguda ., et al. "Efficient Document Clustering for Web Search Result." International Journal of Engineering & Technology 7, no. 3.3 (2018): 90. http://dx.doi.org/10.14419/ijet.v7i3.3.14494.

Full text

Abstract:

Clustering the document in data mining is one of the traditional approach in which the same documents that are more relevant are grouped together. Document clustering take part in achieving accuracy that retrieve information for systems that identifies the nearest neighbors of the document. Day to day the massive quantity of data is being generated and it is clustered. According to particular sequence to improve the cluster qualityeven though different clustering methods have been introduced, still many challenges exist for the improvement of document clustering. For web search purposea document in group is efficiently arranged for the result retrieval.The users accordingly search query in an organized way. Hierarchical clustering is attained by document clustering.To the greatest algorithms for groupingdo not concentrate on the semantic approach, hence resulting to the unsatisfactory output clustering. The involuntary approach of organizing documents of web like Google, Yahoo is often considered as a reference. A distinct method to identify the existing group of similar things in the previously organized documents and retrieves effective document classifier for new documents. In this paper the main concentration is on hierarchical clustering and k-means algorithms, hence prove that k-means and its variant are efficient than hierarchical clustering along with this by implementing greedy fast k-means algorithm (GFA) for cluster document in efficient way is considered.

APA, Harvard, Vancouver, ISO, and other styles

39

Jha, Radhika. "Xperia Search Engine." International Journal for Research in Applied Science and Engineering Technology 11, no. 5 (2023): 6331–40. http://dx.doi.org/10.22214/ijraset.2023.52996.

Full text

Abstract:

Abstract: This research paper delves into the innerworkings of search engines and introduces Xperia, a personalized search engine aimed at enhancing information retrieval. The paper explores the fundamentaltechnologies employed by search engines, tracing their evolution and growth over time. It presents the development and implementation of Xperia, highlighting its unique feature set that goes beyond traditional search engines by providing users with not only relevant resourcelinks but also extracted information from various web sources. The paper begins with an introduction, providing the background and motivation for the research, as well as outlining the research objectives and scope. It then delves into the fundamentals of search engines, discussing their components, crawling and indexing processes, ranking algorithms, and user interfaces. The evolution of search engine technologies is examined, from the early stages to the current advancements in semantic search, natural language processing, and the incorporation of machine learning and artificial intelligence.

APA, Harvard, Vancouver, ISO, and other styles

40

Researcher. "EVOLUTION AND FUTURE OF SEARCH: HOW AI IS TRANSFORMING INFORMATION RETRIEVAL." International Journal of Computer Engineering and Technology (IJCET) 15, no. 4 (2024): 107–17. https://doi.org/10.5281/zenodo.13134112.

Full text

Abstract:

This article examines the transformative impact of artificial intelligence on search engines, enhancing query processing and information retrieval. It addresses the limitations of traditional keyword-based algorithms. It traces the evolution of search engines from early keyword-based models to the integration of AI, enabling semantic understanding and context-aware search. The article delves into crucial AI techniques like Natural Language Processing, deep learning, and reinforcement learning, highlighting their impact on query processing and retrieval accuracy. It further explores how AI facilitates semantic search, leverages knowledge graphs, and enables personalized search results. Real-world applications are illustrated through examples like Google's BERT model and AI-driven enhancements in e-commerce. Finally, the article addresses challenges such as data privacy, bias in AI models, and computational demands while exploring future directions like multimodal search, explainable AI, and continual learning. Ultimately, the article underscores the profound impact of AI in shaping the future of search engines and their crucial role in navigating the digital age.

APA, Harvard, Vancouver, ISO, and other styles

41

Deng, Hui, Kejie Fu, Binglin Yu, et al. "Enabling High-Level Worker-Centric Semantic Understanding of Onsite Images Using Visual Language Models with Attention Mechanism and Beam Search Strategy." Buildings 15, no. 6 (2025): 959. https://doi.org/10.3390/buildings15060959.

Full text

Abstract:

Visual information is becoming increasingly essential in construction management. However, a significant portion of this information remains underutilized by construction managers due to the limitations of existing image processing algorithms. These algorithms primarily rely on low-level visual features and struggle to capture high-order semantic information, leading to a gap between computer-generated image semantics and human interpretation. However, current research lacks a comprehensive justification for the necessity of employing scene understanding algorithms to address this issue. Moreover, the absence of large-scale, high-quality open-source datasets remains a major obstacle, hindering further research progress and algorithmic optimization in this field. To address this issue, this paper proposes a construction scene visual language model based on attention mechanism and encoder–decoder architecture, with the encoder built using ResNet101 and the decoder built using LSTM (long short-term memory). The addition of the attention mechanism and beam search strategy improves the model, making it more accurate and generalizable. To verify the effectiveness of the proposed method, a publicly available construction scene visual-language dataset containing 16 common construction scenes, SODA-ktsh, is built and verified. The experimental results demonstrate that the proposed model achieves a BLEU-4 score of 0.7464, a CIDEr score of 5.0255, and a ROUGE_L score of 0.8106 on the validation set. These results indicate that the model effectively captures and accurately describes the complex semantic information present in construction images. Moreover, the model exhibits strong generalization, perceptual, and recognition capabilities, making it well suited for interpreting and analyzing intricate construction scenes.

APA, Harvard, Vancouver, ISO, and other styles

42

Wang, Fan, Afeng Wang, Minghao Pan, et al. "Recognizing Large‐Scale AIGC on Search Engine Websites Based on Knowledge Integration and Feature Pyramid Network." Proceedings of the Association for Information Science and Technology 61, no. 1 (2024): 679–84. http://dx.doi.org/10.1002/pra2.1079.

Full text

Abstract:

ABSTRACTThe proliferation of Artificial Intelligence Generated Content (AIGC) poses significant challenges to user experience and information accuracy, especially on search engine websites(Guo et al., 2023). The current solution is to identify AIGC by machine learning algorithms or publicly available AI detection tools, whereas, machine learning(Wang & Wang, 2022) algorithms degrade in accuracy as more data is available and tools such as GPTZero perform poorly in the task of AIGC detection on social media. In this paper, we propose an EPCNN model to identify AIGC on search engine websites, which maintains good performance in large‐scale samples. The ERNIE model integrates cross‐domain knowledge and improves language understanding and generalization. We use ERNIE to extract text features, then use a feature pyramid network to capture semantic information at different levels, and finally use an end‐to‐end structure to connect ERNIE and the feature pyramid network to construct the EPCNN. Experimental results show that our proposed algorithm has high accuracy and the ability to handle large‐scale data compared with machine learning algorithms and AI detection tools.

APA, Harvard, Vancouver, ISO, and other styles

43

Rogushina, J. V. "A three-dimensional model of semantic search: queries, resources, and results." PROBLEMS IN PROGRAMMING, no. 4 (December 2023): 39–55. http://dx.doi.org/10.15407/pp2023.04.039.

Full text

Abstract:

We propose three-dimensional model of semantic search that analyzes search requests, information resources (IRs) and search results. This model is proposed as an additional tool for describing and comparing information retrieval systems (IRSs) that use various elements of artificial intelligence and knowledge management for more effective and relevant satisfaction of user information needs. In this work we analyze existing approaches to the semanticization of search queries and the use of external knowledge sources for retrieval process. The values of parameters analyzed by this model are not mutually exclusive, that is, the same IRS can support several search options. More over, the representation means of queries and resources are not always comparable. The model makes it possible to identify IRSs with intersected triads «request-IR-result» and to perform their comparison precisely on these subclasses of search problems. This approach allows to select search algorithms that are more pertinent for specific user tasks and to choose on base of this selection appropriate retrieval services that provide information for further processing. An important feature of the proposed model is that it uses only those IRS characteristics that can be directly evaluated by retrieval users.

APA, Harvard, Vancouver, ISO, and other styles

44

Chilingaryan, Kamo Pavelovich, and Lyudmila Stanislavovna Sorokina. "In search of an optimal method for analyzing deep structures: frame semantics and classification of argumentative structures." Филология: научные исследования, no. 3 (March 2024): 137–54. http://dx.doi.org/10.7256/2454-0749.2024.3.70155.

Full text

Abstract:

The subject of the research is the search for the optimal method of analyzing deep structures using frame semantics. The study of semantic roles, similarities and differences in the approaches of both C. Fillmore and B. Levin – M. R. Hovav make it possible to analyze the structure of a sentence in more detail and accurately, identify deep cases and determine semantic relations between words. The study of these aspects is key to understanding language constructs and their interpretation. The study of various approaches makes it possible to identify both common features and unique features, which is key for a complete understanding of language constructions. An interest in text analysis in the field of artificial intelligence, machine learning and computational linguistics, and an understanding of the semantic relationships between words will help create more accurate and efficient text processing algorithms. One of the research methods is the semantic analysis of sentences based on corpus data. This method includes the study of various linguistic constructions in the context of their use in real texts, which allows us to identify common patterns and rules for the use of these linguistic units in different situations. The scientific novelty of the study lies in the fact that the authors have determined the similarity of the approaches on how to understand the surface and deep structures of language of Ch. Fillmore and B. Levin and M. Rappaport. Their work, despite differences in methodology and terminology, together allow for in-depth investigation of the relationship between the meanings of verbs and the structure of arguments. As a result of the study, the natural relationships between deep cases and semantic roles in sentences of various types are revealed, and key points that need to be taken into account when analyzing deep structures for a more accurate definition of the semantic roles of arguments are highlighted: frame semantics and thematic grids. Disagreements and alternative points of view contribute to the constant development and improvement of linguistic theories. Such debates eventually lead to a deeper understanding of the implementation of the arguments and open up opportunities for further research in this area. Both C. Fillmore and B. Levin and M. Rappaport have made significant contributions to understanding the surface and deep structures of language, although their approaches and terminology may differ.

APA, Harvard, Vancouver, ISO, and other styles

45

Bocharova, Maiia Y., Eugene V. Malakhov, and Vitaliy I. Mezhuyev. "VacancySBERT: the approach for representation of titles and skills for semantic similarity search in the recruitment domain." Applied Aspects of Information Technology 6, no. 1 (2023): 52–59. http://dx.doi.org/10.15276/aait.06.2023.4.

Full text

Abstract:

The paper focuses on deep learning semantic search algorithms applied in the HR domain. The aim of the article is developing a novel approach to training a Siamese network to link the skills mentioned in the job ad with the title. It has been shown that the title normalization process can be based either on classification or similarity comparison approaches. While classification algorithms strive to classify a sample into predefined set of categories, similarity search algorithms take a more flexible approach, since they are designed to find samples that are similar to a given query sample, without requiring pre-defined classes and labels. In this article semantic similarity search to find candidates for title normalization has been used. A pre-trained language model has been adapted while teaching it to match titles and skills based on co-occurrence information. For the purpose of this research fifty billion titledescriptions pairs had been collected for training the model and thirty three thousand title-description-normalized title triplets, where normalized job title was picked up manually by job ad creator for testing purposes. As baselines FastText, BERT, SentenceBert and JobBert have been used. As a metric of the accuracy of the designed algorithm is Recall in top one, five and ten model’s suggestions. It has been shown that the novel training objective lets it achieve significant improvement in comparison to other generic and specific text encoders. Two settings with treating titles as standalone strings, and with included skills as additional features during inference have been used and the results have been compared in this article. Improvements by 10 % and 21.5 % have been achieved using VacancySBERT and VacancySBERT (with skills) respectively. The benchmark has been developed as open-source to foster further research in the area

APA, Harvard, Vancouver, ISO, and other styles

46

Ishak Boushaki, Saida, Nadjet Kamel, and Omar Bendjeghaba. "High-Dimensional Text Datasets Clustering Algorithm Based on Cuckoo Search and Latent Semantic Indexing." Journal of Information & Knowledge Management 17, no. 03 (2018): 1850033. http://dx.doi.org/10.1142/s0219649218500338.

Full text

Abstract:

The clustering is an important data analysis technique. However, clustering high-dimensional data like documents needs more effort in order to extract the richness relevant information hidden in the multidimensionality space. Recently, document clustering algorithms based on metaheuristics have demonstrated their efficiency to explore the search area and to achieve the global best solution rather than the local one. However, most of these algorithms are not practical and suffer from some limitations, including the requirement of the knowledge of the number of clusters in advance, they are neither incremental nor extensible and the documents are indexed by high-dimensional and sparse matrix. In order to overcome these limitations, we propose in this paper, a new dynamic and incremental approach (CS_LSI) for document clustering based on the recent cuckoo search (CS) optimization and latent semantic indexing (LSI). Conducted Experiments on four well-known high-dimensional text datasets show the efficiency of LSI model to reduce the dimensionality space with more precision and less computational time. Also, the proposed CS_LSI determines the number of clusters automatically by employing a new proposed index, focused on significant distance measure. This later is also used in the incremental mode and to detect the outlier documents by maintaining a more coherent clusters. Furthermore, comparison with conventional document clustering algorithms shows the superiority of CS_LSI to achieve a high quality of clustering.

APA, Harvard, Vancouver, ISO, and other styles

47

Bedkowski, Janusz, Timo Röhling, Frank Hoeller, Dirk Shulz, and Frank E. Schneider. "Benchmark of 6D SLAM (6D Simultaneous Localisation and Mapping) Algorithms with Robotic Mobile Mapping Systems." Foundations of Computing and Decision Sciences 42, no. 3 (2017): 275–95. http://dx.doi.org/10.1515/fcds-2017-0014.

Full text

Abstract:

AbstractThis work concerns the study of 6DSLAM algorithms with an application of robotic mobile mapping systems. The architecture of the 6DSLAM algorithm is designed for evaluation of different data registration strategies. The algorithm is composed of the iterative registration component, thus ICP (Iterative Closest Point), ICP (point to projection), ICP with semantic discrimination of points, LS3D (Least Square Surface Matching), NDT (Normal Distribution Transform) can be chosen. Loop closing is based on LUM and LS3D. The main research goal was to investigate the semantic discrimination of measured points that improve the accuracy of final map especially in demanding scenarios such as multi-level maps (e.g., climbing stairs). The parallel programming based nearest neighborhood search implementation such as point to point, point to projection, semantic discrimination of points is used. The 6DSLAM framework is based on modified 3DTK and PCL open source libraries and parallel programming techniques using NVIDIA CUDA. The paper shows experiments that are demonstrating advantages of proposed approach in relation to practical applications. The major added value of presented research is the qualitative and quantitative evaluation based on realistic scenarios including ground truth data obtained by geodetic survey. The research novelty looking from mobile robotics is the evaluation of LS3D algorithm well known in geodesy.

APA, Harvard, Vancouver, ISO, and other styles

48

S, Subi, Shanthini B, SilpaRaj M, Shekar K, Keerthana G, and Anitha R. "Natural Language Processing Techniques for Information Retrieval Enhancing Search Engines with Semantic Understanding." ITM Web of Conferences 76 (2025): 05013. https://doi.org/10.1051/itmconf/20257605013.

Full text

Abstract:

This paper investigates new Natural Language Processing (NLP) methods which seek to improve information retrieval systems via semantic knowledge and focuses on enhancing search engines. The proposed ideas focus on reducing the size of the model (one of the biggest problems with large models), training it on domain-specific knowledge (the right knowledge is important for the real application) and ways to efficiently deal with unstructured data (this is also a key issue against NLP frameworks). The study highlights the need for hybrid models that combine generalization and specificity, fast algorithms for big data sets, and automated knowledge extraction. They include cross-lingual approaches, rapid learning in out-of-distribution domains, and human-centered design of AI systems. The end objective of this work is to create a semantic search engine which is adaptive, scalable and flexible; intent aware, and query ambiguity tolerant; improving semantic richness in results tailored to datasets of varying size; hence promising complementary applications of Natural Language Processing to information retrieval.

APA, Harvard, Vancouver, ISO, and other styles

49

Sytovа, S. N., V. V. Haurylavets, A. P. Dunets, A. N. Kavalenka, and S. V. Charapitsa. "Basics of the semantic portal of nuclear knowledge BelNET functioning." Informatics 21, no. 2 (2024): 7–23. http://dx.doi.org/10.37661/1816-0301-2024-21-2-7-23.

Full text

Abstract:

Object ives. The possibility of using semantic technologies for the development and improvement of the content management system of the scientific and educational portal eLab-Science and the Belarusian nuclear knowledge portal BelNET (Belarusian Nuclear Education and Training Portal, https://belnet.by/) created on its basis is being considered.Methods. Original algorithms for automatic systematization have been developed, such as placing content records in the portal taxonomy based on semantic technologies and generating a list of keywords. The following concepts of semantic technologies are used: taxonomy (hierarchical structure of the portal), thesaurus, glossary. Resul t s. The developed algorithms were implemented and tested using a full-text search tool and the original Belarusian glossary on nuclear and radiation safety.Conclusion. The described basic principles of organization and algorithms based on semantic technologies, which underlie the functioning of the content management system of the scientific and educational portal eLab-Science and the Belarusian nuclear knowledge portal BelNET, created on its basis, make it possible to effectively implement the placement of content records in the portal taxonomy, as well as automatically generate a set of keywords for the resource being created

APA, Harvard, Vancouver, ISO, and other styles

50

Kulkarni, D. M., and Swapnaja S. Kulkarni. "Measure term similarity using a semantic network approach." BOHR International Journal of Computer Science 1, no. 1 (2022): 01–05. http://dx.doi.org/10.54646/bijcs.2022.01.

Full text

Abstract:

Computing semantic similarity between two words comes with a variety of approaches. This is mainly essential for applications such as text analysis and text understanding. In traditional systems, search engines are used to compute the similarity between words. In that sense, search engines are keyword based. There is one drawback that users should know what exactly they are looking for. There are mainly two main approaches for computation, namely knowledge-based and corpus-based approaches. However, there is one drawback that these two approaches are not suitable for computing similarity between multiword expressions. This system provides an efficient and effective approach for computing term similarity using a semantic network approach. A clustering approach is used in order to improve the accuracy of the semantic similarity. This approach is more efficient than other computing algorithms. This technique can also be applied to large-scale datasets to compute term similarity.

APA, Harvard, Vancouver, ISO, and other styles

We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!