Log in

Relevant bibliographies by topics / Improvement of Search Queries / Journal articles

To see the other types of publications on this topic, follow the link: Improvement of Search Queries.

Journal articles on the topic 'Improvement of Search Queries'

Author: Grafiati

Published: 7 June 2025

Last updated: 16 July 2025

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 journal articles for your research on the topic 'Improvement of Search Queries.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse journal articles on a wide variety of disciplines and organise your bibliography correctly.

1

Mohamadi, Fakhrosadat. "Optimized Expansion of Search Queries Using ERIC Online Thesaurus and its Impact on Precision of Outcome and User's Search Time." Information Science and Technology 24, no. 4 (2009): 29–52. https://doi.org/10.5281/zenodo.13963162.

Full text

Abstract:

The development of search queries aims to improve relevant results or obtain suitable resources for current or future research, and it often occurs when users are dissatisfied with their search results or have not obtained sufficient results. Language control tools, particularly continuous thesauri, are considered resources for developing search queries. Accordingly, this article attempts to empirically demonstrate the impact of using terms from the ERIC thesaurus on the optimal development of search queries.  For implementation, the "Initial Assessment of Internet Search Skills" questionnaire, the Track4Win software, and the ERIC thesaurus (associated with the ERIC database) were used as data collection tools. A comparison of the pre-test and post-test search stages between the experimental and control groups showed that the use of the ERIC continuous thesaurus plays a significant role in improving queries and search time, and the use of broader, narrower, and related terms from the thesaurus alters retrieval accuracy. Additionally, the use of broader and related terms from the thesaurus increases search time.

APA, Harvard, Vancouver, ISO, and other styles

2

Shishir Biyyala, Sai Charan Tokachichu, and Sudheer Chennuri. "AI-Powered Search Systems : Integrating Machine Learning with Search Technology for High-Scalability Applications." International Journal of Scientific Research in Computer Science, Engineering and Information Technology 10, no. 6 (2024): 74–81. http://dx.doi.org/10.32628/cseit24106155.

Full text

Abstract:

This article comprehensively analyzes integrating Artificial Intelligence (AI) and machine learning techniques into high-scalability search systems. We explore AI-powered search's theoretical foundations and practical implementations, focusing on advanced ranking algorithms, natural language processing for query understanding, and optimized distributed architectures. We demonstrate significant improvements in search relevance and efficiency through experiments conducted on a large-scale dataset comprising 100 million web pages and 1 million real-world queries. Our AI-powered system showed a 15% increase in Normalized Discounted Cumulative Gain (NDCG) for complex queries and a 12% improvement in Mean Reciprocal Rank (MRR) for navigational queries compared to traditional keyword-based approaches. We also address critical challenges in maintaining system scalability and performance, including data synchronization, real-time model updates, and resource management in distributed environments. The article further discusses emerging trends, such as graph neural networks and multimodal search capabilities, alongside ethical considerations and data privacy concerns. Our findings provide valuable insights for researchers and practitioners aiming to develop next-generation search platforms capable of handling the increasing complexity and volume of digital information while ensuring responsible AI integration.

APA, Harvard, Vancouver, ISO, and other styles

3

Baştan, Muhammet, and Özgür Yılmaz. "Multi-View Product Image Search with Deep ConvNets Representations." International Journal on Artificial Intelligence Tools 27, no. 08 (2018): 1850032. http://dx.doi.org/10.1142/s021821301850032x.

Full text

Abstract:

Multi-view product image queries can improve retrieval performance over single view queries significantly. In this paper, we investigated the performance of deep convolutional neural networks (ConvNets) on multi-view product image search. First, we trained a VGG-like network to learn deep ConvNets representations of product images. Then, we computed the deep ConvNets representations of database and query images and performed single view queries, and multi-view queries using several early and late fusion approaches. We performed extensive experiments on the publicly available Multi-View Object Image Dataset (MVOD 5K) with both clean background queries from the internet and cluttered background queries from a mobile phone. We compared the performance of ConvNets to the classical bag-of-visual-words (BoWs). We concluded that (1) multi-view queries with deep ConvNets representations perform significantly better than single view queries, (2) ConvNets perform much better than BoWs and have room for further improvement, (3) pre-training of ConvNets on a different image dataset with background clutter is needed to obtain good performance on cluttered product image queries obtained with a mobile phone.

APA, Harvard, Vancouver, ISO, and other styles

4

Takama, Yasufumi, Takuya Tezuka, Hiroki Shibata, and Lieu-Hen Chen. "Estimation of Search Intents from Query to Context Search Engine." Journal of Advanced Computational Intelligence and Intelligent Informatics 24, no. 3 (2020): 316–25. http://dx.doi.org/10.20965/jaciii.2020.p0316.

Full text

Abstract:

This paper estimates users’ search intents when using the context search engine (CSE) by analyzing submitted queries. Recently, due to the increase in the amount of information on the Web and the diversification of information needs, the gap between user’s information needs and a basic search function provided by existing web search engines becomes larger. As a solution to this problem, the CSE that limits its tasks to answer questions about temporal trends has been proposed. It provides three primitive search functions, which users can use in accordance with their purposes. Furthermore, if the system can estimate users’ search intents, it can provide more user-friendly services that contribute the improvement of search efficiency. Aiming at estimating users’ search intents only from submitted queries, this paper analyzes the characteristics of queries in terms of typical search intents when using CSE, and defines classification rules. To show the potential use of the estimated search intents, this paper introduces a learning to rank into CSE. Experimental results show that MAP (mean average precision) is improved by learning rank models separately for different search intents.

APA, Harvard, Vancouver, ISO, and other styles

5

Bondarenko, Yulia, Solomiya Ohinok, Artur Kisiołek, and Oleh Karyy. "Interest in universities based on search queries on the Internet." Innovative Marketing 17, no. 3 (2021): 179–90. http://dx.doi.org/10.21511/im.17(3).2021.15.

Full text

Abstract:

The improvement of global Internet access and the COVID-19 pandemic, which necessitated mass testing of online teaching methods, have forwarded the competition between higher education institutions from the regional level and the struggle for the rich student into the competition for students in all countries. The paper aims to determine the influence of the rating of higher education institutions on the interest of Internet users by conducting a comparative analysis of the popularity of the official names of higher education institutions in search queries in Ukraine and Poland. To do this, a comparative analysis of the change in the interest in leading higher education institutions in Ukraine and Poland in search queries in the Google search engine is carried out. The analysis is performed using the Google Trends web application. As a result, it is found that a high position of the university in the national ranking does not guarantee more search queries about it on the Internet by both national Internet users and users from the neighboring country. In general, Internet users continue to be most interested in universities located in their region at the time of the search.

APA, Harvard, Vancouver, ISO, and other styles

6

Zhao, Qiwen, Zhongwen Zhou, and Yibang Liu. "PALM: Personalized Attention-based Language Model for Long-tail Query Understanding in Enterprise Search Systems." Journal of AI-Powered Medical Innovations (International online ISSN 3078-1930) 2, no. 1 (2024): 44–59. https://doi.org/10.60087/vol2iisue1.p009.

Full text

Abstract:

Enterprise search systems face significant challenges in handling long-tail queries, which constitute a substantial portion of search traffic but often receive inadequate attention in traditional systems. This paper introduces PALM (Personalized Attention-based Language Model), a novel framework designed to enhance long-tail query understanding in enterprise search environments. PALM integrates personalization capabilities with an advanced attention mechanism to improve search accuracy for infrequent queries while maintaining high performance on common queries. The framework employs a unique hierarchical architecture that combines user context, query semantics, and organizational knowledge through a sophisticated attention mechanism. The system features an innovative query embedding approach that adapts to individual user contexts while leveraging collective organizational knowledge. Extensive experiments on a large-scale enterprise dataset, comprising over 5 million queries from 50,000 users, demonstrate PALM's superior performance compared to state-of-the-art baselines. The results show significant improvements across multiple metrics, with a 17.5% increase in MAP for ultra-rare queries and a 10.4% overall improvement in NDCG@10. The framework exhibits robust performance across different organizational units and query types, making it particularly valuable for enterprise environments where query patterns are highly diverse and context-dependent. Our ablation studies confirm the effectiveness of each component in the PALM architecture, while case analyses provide insights into the framework's practical applications.

APA, Harvard, Vancouver, ISO, and other styles

7

Corral, Karen, David Schuff, Gregory Schymik, and Robert St. Louis. "Strategies for Document Management." International Journal of Business Intelligence Research 1, no. 1 (2010): 64–83. http://dx.doi.org/10.4018/jbir.2010071705.

Full text

Abstract:

Keyword search has failed to adequately meet the needs of enterprise users. This is largely due to the size of document stores, the distribution of word frequencies, and the indeterminate nature of languages. The authors argue a different approach needs to be taken, and draw on the successes of dimensional data modeling and subject indexing to propose a solution. They test our solution by performing search queries on a large research database. By incorporating readily available subject indexes into the search process, they obtain order of magnitude improvements in the performance of search queries. Their performance measure is the ratio of the number of documents returned without using subject indexes to the number of documents returned when subject indexes are used. The authors explain why the observed tenfold improvement in search performance on our research database can be expected to occur for searches on a wide variety of enterprise document stores.

APA, Harvard, Vancouver, ISO, and other styles

8

Mohoney, Jason, Anil Pacaci, Shihabur Rahman Chowdhury, et al. "High-Throughput Vector Similarity Search in Knowledge Graphs." Proceedings of the ACM on Management of Data 1, no. 2 (2023): 1–25. http://dx.doi.org/10.1145/3589777.

Full text

Abstract:

There is an increasing adoption of machine learning for encoding data into vectors to serve online recommendation and search use cases. As a result, recent data management systems propose augmenting query processing with online vector similarity search. In this work, we explore vector similarity search in the context of Knowledge Graphs (KGs). Motivated by the tasks of finding related KG queries and entities for past KG query workloads, we focus on hybrid vector similarity search (hybrid queries for short) where part of the query corresponds to vector similarity search and part of the query corresponds to predicates over relational attributes associated with the underlying data vectors. For example, given past KG queries for a song entity, we want to construct new queries for new song entities whose vector representations are close to the vector representation of the entity in the past KG query. But entities in a KG also have non-vector attributes such as a song associated with an artist, a genre, and a release date. Therefore, suggested entities must also satisfy query predicates over non-vector attributes beyond a vector-based similarity predicate. While these tasks are central to KGs, our contributions are generally applicable to hybrid queries. In contrast to prior works that optimize online queries, we focus on enabling efficient batch processing of past hybrid query workloads. We present our system, HQI, for high-throughput batch processing of hybrid queries. We introduce a workload-aware vector data partitioning scheme to tailor the vector index layout to the given workload and describe a multi-query optimization technique to reduce the overhead of vector similarity computations. We evaluate our methods on industrial workloads and demonstrate that HQI yields a 31× improvement in throughput for finding related KG queries compared to existing hybrid query processing approaches.

APA, Harvard, Vancouver, ISO, and other styles

9

Chawla, Suruchi. "Application of Genetic Algorithm and Back Propagation Neural Network for Effective Personalize Web Search-Based on Clustered Query Sessions." International Journal of Applied Evolutionary Computation 7, no. 1 (2016): 33–49. http://dx.doi.org/10.4018/ijaec.2016010103.

Full text

Abstract:

In this paper novel method is proposed using hybrid of Genetic Algorithm (GA) and Back Propagation (BP) Artificial Neural Network (ANN) for learning of classification of user queries to cluster for effective Personalized Web Search. The GA- BP ANN has been trained offline for classification of input queries and user query session profiles to a specific cluster based on clustered web query sessions. Thus during online web search, trained GA –BP ANN is used for classification of new user queries to a cluster and the selected cluster is used for web page recommendations. This process of classification and recommendations continues till search is effectively personalized to the information need of the user. Experiment was conducted on the data set of web user query sessions to evaluate the effectiveness of Personalized Web Search using GA optimized BP ANN and the results confirm the improvement in the precision of search results.

APA, Harvard, Vancouver, ISO, and other styles

10

Liu, Han, Jiaqing Zhan, and Qin Zhang. "Uncertainty-Aware Contrastive Learning with Hard Negative Sampling for Code Search Tasks." Proceedings of the AAAI Conference on Artificial Intelligence 39, no. 18 (2025): 18807–15. https://doi.org/10.1609/aaai.v39i18.34070.

Full text

Abstract:

Code search is a highly required technique for software development. In recent years, the rapid development of transformer-based language models has made it increasingly more popular to adapt a pre-trained language model to a code search task, where contrastive learning is typically adopted to semantically align user queries and codes in an embedding space. Considering that the same semantic meaning can be presented using diverse language styles in user queries and codes, the representation of queries and codes in an embedding space may thus be non-deterministic. To address the above-specified point, this paper proposes an uncertainty-aware contrastive learning approach for code search. Specifically, for both queries and codes, we design an uncertainty learning strategy to produce diverse embeddings by learning to transform the original inputs into Gaussian distributions and then taking a reparameterization trick. We also design a hard negative sampling strategy to construct query-code pairs for improving the effectiveness of uncertainty-aware contrastive learning. The experimental results indicate that our approach outperforms 10 baseline methods on a large code search dataset with six programming languages. The results also show that our strategies of uncertainty learning and hard negative sampling can really help enhance the representation of queries and codes leading to an improvement of the code search performance.

APA, Harvard, Vancouver, ISO, and other styles

11

Wildannissa Pinasti and Lya Hulliyyatus Suadaa. "Named Entity Recognition in Statistical Dataset Search Queries." Jurnal Nasional Teknik Elektro dan Teknologi Informasi 13, no. 3 (2024): 171–77. http://dx.doi.org/10.22146/jnteti.v13i3.11580.

Full text

Abstract:

Search engines must understand user queries to provide relevant search results. Search engines can enhance their understanding of user intent by employing named entity recognition (NER) to identify the entity in the query. Knowing the types of entities in the query can be the initial step in helping search engines better understand search intent. In this research, a dataset was constructed using search query history from the Statistics Indonesia (Badan Pusat Statistik, BPS) website, and NER in query modeling was employed to extract entities from search queries related to statistical datasets. The research stages included query data collection, query data preprocessing, query data labeling, NER in query modeling, and model evaluation. The conditional random field (CRF) model was employed for NER in query modeling with two scenarios: CRF with basic features and CRF with basic features plus part of speech (POS) features. The CRF model was used due to its well-known effectiveness in natural language processing (NLP), particularly for tasks like NER with sequence labeling. In this research, the basic CRF and the CRF model with POS feature achieved an F1-score of 0.9139 and 0.9110, respectively. A case study on a Linked Open Data (LOD) statistical dataset indicated that searches with synonym query expansion on entities from NER in query produced better search results than regular searches without query expansion. The model's performance incorporating additional POS tagging features did not result in a significant improvement. Therefore, it is recommended that future research will elaborate on deep learning.

APA, Harvard, Vancouver, ISO, and other styles

12

Mohammed, Hussein, and Ayad Abdulsada. "Secure Multi-keyword Similarity Search Over Encrypted Data With Security Improvement." Iraqi Journal for Electrical and Electronic Engineering 17, no. 2 (2021): 1–10. http://dx.doi.org/10.37917/ijeee.17.2.1.

Full text

Abstract:

Searchable encryption (SE) is an interesting tool that enables clients to outsource their encrypted data into external cloud servers with unlimited storage and computing power and gives them the ability to search their data without decryption. The current solutions of SE support single-keyword search making them impractical in real-world scenarios. In this paper, we design and implement a multi-keyword similarity search scheme over encrypted data by using locality-sensitive hashing functions and Bloom filter. The proposed scheme can recover common spelling mistakes and enjoys enhanced security properties such as hiding the access and search patterns but with costly latency. To support similarity search, we utilize an efficient bi-gram-based method for keyword transformation. Such a method improves the search results accuracy. Our scheme employs two non-colluding servers to break the correlation between search queries and search results. Experiments using real-world data illustrate that our scheme is practically efficient, secure, and retains high accuracy.

APA, Harvard, Vancouver, ISO, and other styles

13

Chao, Zoe. "From query analysis to user information needs: a study of campus map searches." Library Hi Tech 34, no. 1 (2016): 104–29. http://dx.doi.org/10.1108/lht-12-2014-0110.

Full text

Abstract:

Purpose – Search engines and web applications have evolved to be more tailored toward individual user’s needs, including the individual’s personal preferences and geographic location. By integrating the free Google Maps Application Program Interface with locally stored metadata, the author created an interactive map search for users to locate, and navigate to, destinations on the University of New Mexico (UNM) campus. The purpose of this paper is to identify the characteristics of UNM map search queries, the options and prioritization of the metadata augmentation, and the usefulness and possible improvement of the interface. Design/methodology/approach – Queries, search date/time, and the number of results found were logged and examined. Queries’ search frequency and characteristics were analyzed and categorized. Findings – From November 1, 2012 to September 15, 2013, the author had a total 14,097 visits to the SearchUNM Maps page (http://search.unm.edu/maps/). There were total 5,868 searches (41 percent of all the page visits), and out of all the search instances, 2,297 of them (39 percent) did not retrieve any results. By analyzing the failed queries, the author was able to develop a strategy to increase successful searches. Originality/value – Many academic institutions have implemented interactive map searches for users to find locations and navigate on campus. However, to date there is no related research on how users conduct their searches in such a scope. Based on the query analysis, this paper identifies user’s search behavior and discusses the strategies of improving searches results of campus interactive maps.

APA, Harvard, Vancouver, ISO, and other styles

14

Madan, Kapil, and Rajesh K. Bhatia. "Ranked Deep Web Page Detection Using Reinforcement Learning and Query Optimization." International Journal on Semantic Web and Information Systems 17, no. 4 (2021): 99–121. http://dx.doi.org/10.4018/ijswis.2021100106.

Full text

Abstract:

This paper proposes a novel algorithm based on reinforcement learning-entitled asynchronous advantage actor-critic (A3C). Overflow queries are optimized to crawl the ranked deep web. A3C assigns the reward and penalty to the various queries. Queries are derived from the domain-based taxonomy that helps to fill the search forms. Overflow queries are the collection of queries that match with more than k number of results and only top k matched results are retrieved. Low ranked documents beyond k results are not accessible and lead to low coverage. Overflow queries are optimized to convert into non-overflow queries based on the proposed technique and lead to more coverage. As of yet, no research work has been explored by using A3C with taxonomy in the domain of ranked deep web. The experimental results show that the proposed technique outperforms the three other techniques (i.e., document frequency, random query, and high frequency) in terms of average improvement metric by 26%, 69%, and 92%, respectively.

APA, Harvard, Vancouver, ISO, and other styles

15

Pratibha Waghale. "Beam Search for Boolean Query Optimization: A Heuristic Approach for Patent Information Retrieval." Advances in Nonlinear Variational Inequalities 28, no. 6s (2025): 749–61. https://doi.org/10.52783/anvi.v28.4423.

Full text

Abstract:

A patent gives the inventors exclusive rights to their innovations, thus stimulating economic growth, technological advancement, and creativity. Valuable patent document retrieval is necessary for establishing novelty and patentability. Boolean search queries are built to perfectly fetch relevant patents but control complexity at the same time. The Brute Force method treats all keyword combinations and is complete but very expensive in terms of computation. Large-scale manual refining is needed for traditional Boolean methods, whereas AI-based approaches usually lack transparency. This work optimizes Boolean logic with beam search that enhances patent search queries by reducing the solution space to the best promising options. Precisely through the refining of terms in combination, Beam Search enhances precision and recall but does not lose its interpretability. Experimental results demonstrate a 0.94923 score in patent retrieval efficiency. Despite the difficulties such as noisy data and complexity in processing, the approach exhibits a consistent improvement in retrieval performance, thus being a potential solution for patent search optimization.

APA, Harvard, Vancouver, ISO, and other styles

16

Chenaina, Tarek, Sameh Neji, and Abdullah Shoeb. "Query Sense Discovery Approach to Realize the User's Search Intent." International Journal of Information Retrieval Research 12, no. 1 (2022): 1–18. http://dx.doi.org/10.4018/ijirr.289609.

Full text

Abstract:

The main goal of information retrieval is getting the most relevant documents to a user’s query. So, a search engine must not only understand the meaning of each keyword in the query but also their relative senses in the context of the query. Discovering the query meaning is a comprehensive and evolutionary process; the precise meaning of the query is established as developing the association between concepts. The meaning determination process is modeled by a dynamic system operating in the semantic space of WordNet. To capture the meaning of a user query, the original query is reformulating into candidate queries by combining the concepts and their synonyms. A semantic score characterizing the overall meaning of such queries is calculated, the one with the highest score was used to perform the search. The results confirm that the proposed "Query Sense Discovery" approach provides a significant improvement in several performance measures.

APA, Harvard, Vancouver, ISO, and other styles

17

Garba, Adamu, Shah Khalid, Irfan Ullah, Shah Khusro, and Diyawu Mumin. "Embedding based learning for collection selection in federated search." Data Technologies and Applications 54, no. 5 (2020): 703–17. http://dx.doi.org/10.1108/dta-01-2019-0005.

Full text

Abstract:

PurposeThere have been many challenges in crawling deep web by search engines due to their proprietary nature or dynamic content. Distributed Information Retrieval (DIR) tries to solve these problems by providing a unified searchable interface to these databases. Since a DIR must search across many databases, selecting a specific database to search against the user query is challenging. The challenge can be solved if the past queries of the users are considered in selecting collections to search in combination with word embedding techniques. Combining these would aid the best performing collection selection method to speed up retrieval performance of DIR solutions.Design/methodology/approachThe authors propose a collection selection model based on word embedding using Word2Vec approach that learns the similarity between the current and past queries. They used the cosine and transformed cosine similarity models in computing the similarities among queries. The experiment is conducted using three standard TREC testbeds created for federated search.FindingsThe results show significant improvements over the baseline models.Originality/valueAlthough the lexical matching models for collection selection using similarity based on past queries exist, to the best our knowledge, the proposed work is the first of its kind that uses word embedding for collection selection by learning from past queries.

APA, Harvard, Vancouver, ISO, and other styles

18

Pan, Yaoxin, Shangsong Liang, Jiaxin Ren, Zaiqiao Meng, and Qiang Zhang. "Personalized, Sequential, Attentive, Metric-Aware Product Search." ACM Transactions on Information Systems 40, no. 2 (2022): 1–29. http://dx.doi.org/10.1145/3473337.

Full text

Abstract:

The task of personalized product search aims at retrieving a ranked list of products given a user’s input query and his/her purchase history. To address this task, we propose the PSAM model, a Personalized, Sequential, Attentive and Metric-aware (PSAM) model, that learns the semantic representations of three different categories of entities, i.e., users, queries, and products, based on user sequential purchase historical data and the corresponding sequential queries. Specifically, a query-based attentive LSTM (QA-LSTM) model and an attention mechanism are designed to infer users dynamic embeddings, which is able to capture their short-term and long-term preferences. To obtain more fine-grained embeddings of the three categories of entities, a metric-aware objective is deployed in our model to force the inferred embeddings subject to the triangle inequality, which is a more realistic distance measurement for product search. Experiments conducted on four benchmark datasets show that our PSAM model significantly outperforms the state-of-the-art product search baselines in terms of effectiveness by up to 50.9% improvement under NDCG@20. Our visualization experiments further illustrate that the learned product embeddings are able to distinguish different types of products.

APA, Harvard, Vancouver, ISO, and other styles

19

RAKHSHAN, ARASH, LAWRENCE B. HOLDER, and DIANE J. COOK. "STRUCTURAL WEB SEARCH ENGINE." International Journal on Artificial Intelligence Tools 13, no. 01 (2004): 27–44. http://dx.doi.org/10.1142/s0218213004001405.

Full text

Abstract:

We present a new approach in web search engines. The web creates new challenges for information retrieval. The vast improvement in information access is not the only advantage resulting from the keyword search. Additionally, much potential exists for analyzing interests and relationships within the structure of the web. The creation of a hyperlink by the author of a web page explicitly represents a relationship between the source and destination pages which demonstrates the hyperlink structure between web pages. Our web search engine searches not only for the keywords in the web pages, but also for the hyperlink structure between them. Comparing the results of structural web search versus keyword-based search indicates an improved ability to access desired information. We also discuss steps toward mining the queries input to the structural web search engine.

APA, Harvard, Vancouver, ISO, and other styles

20

Phan, Cong-Phuoc, Hong-Quang Nguyen, and Tan-Tai Nguyen. "Ontology-based heuristic patent search." International Journal of Web Information Systems 15, no. 3 (2019): 258–84. http://dx.doi.org/10.1108/ijwis-06-2018-0053.

Full text

Abstract:

Purpose Large collections of patent documents disclosing novel, non-obvious technologies are publicly available and beneficial to academia and industries. To maximally exploit its potential, searching these patent documents has increasingly become an important topic. Although much research has processed a large size of collections, a few studies have attempted to integrate both patent classifications and specifications for analyzing user queries. Consequently, the queries are often insufficiently analyzed for improving the accuracy of search results. This paper aims to address such limitation by exploiting semantic relationships between patent contents and their classification. Design/methodology/approach The contributions are fourfold. First, the authors enhance similarity measurement between two short sentences and make it 20 per cent more accurate. Second, the Graph-embedded Tree ontology is enriched by integrating both patent documents and classification scheme. Third, the ontology does not rely on rule-based method or text matching; instead, an heuristic meaning comparison to extract semantic relationships between concepts is applied. Finally, the patent search approach uses the ontology effectively with the results sorted based on their most common order. Findings The experiment on searching for 600 patent documents in the field of Logistics brings better 15 per cent in terms of F-Measure when compared with traditional approaches. Research limitations/implications The research, however, still requires improvement in which the terms and phrases extracted by Noun and Noun phrases making less sense in some aspect and thus might not result in high accuracy. The large collection of extracted relationships could be further optimized for its conciseness. In addition, parallel processing such as Map-Reduce could be further used to improve the search processing performance. Practical implications The experimental results could be used for scientists and technologists to search for novel, non-obvious technologies in the patents. Social implications High quality of patent search results will reduce the patent infringement. Originality/value The proposed ontology is semantically enriched by integrating both patent documents and their classification. This ontology facilitates the analysis of the user queries for enhancing the accuracy of the patent search results.

APA, Harvard, Vancouver, ISO, and other styles

21

Simian, Dana, and Marin-Eusebiu Șerban. "Improving Search Query Accuracy for Specialized Websites Through Intelligent Text Correction and Reconstruction Models." Information 15, no. 11 (2024): 683. http://dx.doi.org/10.3390/info15110683.

Full text

Abstract:

In the digital era, the need for precise and efficient search operations is paramount as users increasingly rely on online resources to access specific information. However, search accuracy is often hindered by errors in user queries, such as incomplete or degraded input. Errors in search queries can reduce both the precision and speed of search results, making error correction a key factor in enhancing the user experience. This paper addresses the challenge of improving search performance through query error correction. We propose a novel methodology and architecture aimed at optimizing search results across thematic websites, such as those for universities, hospitals, or tourism agencies. The proposed solution leverages an intelligent model based on Gated Recurrent Units (GRUs) and Bahdanau Attention mechanisms to reconstruct erroneous or incomplete text in search queries. To validate our approach, we embedded the model in a prototype website consolidating data from multiple universities, demonstrating significant improvements in search accuracy and efficiency.

APA, Harvard, Vancouver, ISO, and other styles

22

Gudmundsson, Joachim, Michael Horton, John Pfeifer, and Martin P. Seybold. "A Practical Index Structure Supporting Fréchet Proximity Queries among Trajectories." ACM Transactions on Spatial Algorithms and Systems 7, no. 3 (2021): 1–33. http://dx.doi.org/10.1145/3460121.

Full text

Abstract:

We present a scalable approach for range and k nearest neighbor queries under computationally expensive metrics, like the continuous Fréchet distance on trajectory data. Based on clustering for metric indexes, we obtain a dynamic tree structure whose size is linear in the number of trajectories, regardless of the trajectory’s individual sizes or the spatial dimension, which allows one to exploit low “intrinsic dimensionality” of datasets for effective search space pruning. Since the distance computation is expensive, generic metric indexing methods are rendered impractical. We present strategies that (i) improve on known upper and lower bound computations, (ii) build cluster trees without any or very few distance calls, and (iii) search using bounds for metric pruning, interval orderings for reduction, and randomized pivoting for reporting the final results. We analyze the efficiency and effectiveness of our methods with extensive experiments on diverse synthetic and real-world datasets. The results show improvement over state-of-the-art methods for exact queries, and even further speedups are achieved for queries that may return approximate results. Surprisingly, the majority of exact nearest-neighbor queries on real datasets are answered without any distance computations.

APA, Harvard, Vancouver, ISO, and other styles

23

Lin, Sheng-Chieh, Jheng-Hong Yang, Rodrigo Nogueira, Ming-Feng Tsai, Chuan-Ju Wang, and Jimmy Lin. "Multi-Stage Conversational Passage Retrieval: An Approach to Fusing Term Importance Estimation and Neural Query Rewriting." ACM Transactions on Information Systems 39, no. 4 (2021): 1–29. http://dx.doi.org/10.1145/3446426.

Full text

Abstract:

Conversational search plays a vital role in conversational information seeking. As queries in information seeking dialogues are ambiguous for traditional ad hoc information retrieval (IR) systems due to the coreference and omission resolution problems inherent in natural language dialogue, resolving these ambiguities is crucial. In this article, we tackle conversational passage retrieval, an important component of conversational search, by addressing query ambiguities with query reformulation integrated into a multi-stage ad hoc IR system. Specifically, we propose two conversational query reformulation (CQR) methods: (1) term importance estimation and (2) neural query rewriting. For the former, we expand conversational queries using important terms extracted from the conversational context with frequency-based signals. For the latter, we reformulate conversational queries into natural, stand-alone, human-understandable queries with a pretrained sequence-to-sequence model. Detailed analyses of the two CQR methods are provided quantitatively and qualitatively, explaining their advantages, disadvantages, and distinct behaviors. Moreover, to leverage the strengths of both CQR methods, we propose combining their output with reciprocal rank fusion, yielding state-of-the-art retrieval effectiveness, 30% improvement in terms of NDCG@3 compared to the best submission of Text REtrieval Conference (TREC) Conversational Assistant Track (CAsT) 2019.

APA, Harvard, Vancouver, ISO, and other styles

24

Maree, Mohammed, Saadat M. Alhashmi, and Mohammed Belkhatir. "Towards Improving Meta-Search through Exploiting an Integrated Search Model." Journal of Information & Knowledge Management 10, no. 04 (2011): 379–91. http://dx.doi.org/10.1142/s0219649211003073.

Full text

Abstract:

Meta-search engines are created to reduce the burden on the user by dispatching queries to multiple search engines in parallel. Decisions on how to rank the returned results are made based on the query's keywords. Although keyword-based search model produces good results, better results can be obtained by integrating semantic and statistical based relatedness measures into this model. Such integration allows the meta-search engine to search by meanings rather than only by literal strings. In this article, we present Multi-Search+, the next generation of Multi-Search general-purpose meta-search engine. The extended version of the system employs additional knowledge represented by multiple domain-specific ontologies to enhance both the query processing and the returned results merging. In addition, new general-purpose search engines are plugged-in to its architecture. Experimental results demonstrate that our integrated search model obtained significant improvement in the quality of the produced search results.

APA, Harvard, Vancouver, ISO, and other styles

25

Chawla, Suruchi. "Intelligent Information Retrieval Using Hybrid of Fuzzy Set and Trust." Oriental journal of computer science and technology 10, no. 2 (2017): 311–25. http://dx.doi.org/10.13005/ojcst/10.02.09.

Full text

Abstract:

The main challenge for effective web Information Retrieval(IR) is to infer the information need from user’s query and retrieve relevant documents. The precision of search results is low due to vague and imprecise user queries and hence could not retrieve sufficient relevant documents. Fuzzy set based query expansion deals with imprecise and vague queries for inferring user’s information need. Trust based web page recommendations retrieve search results according to the user’s information need. In this paper an algorithm is designed for Intelligent Information Retrieval using hybrid of Fuzzy set and Trust in web query session mining to perform Fuzzy query expansion for inferring user’s information need and trust is used for recommendation of web pages according to the user’s information need. Experiment was performed on the data set collected in domains Academics, Entertainment and Sports and search results confirm the improvement of precision.

APA, Harvard, Vancouver, ISO, and other styles

26

Agyapong, Kwame, J. B. Hayfron Acquah, and M. Asante. "AN OPTIMIZED PAGE RANK ALGORITHM WITH WEB MINING, WEB CONTENT MINING AND WEB STRUCTURE MINING." International Journal of Engineering Technologies and Management Research 4, no. 8 (2020): 22–27. http://dx.doi.org/10.29121/ijetmr.v4.i8.2017.91.

Full text

Abstract:

With the rapid increase in internet technology, users get easily confused in large hypertext structure. The primary goal of the web site owner is to provide the relevant information to the users to fulfill their needs. In order to achieve this goal, they use the concept of web mining. Web mining is used to categorize users and pages by analyzing the users‟ behaviour, the content of the pages, and the order of the URLs that tend to be accessed in order. Most of the search engines are ranking their search results in response to users' queries to make their search navigation easier. With a web browser, one can view web pages that may contain text, images, videos, and other multimedia, and navigate between them via hyperlinks. It is very difficult for a user to find the high quality information which he wants. Page Ranking algorithm is needed which provide the higher ranking to the important pages. In this paper, we discuss the improvement of Page ranking algorithm to provide the higher ranking to important pages. Most of the search engines are ranking their search results in response to user’s queries to make their search navigations easier.

APA, Harvard, Vancouver, ISO, and other styles

27

Chen, Wanyu, Zepeng Hao, Taihua Shao, and Honghui Chen. "Personalized query suggestion based on user behavior." International Journal of Modern Physics C 29, no. 04 (2018): 1850036. http://dx.doi.org/10.1142/s0129183118500365.

Full text

Abstract:

Query suggestions help users refine their queries after they input an initial query. Previous work mainly concentrated on similarity-based and context-based query suggestion approaches. However, models that focus on adapting to a specific user (personalization) can help to improve the probability of the user being satisfied. In this paper, we propose a personalized query suggestion model based on users’ search behavior (UB model), where we inject relevance between queries and users’ search behavior into a basic probabilistic model. For the relevance between queries, we consider their semantical similarity and co-occurrence which indicates the behavior information from other users in web search. Regarding the current user’s preference to a query, we combine the user’s short-term and long-term search behavior in a linear fashion and deal with the data sparse problem with Bayesian probabilistic matrix factorization (BPMF). In particular, we also investigate the impact of different personalization strategies (the combination of the user’s short-term and long-term search behavior) on the performance of query suggestion reranking. We quantify the improvement of our proposed UB model against a state-of-the-art baseline using the public AOL query logs and show that it beats the baseline in terms of metrics used in query suggestion reranking. The experimental results show that: (i) for personalized ranking, users’ behavioral information helps to improve query suggestion effectiveness; and (ii) given a query, merging information inferred from the short-term and long-term search behavior of a particular user can result in a better performance than both plain approaches.

APA, Harvard, Vancouver, ISO, and other styles

28

Zhou, Xuanhe, Guoliang Li, Chengliang Chai, and Jianhua Feng. "A learned query rewrite system using Monte Carlo tree search." Proceedings of the VLDB Endowment 15, no. 1 (2021): 46–58. http://dx.doi.org/10.14778/3485450.3485456.

Full text

Abstract:

Query rewrite transforms a SQL query into an equivalent one but with higher performance. However, SQL rewrite is an NP-hard problem, and existing approaches adopt heuristics to rewrite the queries. These heuristics have two main limitations. First, the order of applying different rewrite rules significantly affects the query performance. However, the search space of all possible rewrite orders grows exponentially with the number of query operators and rules and it is rather hard to find the optimal rewrite order. Existing methods apply a pre-defined order to rewrite queries and will fall in a local optimum. Second, different rewrite rules have different benefits for different queries. Existing methods work on single plans but cannot effectively estimate the benefits of rewriting a query. To address these challenges, we propose a policy tree based query rewrite framework, where the root is the input query and each node is a rewritten query from its parent. We aim to explore the tree nodes in the policy tree to find the optimal rewrite query. We propose to use Monte Carlo Tree Search to explore the policy tree, which navigates the policy tree to efficiently get the optimal node. Moreover, we propose a learning-based model to estimate the expected performance improvement of each rewritten query, which guides the tree search more accurately. We also propose a parallel algorithm that can explore the tree search in parallel in order to improve the performance. Experimental results showed that our method significantly outperformed existing approaches.

APA, Harvard, Vancouver, ISO, and other styles

29

Zhang, Xinyi, Qichen Wang, Cheng Xu, Yun Peng, and Jianliang Xu. "FedKNN: Secure Federated k-Nearest Neighbor Search." Proceedings of the ACM on Management of Data 2, no. 1 (2024): 1–26. http://dx.doi.org/10.1145/3639266.

Full text

Abstract:

Nearest neighbor search is a fundamental task in various domains, such as federated learning, data mining, information retrieval, and biomedicine. With the increasing need to utilize data from different organizations while respecting privacy regulations, private data federation has emerged as a promising solution. However, it is costly to directly apply existing approaches to federated k-nearest neighbor (kNN) search with difficult-to-compute distance functions, like graph or sequence similarity. To address this challenge, we propose FedKNN, a system that supports secure federated kNN search queries with a wide range of similarity measurements. Our system is equipped with a new Distribution-Aware kNN (DANN) algorithm to minimize unnecessary local computations while protecting data privacy. We further develop DANN*, a secure version of DANN that satisfies differential obliviousness. Extensive evaluations show that FedKNN outperforms state-of-the-art solutions, achieving up to 4.8× improvement on federated graph kNN search and up to 2.7× improvement on federated sequence kNN search. Additionally, our approach offers a trade-off between privacy and efficiency, providing strong privacy guarantees with minimal overhead.

APA, Harvard, Vancouver, ISO, and other styles

30

Kwame, Boakye Agyapong, J.B.Hayfron-Acquah Dr., and M. Asante Dr. "AN OPTIMIZED PAGE RANK ALGORITHM WITH WEB MINING, WEB CONTENT MINING AND WEB STRUCTURE MINING." International Journal of Engineering Technologies and Management Research 4, no. 8 (2017): 22–27. https://doi.org/10.5281/zenodo.914660.

Full text

Abstract:

<strong><em>With the rapid increase in internet technology, users get easily confused in large hypertext structure. The primary goal of the web site owner is to provide the relevant information to the users to fulfill their needs. In order to achieve this goal, they use the concept of web mining. Web mining is used to categorize users and pages by analyzing the users" behaviour, the content of the pages, and the order of the URLs that tend to be accessed in order. Most of the search engines are ranking their search results in response to users' queries to make their search navigation easier. With a web browser, one can view web pages that may contain text, images, videos, and other multimedia, and navigate between them via hyperlinks. It is very difficult for a user to find the high quality information which he wants. Page Ranking algorithm is needed which provide the higher ranking to the important pages. In this paper, we discuss the improvement of Page ranking algorithm to provide the higher ranking to important pages. Most of the search engines are ranking their search results in response to user’s queries to make their search navigations easier.</em></strong>

APA, Harvard, Vancouver, ISO, and other styles

31

Abo Khamis, Mahmoud, Ahmet Kara, Dan Olteanu, and Dan Suciu. "Output-Sensitive Evaluation of Regular Path Queries." Proceedings of the ACM on Management of Data 3, no. 2 (2025): 1–20. https://doi.org/10.1145/3725242.

Full text

Abstract:

We study the classical evaluation problem for regular path queries: Given an edge-labeled graph and a regular path query, compute the set of pairs of vertices that are connected by paths that match the query. The Product Graph (PG) is the established evaluation approach for regular path queries. PG first constructs the product automaton of the data graph and the query and then uses breadth-first search to find the accepting states reachable from each initial state in the product automaton. Its data complexity is O(|V|⋅|E|), where V and E are the sets of vertices and respectively edges in the data graph. This complexity cannot be improved by combinatorial algorithms. In this paper, we introduce OSPG, an output-sensitive refinement of PG, whose data complexity is O(|E| 3/2 + min(OUT⋅√|E|, |V|⋅|E|)), where OUT is the number of distinct vertex pairs in the query output. OSPG's complexity is at most that of PG and can be asymptotically smaller for small output and sparse input. The improvement of OSPG over PG is due to the unnecessary time wasted by PG in the breadth-first search phase, in case a few output pairs are eventually discovered. For queries without Kleene star, the complexity of OSPG can be further improved to O(|E| + |E|⋅√OUT).

APA, Harvard, Vancouver, ISO, and other styles

32

Moll, Oscar, Manuel Favela, Samuel Madden, Vijay Gadepally, and Michael Cafarella. "SeeSaw: Interactive Ad-hoc Search Over Image Databases." Proceedings of the ACM on Management of Data 1, no. 4 (2023): 1–26. http://dx.doi.org/10.1145/3626754.

Full text

Abstract:

As image datasets become ubiquitous, the problem of ad-hoc searches over image data is increasingly important. Many high-level data tasks in machine learning, such as constructing datasets for training and testing object detectors, imply finding ad-hoc objects or scenes within large image datasets as a key sub-problem. New foundational visual-semantic embeddings trained on massive web datasets such as Contrastive Language-Image Pre-Training (CLIP) can help users start searches on their own data, but we find there is a long tail of queries where these models fall short in practice. Seesaw is a system for interactive ad-hoc searches on image datasets that integrates state-of-the-art embeddings like CLIP with user feedback in the form of box annotations to help users quickly locate images of interest in their data even in the long tail of harder queries. One key challenge for Seesaw is that, in practice, many sensible approaches to incorporating feedback into future results, including state-of-the-art active-learning algorithms, can worsen results compared to introducing no feedback, partly due to CLIP's high-average performance. Therefore, Seesaw includes several algorithms that empirically result in larger and also more consistent improvements. We compare Seesaw's accuracy to both using CLIP alone and to a state-of-the-art active-learning baseline and find Seesaw consistently helps improve results for users across four datasets and more than a thousand queries. Seesaw increases Average Precision (AP) on search tasks by an average of .08 on a wide benchmark (from a base of .72), and by a .27 on a subset of more difficult queries where CLIP alone performs poorly.

APA, Harvard, Vancouver, ISO, and other styles

33

Rogushina, J. V. "Classification of means and methods of the Web semantic retrieval." PROBLEMS IN PROGRAMMING, no. 1 (2017): 030–50. http://dx.doi.org/10.15407/pp2017.01.030.

Full text

Abstract:

Problems associated with the improve ment of information retrieval for open environment are considered and the need for it’s semantization is grounded. Thecurrent state and prospects of development of semantic search engines that are focused on the Web information resources processing are analysed, the criteria for the classification of such systems are reviewed. In this analysis the significant attention is paid to the semantic search use of ontologies that contain knowledge about the subject area and the search users. The sources of ontological knowledge and methods of their processing for the improvement of the search procedures are considered. Examples of semantic search systems that use structured query languages (eg, SPARQL), lists of keywords and queries in natural language are proposed. Such criteria for the classification of semantic search engines like architecture, coupling, transparency, user context, modification requests, ontology structure, etc. are considered. Different ways of support of semantic and otology based modification of user queries that improve the completeness and accuracy of the search are analyzed. On base of analysis of the properties of existing semantic search engines in terms of these criteria, the areas for further improvement of these systems are selected: the development of metasearch systems, semantic modification of user requests, the determination of an user-acceptable transparency level of the search procedures, flexibility of domain knowledge management tools, increasing productivity and scalability. In addition, the development of means of semantic Web search needs in use of some external knowledge base which contains knowledge about the domain of user information needs, and in providing the users with the ability to independent selection of knowledge that is used in the search process. There is necessary to take into account the history of user interaction with the retrieval system and the search context for personalization of the query results and their ordering in accordance with the user information needs. All these aspects were taken into account in the design and implementation of semantic search engine "MAIPS" that is based on an ontological model of users and resources cooperation into the Web.

APA, Harvard, Vancouver, ISO, and other styles

34

Uyar, Ahmet, and Rabia Karapinar. "Investigating the precision of Web image search engines for popular and less popular entities." Journal of Information Science 43, no. 3 (2016): 378–92. http://dx.doi.org/10.1177/0165551516642929.

Full text

Abstract:

Image search is the second most frequently used search service on the Web. However, there are very few studies investigating any aspect of it. In this study, we investigate the precision of Web image search engines of Google and Bing for popular and less popular entities using text-based queries. Furthermore, we investigate four additional aspects of Web image search engines that have not been studied before. We used 60 different queries in total from three different domains for popular and less popular categories. We examined the relevancy of the top 100 images for each query. Our results indicate that image search is a solved problem for popular entities. They deliver 97% precision on the average for popular entities. However, precision values are much lower for less popular entities. For the top 100 results, average precision is 48% for Google and 33% for Bing. The most important problem seems to be the worst cases in which the precision can be less than 10%. The results show that significant improvement is needed to better identify relevant images for less popular entities. One of the main issues is the association problem. When a Web page has query words and multiple images, both Google and Bing are having difficulty determining the relevant images.

APA, Harvard, Vancouver, ISO, and other styles

35

Chang, Yi, Ruiqiang Zhang, Srihari Reddy, and Yan Liu. "Detecting Multilingual and Multi-Regional Query Intent in Web Search." Proceedings of the AAAI Conference on Artificial Intelligence 25, no. 1 (2011): 1134–39. http://dx.doi.org/10.1609/aaai.v25i1.8074.

Full text

Abstract:

With rapid growth of commercial search engines, detecting multilingual and multi-regional intent underlying search queries becomes a critical challenge to serve international users with diverse language and region requirements. We introduce a query intent probabilistic model, whose input is the number of clicks on documents from different regions and in different language, while the output of this model is a smoothed probabilistic distribution of multilingual and multi-regional query intent. Based on an editorial test to evaluate the accuracy of the intent classifier, our probabilistic model could improve the accuracy of multilingual intent detection for 15%, and improve multi-regional intent detection for 18%. To improve web search quality, we propose a set of new ranking features to combine multilingual and multi-regional query intent with document language/region attributes, and apply different approaches in integrating intent information to directly affect ranking. The experiments show that the novel features could provide 2.31% NDCG@1 improvement and 1.81% NDCG@5 improvement.

APA, Harvard, Vancouver, ISO, and other styles

36

Siddiqui, Tarique, Wentao Wu, Vivek Narasayya, and Surajit Chaudhuri. "DISTILL." Proceedings of the VLDB Endowment 15, no. 10 (2022): 2019–31. http://dx.doi.org/10.14778/3547305.3547309.

Full text

Abstract:

Many database systems offer index tuning tools that help automatically select appropriate indexes for improving the performance of an input workload. Index tuning is a resource-intensive and time-consuming task requiring expensive optimizer calls for estimating the cost of queries over potential index configurations. In this work, we develop low-overhead techniques that can be leveraged by index tuning tools for reducing a large number of optimizer calls without making changes to the tuning algorithm or to the query optimizer. First, index tuning tools use rule-based techniques to generate a large number of syntactically-relevant indexes; however, a large proportion of such indexes are spurious and do not lead to a significant improvement in the performance of queries. We eliminate such indexes much earlier in the search by leveraging patterns in the workload, without making optimizer calls. Second, we learn cost models that exploit the similarity between query and index configuration pairs in the workload to efficiently estimate the cost of queries over a large number of index configurations using fewer optimizer calls. We perform an extensive evaluation over both real-world and synthetic benchmarks, and show that given the same set of input queries, indexes, and the search algorithm for exploration, our proposed techniques can lead to a median reduction in tuning time of 3X and a maximum of 12X compared to state-of-the-art tuning tools with similar quality of recommended indexes.

APA, Harvard, Vancouver, ISO, and other styles

37

Xiong, Wei, Michael Recce, and Brook Wu. "Intent-Based User Segmentation with Query Enhancement." International Journal of Information Retrieval Research 3, no. 4 (2013): 1–17. http://dx.doi.org/10.4018/ijirr.2013100101.

Full text

Abstract:

With the rapid advancement of the internet, accurate prediction of user's online intent underlying their search queries has received increasing attention from online advertising community. This paper aims to address the major challenges with user queries in the context of behavioral targeting advertising by proposing a query enhancement mechanism that augments user's queries by leveraging a user query log. The empirical evaluation demonstrates that the authors' methodology for query enhancement achieves greater improvement than the baseline models in both intent-based user classification and user segmentation. Different from traditional user segmentation methods, which take little semantics of user behaviors into consideration, the authors propose a novel user segmentation strategy by incorporating the query enhancement mechanism with a topic model to mine the relationships between users and their behaviors in order to segment users in a semantic manner. Comparing with a classical clustering algorithm, K-means, the experimental results indicate that the proposed user segmentation strategy helps improve behavioral targeting effectiveness significantly. This paper also proposes an alternative to define user's search intent for the evaluation purpose, in the case that the dataset is sanitized. This approach automatically labels users in a click graph, which are then used in training an intent-based user classifier.

APA, Harvard, Vancouver, ISO, and other styles

38

Wu, Dan, Daqing He, and Xiaomei Xu. "A study of relevance feedback techniques in interactive multilingual information access." Library Hi Tech 30, no. 3 (2012): 523–44. http://dx.doi.org/10.1108/07378831211266645.

Full text

Abstract:

PurposeWith the vast amount of multilingual information available online, it becomes increasingly critical for libraries to use various multilingual information access techniques in order to effectively support patrons' online information requests. However, this is still a relatively under‐explored area. This paper aims to study the effectiveness and the adoptability of query expansion and translation enhancement in the context of interactive multilingual information access.Design/methodology/approachRelying on an interactive multilingual information access system called ICE‐TEA, the authors conducted a controlled experiment (English‐to‐Chinese translation) involving human subjects to assess the retrieval effectiveness, analyzed the collected search logs to examine users' behavior, and employed pre‐ and post‐questionnaires to obtain users' opinions about the system.FindingsThe results confirm that significant improvement in retrieval effectiveness can be achieved by combining query expansion with translation enhancement (as compared to a case when there is no relevance feedback). However, users' ability to understand, interact with and even perceive the complex process of searches involving the combination of query expansion and translation enhancement may greatly impact the effectiveness of the techniques. The results also confirm that human‐generated queries were short queries, which calls for careful consideration of how longer queries perform in real search because many search engines rely on longer and more complex queries.Originality/valueThis study examines two important relevance feedback techniques in the context of human‐involved multilingual information access. This study is a valuable addition to the information seeking behaviour literature.

APA, Harvard, Vancouver, ISO, and other styles

39

N, Poornima, Shivam Agrawal, Shivam Agrawal, Saleena B, and Saleena B. "PRIOR ONTOLOGY SELECTION AND QUERY TRANSLATION FOR INFORMATION SEARCH." Asian Journal of Pharmaceutical and Clinical Research 10, no. 13 (2017): 499. http://dx.doi.org/10.22159/ajpcr.2017.v10s1.23490.

Full text

Abstract:

Objective: Most of the current search engines follow informal keyword based search. Finding the user intention and improving the relevancy of results are the major issues faced by the current traditional keyword based search. Targeting to solve the problems of traditional search and to boost the retrieval process, a framework for semantic based information retrieval is planned. Methods: Social and wine ontologies are used to find the user intention and retrieving it. User’s natural language queries are translated into SPARQL (SPARQL Protocol and Resource Description Framework query language) query for finding related items from those ontologies.Results: The proposed method makes a significant improvement over traditional search in terms of some searches required for searching a particular number of pages using performance graph.Conclusion: Semantic based search can understand the user intention and gives better results than traditional search.

APA, Harvard, Vancouver, ISO, and other styles

40

Tejedor, Javier, Doroteo T. Toledano, Jose M. Ramirez, Ana R. Montalvo, and Juan Ignacio Alvarez-Trejos. "The Multi-Domain International Search on Speech 2020 ALBAYZIN Evaluation: Overview, Systems, Results, Discussion and Post-Evaluation Analyses." Applied Sciences 11, no. 18 (2021): 8519. http://dx.doi.org/10.3390/app11188519.

Full text

Abstract:

The large amount of information stored in audio and video repositories makes search on speech (SoS) a challenging area that is continuously receiving much interest. Within SoS, spoken term detection (STD) aims to retrieve speech data given a text-based representation of a search query (which can include one or more words). On the other hand, query-by-example spoken term detection (QbE STD) aims to retrieve speech data given an acoustic representation of a search query. This is the first paper that presents an internationally open multi-domain evaluation for SoS in Spanish that includes both STD and QbE STD tasks. The evaluation was carefully designed so that several post-evaluation analyses of the main results could be carried out. The evaluation tasks aim to retrieve the speech files that contain the queries, providing their start and end times and a score that reflects how likely the detection within the given time intervals and speech file is. Three different speech databases in Spanish that comprise different domains were employed in the evaluation: the MAVIR database, which comprises a set of talks from workshops; the RTVE database, which includes broadcast news programs; and the SPARL20 database, which contains Spanish parliament sessions. We present the evaluation itself, the three databases, the evaluation metric, the systems submitted to the evaluation, the evaluation results and some detailed post-evaluation analyses based on specific query properties (in-vocabulary/out-of-vocabulary queries, single-word/multi-word queries and native/foreign queries). The most novel features of the submitted systems are a data augmentation technique for the STD task and an end-to-end system for the QbE STD task. The obtained results suggest that there is clearly room for improvement in the SoS task and that performance is highly sensitive to changes in the data domain.

APA, Harvard, Vancouver, ISO, and other styles

41

Verberne, Suzan, Emiel Krahmer, Sander Wubben, and Antal van den Bosch. "Query-based summarization of discussion threads." Natural Language Engineering 26, no. 1 (2019): 3–29. http://dx.doi.org/10.1017/s1351324919000123.

Full text

Abstract:

AbstractIn this paper, we address query-based summarization of discussion threads. New users can profit from the information shared in the forum, Please check if the inserted city and country names in the affiliations are correct. if they can find back the previously posted information. However, discussion threads on a single topic can easily comprise dozens or hundreds of individual posts. Our aim is to summarize forum threads given real web search queries. We created a data set with search queries from a discussion forum’s search engine log and the discussion threads that were clicked by the user who entered the query. For 120 thread–query combinations, a reference summary was made by five different human raters. We compared two methods for automatic summarization of the threads: a query-independent method based on post features, and Maximum Marginal Relevance (MMR), a method that takes the query into account. We also compared four different word embeddings representations as alternative for standard word vectors in extractive summarization. We find (1) that the agreement between human summarizers does not improve when a query is provided that: (2) the query-independent post features as well as a centroid-based baseline outperform MMR by a large margin; (3) combining the post features with query similarity gives a small improvement over the use of post features alone; and (4) for the word embeddings, a match in domain appears to be more important than corpus size and dimensionality. However, the differences between the models were not reflected by differences in quality of the summaries created with help of these models. We conclude that query-based summarization with web queries is challenging because the queries are short, and a click on a result is not a direct indicator for the relevance of the result.

APA, Harvard, Vancouver, ISO, and other styles

42

Purwita, Naila Iffah, Moch Arif Bijaksana, Kemas Muslim Lhaksmana, and Muhammad Zidny Naf’an. "Typo handling in searching of Quran verse based on phonetic similarities." Register: Jurnal Ilmiah Teknologi Sistem Informasi 6, no. 2 (2020): 130. http://dx.doi.org/10.26594/register.v6i2.2065.

Full text

Abstract:

The Quran search system is a search system that was built to make it easier for Indonesians to find a verse with text by Indonesian pronunciation, this is a solution for users who have difficulty writing or typing Arabic characters. Quran search system with phonetic similarity can make it easier for Indonesian Muslims to find a particular verse. Lafzi was one of the systems that developed the search, then Lafzi was further developed under the name Lafzi+. The Lafzi+ system can handle searches with typo queries but there are still fewer variations regarding typing error types. In this research Lafzi++, an improvement from previous development to handle typographical error types was carried out by applying typo correction using the autocomplete method to correct incorrect queries and Damerau Levenshtein distance to calculate the edit distance, so that the system can provide query suggestions when a user mistypes a search, either in the form of substitution, insertion, deletion, or transposition. Users can also search easily because they use Latin characters according to pronunciation in Indonesian. Based on the evaluation results it is known that the system can be better developed, this can be seen from the accuracy value in each query that is tested can surpass the accuracy of the previous system, by getting the highest recall of 96.20% and the highest Mean Average Precision (MAP) reaching 90.69%. The Lafzi++ system can improve the previous system.

APA, Harvard, Vancouver, ISO, and other styles

43

Kelly, Kate. "Applying the Narrow Forms of PubMed Methods-based and Topic-based Filters Increases Nephrologists’ Search Efficiency." Evidence Based Library and Information Practice 7, no. 3 (2012): 95. http://dx.doi.org/10.18438/b8cg7n.

Full text

Abstract:

Objective – To determine whether the use of PubMed methods-based filters and topic-based filters, alone or in combination, improves physician searching.  Design – Mixed methods, survey questionnaire, comparative.  Setting – Canada.  Subjects – Random sample of Canadian nephrologists (n=153), responses (n=115), excluded (n=15), total (n=100).  Methods – The methods are described in detail in a previously published study protocol by a subset of the authors (Shariff et al., 2010).   One hundred systematic reviews on renal therapy were identified using the EvidenceUpdates service (http://plus.mcmaster.ca/EvidenceUpdates) and a clinical question was derived from each review. Randomly-selected Canadian nephrologists were randomly assigned a unique clinical question derived from the reviews and asked, by survey, to provide the search query they would use to search PubMed. The survey was administered until one valid search query for each of the one hundred questions was received.  The physician search was re-executed and compared to searches where either or both methods-based and topic-based filters were applied. Nine searches for each question were conducted: the original physician search, a broad and narrow form of the clinical queries therapy filter, a broad and narrow form of the nephrology topic filter and combinations of broad and narrow forms of both filters.   Significance tests of comprehensiveness (proportion of relevant articles found) and efficiency (ratio of relevant to non-relevant articles) of the filtered and unfiltered searches were conducted. The primary studies included in the systematic reviews were set as the reference standard for relevant articles.   As physicians indicated they did not scan beyond two pages of default PubMed results, primary analysis was also repeated on search results restricted to the first 40 records.   The ability of the filters to retrieve highly-relevant or highly-cited articles was also tested, with an article being considered highly-relevant if referenced by UpToDate and highly-cited if its citation count was greater than the median citation count of all relevant articles for that question – there was an average of eight highly-cited articles per question.   To reduce the risk of type I error, the conservative method of Bonferroni was applied so that tests with a p less than 0.003 were interpreted as statistically significant.  Main Results – Response rate 75%. Physician-provided search terms retrieved 46% of relevant articles and a ratio of relevant to non-relevant articles of 1:16 (p less than 0.003). Applying the narrow forms of both the nephrology and clinical queries filters together produced the greatest overall improvement, with efficiency improving by 16% and comprehensiveness remaining unchanged. Applying a narrow form of the clinical queries filter increased efficiency by 17% (p less than 0.003) but decreased comprehensiveness by 8% (p less than 0.003). No combination of search filters produced improvements in both comprehensiveness and efficiency.  When results were restricted to the first 40 citations, the use of the narrow form of the clinical queries filter alone improved overall search performance – comprehensiveness improved from 13% to 26 % and efficiency from 5.5% to 23%.  For highly-cited or highly-relevant articles the combined use of the narrow forms of both filters produced the greatest overall improvement in efficiency but no significant change in comprehensiveness.  Conclusion – The use of PubMed search filters improves the efficiency of physician searches and saves time and frustration. Applying clinical filters for quick clinical searches can significantly improve the efficiency of physician searching. Improved search performance has the potential to enhance the transfer of research into practice and improve patient care.

APA, Harvard, Vancouver, ISO, and other styles

44

Veretennikov, A. B. "Relevance ranking for proximity full-text search based on additional indexes with multi-component keys." Vestnik Udmurtskogo Universiteta. Matematika. Mekhanika. Komp'yuternye Nauki 31, no. 1 (2021): 132–48. http://dx.doi.org/10.35634/vm210110.

Full text

Abstract:

The problem of proximity full-text search is considered. If a search query contains high-frequently occurring words, then multi-component key indexes deliver improvement of the search speed in comparison with ordinary inverted indexes. It was shown that we can increase the search speed up to 130 times in cases when queries consist of high-frequently occurring words. In this paper, we are investigating how the multi-component key indexes architecture affects the quality of the search. We consider several well-known methods of relevance ranking; these methods are of different authors. Using these methods we perform the search in the ordinary inverted index and then in the index that is enhanced with multi-component key indexes. The results show that with multi-component key indexes we obtain search results that are very near in terms of relevance ranking to the search results that are obtained by means of ordinary inverted indexes.

APA, Harvard, Vancouver, ISO, and other styles

45

RYU, TAE W. "A COMMON CHARACTERISTIC KNOWLEDGE DISCOVERY SYSTEM IN DISTRIBUTED COMPUTING ENVIRONMENT." International Journal on Artificial Intelligence Tools 14, no. 03 (2005): 425–43. http://dx.doi.org/10.1142/s0218213005002181.

Full text

Abstract:

This paper describes an automated query discovery system for retrieving common characteristic knowledge from a database in a distributed computing environment. The paper particularly centers on the problem of discovering the common characteristics that are shared by a set of objects in a database. This type of commonalities can be useful in finding a typical profile for the given object set or outstanding features for a group of objects in a database. In our approach, commonalities within a set of objects are described by database queries that compute the given set of objects. We use the genetic programming as a major search engine to discover such queries. The paper discusses the architecture and the techniques used in our system, and presents some experimental results to evaluate the system. In addition, for the performance improvement, we built a distributed computing environment for our system with clustered computers using the Common Object Request Broker Architecture (CORBA). The paper briefly discusses our clustered computer architecture, the implementation of distributed computing environment, and shows the overall performance improvement.

APA, Harvard, Vancouver, ISO, and other styles

46

Park, Chang-Sup, and Sungchae Lim. "Effective keyword query processing with an extended answer structure in large graph databases." International Journal of Web Information Systems 10, no. 1 (2014): 65–84. http://dx.doi.org/10.1108/ijwis-11-2013-0030.

Full text

Abstract:

Purpose – The paper aims to propose an effective method to process keyword-based queries over graph-structured databases which are widely used in various applications such as XML, semantic web, and social network services. To satisfy users' information need, it proposes an extended answer structure for keyword queries, inverted list indexes on keywords and nodes, and query processing algorithms exploiting the inverted lists. The study aims to provide more effective and relevant answers to a given query than the previous approaches in an efficient way. Design/methodology/approach – A new relevance measure for nodes to a given keyword query is defined in the paper and according to the relevance metric, a new answer tree structure is proposed which has no constraint on the number of keyword nodes chosen for each query keyword. For efficient query processing, an inverted list-style index is suggested which pre-computes connectivity and relevance information on the nodes in the graph. Then, a query processing algorithm based on the pre-constructed inverted lists is designed, which aggregates list entries for each graph node relevant to given keywords and identifies top-k root nodes of answer trees most relevant to the given query. The basic search method is also enhanced by using extend inverted lists which store additional relevance information of the related entries in the lists in order to estimate the relevance score of a node more closely and to find top-k answers more efficiently. Findings – Experiments with real datasets and various test queries were conducted for evaluating effectiveness and performance of the proposed methods in comparison with one of the previous approaches. The experimental results show that the proposed methods with an extended answer structure produce more effective top-k results than the compared previous method for most of the queries, especially for those with OR semantics. An extended inverted list and enhanced search algorithm are shown to achieve much improvement on the execution performance compared to the basic search method. Originality/value – This paper proposes a new extended answer structure and query processing scheme for keyword queries on graph databases which can satisfy the users' information need represented by a keyword set having various semantics.

APA, Harvard, Vancouver, ISO, and other styles

47

Fang, Shuheng, Kangfei Zhao, Yu Rong, Zhixun Li, and Jeffrey Xu Yu. "Inductive Attributed Community Search: To Learn Communities Across Graphs." Proceedings of the VLDB Endowment 17, no. 10 (2024): 2576–89. http://dx.doi.org/10.14778/3675034.3675048.

Full text

Abstract:

Attributed community search (ACS) aims to identify subgraphs satisfying both structure cohesiveness and attribute homogeneity in attributed graphs, for a given query that contains query nodes and query attributes. Previously, algorithmic approaches deal with ACS in a two-stage paradigm, which suffer from structural inflexibility and attribute irrelevance. To overcome this problem, recently, learning-based approaches have been proposed to learn both structures and attributes simultaneously as a one-stage paradigm. However, these approaches train a transductive model which assumes the graph to infer unseen queries is as same as the graph used for training. That limits the generalization and adaptation of these approaches to different heterogeneous graphs. In this paper, we propose a new framework, Inductive Attributed Community Search, IACS , by inductive learning, which can be used to infer new queries for different communities/graphs. Specifically, IACS employs an encoder-decoder neural architecture to handle an ACS task at a time, where a task consists of a graph with only a few queries and corresponding ground-truth. We design a three-phase workflow, "training-adaptation-inference", which learns a shared model to absorb and induce prior effective common knowledge about ACS across different tasks. And the shared model can swiftly adapt to a new task with small number of ground-truth. We conduct substantial experiments in 7 real-world datasets to verify the effectiveness of IACS for CS/ACS. Our approach IACS achieves 28.97% and 25.60% improvements in F1-score on average in CS and ACS, respectively.

APA, Harvard, Vancouver, ISO, and other styles

48

Delgado-de-la-Garza, Luis Alberto, Gonzalo Adolfo Garza-Rodríguez, Daniel Alejandro Jacques-Osuna, Alejandro Múgica-Lara, and Carlos Alberto Carrasco. "Does the use of a big data variable improve monetary policy estimates? Evidence from Mexico." Economics and Business Letters 10, no. 4 (2021): 383–93. http://dx.doi.org/10.17811/ebl.10.4.2021.383-393.

Full text

Abstract:

We analyse the performance improvement on a monetary policy model of introducing non-conventional market attention (NCMA) indices generated using big data. To address this aim, we extracted top keywords by text mining Banco de Mexico’s minutes. Then, we used Google search information according to the top keywords and related queries to generate NCMA indices. Finally, we introduce as covariates the NCMA indices into a bivariate probit model of monetary policy and contrast several specifications to examine the improvement in the model estimates. Our results show evidence of the statistical significance of the NCMA indices where the expanded model performed better than models only including conventional economic and financial variables.

APA, Harvard, Vancouver, ISO, and other styles

49

Chawla, Suruchi. "Application of Fuzzy C-Means Clustering and Semantic Ontology in Web Query Session Mining for Intelligent Information Retrieval." International Journal of Fuzzy System Applications 10, no. 1 (2021): 1–19. http://dx.doi.org/10.4018/ijfsa.2021010101.

Full text

Abstract:

Information retrieval based on keywords search retrieves irrelevant documents because of vocabulary gap between document content and search queries. The keyword vector representation of web documents is very high dimensional, and keyword terms are unable to capture the semantic of document content. Ontology has been built in various domains for representing the semantics of documents based on concepts relevant to document subject. The web documents often contain multiple topics; therefore, fuzzy c-means document clustering has been used for discovering clusters with overlapping boundaries. In this paper, the method is proposed for intelligent information retrieval using hybrid of fuzzy c-means clustering and ontology in query session mining. Thus, use of fuzzy clusters of web query session concept vector improve quality of clusters for effective web search. The proposed method was evaluated experimentally, and results show the improvement in precision of search results.

APA, Harvard, Vancouver, ISO, and other styles

50

Zhu, Xiuqi. "A Study on Theoretical Modified Interactive Graph Search Algorithm." Theoretical and Natural Science 83, no. 1 (2025): 244–57. https://doi.org/10.54254/2753-8818/2025.20191.

Full text

Abstract:

Interactive search on hierarchical structures, such as trees and Directed Acyclic Graphs (DAGs), is a crucial problem with applications in various domains, including data retrieval, recommendation systems, and biological networks. Existing methods for interactive search exhibit significant limitations in both theoretical and practical aspects, particularly in handling high-degree nodes and minimizing the number of queries. In this paper, we present a novel approach to interactive graph search that reduces the gap between the current performance bounds and the theoretical optimum. We introduce new algorithms, including an improved Golden Search for binary trees and a method for Equivalence Tree Rewrite, that efficiently manage high-degree nodes and enhance the overall retrieval performance. Our theoretical analysis establishes tighter lower bounds for the number of queries, and our experimental results on multiple real-world datasets demonstrate significant improvements over state-of-the-art methods. The proposed approach not only achieves better performance in terms of query reduction but also provides a more robust framework for practical applications in complex hierarchical datasets.

APA, Harvard, Vancouver, ISO, and other styles

We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!