To see the other types of publications on this topic, follow the link: TEXT RETRIEVAL METHODS.

Journal articles on the topic 'TEXT RETRIEVAL METHODS'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 journal articles for your research on the topic 'TEXT RETRIEVAL METHODS.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse journal articles on a wide variety of disciplines and organise your bibliography correctly.

1

Dik Lun Lee, Young Man Kim, and Gaurav Patel. "Efficient signature file methods for text retrieval." IEEE Transactions on Knowledge and Data Engineering 7, no. 3 (June 1995): 423–35. http://dx.doi.org/10.1109/69.390248.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Kando, Noriko, Kyo Kageura, Masaharu Yoshioka, and Keizo Oyama. "Phrase processing methods for Japanese text retrieval." ACM SIGIR Forum 32, no. 2 (September 1998): 23–28. http://dx.doi.org/10.1145/305110.305120.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Chute, C. G., and Y. Yang. "An Overview of Statistical Methods for the Classification and Retrieval of Patient Events." Methods of Information in Medicine 34, no. 01/02 (1995): 104–10. http://dx.doi.org/10.1055/s-0038-1634570.

Full text
Abstract:
Abstract:Statistical methods that can support text retrieval are becoming an increasing focus of medical informatics activities. We overview our adaptation of existing knowlege sources to create pseudo-documents for concept based latent semantic indexing. Experience demonstrated this tack of limited practical value, since retrieval performance was invariably unsatisfactory. We discovered this was due in part to the introduction of a vocabulary gap between the queries and the cases we sought to retrieve. In part to address this problem, and to avail our large body of humanly coded text as a knowledge source, we developed a least squares fit alternative for the computer assisted indexing and retrieval of biomedical texts. This technique demonstrates equivalent or superior retrieval performance when compared to all other textual retrieval techniques. It does not depend upon elaborate knowledge bases, lexicons, or thesauri. It is a promising technique for classifying and retrieving the large volumes of clinical text.
APA, Harvard, Vancouver, ISO, and other styles
4

Rautray, Rasmita, Lopamudra Swain, Rasmita Dash, and Rajashree Dash. "A brief review on text summarization methods." International Journal of Engineering & Technology 7, no. 4.5 (September 22, 2018): 728. http://dx.doi.org/10.14419/ijet.v7i4.5.25070.

Full text
Abstract:
In present scenario, text summarization is a popular and active field of research in both the Information Retrieval (IR) and Natural Language Processing (NLP) communities. Summarization is important for IR since it is a means to identify useful information by condensing the document from large corpus of data in an efficient way. In this study, different aspects of text summarization methods with strength, limitation and gap within the methods are presented.
APA, Harvard, Vancouver, ISO, and other styles
5

Srinivasa Reddy, K., R. Anandan, K. Kalaivani, and P. Swaminathan. "A comprehensive survey on content based image retrieval system and its application in medical domain." International Journal of Engineering & Technology 7, no. 2.31 (May 29, 2018): 181. http://dx.doi.org/10.14419/ijet.v7i2.31.13436.

Full text
Abstract:
Content Based Image Retrieval (CBIR) is an important and widely used technique for retrieval of different kinds of images from large database. Collection of information in database are available in different formats such as text, image, graph, chart etc. Here, our focus is on information which is available in the form of images. Searching and retrieval of the image from a large amount of database is difficult problem because it uses the image visual information such as shape, text and color for indexing and representation of an image. For efficient CBIR system, there is a need to develop different kinds of retrieval methods using feature extraction, similarity matching etc. Text Based Image Retrieval systems are used in many hospitals, but for large databases these are inefficient. To solve this problem, CBIR systems are proposed to retrieve matching images from database using automated feature extraction method. At present, medical imaging field finds extensive growth in the generation and evaluation of various types of medical images which are high inconsistency, usually fused and the combination of various minor composition structures. For easy retrieval, need to be development of feature extraction and image classification methods. Different methods are used for different kinds of medical images. The Radiology department and Cardiology department are the largest producers of medical images and the patient abnormal images can be stored with the normal images. CBIR uses query image as input and it retrieves the images, which are similar to the query more efficiently and effectively. This paper provides a comprehensive Survey about CBIR system and its one of the major application in medical domain.
APA, Harvard, Vancouver, ISO, and other styles
6

Suhartono, Didit, and Khodirun Khodirun. "System of Information Feedback on Archive Using Term Frequency-Inverse Document Frequency and Vector Space Model Methods." IJIIS: International Journal of Informatics and Information Systems 3, no. 1 (March 1, 2020): 36–42. http://dx.doi.org/10.47738/ijiis.v3i1.6.

Full text
Abstract:
The archive is one of the examples of documents that important. Archives are stored systematically with a view to helping and simplifying the storage and retrieval of the archive. In the information retrieval (Information retrieval) the process of retrieving relevant documents and not retrieving documents that are not relevant. To retrieve the relevant documents, a method is needed. Using the Term Frequency-Inverse Document and Vector Space Model methods can find relevant documents according to the level of closeness or similarity, in addition to applying the Nazief-Adriani stemming algorithm can improve information retrieval performance by transforming words in a document or text to the basic word form. then the system indexes the document to simplify and speed up the search process. Relevance is determined by calculating the similarity values between existing documents by querying and represented in certain forms. The documents obtained, then the system sort by the level of relevance to the query.
APA, Harvard, Vancouver, ISO, and other styles
7

Hui, Fan, Guo Jie, and Jin Jiang Li. "New Research Progress in Image Retrieval." Applied Mechanics and Materials 333-335 (July 2013): 1076–79. http://dx.doi.org/10.4028/www.scientific.net/amm.333-335.1076.

Full text
Abstract:
Image retrieval is generally divided into two categories: one is text-based Image Retrieval; another is content-based Image Retrieval. Early image retrieval technology is mainly based on the text, after 90 years, the content-based image retrieval emerged. So far, we mainly use image retrieval technology that based on color, texture, layout analysis and retrieval. That is: content-based Image Retrieval (CBIR). This paper review the two kinds of image retrieval methods, and introduces a variety of techniques in content-based image retrieval, we also prospect of fusion research of text and content.
APA, Harvard, Vancouver, ISO, and other styles
8

Liu, Zhiqiang, Jingkun Feng, Zhihao Yang, and Lei Wang. "Document Retrieval for Precision Medicine Using a Deep Learning Ensemble Method." JMIR Medical Informatics 9, no. 6 (June 29, 2021): e28272. http://dx.doi.org/10.2196/28272.

Full text
Abstract:
Background With the development of biomedicine, the number of biomedical documents has increased rapidly bringing a great challenge for researchers trying to retrieve the information they need. Information retrieval aims to meet this challenge by searching relevant documents from abundant documents based on the given query. However, sometimes the relevance of search results needs to be evaluated from multiple aspects in specific retrieval tasks, thereby increasing the difficulty of biomedical information retrieval. Objective This study aimed to find a more systematic method for retrieving relevant scientific literature for a given patient. Methods In the initial retrieval stage, we supplemented query terms through query expansion strategies and applied query boosting to obtain an initial ranking list of relevant documents. In the re-ranking phase, we employed a text classification model and relevance matching model to evaluate documents from different dimensions and then combined the outputs through logistic regression to re-rank all the documents from the initial ranking list. Results The proposed ensemble method contributed to the improvement of biomedical retrieval performance. Compared with the existing deep learning–based methods, experimental results showed that our method achieved state-of-the-art performance on the data collection provided by the Text Retrieval Conference 2019 Precision Medicine Track. Conclusions In this paper, we proposed a novel ensemble method based on deep learning. As shown in the experiments, the strategies we used in the initial retrieval phase such as query expansion and query boosting are effective. The application of the text classification model and relevance matching model better captured semantic context information and improved retrieval performance.
APA, Harvard, Vancouver, ISO, and other styles
9

KIKUCHI, Hirosato. "Progress in Literature Retrieval Methods and Appearance of Full Text Electronic Journal." Igaku Toshokan 50, no. 3 (2003): 226–29. http://dx.doi.org/10.7142/igakutoshokan.50.226.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

Ayyavaraiah, Monelli, and Dr Bondu Venkateswarlu. "Joint graph regularization based semantic analysis for cross-media retrieval: a systematic review." International Journal of Engineering & Technology 7, no. 2.7 (March 18, 2018): 257. http://dx.doi.org/10.14419/ijet.v7i2.7.10592.

Full text
Abstract:
The large number of heterogeneous data are rapidly increasing in the internet and most data consist of audio, video, text and images. The searching of the required data from the large database is difficult and time taking process. The single media retrieval is used to get the needed data from the large dataset and it has the drawback, it can only retrieve the single media only. If the query is given as the text and acquired result are present in text. The users demand the cross-media retrieval for their queries and it is very consistent in providing the result. This helps the users to get more information regarding to their queries. Finding the similarities between the heterogeneous data is very complex. Many research is done on the cross-media retrieval with different methods and provide the different result. The aim is to analysis the different cross-media retrieval with the joint graph regularization (JGR) to understand the various technique. The most of researches are using the parameter of MAP, precision and recall for their research.
APA, Harvard, Vancouver, ISO, and other styles
11

Periyasamy, A. R. Pon. "Reversible N-grams Stemming Stripping Algorithm for Classification of Text Data." International Journal of Advanced Research in Computer Science and Software Engineering 7, no. 7 (July 30, 2017): 465. http://dx.doi.org/10.23956/ijarcsse/v7i4/0210.

Full text
Abstract:
Abstract—Stemming methods traces the root or stem of a word that is possibly used for Information retrieval (IR) tasks for increasing the recall rate to enhance most relevant searches. There are numerous ways ranging from manual and automatic, language dependent to language independent of methods available for performing the task of stemming. Those algorithms are designed for the purpose of overcoming the challenges involved with the existing methods. This paper represents a comparative study of various available stemming algorithms widely used to enhance the effectiveness and efficiency of information retrieval.
APA, Harvard, Vancouver, ISO, and other styles
12

Hardoon, David R., Sandor Szedmak, and John Shawe-Taylor. "Canonical Correlation Analysis: An Overview with Application to Learning Methods." Neural Computation 16, no. 12 (December 1, 2004): 2639–64. http://dx.doi.org/10.1162/0899766042321814.

Full text
Abstract:
We present a general method using kernel canonical correlation analysis to learn a semantic representation to web images and their associated text. The semantic space provides a common representation and enables a comparison between the text and images. In the experiments, we look at two approaches of retrieving images based on only their content from a text query. We compare orthogonalization approaches against a standard cross-representation retrieval technique known as the generalized vector space model.
APA, Harvard, Vancouver, ISO, and other styles
13

Boban, Ivan, Alen Doko, and Sven Gotovac. "Improving Sentence Retrieval Using Sequence Similarity." Applied Sciences 10, no. 12 (June 23, 2020): 4316. http://dx.doi.org/10.3390/app10124316.

Full text
Abstract:
Sentence retrieval is an information retrieval technique that aims to find sentences corresponding to an information need. It is used for tasks like question answering (QA) or novelty detection. Since it is similar to document retrieval but with a smaller unit of retrieval, methods for document retrieval are also used for sentence retrieval like term frequency—inverse document frequency (TF-IDF), BM 25 , and language modeling-based methods. The effect of partial matching of words to sentence retrieval is an issue that has not been analyzed. We think that there is a substantial potential for the improvement of sentence retrieval methods if we consider this approach. We adapted TF-ISF, BM 25 , and language modeling-based methods to test the partial matching of terms through combining sentence retrieval with sequence similarity, which allows matching of words that are similar but not identical. All tests were conducted using data from the novelty tracks of the Text Retrieval Conference (TREC). The scope of this paper was to find out if such approach is generally beneficial to sentence retrieval. However, we did not examine in depth how partial matching helps or hinders the finding of relevant sentences.
APA, Harvard, Vancouver, ISO, and other styles
14

Dewan, Jaya H., and Sudeep D. Thepade. "Image Retrieval Using Low Level and Local Features Contents: A Comprehensive Review." Applied Computational Intelligence and Soft Computing 2020 (October 22, 2020): 1–20. http://dx.doi.org/10.1155/2020/8851931.

Full text
Abstract:
Billions of multimedia data files are getting created and shared on the web, mainly social media websites. The explosive increase in multimedia data, especially images and videos, has created an issue of searching and retrieving the relevant data from the archive collection. In the last few decades, the complexity of the image data has increased exponentially. Text-based image retrieval techniques do not meet the needs of the users due to the difference between image contents and text annotations associated with an image. Various methods have been proposed in recent years to tackle the problem of the semantic gap and retrieve images similar to the query specified by the user. Image retrieval based on image contents has attracted many researchers as it uses the visual content of the image such as color, texture, and shape feature. The low-level image features represent the image contents as feature vectors. The query image feature vector is compared with the dataset images feature vectors to retrieve similar images. The main aim of this article is to appraise the various image retrieval methods based on feature extraction, description, and matching content that has been presented in the last 10–15 years based on low-level feature contents and local features and proposes a promising future research direction for researchers.
APA, Harvard, Vancouver, ISO, and other styles
15

KISHIDA, Kazuaki. "Empirical examination on performance of some statistical methods for Japanese text retrieval by using large test collection." Proceedings of Annual Conference, Japan Society of Information and Knowledge 8 (2000): 61–64. http://dx.doi.org/10.2964/jsikproc.8.0_61.

Full text
APA, Harvard, Vancouver, ISO, and other styles
16

Veisi, Hadi, and Hamed Fakour Shandi. "A Persian Medical Question Answering System." International Journal on Artificial Intelligence Tools 29, no. 06 (September 2020): 2050019. http://dx.doi.org/10.1142/s0218213020500190.

Full text
Abstract:
A question answering system is a type of information retrieval that takes a question from a user in natural language as the input and returns the best answer to it as the output. In this paper, a medical question answering system in the Persian language is designed and implemented. During this research, a dataset of diseases and drugs is collected and structured. The proposed system includes three main modules: question processing, document retrieval, and answer extraction. For the question processing module, a sequential architecture is designed which retrieves the main concept of a question by using different components. In these components, rule-based methods, natural language processing, and dictionary-based techniques are used. In the document retrieval module, the documents are indexed and searched using the Lucene library. The retrieved documents are ranked using similarity detection algorithms and the highest-ranked document is selected to be used by the answer extraction module. This module is responsible for extracting the most relevant section of the text in the retrieved document. During this research, different customized language processing tools such as part of speech tagger and lemmatizer are also developed for Persian. Evaluation results show that this system performs well for answering different questions about diseases and drugs. The accuracy of the system for 500 sample questions is 83.6%.
APA, Harvard, Vancouver, ISO, and other styles
17

Ormeño, Pablo, Marcelo Mendoza, and Carlos Valle. "Topic Models Ensembles for AD-HOC Information Retrieval." Information 12, no. 9 (September 1, 2021): 360. http://dx.doi.org/10.3390/info12090360.

Full text
Abstract:
Ad hoc information retrieval (ad hoc IR) is a challenging task consisting of ranking text documents for bag-of-words (BOW) queries. Classic approaches based on query and document text vectors use term-weighting functions to rank the documents. Some of these methods’ limitations consist of their inability to work with polysemic concepts. In addition, these methods introduce fake orthogonalities between semantically related words. To address these limitations, model-based IR approaches based on topics have been explored. Specifically, topic models based on Latent Dirichlet Allocation (LDA) allow building representations of text documents in the latent space of topics, the better modeling of polysemy and avoiding the generation of orthogonal representations between related terms. We extend LDA-based IR strategies using different ensemble strategies. Model selection obeys the ensemble learning paradigm, for which we test two successful approaches widely used in supervised learning. We study Boosting and Bagging techniques for topic models, using each model as a weak IR expert. Then, we merge the ranking lists obtained from each model using a simple but effective top-k list fusion approach. We show that our proposal strengthens the results in precision and recall, outperforming classic IR models and strong baselines based on topic models.
APA, Harvard, Vancouver, ISO, and other styles
18

Ye, Shuyun, John A. Dawson, and Christina Kendziorski. "Extending Information Retrieval Methods to Personalized Genomic-Based Studies of Disease." Cancer Informatics 13s7 (January 2014): CIN.S16354. http://dx.doi.org/10.4137/cin.s16354.

Full text
Abstract:
Genomic-based studies of disease now involve diverse types of data collected on large groups of patients. A major challenge facing statistical scientists is how best to combine the data, extract important features, and comprehensively characterize the ways in which they affect an individual's disease course and likelihood of response to treatment. We have developed a survival-supervised latent Dirichlet allocation (survLDA) modeling framework to address these challenges. Latent Dirichlet allocation (LDA) models have proven extremely effective at identifying themes common across large collections of text, but applications to genomics have been limited. Our framework extends LDA to the genome by considering each patient as a “document” with “text” detailing his/her clinical events and genomic state. We then further extend the framework to allow for supervision by a time-to-event response. The model enables the efficient identification of collections of clinical and genomic features that co-occur within patient subgroups, and then characterizes each patient by those features. An application of survLDA to The Cancer Genome Atlas ovarian project identifies informative patient subgroups showing differential response to treatment, and validation in an independent cohort demonstrates the potential for patient-specific inference.
APA, Harvard, Vancouver, ISO, and other styles
19

Ghanem, Khadoudja. "Local and Global Latent Semantic Analysis for Text Categorization." International Journal of Information Retrieval Research 4, no. 3 (July 2014): 1–13. http://dx.doi.org/10.4018/ijirr.2014070101.

Full text
Abstract:
In this paper the authors propose a semantic approach to document categorization. The idea is to create for each category a semantic index (representative term vector) by performing a local Latent Semantic Analysis (LSA) followed by a clustering process. A second use of LSA (Global LSA) is adopted on a term-Class matrix in order to retrieve the class which is the most similar to the query (document to classify) in the same way where the LSA is used to retrieve documents which are the most similar to a query in Information Retrieval. The proposed system is evaluated on a popular dataset which is 20 Newsgroup corpus. Obtained results show the effectiveness of the method compared with those obtained with the classic KNN and SVM classifiers as well as with methods presented in the literature. Experimental results show that the new method has high precision and recall rates and classification accuracy is significantly improved.
APA, Harvard, Vancouver, ISO, and other styles
20

Brandt, Cynthia, and Prakash Nadkarni. "Web-based UMLS concept retrieval by automatic text scanning: a comparison of two methods." Computer Methods and Programs in Biomedicine 64, no. 1 (January 2001): 37–43. http://dx.doi.org/10.1016/s0169-2607(00)00092-4.

Full text
APA, Harvard, Vancouver, ISO, and other styles
21

Liu, Chang. "Research on Words Segmentation Technology in Chinese Full Text Retrieval System." Applied Mechanics and Materials 411-414 (September 2013): 313–16. http://dx.doi.org/10.4028/www.scientific.net/amm.411-414.313.

Full text
Abstract:
In order to improve the speed of Chinese full-text retrieval in the premise of ensuring Chinese ambiguity inclusion and length limitation, this paper introduces the application methods of Chinese full-text retrieval system and the current application situation of Chinese word segmentation technology. Based on the existed word segmentation algorithms, this paper proposed an improved Chinese word segmentation algorithm. In the proposed method, the procedure of indexing is to construct the map between the relative words in the context and the dictionary. This paper improves the diction to realize better mapping with relative words, so as to realize Chinese words segmentation. The experiments demonstrate that the proposed Chinese full-text words segmentation algorithm is more effective than the existing methods.
APA, Harvard, Vancouver, ISO, and other styles
22

Marton, Christine F. "Salton and Buckley’s Landmark Research in Experimental Text Information Retrieval." Evidence Based Library and Information Practice 6, no. 4 (December 15, 2011): 169. http://dx.doi.org/10.18438/b87032.

Full text
Abstract:
Objectives – To compare the performance of the vector space model and the probabilistic weighting model of relevance feedback for the overall purpose of determining the most useful relevance feedback procedures. The amount of improvement that can be obtained from searching several test document collections with only one feedback iteration of each relevance feedback model was measured. Design – The experimental design consisted of 72 different tests: 2 different relevance feedback methods, each with 6 permutations, on 6 test document collections of various sizes. A residual collection method was utilized to ascertain the “true advantage provided by the relevance feedback process.” (Salton & Buckley, 1990, p. 293) Setting – Department of Computer Science at Cornell University. Subjects – Six test document collections. Methods – Relevance feedback is an effective technique for query modification that provides significant improvement in search performance. Relevance feedback entails both “term reweighting,” the modification of term weights based on term use in retrieved relevant and non-relevant documents, and “query expansion,” which is the addition of new terms from relevant documents retrieved (Harman, 1992). Salton and Buckley (1990) evaluated two established relevance feedback models based on the vector space model (a spatial model) and the probabilistic model, respectively. Harman (1992) describes the two key differences between these competing models of relevance feedback. [The vector space model merges] document vectors and original query vectors. This automatically reweights query terms by adding the weights from the actual occurrence of those query terms in the relevant documents, and subtracting the weights of those terms occurring in the non-relevant documents. Queries are automatically expanded by adding all the terms not in the original query that are in the relevant documents and non-relevant documents. They are expanded using both positive and negative weights based on whether the terms are coming from relevant or non-relevant documents. Yet, no new terms are actually added with negative weights; the contribution of non-relevant document terms is to modify the weighting of new terms coming from relevant documents. . . . The probabilistic model . . . is based on the distribution of query terms in relevant and non-relevant documents, This is expressed as a term weight, with the rank of each retrieved document then being the sum of the term weights for terms contained in the document that match query terms. (pp. 1-2) Second, while the vector space model “has an inherent relationship between term reweighting and query expansion” (p. 2), the probabilistic model does not. Thus, query expansion is optional, but given its usefulness, various schemes have been proposed for expanding queries using terms from retrieved relevant documents. In the Salton and Buckley study 3 versions of each of the two relevance feedback methods were utilized, with two different levels of query expansion, and run on 6 different test collections. More specifically, they queried test collections that ranged in size from small to large, and that represented different domains of knowledge, including medicine and engineering with 72 experimental runs in total. Salton and Buckley examined 3 variants of the vector space model, the second and third of which were based on the first. The first model was the classic Rocchio algorithm (1971), which uses reduced document weights to modify the queries. The second model was the “Ide regular” algorithm, which reweights both relevant and non-relevant query terms (Ide, 1971). And the third model was the “Ide dec-hi” algorithm, which reweights all identified relevant items but only one retrieved nonrelevant item, the one retrieved first in the initial set of search results (Ide & Salton, 1971). As well, 3 variants of the probabilistic model developed by S.E. Robertson (Robertson, 1986; Robertson & Spark Jones, 1976; Robertson, van Rijsbergen, & Porter, 1981; Yu, Buckley, Lam, & Salton, 1983) were examined: the conventional probabilistic approach with a 0.5 adjustment factor, the adjusted probabilistic derivation with a different adjustment factor, and finally an adjusted derivation with enhanced query term weights. The 6 vector space model and probabilistic model relevance feedback techniques are described in Table 3 (p. 293). The performance of the first iteration feedback searches were compared solely with the results of the initial searches performed with the original query statements. The first 15 documents retrieved from the initial searches were judged for relevance by the researchers and the terms contained in these relevant and non-relevant retrieved items were used to construct the feedback queries. The authors utilized the residual collection system, which entails the removal of all items previously seen by the searcher (whether relevant or not), and to evaluate both the initial and any subsequent queries for the reduced collection only. Both multi-valued (partial) and binary weights (1=relevant, 0=non-relevant) were used on the document terms (Table 6, p. 296). Also, two types of query expansion method were applied: expanded by the most common terms and expanded by all terms (Table 4, p. 294). While not using any query expansion and relying solely on reweighting relevant and non-relevant query terms is possible, this option was not examined. Three measures were calculated to assess relative relevance feedback performance, the rank order (recall-precision value); search precision (with respect to the average precision at 3 particular recall points of 0.75, 0.50, and 0.25), and the percentage improvement in the 3-point precision feedback and original searches. Main Results – The best results are produced by the same relevance feedback models for all test collections examined, and conversely, the poorest results are produced by the same relevance feedback models, (Tables 4, 5, and 6, pp. 294-296). In other words, all 3 relevance feedback algorithms based on the vector space retrieval model outperformed the 3 relevance feedback algorithms based on the probabilistic retrieval model, with the best relevance feedback results obtained for the “Ide dec hi” model. This finding suggests that improvements in relevance from term reweighting are attributable primarily to reweighting relevant terms. However, the probabilistic method with adjusted derivation, specifically considering the extra weight assignments for query terms, was almost as effective as the vector space model relevance feedback algorithms. Paired comparisons between full query expansion (all terms from the initial search are utilized in the feedback query) and partial query expansion by the most common terms from the relevant items, demonstrate that full expansion is better, however, the difference between expansion methods is small. Conclusions – Relevance feedback methods that reformulate the initial query by reweighting existing query terms and adding new terms (query expansion) can greatly improve the relevance of search results after only one feedback iteration. The amount of improvement achieved was highly variable across the 6 test collections, from 50% to 150% in the 3-point precision. Other variables thought to influence relevance feedback performance were initial query length, characteristics of the collection, including the specificity of the terms in the collection, the size of the collection (number of documents), and average term frequency in documents. The authors recommend that the relevance feedback process be incorporated into operational text retrieval systems.
APA, Harvard, Vancouver, ISO, and other styles
23

Sheng, Shurong, Katrien Laenen, Luc Van Gool, and Marie-Francine Moens. "Fine-Grained Cross-Modal Retrieval for Cultural Items with Focal Attention and Hierarchical Encodings." Computers 10, no. 9 (August 25, 2021): 105. http://dx.doi.org/10.3390/computers10090105.

Full text
Abstract:
In this paper, we target the tasks of fine-grained image–text alignment and cross-modal retrieval in the cultural heritage domain as follows: (1) given an image fragment of an artwork, we retrieve the noun phrases that describe it; (2) given a noun phrase artifact attribute, we retrieve the corresponding image fragment it specifies. To this end, we propose a weakly supervised alignment model where the correspondence between the input training visual and textual fragments is not known but their corresponding units that refer to the same artwork are treated as a positive pair. The model exploits the latent alignment between fragments across modalities using attention mechanisms by first projecting them into a shared common semantic space; the model is then trained by increasing the image–text similarity of the positive pair in the common space. During this process, we encode the inputs of our model with hierarchical encodings and remove irrelevant fragments with different indicator functions. We also study techniques to augment the limited training data with synthetic relevant textual fragments and transformed image fragments. The model is later fine-tuned by a limited set of small-scale image–text fragment pairs. We rank the test image fragments and noun phrases by their intermodal similarity in the learned common space. Extensive experiments demonstrate that our proposed models outperform two state-of-the-art methods adapted to fine-grained cross-modal retrieval of cultural items for two benchmark datasets.
APA, Harvard, Vancouver, ISO, and other styles
24

Naveena, A. K., and N. K. Narayanan. "Improving Image Search through MKFCM Clustering Strategy-Based Re-ranking Measure." Journal of Intelligent Systems 29, no. 1 (April 14, 2018): 497–514. http://dx.doi.org/10.1515/jisys-2017-0227.

Full text
Abstract:
Abstract The main intention of this research is to develop a novel ranking measure for content-based image retrieval system. Owing to the achievement of data retrieval, most commercial search engines still utilize a text-based search approach for image search by utilizing encompassing textual information. As the text information is, in some cases, noisy and even inaccessible, the drawback of such a recovery strategy is to the extent that it cannot depict the contents of images precisely, subsequently hampering the execution of image search. In order to improve the performance of image search, we propose in this work a novel algorithm for improving image search through a multi-kernel fuzzy c-means (MKFCM) algorithm. In the initial step of our method, images are retrieved using four-level discrete wavelet transform-based features and the MKFCM clustering algorithm. Next, the retrieved images are analyzed using fuzzy c-means clustering methods, and the rank of the results is adjusted according to the distance of a cluster from a query. To improve the ranking performance, we combine the retrieved result and ranking result. At last, we obtain the ranked retrieved images. In addition, we analyze the effects of different clustering methods. The effectiveness of the proposed methodology is analyzed with the help of precision, recall, and F-measures.
APA, Harvard, Vancouver, ISO, and other styles
25

Jia, Shi Jie, Yan Ping Yang, Jian Ying Zhao, and Nan Xiao. "Pyramid Histograms of Orientated Gradients for Product Image Retrieval." Advanced Materials Research 383-390 (November 2011): 5712–16. http://dx.doi.org/10.4028/www.scientific.net/amr.383-390.5712.

Full text
Abstract:
Traditional text-based image retrieval methods are hard to meet the requirements of on-line product search. This paper applied Content Based Image Retrieval (CBIR) technologies to e-commerce field and designed a product image retrieval algorithm based on Pyramid Histograms of Orientated Gradients (PHOG) descriptor and chi-square distance. By constructing the image retrieval system, we made retrieval tests on PI100 dataset from Microsoft Research Asia. The experimental results proved the efficiency of this algorithm.
APA, Harvard, Vancouver, ISO, and other styles
26

Starling, Daniela Siqueira Veloso, Bruna Fernanda Tolentino Moreira, and Antônio Jaeger. "Retrieval practice as a learning strategy for individuals with Down syndrome A preliminary study." Dementia & Neuropsychologia 13, no. 1 (March 2019): 104–10. http://dx.doi.org/10.1590/1980-57642018dn13-010012.

Full text
Abstract:
ABSTRACT. Remembering recently studied materials (i.e., retrieval practice) is more beneficial for learning than restudying these materials. Objective: To investigate whether retrieval practice benefits learning for individuals with Down syndrome. Methods: Eighteen individuals with Down syndrome (mean age=21.61 years, SD=5.93) performed a task entailing a first read of an encyclopedic text covering a series of target words. After reading the text twice, participants recalled half of the target words (retrieval practice), and reread the other half (restudy). After 48 hours, participants answered a multiple-choice test including all target words. Subsequently, WASI’s Vocabulary and Matrix reasoning subtests were administered to estimate intelligence. Results: The benefit of retrieval practice for learning was numerically greater than the benefit of restudy, although this advantage did not reach statistical significance. Inspection of individual data suggested that the benefit of retrieval practice was greater than the benefit of restudy for the majority of the participants, independently of the participants’ vocabulary or reasoning abilities. Conclusion: Although more research is needed before retrieval practice can be recommended as a learning strategy for individuals with Down syndrome, the data suggest that retrieval practice can be a useful teaching tool for at least part of this population.
APA, Harvard, Vancouver, ISO, and other styles
27

ATLAM, ELSAYED. "A NEW APPROACH FOR TEXT SIMILARITY USING ARTICLES." International Journal of Information Technology & Decision Making 07, no. 01 (March 2008): 23–34. http://dx.doi.org/10.1142/s021962200800279x.

Full text
Abstract:
Conventional approaches to text analysis and information retrieval which measured document similarity by considering all information in texts are relatively inefficiency for processing large text collections in heterogeneous subject areas. Previous researches showed that evidence from passage can improve retrieval results. But it also raised questions about how passage is defined, how they can be ranked efficiently, and what is their proper rule in long structure documents. Moreover, the frequency of "the" with important sentence is efficiently to summarize the text by dexterity way. We previously proposed an approach for extracting sentences which including article "the" by some restrict rules to carry out effectiveness passages. Based on previous approaches, this paper presents a new Passage SIMilarity (P-SIM) measurements between documents based on effectiveness passages after extracting them using article "the". Moreover, our new approach showing that this method is more efficient than traditional methods. Also, Recall and Precision are achieved by 92.6% and 97.5% respectively, depending on extracted passages. Furthermore, Recall and Precision significantly improved by 38.3% and 44.2% over the traditional method. The proposed methods are applied to 3,990 articles from the large tagged corpus.
APA, Harvard, Vancouver, ISO, and other styles
28

Hersh, William R., and David H. Hickam. "A Comparison of Two Methods for Indexing and Retrieval from a Full-text Medical Database." Medical Decision Making 13, no. 3 (August 1993): 220–26. http://dx.doi.org/10.1177/0272989x9301300308.

Full text
APA, Harvard, Vancouver, ISO, and other styles
29

Patil, Sheetal Deepak. "Content Based Image and Video Retrieval A Compressive Review." International Journal of Engineering and Advanced Technology 10, no. 5 (June 30, 2021): 243–47. http://dx.doi.org/10.35940/ijeat.e2783.0610521.

Full text
Abstract:
Content-based image retrieval is quickly becoming the most common method of searching vast databases for images, giving researchers a lot of room to develop new techniques and systems. Likewise, another common application in the field of computer vision is content-based visual information retrieval. For image and video retrieval, text-based search and Web-based image reranking have been the most common methods. Though Content Based Video Systems have improved in accuracy over time, they still fall short in interactive search. The use of these approaches has exposed shortcomings such as noisy data and inaccuracy, which often result in the showing of irrelevant images or videos. The authors of the proposed study integrate image and visual data to improve the precision of the retrieved results for both photographs and videos. In response to a user's query, this study investigates alternative ways for fetching high-quality photos and related videos.
APA, Harvard, Vancouver, ISO, and other styles
30

Hussain, D. Mansoor, D. Surendran, and A. Benazir Begum. "Feature Extraction in JPEG domain along with SVM for Content Based Image Retrieval." International Journal of Engineering & Technology 7, no. 2.19 (April 17, 2018): 1. http://dx.doi.org/10.14419/ijet.v7i2.19.11656.

Full text
Abstract:
Content Based Image Retrieval (CBIR) applies computer vision methods for image retreival purposes from the databases. It is majorly based on the user query, which is in visual form rather than the traditional text form. CBIR is applied in different fields extending from surveillance to remote sensing, E-purchase, medical image processing, security systems to historical research and many others. JPEG, a very commonly used method of lossy compression is used to reduce the size of the image before being stored or transmitted. Almost every digital camera in the market are storing the captured images in jpeg format. The storage industry has seen many major transformations in the past decades while the retrieval technologies are still developing. Though there are some breakthroughs happened in text retrieval, the same is not true for the image and other multimedia retrieval. Specifically image retreival has witnessed many algorithms in the spatial or the raw domain but since majority of the images are stored in the JPEG format, it takes time to decode the compressed image before extracting features and retrieving. Hence, in this research work, we focus on extracting the features from the compressed domain itself and then utilize support vector machines (SVM) for improving the retrieval results. Our proof of concept shows us that the features extracted in compressed domain helps retrieve the images 43% faster than the same set of images in the spatial domain and the accuracy is improved to 93.4% through SVM based feedback mechanism.
APA, Harvard, Vancouver, ISO, and other styles
31

Deepa, K., and K. Priyanka. "Image salvage based on visual courtesy model using ROI." International Journal of Engineering & Technology 7, no. 2.26 (May 7, 2018): 63. http://dx.doi.org/10.14419/ijet.v7i2.26.12536.

Full text
Abstract:
The process of demonstrating, organizing and evaluating the pictures regarding the information despite of evaluating pictures is the field of Content Based Image Retrieval (CBIR). Here we work on the salvage of images based not on keywords or explanations but on features haul out directly from the image data. The well-organized algorithms of salvage algorithms are already proposed. Content Based Image Retrieval has replaced Text Based Image Retrieval. CBIR is processed by more methods and research scientists are working to improve the accuracy of the technique. The project presents that the ROI from an image is retrieved and it retains the image based on Teacher Learning Based Optimization genetic algorithm. The retrieval of the image improves the efficiency based on two metrics such as precision and recall which is the main advantage of the project. The issue of Content Based Image Retrieval systems to provide the semantic gap and to determine the variation between the structure of visual objects and definition of semantics. From the human visual system the visual courtesy is more projected for the purpose of Content Based Image Retrieval. The new similarity based matching method is described based on the saliency map which retains the courtesy values and the regions of interest are hauled out.
APA, Harvard, Vancouver, ISO, and other styles
32

Hodhod, Rania, Brian Magerko, and Mohamed Gawish. "Pharaoh." International Journal of Information Retrieval Research 2, no. 3 (July 2012): 58–71. http://dx.doi.org/10.4018/ijirr.2012070104.

Full text
Abstract:
Cognitive scripts can act as a basis for representing behavioral tasks and domain knowledge in cognitive systems. Each event in a cognitive script is either temporally or causally linked with preceding and succeeding events. This temporal progression of events is what provides context to a particular cognitive script. In other words, it is this linking that provides a deeper explanation of a key event by defining the settings in which this event occurs (i.e. preceding and succeeding events). Contextual information plays a significant role in the retrieval process of cognitive scripts and needs to be considered in the retrieving process of cognitive scripts from large search spaces. Standard retrieval methods have been used on various unstructured data objects, such as text documents, images, audio, mind maps or videos. Other representations appear in logic-based languages that provide a structure that supports information retrieval based on logical reasoning, such as the Web Ontology Language. However, the application of these methods to structured cognitive scripts is not ideal because of the type of contextual information in cognitive scripts. This article presents Pharaoh, a novel context-based retrieval algorithm for cognitive scripts that can be employed in cognitive systems. Pharaoh relies on semantic structure and keyword-based retrieval to retrieve similar cognitive scripts based on a novel similarity measure between a structured query cognitive script and registered cognitive scripts.
APA, Harvard, Vancouver, ISO, and other styles
33

RUOTSALO, TUUKKA, and MATIAS FROSTERUS. "DIVERSIFYING SEMANTIC ENTITY SEARCH: INDEPENDENT COMPONENT ANALYSIS APPROACH." International Journal of Semantic Computing 07, no. 04 (December 2013): 407–26. http://dx.doi.org/10.1142/s1793351x13400138.

Full text
Abstract:
Structured Web data are increasingly accessed using information retrieval methods and information retrieval increasingly relies on structured background knowledge. As users' searches are often directed towards finding information about entities rather than text documents, a key affordance of semantic search is the ability to retrieve relevant information about entities more precisely by utilizing the rich structured descriptions and background knowledge. Entity search also poses challenges for information retrieval methods. Entity descriptions are often short and conventional search term matching alone can be insufficient. As a consequence, the search engine should be able to increase the recall of the returned results and select a representative set of entities for a user; to diversify search results. This paper presents an approach to diversify entity search by using semantics present and inferred from the initial entity search results. Our approach utilizes ontologies as a source of background knowledge to improve recall of entity retrieval and independent component analysis to detect independent latent components shared by the entities. The search results are then diversified by selecting a representative set of entities based on their membership in the independent components. We demonstrate the performance of our approach through retrieval experiments conducted by using a real-world dataset composed from four entity databases. The results suggest that our approach can significantly improve effectiveness and diversity of entity search.
APA, Harvard, Vancouver, ISO, and other styles
34

Firoozeh, Nazanin, Adeline Nazarenko, Fabrice Alizon, and Béatrice Daille. "Keyword extraction: Issues and methods." Natural Language Engineering 26, no. 3 (November 11, 2019): 259–91. http://dx.doi.org/10.1017/s1351324919000457.

Full text
Abstract:
AbstractDue to the considerable growth of the volume of text documents on the Internet and in digital libraries, manual analysis of these documents is no longer feasible. Having efficient approaches to keyword extraction in order to retrieve the ‘key’ elements of the studied documents is now a necessity. Keyword extraction has been an active research field for many years, covering various applications in Text Mining, Information Retrieval, and Natural Language Processing, and meeting different requirements. However, it is not a unified domain of research. In spite of the existence of many approaches in the field, there is no single approach that effectively extracts keywords from different data sources. This shows the importance of having a comprehensive review, which discusses the complexity of the task and categorizes the main approaches of the field based on the features and methods of extraction that they use. This paper presents a general introduction to the field of keyword/keyphrase extraction. Unlike the existing surveys, different aspects of the problem along with the main challenges in the field are discussed. This mainly includes the unclear definition of ‘keyness’, complexities of targeting proper features for capturing desired keyness properties and selecting efficient extraction methods, and also the evaluation issues. By classifying a broad range of state-of-the-art approaches and analysing the benefits and drawbacks of different features and methods, we provide a clearer picture of them. This review is intended to help readers find their way around all the works related to keyword extraction and guide them in choosing or designing a method that is appropriate for the application they are targeting.
APA, Harvard, Vancouver, ISO, and other styles
35

Budikova, Petra, Jan Sedmidubsky, Jan Horvath, and Pavel Zezula. "Efficient Retrieval of Human Motion Episodes Based on Indexed Motion-Word Representations." International Journal of Semantic Computing 15, no. 02 (June 2021): 189–213. http://dx.doi.org/10.1142/s1793351x21400031.

Full text
Abstract:
With the increasing availability of human motion data captured in the form of 2D or 3D skeleton sequences, more complex motion recordings need to be processed. In this paper, we focus on similarity-based indexing and efficient retrieval of motion episodes — medium-sized skeleton sequences that consist of multiple semantic actions and correspond to some logical motion unit (e.g. a figure skating performance). As a first step toward efficient retrieval, we apply the motion-word technique to transform spatio-temporal skeleton sequences into compact text-like documents. Based on these documents, we introduce a two-phase retrieval scheme that first finds a set of candidate query results and then re-ranks these candidates with more expensive application-specific methods. We further index the motion-word documents using inverted files, which allows us to retrieve the candidate documents in an efficient and scalable manner. We also propose additional query-reduction techniques that accelerate both the retrieval phases by removing semantically irrelevant parts of the motion query. Experimental evaluation is used to analyze the effects of the individual proposed techniques on the retrieval efficiency and effectiveness.
APA, Harvard, Vancouver, ISO, and other styles
36

Wang, Yongyue, Beitong Yao, Tianbo Wang, Chunhe Xia, and Xianghui Zhao. "A Cognitive Method for Automatically Retrieving Complex Information on a Large Scale." Sensors 20, no. 11 (May 28, 2020): 3057. http://dx.doi.org/10.3390/s20113057.

Full text
Abstract:
Modern retrieval systems tend to deteriorate because of their large output of useless and even misleading information, especially for complex search requests on a large scale. Complex information retrieval (IR) tasks requiring multi-hop reasoning need to fuse multiple scattered text across two or more documents. However, there are two challenges for multi-hop retrieval. To be specific, the first challenge is that since some important supporting facts have little lexical or semantic relationship with the retrieval query, the retriever often omits them; the second challenge is that once a retriever chooses misinformation related to the query as the entities of cognitive graphs, the retriever will fail. In this study, in order to improve the performance of retrievers in complex tasks, an intelligent sensor technique was proposed based on a sub-scope with cognitive reasoning (2SCR-IR), a novel method of retrieving reasoning paths over the cognitive graph to provide users with verified multi-hop reasoning chains. Inspired by the users’ process of step-by-step searching online, 2SCR-IR includes a dynamic fusion layer that starts from the entities mentioned in the given query, explores the cognitive graph dynamically built from the query and contexts, gradually finds relevant supporting entities mentioned in the given documents, and verifies the rationality of the retrieval facts. Our experimental results show that 2SCR-IR achieves competitive results on the HotpotQA full wiki and distractor settings, and outperforms the previous state-of-the-art methods by a more than two points absolute gain on the full wiki setting.
APA, Harvard, Vancouver, ISO, and other styles
37

Tulowiecki, Stephen J. "Information retrieval in physical geography." Progress in Physical Geography: Earth and Environment 42, no. 3 (April 29, 2018): 369–90. http://dx.doi.org/10.1177/0309133318770972.

Full text
Abstract:
Information retrieval (IR) methods seek to locate meaningful documents in large collections of textual and other data. Few studies apply these techniques to discover descriptions in historical documents for physical geography applications. This absence is noteworthy given the use of qualitative historical descriptions in physical geography and the amount of historical documentation online. This study, therefore, introduces an IR approach for finding meaningful and geographically resolved historical descriptions in large digital collections of historical documents. Presenting a biogeography application, it develops a ‘search engine’ using a boosted regression trees (BRT) model to assist in finding forest compositional descriptions (FCDs) based on textual features in a collection of county histories. The study then investigates whether FCDs corroborate existing estimates of relative abundances and spatial distributions of tree taxa from presettlement land survey records (PLSRs) and existing range maps. The BRT model is trained using portions of text from 458 US county histories. Evaluating the model’s performance upon a spatially independent test dataset, the model helps discover 97.5% of FCDs while reducing the amount of text to search through to 0.3% of total. The prevalence rank of taxa in FCDs (i.e. the number of times a taxon is mentioned at least once in an FCD, divided by the total number of FCDs, then ranked) is strongly related to the abundance rank in PLSRs. Patterns in species mentions from FCDs generally match relative abundance patterns from PLSRs. However, analyses suggest that FCDs contain biases towards large and economically valuable tree taxa and against smaller taxa. In the end, the study demonstrates the potential of IR approaches for developing novel datasets over large geographic areas, corroborating existing historical datasets, and providing spatial coverage of historic phenomena.
APA, Harvard, Vancouver, ISO, and other styles
38

Hua, Yan, Yingyun Yang, and Jianhe Du. "Deep Multi-Modal Metric Learning with Multi-Scale Correlation for Image-Text Retrieval." Electronics 9, no. 3 (March 10, 2020): 466. http://dx.doi.org/10.3390/electronics9030466.

Full text
Abstract:
Multi-modal retrieval is a challenge due to heterogeneous gap and a complex semantic relationship between different modal data. Typical research map different modalities into a common subspace with a one-to-one correspondence or similarity/dissimilarity relationship of inter-modal data, in which the distances of heterogeneous data can be compared directly; thus, inter-modal retrieval can be achieved by the nearest neighboring search. However, most of them ignore intra-modal relations and complicated semantics between multi-modal data. In this paper, we propose a deep multi-modal metric learning method with multi-scale semantic correlation to deal with the retrieval tasks between image and text modalities. A deep model with two branches is designed to nonlinearly map raw heterogeneous data into comparable representations. In contrast to binary similarity, we formulate semantic relationship with multi-scale similarity to learn fine-grained multi-modal distances. Inter-modal and intra-modal correlations constructed on multi-scale semantic similarity are incorporated to train the deep model in an end-to-end way. Experiments validate the effectiveness of our proposed method on multi-modal retrieval tasks, and our method outperforms state-of-the-art methods on NUS-WIDE, MIR Flickr, and Wikipedia datasets.
APA, Harvard, Vancouver, ISO, and other styles
39

Ravana, Sri Devi, Prabha Rajagopal, and Vimala Balakrishnan. "Ranking retrieval systems using pseudo relevance judgments." Aslib Journal of Information Management 67, no. 6 (November 16, 2015): 700–714. http://dx.doi.org/10.1108/ajim-03-2015-0046.

Full text
Abstract:
Purpose – In a system-based approach, replicating the web would require large test collections, and judging the relevancy of all documents per topic in creating relevance judgment through human assessors is infeasible. Due to the large amount of documents that requires judgment, there are possible errors introduced by human assessors because of disagreements. The paper aims to discuss these issues. Design/methodology/approach – This study explores exponential variation and document ranking methods that generate a reliable set of relevance judgments (pseudo relevance judgments) to reduce human efforts. These methods overcome problems with large amounts of documents for judgment while avoiding human disagreement errors during the judgment process. This study utilizes two key factors: number of occurrences of each document per topic from all the system runs; and document rankings to generate the alternate methods. Findings – The effectiveness of the proposed method is evaluated using the correlation coefficient of ranked systems using mean average precision scores between the original Text REtrieval Conference (TREC) relevance judgments and pseudo relevance judgments. The results suggest that the proposed document ranking method with a pool depth of 100 could be a reliable alternative to reduce human effort and disagreement errors involved in generating TREC-like relevance judgments. Originality/value – Simple methods proposed in this study show improvement in the correlation coefficient in generating alternate relevance judgment without human assessors while contributing to information retrieval evaluation.
APA, Harvard, Vancouver, ISO, and other styles
40

Liu, Sijia, Yanshan Wang, Andrew Wen, Liwei Wang, Na Hong, Feichen Shen, Steven Bedrick, William Hersh, and Hongfang Liu. "Implementation of a Cohort Retrieval System for Clinical Data Repositories Using the Observational Medical Outcomes Partnership Common Data Model: Proof-of-Concept System Validation." JMIR Medical Informatics 8, no. 10 (October 6, 2020): e17376. http://dx.doi.org/10.2196/17376.

Full text
Abstract:
Background Widespread adoption of electronic health records has enabled the secondary use of electronic health record data for clinical research and health care delivery. Natural language processing techniques have shown promise in their capability to extract the information embedded in unstructured clinical data, and information retrieval techniques provide flexible and scalable solutions that can augment natural language processing systems for retrieving and ranking relevant records. Objective In this paper, we present the implementation of a cohort retrieval system that can execute textual cohort selection queries on both structured data and unstructured text—Cohort Retrieval Enhanced by Analysis of Text from Electronic Health Records (CREATE). Methods CREATE is a proof-of-concept system that leverages a combination of structured queries and information retrieval techniques on natural language processing results to improve cohort retrieval performance using the Observational Medical Outcomes Partnership Common Data Model to enhance model portability. The natural language processing component was used to extract common data model concepts from textual queries. We designed a hierarchical index to support the common data model concept search utilizing information retrieval techniques and frameworks. Results Our case study on 5 cohort identification queries, evaluated using the precision at 5 information retrieval metric at both the patient-level and document-level, demonstrates that CREATE achieves a mean precision at 5 of 0.90, which outperforms systems using only structured data or only unstructured text with mean precision at 5 values of 0.54 and 0.74, respectively. Conclusions The implementation and evaluation of Mayo Clinic Biobank data demonstrated that CREATE outperforms cohort retrieval systems that only use one of either structured data or unstructured text in complex textual cohort queries.
APA, Harvard, Vancouver, ISO, and other styles
41

Roy, Anurag, Shalmoli Ghosh, Kripabandhu Ghosh, and Saptarshi Ghosh. "An Unsupervised Normalization Algorithm for Noisy Text: A Case Study for Information Retrieval and Stance Detection." Journal of Data and Information Quality 13, no. 3 (April 27, 2021): 1–25. http://dx.doi.org/10.1145/3418036.

Full text
Abstract:
A large fraction of textual data available today contains various types of “noise,” such as OCR noise in digitized documents, noise due to informal writing style of users on microblogging sites, and so on. To enable tasks such as search/retrieval and classification over all the available data, we need robust algorithms for text normalization, i.e., for cleaning different kinds of noise in the text. There have been several efforts towards cleaning or normalizing noisy text; however, many of the existing text normalization methods are supervised and require language-dependent resources or large amounts of training data that is difficult to obtain. We propose an unsupervised algorithm for text normalization that does not need any training data/human intervention. The proposed algorithm is applicable to text over different languages and can handle both machine-generated and human-generated noise. Experiments over several standard datasets show that text normalization through the proposed algorithm enables better retrieval and stance detection, as compared to that using several baseline text normalization methods.
APA, Harvard, Vancouver, ISO, and other styles
42

Wang, Xiao Bo, Fan Zhao, Xiao Li, and Rong Hui Zhang. "Cross Language Query Expansion Approach for CIMS Based on Weighted D-S Evidence Theory." Key Engineering Materials 620 (August 2014): 534–43. http://dx.doi.org/10.4028/www.scientific.net/kem.620.534.

Full text
Abstract:
With the Computer Integrated Manufacturing System and Information Technology rapid development, rapid retrieval multilingual becomes one of the hot spots in Machine Translation. The cross-language information retrieval (CLIR) provides a convenient way, enabling users to use their own familiar language to submit queries to retrieve documents in another language. Basic query expansion is one of the effective methods to improve recall of information retrieval. There are many researchers have proposed many extension methods, but most methods are simply added to the query expansion terms. If we do not distinguish the original query words and extended words, expanded query may deviate from the original semantics. So, it is very inconvenience for mechanical engineer and programmer. Based on Dempster-Shafer theory of evidence, we proposed a query expansion computing model, which considered as the main evidence of the original query terms, while the extensions as a secondary evidence of the original query terms. Which method to use semantic dictionary Han and Uygur-Chinese bilingual dictionary of synonyms forest and How to get the query word synonyms, near-synonyms and hypernym. Latent Semantic Analysis is used to obtain semantic relationships query words related words the using potentially large-scale text. The combination of these two types of evidence is in order to put forward a weighted combination of the Dempster-Shafer rule. Experimental results show that this method can effectively improve retrieval efficiency in Mechanical Engineering and Information Technology. The research results can be provided a reference for CIMS multilingual quick retrieval.
APA, Harvard, Vancouver, ISO, and other styles
43

Dhar, Dibyajyoti, Neelotpal Chakraborty, Sayan Choudhury, Ashis Paul, Ayatullah Faruk Mollah, Subhadip Basu, and Ram Sarkar. "Multilingual Scene Text Detection Using Gradient Morphology." International Journal of Computer Vision and Image Processing 10, no. 3 (July 2020): 31–43. http://dx.doi.org/10.4018/ijcvip.2020070103.

Full text
Abstract:
Text detection in natural scene images is an interesting problem in the field of information retrieval. Several methods have been proposed over the past few decades for scene text detection. However, the robustness and efficiency of these methods are downgraded due to high sensitivity towards various complexities of an image. Also, in multi-lingual environment where texts may occur in multiple languages, a method may not be suitable for detecting scene texts in certain languages. To counter these challenges, a gradient morphology-based method is proposed in this paper that proves to be robust against image complexities and efficiently detects scene texts irrespective of their languages. The method is validated using low quality images from standard multi-lingual datasets like MSRA-TD500 and MLe2e. The performance of the method is compared with that of some state-of-the-art methods, and comparably better results are observed.
APA, Harvard, Vancouver, ISO, and other styles
44

VILLATORO, ESAÚ, ANTONIO JUÁREZ, MANUEL MONTES, LUIS VILLASEÑOR, and L. ENRIQUE SUCAR. "Document ranking refinement using a Markov random field model." Natural Language Engineering 18, no. 2 (March 14, 2012): 155–85. http://dx.doi.org/10.1017/s1351324912000010.

Full text
Abstract:
AbstractThis paper introduces a novel ranking refinement approach based on relevance feedback for the task of document retrieval. We focus on the problem of ranking refinement since recent evaluation results from Information Retrieval (IR) systems indicate that current methods are effective retrieving most of the relevant documents for different sets of queries, but they have severe difficulties to generate a pertinent ranking of them. Motivated by these results, we propose a novel method to re-rank the list of documents returned by an IR system. The proposed method is based on a Markov Random Field (MRF) model that classifies the retrieved documents as relevant or irrelevant. The proposed MRF combines: (i) information provided by the base IR system, (ii) similarities among documents in the retrieved list, and (iii) relevance feedback information. Thus, the problem of ranking refinement is reduced to that of minimising an energy function that represents a trade-off between document relevance and inter-document similarity. Experiments were conducted using resources from four different tasks of the Cross Language Evaluation Forum (CLEF) forum as well as from one task of the Text Retrieval Conference (TREC) forum. The obtained results show the feasibility of the method for re-ranking documents in IR and also depict an improvement in mean average precision compared to a state of the art retrieval machine.
APA, Harvard, Vancouver, ISO, and other styles
45

Dong, Bin, Songlei Jian, and Kai Lu. "Learning Multimodal Representations by Symmetrically Transferring Local Structures." Symmetry 12, no. 9 (September 13, 2020): 1504. http://dx.doi.org/10.3390/sym12091504.

Full text
Abstract:
Multimodal representations play an important role in multimodal learning tasks, including cross-modal retrieval and intra-modal clustering. However, existing multimodal representation learning approaches focus on building one common space by aligning different modalities and ignore the complementary information across the modalities, such as the intra-modal local structures. In other words, they only focus on the object-level alignment and ignore structure-level alignment. To tackle the problem, we propose a novel symmetric multimodal representation learning framework by transferring local structures across different modalities, namely MTLS. A customized soft metric learning strategy and an iterative parameter learning process are designed to symmetrically transfer local structures and enhance the cluster structures in intra-modal representations. The bidirectional retrieval loss based on multi-layer neural networks is utilized to align two modalities. MTLS is instantiated with image and text data and shows its superior performance on image-text retrieval and image clustering. MTLS outperforms the state-of-the-art multimodal learning methods by up to 32% in terms of R@1 on text-image retrieval and 16.4% in terms of AMI onclustering.
APA, Harvard, Vancouver, ISO, and other styles
46

Bhatt, Nikita, and Amit Ganatra. "Improvement of deep cross-modal retrieval by generating real-valued representation." PeerJ Computer Science 7 (April 27, 2021): e491. http://dx.doi.org/10.7717/peerj-cs.491.

Full text
Abstract:
The cross-modal retrieval (CMR) has attracted much attention in the research community due to flexible and comprehensive retrieval. The core challenge in CMR is the heterogeneity gap, which is generated due to different statistical properties of multi-modal data. The most common solution to bridge the heterogeneity gap is representation learning, which generates a common sub-space. In this work, we propose a framework called “Improvement of Deep Cross-Modal Retrieval (IDCMR)”, which generates real-valued representation. The IDCMR preserves both intra-modal and inter-modal similarity. The intra-modal similarity is preserved by selecting an appropriate training model for text and image modality. The inter-modal similarity is preserved by reducing modality-invariance loss. The mean average precision (mAP) is used as a performance measure in the CMR system. Extensive experiments are performed, and results show that IDCMR outperforms over state-of-the-art methods by a margin 4% and 2% relatively with mAP in the text to image and image to text retrieval tasks on MSCOCO and Xmedia dataset respectively.
APA, Harvard, Vancouver, ISO, and other styles
47

Revathy, N. P., S. Janarthanam, and S. Sukumaran. "Boosted Edge Detection Algorithm for Unstructured Environment in Document Using Optimized Text Region Detection." Asian Journal of Computer Science and Technology 8, S1 (February 5, 2019): 50–53. http://dx.doi.org/10.51983/ajcst-2019.8.s1.1959.

Full text
Abstract:
Document images are more popular in today’s world and being made available over the internet for Information retrieval. The document images becomes a difficult task compared with digital texts and edge detection is an important task in the document image retrieval, edge detection indicates to the process of finding sharp discontinuation of characters in the document images. The single edge detection methods causing the weak gradient and edge missing problems adopts the method of combining global with local edge detection to extract edge. The global edge detection obtains the whole edges and uses to improve adaptive smooth filter algorithm based on canny operator. These combinations increase the detection efficiency and reduce the computational time. In addition, the proposed algorithm has been tested through real-time document retrieval system to detect the edges in unstructured environment and generate 2D maps. These maps contain the starting and destination points in addition to current positions of the objects. This proposed work enhancing the searching ability of the document to move towards the optimal solution and to verify the capability in terms of detection efficiency.
APA, Harvard, Vancouver, ISO, and other styles
48

Chen, Hong Ye, and Phil Vines. "Multi Queries Methods of the Chinese-English Bilingual Plagiarism Detection." Applied Mechanics and Materials 462-463 (November 2013): 1158–62. http://dx.doi.org/10.4028/www.scientific.net/amm.462-463.1158.

Full text
Abstract:
Cross-language plagiarism detection identifies and extracts plagiarized text in a multilingual environment. In recent years, there has been a significant amount of work done involving English and European text. However, somewhat less attention has been paid to Asia languages. We compared a number of different strategies for Chinese-English bilingual plagiarism detection. We present methods for candidate document retrieval and compare four methods: (i) document keywords based, (ii) intrinsic plagiarism based, (iii) headers based, and (iv) machine translation queries. The results of our evaluation indicated that keywords based queries, the simplest and most efficient approach, gives acceptable results for newspaper articles. We also compared different percentage of keywords based query, and the results indicated that putting 50% keywords into queries can obtain the satisfied candidate documents set.
APA, Harvard, Vancouver, ISO, and other styles
49

Guda, Vanitha, and SureshKumar Sanampudi. "Event Time Relationship in Natural Language Text." International Journal of Recent Contributions from Engineering, Science & IT (iJES) 7, no. 3 (September 25, 2019): 4. http://dx.doi.org/10.3991/ijes.v7i3.10985.

Full text
Abstract:
<p>Due to the numerous information needs, retrieval of events from a given natural language text is inevitable. In natural language processing (NLP) perspective, "Events" are situations, occurrences, real-world entities or facts. Extraction of events and arranging them on a timeline is helpful in various NLP application like building the summary of news articles, processing health records, and Question Answering System (QA) systems. This paper presents a framework for identifying the events and times from a given document and representing them using a graph data structure. As a result, a graph is derived to show event-time relationships in the given text. Events form the nodes in a graph, and edges represent the temporal relations among the nodes. Time of an event occurrence exists in two forms namely qualitative (like before, after, duringetc) and quantitative (exact time points/periods). To build the event-time-event structure quantitative time is normalized to qualitative form. Thus obtained temporal information is used to label the edges among the events. Data set released in the shared task EvTExtract of (Forum for Information Retrieval Extraction) FIRE 2018 conference is identified to evaluate the framework. Precision and recall are used as evaluation metrics to access the performance of the proposed framework with other methods mentioned in state of the art with 85% of accuracy and 90% of precision.</p>
APA, Harvard, Vancouver, ISO, and other styles
50

Chen, Yen-Wei, Xinyin Huang, Dingye Chen, and Xian-Hua Han. "Generic and Specific Impressions Estimation and Their Application to KANSEI-Based Clothing Fabric Image Retrieval." International Journal of Pattern Recognition and Artificial Intelligence 32, no. 10 (June 20, 2018): 1854024. http://dx.doi.org/10.1142/s0218001418540241.

Full text
Abstract:
Current image retrieval techniques are mainly based on text or visual contents. However, both text-based and contents-based methods lack the capability of utilizing human intuition and KANSEI (impression). In this paper, we proposed an impression-based image retrieval method in order to realize the image retrieval according to our impression presented by impression keywords. We first propose a generic and specific impressions estimation method based on machine learning and then apply it to impression-based clothing fabric image retrieval. We use a semantic differential (SD) method to measure the user’s impressions such as brightness and warmth while they view a cloth fabric image. We also extract both global and local features of cloth fabric images such as color and texture using computer vision techniques. Then we use support vector regression to model the mapping functions between the generic impression (or specific impression) and image features. The learnt mapping functions are used to estimate the generic and specific impressions of cloth fabric images. The retrieval is done by comparing the query impression with the estimated impression of images in the database.
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography