To see the other types of publications on this topic, follow the link: Information retrieval keyword extraction.

Journal articles on the topic 'Information retrieval keyword extraction'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 journal articles for your research on the topic 'Information retrieval keyword extraction.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse journal articles on a wide variety of disciplines and organise your bibliography correctly.

1

Chien, Lee-Feng. "PAT-tree-based keyword extraction for Chinese information retrieval." ACM SIGIR Forum 31, SI (1997): 50–58. http://dx.doi.org/10.1145/278459.258534.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Li, Ya Min, and Xian Huan Zhang. "The Research and Implementation of Keyword Extraction Technology." Applied Mechanics and Materials 644-650 (September 2014): 2003–8. http://dx.doi.org/10.4028/www.scientific.net/amm.644-650.2003.

Full text
Abstract:
Keyword extraction plays an important role in abstract, information retrieval, data mining, text clustering etc. Extracting the keywords from a document can increases the efficiency of retrieval, thus provide great help to efficiently organize the resource. Few writers on the Internet have given the keywords of a document. Artificially extracting the keywords of a document is a great deal of work, so we need a method of extracting the keywords automatically. The paper constructing a verb, function words, stop words etc. small library from the perspective of the Chinese part of speech, realize rapid word segmentation based on the research, analysis, improvement of traditional lexical maximum matching points, and analyze, realize extracting the keywords based on TFIDF(Term Frequency Inverse Document Frequency).
APA, Harvard, Vancouver, ISO, and other styles
3

Bahareh, Hashemzadeh, and Abdolrazzagh-Nezhad Majid. "Improving keyword extraction in multilingual texts." International Journal of Electrical and Computer Engineering (IJECE) 10, no. 6 (2020): 5909–16. https://doi.org/10.11591/ijece.v10i6.pp5909-5916.

Full text
Abstract:
The accuracy of keyword extraction is a leading factor in information retrieval systems and marketing. In the real world, text is produced in a variety of languages, and the ability to extract keywords based on information from different languages improves the accuracy of keyword extraction. In this paper, the available information of all languages is applied to improve a traditional keyword extraction algorithm from a multilingual text. The proposed keywork extraction procedure is an unsupervise algorithm and designed based on selecting a word as a keyword of a given text, if in addition to that language holds a high rank based on the keywords criteria in other languages, as well. To achieve to this aim, the average TF-IDF of the candidate words were calculated for the same and the other languages. Then the words with the higher averages TF-IDF were chosen as the extracted keywords. The obtained results indicat that the algorithms’ accuracis of the multilingual texts in term frequency-inverse document frequency (TF-IDF) algorithm, graph-based algorithm, and the improved proposed algorithm are 80, 60.65, and 91.3%, respectively.
APA, Harvard, Vancouver, ISO, and other styles
4

Roshni, S. Tadse* L. H. Patil C. U. Chauhan. "CONTENT BASED INFORMATION RETRIEVAL FOR DIGITAL LIBRARY USING DOCUMENT IMAGE." INTERNATIONAL JOURNAL OF ENGINEERING SCIENCES & RESEARCH TECHNOLOGY 5, no. 7 (2016): 632–38. https://doi.org/10.5281/zenodo.57052.

Full text
Abstract:
In the recent year, the using of mobile devices has perceive an emerging need for improving the user experience of digital library for search, with various applications such as education, location search and product retrieval, There  simply compare the query to the databases images; those are match that images are retrieve from the database, searching and response time of delivery staying a challenging issues in mobile document search previously lots of work has been done on search engine, retrieving  the document from the database without analyzed the image.  In The proposed method, Information retrieval  for  image based query automatically with a  mobile document information retrieval framework, consisting of a FP-growth is proposed finding frequent pattern from the retrieve document to optimize the result.
APA, Harvard, Vancouver, ISO, and other styles
5

Hashemzahde, Bahare, and Majid Abdolrazzagh-Nezhad. "Improving keyword extraction in multilingual texts." International Journal of Electrical and Computer Engineering (IJECE) 10, no. 6 (2020): 5909. http://dx.doi.org/10.11591/ijece.v10i6.pp5909-5916.

Full text
Abstract:
The accuracy of keyword extraction is a leading factor in information retrieval systems and marketing. In the real world, text is produced in a variety of languages, and the ability to extract keywords based on information from different languages improves the accuracy of keyword extraction. In this paper, the available information of all languages is applied to improve a traditional keyword extraction algorithm from a multilingual text. The proposed keywork extraction procedure is an unsupervise algorithm and designed based on selecting a word as a keyword of a given text, if in addition to that language holds a high rank based on the keywords criteria in other languages, as well. To achieve to this aim, the average TF-IDF of the candidate words were calculated for the same and the other languages. Then the words with the higher averages TF-IDF were chosen as the extracted keywords. The obtained results indicat that the algorithms’ accuracis of the multilingual texts in term frequency-inverse document frequency (TF-IDF) algorithm, graph-based algorithm, and the improved proposed algorithm are 80%, 60.65%, and 91.3%, respectively.
APA, Harvard, Vancouver, ISO, and other styles
6

Zhang, Shu Dong, and Jing Wang. "The Search Technology of Query Expansion Based on the Agriculture Information." Advanced Materials Research 712-715 (June 2013): 2550–55. http://dx.doi.org/10.4028/www.scientific.net/amr.712-715.2550.

Full text
Abstract:
Because of the rapid development of Internet and information retrieval technology, and the massive amounts of complex agriculture information resources, how to retrieval the required information conveniently and accurately is an important research point that we have to overcome. In this paper, we get the information database through the initial retrieve of the existing agricultural search sites. By the method of combining the feature extraction with similarity computation based on HowNet to query extended the keyword. With the extended keyword we make a second retrieval and use the result as the finally search information to improve the recall ratio and the precision rate of the agriculture information query.
APA, Harvard, Vancouver, ISO, and other styles
7

Ni, Pin, Yuming Li, and Victor Chang. "Research on Text Classification Based on Automatically Extracted Keywords." International Journal of Enterprise Information Systems 16, no. 4 (2020): 1–16. http://dx.doi.org/10.4018/ijeis.2020100101.

Full text
Abstract:
Automatic keywords extraction and classification tasks are important research directions in the domains of NLP (natural language processing), information retrieval, and text mining. As the fine granularity abstracted from text data, keywords are also the most important feature of text data, which has great practical and potential value in document classification, topic modeling, information retrieval, and other aspects. The compact representation of documents can be achieved through keywords, which contains massive significant information. Therefore, it may be quite advantageous to realize text classification with high-dimensional feature space. For this reason, this study designed a supervised keyword classification method based on TextRank keyword automatic extraction technology and optimize the model with the genetic algorithm to contribute to modeling the keywords of the topic for text classification.
APA, Harvard, Vancouver, ISO, and other styles
8

Tao, Liufeng, Kai Ma, Miao Tian, et al. "Developing a Base Domain Ontology from Geoscience Report Collection to Aid in Information Retrieval towards Spatiotemporal and Topic Association." ISPRS International Journal of Geo-Information 13, no. 1 (2023): 14. http://dx.doi.org/10.3390/ijgi13010014.

Full text
Abstract:
The efficient and precise retrieval of desired information from extensive geological databases is a prominent and pivotal focus within the realm of geological information services. Conventional information retrieval methods primarily rely on keyword matching approaches, which often overlook the contextual and semantic aspects of the keywords, consequently impeding the retrieval system’s ability to accurately comprehend user query requirements. To tackle this challenge, this study proposes an ontology-driven information-retrieval framework for geological data that integrates spatiotemporal and topic associations. The framework encompasses the development of a geological domain ontology, extraction of key information, establishment of a multi-feature association and retrieval framework, and validation through a comprehensive case study. By employing the proposed framework, users are empowered to actively and automatically retrieve pertinent information, simplifying the information access process, mitigating the burden of comprehending information organization and software application models, and ultimately enhancing retrieval efficiency.
APA, Harvard, Vancouver, ISO, and other styles
9

Pawar, Shaila. "BRIEFLY Video Transcript Summarizer." INTERANTIONAL JOURNAL OF SCIENTIFIC RESEARCH IN ENGINEERING AND MANAGEMENT 08, no. 04 (2024): 1–13. http://dx.doi.org/10.55041/ijsrem31136.

Full text
Abstract:
In an age where video content proliferates across digital platforms, the need for efficient video transcript summarization tools has become increasingly crucial. This abstract introduces a novel Video Transcript Summarizer designed to automatically generate concise and informative summaries from lengthy video transcripts. By utilizing techniques such as text summarization, keyword extraction, and sentiment analysis, the summarizer seeks to capture the essence of the video while preserving its context and relevance. The abstract concludes by highlighting the potential applications of such a tool in various domains, including education, journalism, content creation, and information retrieval, underscoring its significance in facilitating accessibility and enhancing user engagement with video content Keywords-- Video Transcript Summarization, Natural Language Processing, Machine Learning, Text Summarization, Keyword Extraction, Information Retrieval, User Engagement, Accessibility, Information Extraction, Context Preservation.
APA, Harvard, Vancouver, ISO, and other styles
10

Yang, Zhen, Haiyang Yu, Jiliang Tang, and Huan Liu. "Toward Keyword Extraction in Constrained Information Retrieval in Vehicle Social Network." IEEE Transactions on Vehicular Technology 68, no. 5 (2019): 4285–94. http://dx.doi.org/10.1109/tvt.2019.2906799.

Full text
APA, Harvard, Vancouver, ISO, and other styles
11

LI, ZHANJUN, and KARTHIK RAMANI. "Ontology-based design information extraction and retrieval." Artificial Intelligence for Engineering Design, Analysis and Manufacturing 21, no. 2 (2007): 137–54. http://dx.doi.org/10.1017/s0890060407070199.

Full text
Abstract:
Because of the increasing complexity of products and the design process, as well as the popularity of computer-aided documentation tools, the number of electronic and textual design documents being generated has exploded. The availability of such extensive document resources has created new challenges and opportunities for research. These include improving design information retrieval to achieve a more coherent environment for design exploration, learning, and reuse. One critical issue is related to the construction of a structured representation for indexing design documents that record engineers' ideas and reasoning processes for a specific design. This representation should explicitly and accurately capture the important design concepts as well as the relationships between these concepts so that engineers can locate their documents of interest with less effort. For design information retrieval, we propose to use shallow natural language processing and domain-specific design ontology to automatically construct a structured and semantics-based representation from unstructured design documents. The design concepts and relationships of the representation are recognized from the document based on the identified linguistic patterns. The recognized concepts and relationships are joined to form a concept graph. The integration of these concept graphs builds an application-specific design ontology, which can be seen as the structured representation of the content of the corporate document repository, as well as an automatically populated knowledge base from previous designs. To improve the performance of design information retrieval, we have developed ontology-based query processing, where users' requests are interpreted based on their domain-specific meanings. Our approach contrasts with the traditionally used keyword-based search. An experiment to test the retrieval performance is conducted by using the design documents from a product design scenario. The results demonstrate that our method outperforms the keyword-based search techniques. This research contributes to the development and use of engineering ontology for design information retrieval.
APA, Harvard, Vancouver, ISO, and other styles
12

Firoozeh, Nazanin, Adeline Nazarenko, Fabrice Alizon, and Béatrice Daille. "Keyword extraction: Issues and methods." Natural Language Engineering 26, no. 3 (2019): 259–91. http://dx.doi.org/10.1017/s1351324919000457.

Full text
Abstract:
AbstractDue to the considerable growth of the volume of text documents on the Internet and in digital libraries, manual analysis of these documents is no longer feasible. Having efficient approaches to keyword extraction in order to retrieve the ‘key’ elements of the studied documents is now a necessity. Keyword extraction has been an active research field for many years, covering various applications in Text Mining, Information Retrieval, and Natural Language Processing, and meeting different requirements. However, it is not a unified domain of research. In spite of the existence of many approaches in the field, there is no single approach that effectively extracts keywords from different data sources. This shows the importance of having a comprehensive review, which discusses the complexity of the task and categorizes the main approaches of the field based on the features and methods of extraction that they use. This paper presents a general introduction to the field of keyword/keyphrase extraction. Unlike the existing surveys, different aspects of the problem along with the main challenges in the field are discussed. This mainly includes the unclear definition of ‘keyness’, complexities of targeting proper features for capturing desired keyness properties and selecting efficient extraction methods, and also the evaluation issues. By classifying a broad range of state-of-the-art approaches and analysing the benefits and drawbacks of different features and methods, we provide a clearer picture of them. This review is intended to help readers find their way around all the works related to keyword extraction and guide them in choosing or designing a method that is appropriate for the application they are targeting.
APA, Harvard, Vancouver, ISO, and other styles
13

Bisht, Raj Kishor. "A Comparative Evaluation of Different Keyword Extraction Techniques." International Journal of Information Retrieval Research 12, no. 1 (2022): 1–17. http://dx.doi.org/10.4018/ijirr.289573.

Full text
Abstract:
Retrieving keywords in a text is attracting researchers for a long time as it forms a base for many natural language applications like information retrieval, text summarization, document categorization etc. A text is a collection of words that represent the theme of the text naturally and to bring the naturalism under certain rules is itself a challenging task. In the present paper, the authors evaluate different spatial distribution based keyword extraction methods available in the literature on three standard scientific texts. The authors choose the first few high-frequency words for evaluation to reduce the complexity as all the methods are somehow based on frequency. The authors find that the methods are not providing good results particularly in the case of the first few retrieved words. Thus, the authors propose a new measure based on frequency, inverse document frequency, variance, and Tsallis entropy. Evaluation of different methods is done on the basis of precision, recall, and F-measure. Results show that the proposed method provides improved results.
APA, Harvard, Vancouver, ISO, and other styles
14

Wang, Su, Ming Ya Wang, Jun Zheng, and Kai Zheng. "A Hybrid Keyword Extraction Method Based on TF and Semantic Strategies for Chinese Document." Applied Mechanics and Materials 635-637 (September 2014): 1476–79. http://dx.doi.org/10.4028/www.scientific.net/amm.635-637.1476.

Full text
Abstract:
Keyword extraction is important for information retrieval. This paper gave a hybrid keyword extraction method based on TF and semantic strategies for Chinese document. A new word finding method was proposed to find the new word not exist in the dictionary. Moreover the semantic strategies were introduced to filter the dependent words and remove the synonyms. Experimental results show that the proposed method can improve the accuracy and performance of keyword extraction.
APA, Harvard, Vancouver, ISO, and other styles
15

YUE, KUN, and WEI-YI LIU. "SEMANTIC FIELD: A THEORETICAL PERSPECTIVE OF MODELING INFORMATION RETRIEVAL." International Journal on Artificial Intelligence Tools 18, no. 06 (2009): 825–51. http://dx.doi.org/10.1142/s0218213009000421.

Full text
Abstract:
Information retrieval has been paid much attention and it is widely studied and applied in real world paradigms. For various aspects of information retrieval, various approaches have been proposed from various perspectives. It is necessary to provide a formally-unified and physically-interpretable model for classical problems in information retrieval (e.g., document classification, authority-page selection, and keyword extraction, etc.). In this paper we propose a theoretical model, called semantic field, inspired by the theories of lexical semantics and electrostatic field. Based on this physical model, information retrieval can be viewed from a theoretical perspective and interpreted by people's physical intuitions and natural heuristics. Centered on the concept of semantic field, we give some relevant properties, including semantic affinity, semantic coacervation degree and radiation of a semantic source. As the representative application of the proposed semantic field model, a novel method for automatic keyword extraction is discussed, and the feasibility is verified by corresponding experiments.
APA, Harvard, Vancouver, ISO, and other styles
16

Shinohara, Shuji, Yurie Iribe, Hiroaki Kawashima, Kouichi Katsurada, and Tsuneo Nitta. "A Method for Keyword Extraction Using Retrieval Information from Students in Lectures." Transactions of the Japanese Society for Artificial Intelligence 22 (2007): 604–11. http://dx.doi.org/10.1527/tjsai.22.604.

Full text
APA, Harvard, Vancouver, ISO, and other styles
17

Khan, Osama A., Shaukat Wasi, Muhammad Shoaib Siddiqui, and Asim Karim. "Keyword Extraction for Medium-Sized Documents Using Corpus-Based Contextual Semantic Smoothing." Complexity 2022 (September 29, 2022): 1–8. http://dx.doi.org/10.1155/2022/7015764.

Full text
Abstract:
Keyword extraction refers to the process of selecting most significant, relevant, and descriptive terms as keywords, which are present inside a single document. Keyword extraction has major applications in the information retrieval domain, such as analysis, summarization, indexing, and search, of documents. In this paper, we present a novel supervised technique for extraction of keywords from medium-sized documents, namely Corpus-based Contextual Semantic Smoothing (CCSS). CCSS extends the concept of Contextual Semantic Smoothing (CSS), which considers term usage patterns in similar texts to improve term relevance information. We introduce four more features beyond CSS as our novel contributions in this work. We systematically compare the performance of CCSS with other techniques, when implemented over INSPEC dataset, where CCSS outperforms all state-of-the-art keyphrase extraction techniques presented in the literature.
APA, Harvard, Vancouver, ISO, and other styles
18

Shen, Yuqi, Cheng Chen, Yifan Dai, Jinfang Cai, and Liangyu Chen. "A Hybrid Model Combining Formulae with Keywords for Mathematical Information Retrieval." International Journal of Software Engineering and Knowledge Engineering 31, no. 11n12 (2021): 1583–602. http://dx.doi.org/10.1142/s0218194021400131.

Full text
Abstract:
Formula retrieval is an important research topic in Mathematical Information Retrieval (MIR). Most studies have focused on formula comparison to determine the similarity between mathematical documents. However, two similar formulae may appear in entirely different knowledge domains and have different meanings. Based on N-ary Tree-based Formula Embedding Model (NTFEM, our previous work in [Y. Dai, L. Chen, and Z. Zhang, An N-ary tree-based model for similarity evaluation on mathematical formulae, in Proc. 2020 IEEE Int. Conf. Systems, Man, and Cybernetics, 2020, pp. 2578–2584.], we introduce a new hybrid retrieval model, NTFEM-K, which combines formulae with their surrounding keywords for more accurate retrieval. By using keywords extraction technology, we extract keywords from context, which can supplement the semantic information of the formula. Then, we get the vector representations of keywords by FastText N-gram embedding model and the vector representations of formulae by NTFEM. Finally, documents are sorted according to the similarity between keywords, and then the ranking results are optimized by formula similarity. For performance evaluation, NTFEM-K is not only compared with NTFEM but also hybrid retrieval models combining formulae with long text and hybrid retrieval models combining formulae with their keywords using other keyword extraction algorithms. Experimental results show that the accuracy of top-10 results of NTFEM-K is at least 20% higher than that of NTFEM and can be 50% in some specific topics.
APA, Harvard, Vancouver, ISO, and other styles
19

Hemendra Shanker Sharma. "Query Expansion Information Retrieval using Customized Ontology Technique." Journal of Electrical Systems 20, no. 9s (2024): 2426–40. http://dx.doi.org/10.52783/jes.4904.

Full text
Abstract:
Information from online archives is now much more widely used and accessible than before. As a result, searching becomes more challenging and time-consuming. This vast data utilization is the focus of a significant area of research called information retrieval (IR) systems. The goal is to reduce retrieval time while also maintaining and improving answer relevancy. To solve the issues stated, it is necessary to provide an IR Model. The provided method takes care of indexing, similarity keyword extraction, semantic similarity, updating historical data, and updating. The effectiveness of the suggested strategy and the current methods are compared in terms of performance with distinguished parameters. A novel similarity estimate approach is used to group the texts and determine how similar works are based on the obtained score. It is compared to the existing methods to evaluate the findings and shows the usefulness of the suggested model on the scales of accuracy, mean absolute error, precision, recall, sensitivity, and specificity. The improvement of personalization performance, which is an update based on historical knowledge, is one of the main goals of this effort. The Impact Score Estimation technique is used to improve data extraction using semantic keyword extraction and indexing. To cluster documents based on computed scores, the algorithm is to evaluate similarity estimation which can improve searching by speeding up information retrieval and processing. Decision tree classifiers provide better results for class 3 which is 0.93. While the micro average ROC curve generates 0.87 accuracy.
APA, Harvard, Vancouver, ISO, and other styles
20

Wang, Juan, and Qun Ding. "Dynamic Rounds Chaotic Block Cipher Based on Keyword Abstract Extraction." Entropy 20, no. 9 (2018): 693. http://dx.doi.org/10.3390/e20090693.

Full text
Abstract:
According to the keyword abstract extraction function in the Natural Language Processing and Information Retrieval Sharing Platform (NLPIR), the design method of a dynamic rounds chaotic block cipher is presented in this paper, which takes into account both the security and efficiency. The cipher combines chaotic theory with the Feistel structure block cipher, and uses the randomness of chaotic sequence and the nonlinearity of chaotic S-box to dynamically generate encrypted rounds, realizing more numbers of dynamic rounds encryption for the important information marked by NLPIR, while less numbers of dynamic rounds encryption for the non-important information that is not marked. Through linear and differential cryptographic analysis, ciphertext information entropy, “0–1” balance and National Institute of Science and Technology (NIST) tests and the comparison with other traditional and lightweight block ciphers, the results indicate that the dynamic variety of encrypted rounds can achieve different levels of encryption for different information, which can achieve the purpose of enhancing the anti-attack ability and reducing the number of encrypted rounds. Therefore, the dynamic rounds chaotic block cipher can guarantee the security of information transmission and realize the lightweight of the cryptographic algorithm.
APA, Harvard, Vancouver, ISO, and other styles
21

Gupta, Vaibhav. "Keyword-Based Exploration of Library Resources." INTERANTIONAL JOURNAL OF SCIENTIFIC RESEARCH IN ENGINEERING AND MANAGEMENT 09, no. 01 (2025): 1–9. https://doi.org/10.55041/ijsrem40835.

Full text
Abstract:
The project "Keyword-Based Exploration of Library Resources" addresses the challenges associated with accessing and discovering academic resources efficiently. Traditional systems often suffer from limitations such as inadequate multilingual support, poor metadata utilization, and restricted filtering capabilities, which hinder users from locating relevant research materials effectively. This project proposes an innovative solution leveraging Artificial Intelligence (AI) and Natural Language Processing (NLP) techniques to enhance search capabilities and inclusivity. The system incorporates: • Multilingual Search: Enabling users to perform queries in various languages using translation APIs. • Advanced Filtering Options: Allowing searches to be refined by author, publication year, journal, and more. • AI-Powered Metadata Extraction: Utilizing Optical Character Recognition (OCR) and NLP to extract and catalogue metadata like keywords, authors, and publication years. The proposed system is built on a Python backend using Flask for API integration and MyAWS CLOUD for secure data storage. By integrating robust search mechanisms and user-friendly design, the project contributes to Sustainable Development Goal 4 (Quality Education), fostering global accessibility to knowledge and academic research. The outcomes of this project are anticipated to significantly improve resource discoverability, inclusivity, and precision, addressing the needs of diverse academic communities. INDEX TERMS Keyword Search, Library Resource Management, Information Retrieval, Digital Libraries, Metadata Extraction, Search Optimization, Natural Language Processing (NLP), Database Searching, Search Algorithms, Document Retrieval Systems, Academic Research Tools.
APA, Harvard, Vancouver, ISO, and other styles
22

Liu, Feng, Xiaodi Huang, Weidong Huang, and Sophia Xiaoxia Duan. "Performance Evaluation of Keyword Extraction Methods and Visualization for Student Online Comments." Symmetry 12, no. 11 (2020): 1923. http://dx.doi.org/10.3390/sym12111923.

Full text
Abstract:
Topic keyword extraction (as a typical task in information retrieval) refers to extracting the core keywords from document topics. In an online environment, students often post comments in subject forums. The automatic and accurate extraction of keywords from these comments are beneficial to lecturers (particular when it comes to repeatedly delivered subjects). In this paper, we compare the performance of traditional machine learning algorithms and two deep learning methods in extracting topic keywords from student comments posted in subject forums. For this purpose, we collected student comment data from a period of two years, manually tagging part of the raw data for our experiments. Based on this dataset, we comprehensively compared the five typical algorithms of naïve Bayes, logistic regression, support vector machine, convolutional neural networks, and Long Short-Term Memory with Attention (Att-LSTM). The performances were measured by the four evaluation metrics. We further examined the keywords by visualization. From the results of our experiment and visualization, we conclude that the Att-LSTM method is the best approach for topic keyword extraction from student comments. Further, the results from the algorithms and visualization are symmetry, to some degree. In particular, the extracted topics from the comments posted at the same stages of different teaching sessions are, almost, reflection symmetry.
APA, Harvard, Vancouver, ISO, and other styles
23

Amur, Zaira Hassan, Yew Kwang Hooi, Gul Muhammad Soomro, Hina Bhanbhro, Said Karyem, and Najamudin Sohu. "Unlocking the Potential of Keyword Extraction: The Need for Access to High-Quality Datasets." Applied Sciences 13, no. 12 (2023): 7228. http://dx.doi.org/10.3390/app13127228.

Full text
Abstract:
Keyword extraction is a critical task that enables various applications, including text classification, sentiment analysis, and information retrieval. However, the lack of a suitable dataset for semantic analysis of keyword extraction remains a serious problem that hinders progress in this field. Although some datasets exist for this task, they may not be representative, diverse, or of high quality, leading to suboptimal performance, inaccurate results, and reduced efficiency. To address this issue, we conducted a study to identify a suitable dataset for keyword extraction based on three key factors: dataset structure, complexity, and quality. The structure of a dataset should contain real-time data that is easily accessible and readable. The complexity should also reflect the diversity of sentences and their distribution in real-world scenarios. Finally, the quality of the dataset is a crucial factor in selecting a suitable dataset for keyword extraction. The quality depends on its accuracy, consistency, and completeness. The dataset should be annotated with high-quality labels that accurately reflect the keywords in the text. It should also be complete, with enough examples to accurately evaluate the performance of keyword extraction algorithms. Consistency in annotations is also essential, ensuring that the dataset is reliable and useful for further research.
APA, Harvard, Vancouver, ISO, and other styles
24

Hou, Yong. "Mathematical formula information retrieval system." Journal of Computational Methods in Sciences and Engineering 23, no. 6 (2023): 2949–73. http://dx.doi.org/10.3233/jcm-226961.

Full text
Abstract:
Design and implementation of the system for retrieving information about mathematical formulas – MFIRS. The structure of the system is mainly divided into the modules: input normalization, mathematical formula unification, mathematical formula encoding, text information feature extraction, mathematical formula feature extraction, mathematical formula indexing, retrieval and ranking. A method for extracting mathematical formulas and keywords based on FastText word embedding technology is proposed. This method can be used not only to get the structural features of the formula, but also to facilitate the calculation of the similarity of the formula by the vector result. At the same time, the model introduces the semantic features of context-rich mathematical formulas to improve the domain correlation of search results. The MathRetEval dataset was created based on about 7.9 × 105 arXiv documents and about 1.5 × 108 mathematical formulas. The scalability of the system is verified using this data set. The mathematical formulas can be written in the language TEX or MathML. When queried in the TEX language, it can be converted to a tree representation of the MathML representation and then indexed. This MFIRS is an information retrieval system for mathematical formulas with the features of mathematical perception, which can use the search for the similarity of partial formulas.
APA, Harvard, Vancouver, ISO, and other styles
25

Suhas D. Pachpande and Parag U. Bhalchandra. "MarQO: A query optimizer in multilingual environment for information retrieval in Marathi language." International Journal of Science and Research Archive 9, no. 2 (2023): 986–96. http://dx.doi.org/10.30574/ijsra.2023.9.2.0712.

Full text
Abstract:
Information retrieval is a crucial component of modern information systems. A significant portion of the vast amount of information stored worldwide is in local languages. While most information retrieval systems are designed primarily for English, there is a growing need for these systems to work with data in languages other than English. Cross Language Information Retrieval (CLIR) systems play a pivotal role in enabling information retrieval across multiple languages. However, these systems often face challenges due to ambiguities in query translation, impacting retrieval accuracy. This paper introduces "MarQO," a query optimizer designed to address these challenges in the context of Marathi language. MarQO employs a multi-stage approach, including lexical processing, extraction of multi-word terms, synonym addition, phrasal translations, utilization of word co-occurrence statistics, and more. By disambiguating query keyword translations, MarQO significantly improves the accuracy of translations, thereby leading to more relevant document retrieval results.
APA, Harvard, Vancouver, ISO, and other styles
26

Shruti, Bhavsar, Khairnar Sanjana, Nagarkar Pauravi, Raina Sonali, and Dumbare Amol. "On Time Document Retrieval using Speech Conversation and Diverse Keyword Clustering During Presentations." International Journal of Recent Technology and Engineering (IJRTE) 9, no. 3 (2020): 529–32. https://doi.org/10.35940/ijrte.C4544.099320.

Full text
Abstract:
In this paper we present the idea of extracting keywords from discussions, with the point of using these words to recuperate, for each small piece of conversation and generating reports to individuals. Regardless, even a smaller piece contains a blend of words, which can be effortlessly interrelated to a couple of subjects; additionally, using a customized talk affirmation (ASR) system presents slips among them. Thus it is hard to sum up effectively the data needs of the conversation individuals. We initially propose a count to kill significant words from the yield of an ASR system which makes usage of topic showing strategies and of a sub particular prize limit which supports varying characteristics in the word set, to organize the potential contrasting characteristics of subjects and diminish ASR disturbance. By then, we set forward a strategy to surmise different topically detached requests from this definitive word set, remembering the ultimate objective is to build the potential outcomes of making at any rate one appropriate proposition while using these inquiries to investigate the English Wikipedia. The readings depict that our pronouncement continue ahead over past procedures that watch simply word recurrence or idea commonality, and states the good response for a report recommended framework to be used as a piece of conversations.
APA, Harvard, Vancouver, ISO, and other styles
27

Shaikh, Zaffar Ahmed. "Keyword Detection Techniques: A Comprehensive Study." Engineering, Technology & Applied Science Research 8, no. 1 (2018): 2590–94. https://doi.org/10.5281/zenodo.1207421.

Full text
Abstract:
Automatic identification of influential segments from a large amount of data is an important part of topic detection and tracking (TDT). This can be done using keyword identification via collocation techniques, word co-occurrence networks, topic modeling and other machine learning techniques. This paper reviews existing traditional keyword extraction techniques and analyzes them to make useful insights and to give future directions for better automatic, unsupervised and language independent research. The paper reviews extant literature on existing traditional TDT approaches for automatic identification of influential segments from a large amount of data in keyword detection task. The current keyword detection techniques used by researchers have been discussed. Inferences have been drawn from current keyword detection techniques used by researchers, their advantages and disadvantages over the previous studies and the analysis results have been provided in tabular form. Although keyword detection has been widely explored, there is still a large scope and need for identifying topics from the uncertain user-generated data.
APA, Harvard, Vancouver, ISO, and other styles
28

Guo, Xiaoping. "Intelligent Sports Video Classification Based on Deep Neural Network (DNN) Algorithm and Transfer Learning." Computational Intelligence and Neuroscience 2021 (November 24, 2021): 1–9. http://dx.doi.org/10.1155/2021/1825273.

Full text
Abstract:
Traditional text annotation-based video retrieval is done by manually labeling videos with text, which is inefficient and highly subjective and generally cannot accurately describe the meaning of videos. Traditional content-based video retrieval uses convolutional neural networks to extract the underlying feature information of images to build indexes and achieves similarity retrieval of video feature vectors according to certain similarity measure algorithms. In this paper, by studying the characteristics of sports videos, we propose the histogram difference method based on using transfer learning and the four-step method based on block matching for mutation detection and fading detection of video shots, respectively. By adaptive thresholding, regions with large frame difference changes are marked as candidate regions for shots, and then the shot boundaries are determined by mutation detection algorithm. Combined with the characteristics of sports video, this paper proposes a key frame extraction method based on clustering and optical flow analysis, and experimental comparison with the traditional clustering method. In addition, this paper proposes a key frame extraction algorithm based on clustering and optical flow analysis for key frame extraction of sports video. The algorithm effectively removes the redundant frames, and the extracted key frames are more representative. Through extensive experiments, the keyword fuzzy finding algorithm based on improved deep neural network and ontology semantic expansion proposed in this paper shows a more desirable retrieval performance, and it is feasible to use this method for video underlying feature extraction, annotation, and keyword finding, and one of the outstanding features of the algorithm is that it can quickly and effectively retrieve the desired video in a large number of Internet video resources, reducing the false detection rate and leakage rate while improving the fidelity, which basically meets people’s daily needs.
APA, Harvard, Vancouver, ISO, and other styles
29

Sheng, Jianqiang, Fei Wang, Baoquan Zhao, Junkun Jiang, Yu Yang, and Tie Cai. "Sketch-Based Image Retrieval Using Novel Edge Detector and Feature Descriptor." Wireless Communications and Mobile Computing 2022 (February 1, 2022): 1–12. http://dx.doi.org/10.1155/2022/4554911.

Full text
Abstract:
With the explosive increase of digital images, intelligent information retrieval systems have become an indispensable tool to facilitate users’ information seeking process. Although various kinds of techniques like keyword-/content-based methods have been extensively investigated, how to effectively retrieve relevant images from a large-scale database remains a very challenging task. Recently, with the wide availability of touch screen devices and their associated human-computer interaction technology, sketch-based image retrieval (SBIR) methods have attracted more and more attention. In contrast to keyword-based methods, SBIR allows users to flexibly manifest their information needs into sketches by drawing abstract outlines of an object/scene. Despite its ease and intuitiveness, it is still a nontrivial task to accurately extract and interpret the semantic information from sketches, largely because of the diverse drawing styles of different users. As a consequence, the performance of existing SBIR systems is still far from being satisfactory. In this paper, we introduce a novel sketch image edge feature extraction algorithm to tackle the challenges. Firstly, we propose a Gaussian blur-based multiscale edge extraction (GBME) algorithm to capture more comprehensive and detailed features by continuously superimposing the edge filtering results after Gaussian blur processing. Secondly, we devise a hybrid barycentric feature descriptor (RSB-HOG) that extracts HOG features by randomly sampling points on the edges of a sketch. In addition, we integrate the directional distribution of the barycenters of all sampling points into the feature descriptor and thus improve its representational capability in capturing the semantic information of contours. To examine the efficiency of our method, we carry out extensive experiments on the public Flickr15K dataset. The experimental results indicate that the proposed method is superior to existing peer SBIR systems in terms of retrieval accuracy.
APA, Harvard, Vancouver, ISO, and other styles
30

SIDDIQUI, TANVEER J., and UMA SHANKER TIWARY. "UTILIZING LOCAL CONTEXT FOR EFFECTIVE INFORMATION RETRIEVAL." International Journal of Information Technology & Decision Making 07, no. 01 (2008): 5–21. http://dx.doi.org/10.1142/s0219622008002788.

Full text
Abstract:
Our research focuses on the use of local context through relation matching to improve retrieval effectiveness. An information retrieval (IR) model that integrates relation and keyword matching has been used in this work. The model takes advantage of any existing relational similarity between documents and query to improve retrieval effectiveness. It gives high rank to a document in which the query concepts are involved in similar relationships as in the query, as compared to those in which they are related differently. A conceptual graph (CG) representation has been used to capture relationship between concepts. A simplified form of graph matching has been used to keep our model computationally tractable. Structural variations have been captured during matching through simple heuristics. Four different CG similarity measures have been proposed and used to evaluate performance of our model. We observed a maximum improvement of 7.37% in precision with the second CG similarity measure. The document collection used in this study is CACM-3204. CG similarity measure proposed by us is simple, flexible and scalable and can find application in many IR related tasks like information filtering, information extraction, question answering, document summarization, etc.
APA, Harvard, Vancouver, ISO, and other styles
31

Mohebi, Azadeh, Mehri Sedighi, and Zahra Zargaran. "Subject-based retrieval of scientific documents, case study: Retrieval of Information Technology scientific articles." Library Review 66, no. 6 and 7 (2017): 549–69. https://doi.org/10.5281/zenodo.14000976.

Full text
Abstract:
Purpose – The purpose of this paper is to introduce an approach for retrieving a set of scientific articles in the field of Information Technology (IT) from a scientific database such as Web of Science (WoS), to apply scientometrics indices and compare them with other fields.‎Design/methodology/approach – The authors propose to apply a statistical classification-based approach for extracting IT-related articles.‎ In this approach, first, a probabilistic model is introduced to model the subject IT, using keyphrase extraction techniques.‎ Then, they retrieve IT-related articles from all Iranian papers in WoS, based on a Bayesian classification scheme.‎ Based on the probabilistic IT model, they assign an IT membership probability for each article in the database, and then they retrieve the articles with highest probabilities.‎Findings – The authors have extracted a set of IT keyphrases, with 1,497 terms through the keyphrase extraction process, for the probabilistic model.‎ They have evaluated the proposed retrieval approach with two approaches: the query-based approach in which the articles are retrieved from WoS using a set of queries composed of limited IT keywords, and the research area-based approach which is based on retrieving the articles using WoS categorizations and research areas.‎ The evaluation and comparison results show that the proposed approach is able to generate more accurate results while retrieving more articles related to IT.‎Research limitations/implications – Although this research is limited to the IT subject, it can be generalized for any subject as well.‎ However, for multidisciplinary topics such as IT, special attention should be given to the keyphrase extraction phase.‎ In this research, bigram model is used;‎ however, one can extend it to tri-gram as well.‎Originality/value – This paper introduces an integrated approach for retrieving IT-related documents from a collection of scientific documents.‎ The approach has two main phases: building a model for representing topic IT, and retrieving documents based on the model.‎ The model, based on a set of keyphrases, extracted from a collection of IT articles.‎ However, the extraction technique does not rely on Term Frequency- Inverse Document Frequency, since almost all of the articles in the collection share a set of same keyphrases.‎ In addition, a probabilistic membership score is defined to retrieve the IT articles from a collection of scientific articles.‎ 
APA, Harvard, Vancouver, ISO, and other styles
32

Ma, Jingxia. "Research on Keyword Extraction Algorithm in English Text Based on Cluster Analysis." Computational Intelligence and Neuroscience 2022 (March 28, 2022): 1–8. http://dx.doi.org/10.1155/2022/4293102.

Full text
Abstract:
How to facilitate users to quickly and accurately search for the text information they need is a current research hotspot. Text clustering can improve the efficiency of information search and is an effective text retrieval method. Keyword extraction and cluster center point selection are key issues in text clustering research. Common keyword extraction algorithms can be divided into three categories: semantic-based algorithms, machine learning-based algorithms, and statistical model-based algorithms. There are three common methods for selecting cluster centers: randomly selecting the initial cluster center point, manually specifying the cluster center point, and selecting the cluster center point according to the similarity between the points to be clustered. The randomly selected initial cluster center points may contain “outliers,” and the clustering results are locally optimal. Manually specifying the cluster center points will be very subjective because each person’s understanding of the text set is different, and it is not suitable for the case of a large number of text sets. Selecting the cluster center points according to the similarity between the points to be clustered can make the selected cluster center points distributed in each class and be as close as possible to the class center points, but it takes a long time to calculate the cluster centers. Aiming at this problem, this paper proposes a keyword extraction algorithm based on cluster analysis. The results show that the algorithm does not rely on background knowledge bases, dictionaries, etc., and obtains statistical parameters and builds models through training. Experiments show that the keyword extraction algorithm has high accuracy and can quickly extract the subject content of an English translation.
APA, Harvard, Vancouver, ISO, and other styles
33

Mehrabi, Elahe, Azadeh Mohebi, and Abbas Ahmadi. "Improving the RAKE Algorithm for Keyword Extraction from Persian Scientific Texts: A Case Study of Persian Theses and Dissertations." Journal of Information Processing and Management 37, no. 1 (2021): 197–228. https://doi.org/10.5281/zenodo.14035093.

Full text
Abstract:
Keywords are subsets of words or phrases within a document that can describe the meaning of the document and play a crucial role in the information retrieval process. Since the extraction of keywords or key phrases from specialized and scientific texts is a specialized and time-consuming task, and the volume of scientific documents requiring keywords is increasing, various algorithms have been designed and implemented for the specialized and automatic extraction of keywords and key phrases from documents.  RAKE (Rapid Automatic Keyword Extraction) is a widely used algorithm for extracting keywords from texts. The RAKE algorithm primarily focuses on keywords that typically consist of multiple words (i.e., key phrases) but do not include punctuation marks, meaningless words, or stop words. In this algorithm, part-of-speech tagging is used as a tool to determine the importance of words in sentences. Keywords are scored based on specific criteria, resulting in a set of multi-word or single-word sequences. In this research, an improved version of the RAKE algorithm for automatic keyword extraction is presented. The enhanced version aims to increase the precision and recall of the extracted key phrases by making changes to the scoring criteria for candidate phrases. The proposed solution for improving the RAKE algorithm takes into account the existing weaknesses in the weighting approaches of this algorithm, particularly for the Persian language and scientific documents.  To investigate the weaknesses of the RAKE algorithm and provide a proposed solution, a collection of metadata from Persian theses and dissertations was utilized. The proposed solution was tested and evaluated on this dataset, resulting in increased precision, recall, and F-measure metrics.
APA, Harvard, Vancouver, ISO, and other styles
34

Le, Tuyen, H. David Jeong, Stephen B. Gilbert, and Evgeny Chukharev-Hudilainen. "Generating partial civil information model views using a semantic information retrieval approach." Journal of Information Technology in Construction 25 (January 29, 2020): 41–54. http://dx.doi.org/10.36680/j.itcon.2020.002.

Full text
Abstract:
Open data standards (e.g. LandXML, TransXML, CityGML) are a key to addressing the interoperability issue in exchanging civil information modeling (CIM) data throughout the project life-cycle. Since these schemas include rich sets of data types covering a wide range of assets and disciplines, model view definitions (MVDs) which define subsets of a schema are required to specify what types of data to be shared in accordance with a specific exchange scenario. The traditional procedure for generating and implementing MVDs is time-consuming and laborious as entities and attributes relevant to a particular data exchange context are manually identified by domain experts. This paper presents a method that can locate relevant information from a source XML data schema for a specific domain based on the user's keyword. The study employs a semantic resource of civil engineering terms to understand the semantics of a keyword-based query. The study also introduces a novel context-based search technique for retrieving related entities and their referenced objects. The developed method was tested on a gold standard of several LandXML subschemas. The experiment results show that the semantic MVD retrieval algorithm achieves a mean average precision of nearly 90%. The research is original, being a novel method for extracting partial civil information models given a keyword from the end user. The method is expected to become a fundamental tool assisting professionals in extracting data from complex digital datasets.
APA, Harvard, Vancouver, ISO, and other styles
35

Hadeel, Qasem Gheni, Mohammed Hussein Ahmed, and Kadhim Oleiwi Wed. "Suggesting new words to extract keywords from title and abstract." International Journal of Electrical and Computer Engineering (IJECE) 9, no. 5 (2019): 4441–45. https://doi.org/10.11591/ijece.v9i5.pp4441-4445.

Full text
Abstract:
When talking about the fundamentals of writing research papers, we find that keywords are still present in most research papers, but that does not mean that they exist in all of them, we can find papers that do not contain keywords. Keywords are those words or phrases that accurately reflect the content of the research paper. Keywords are an exact abbreviation of what the research carries in its content. The right keywords may increase the chance of finding the article or research paper and chances of reaching more people who should reach them. The importance of keywords and the essence of the research and address is mainly to attract these highly specialized and highly influential writers in their fields and who specialize in reading what holds the appropriate characteristics but they do not read and cannot read everything. In this paper, we extract new keywords by suggesting a set of words, these words were suggested according to the many mentioned in the researches with multiple disciplines in the field of computer. In our system, we take a number of words (as many as specified in the program) that come before the proposed words and consider it as new keywords. This system proved to be effective in finding keywords that correspond to some extent with the keywords developed by the author in his research.
APA, Harvard, Vancouver, ISO, and other styles
36

MORE, MAHADEV A. "CONTENT BASED IMAGE RETRIVAL USING DIFFERENT CLUSTERING TECHNIQUES." INTERANTIONAL JOURNAL OF SCIENTIFIC RESEARCH IN ENGINEERING AND MANAGEMENT 07, no. 09 (2023): 1–11. http://dx.doi.org/10.55041/ijsrem25835.

Full text
Abstract:
CBIR (Content based image retrieval) is the software system for retrieving the images from the database by using their features. In CBIR technique, the images are retrieved from the dataset by using the features like color, text, shape,texture and similarity. Object recognition technique is used in CBIR. Research on multimedia systems and content-based image retrieval is given tremendous importance during the last decade. The reason behind this is the fact that multimedia databases handle text, audio, video and image information, which are of prime interest in web and other high end user applications. Content-based Image retrieval deals with the extraction of knowledge, image data relationship, or other patternsnot expressly keep within the pictures. It uses ways from computer vision, image processing, image retrieval, data retrieval, machine learning, database and artificial intelligence. Rule retrieval has been applied to large image databases. The proposedsystem gives average accuracy of 90%. Keywords— CBIR, Color feature, Shape feature, Texture feature, Feature extraction, Clustering, Image Retrieval.
APA, Harvard, Vancouver, ISO, and other styles
37

Kuang, Hailan, Haoran Chen, Xiaolin Ma, and Xinhua Liu. "A Keyword Detection and Context Filtering Method for Document Level Relation Extraction." Applied Sciences 12, no. 3 (2022): 1599. http://dx.doi.org/10.3390/app12031599.

Full text
Abstract:
Relation extraction (RE) is the core link of downstream tasks, such as information retrieval, question answering systems, and knowledge graphs. Most of the current mainstream RE technologies focus on the sentence-level corpus, which has great limitations in practical applications. Moreover, the previously proposed models based on graph neural networks or transformers try to obtain context features from the global text, ignoring the importance of local features. In practice, the relation between entity pairs can usually be inferred just through a few keywords. This paper proposes a keyword detection and context filtering method based on the Self-Attention mechanism for document-level RE. In addition, a Self-Attention Memory (SAM) module in ConvLSTM is introduced to process the document context and capture keyword features. By searching for word embeddings with high cross-attention of entity pairs, we update and record critical local features to enhance the performance of the final classification model. The experimental results on three benchmark datasets (DocRED, CDR, and GBA) show that our model achieves advanced performance within open and specialized domain relationship extraction tasks, with up to 0.87% F1 value improvement compared to the state-of-the-art methods. We have also designed experiments to demonstrate that our model can achieve superior results by its stronger contextual filtering capability compared to other methods.
APA, Harvard, Vancouver, ISO, and other styles
38

Mohebi, Azadeh, Mehri Sedighi, and Zahra Zargaran. "Subject-based retrieval of scientific documents, case study." Library Review 66, no. 6/7 (2017): 549–69. http://dx.doi.org/10.1108/lr-10-2016-0090.

Full text
Abstract:
Purpose The purpose of this paper is to introduce an approach for retrieving a set of scientific articles in the field of Information Technology (IT) from a scientific database such as Web of Science (WoS), to apply scientometrics indices and compare them with other fields. Design/methodology/approach The authors propose to apply a statistical classification-based approach for extracting IT-related articles. In this approach, first, a probabilistic model is introduced to model the subject IT, using keyphrase extraction techniques. Then, they retrieve IT-related articles from all Iranian papers in WoS, based on a Bayesian classification scheme. Based on the probabilistic IT model, they assign an IT membership probability for each article in the database, and then they retrieve the articles with highest probabilities. Findings The authors have extracted a set of IT keyphrases, with 1,497 terms through the keyphrase extraction process, for the probabilistic model. They have evaluated the proposed retrieval approach with two approaches: the query-based approach in which the articles are retrieved from WoS using a set of queries composed of limited IT keywords, and the research area-based approach which is based on retrieving the articles using WoS categorizations and research areas. The evaluation and comparison results show that the proposed approach is able to generate more accurate results while retrieving more articles related to IT. Research limitations/implications Although this research is limited to the IT subject, it can be generalized for any subject as well. However, for multidisciplinary topics such as IT, special attention should be given to the keyphrase extraction phase. In this research, bigram model is used; however, one can extend it to tri-gram as well. Originality/value This paper introduces an integrated approach for retrieving IT-related documents from a collection of scientific documents. The approach has two main phases: building a model for representing topic IT, and retrieving documents based on the model. The model, based on a set of keyphrases, extracted from a collection of IT articles. However, the extraction technique does not rely on Term Frequency-Inverse Document Frequency, since almost all of the articles in the collection share a set of same keyphrases. In addition, a probabilistic membership score is defined to retrieve the IT articles from a collection of scientific articles.
APA, Harvard, Vancouver, ISO, and other styles
39

Perez Daniel, Karina Ruby, Enrique Escamilla Hernandez, Takayuki Nagai, and Mariko Nakano Miyatake. "Unknown objects drawing using image retrieval." Revista Facultad de Ingeniería Universidad de Antioquia, no. 61 (November 15, 2012): 146–57. http://dx.doi.org/10.17533/udea.redin.13546.

Full text
Abstract:
In this paper an unknown objects drawing model is proposed. Here we describe a technique that let us build a visual model of a word through images retrieved from Internet, enabling to learn any object at any time. This process is done without any prior knowledge of the objects appearance. However this information must be filtered in order to get the most meaningful image according to the keyword and allowing making the visual relation between words and images as much unsupervised as it could be possible like humans understanding. For this purpose Pyramid of Histogram of Oriented Gradients (PHOG) feature extraction, K-means clustering and color segmentation is done, and then the final image is drawn as an application of the learning process. The proposed model is implemented in a robot platform and some experiments are carried out to evaluate the accuracy of this algorithm.
APA, Harvard, Vancouver, ISO, and other styles
40

Karan, Pahlani*. "PRACTICABILITY INVESTIGATION & DESIGNING OF FILE AND WEB BASED INFORMATION EXTRACTION ALGORITHM." INTERNATIONAL JOURNAL OF ENGINEERING SCIENCES & RESEARCH TECHNOLOGY 6, no. 10 (2017): 180–83. https://doi.org/10.5281/zenodo.1002697.

Full text
Abstract:
Information extraction (IE) aims at extracting specific information from huge amount of documents. Now a day’s internet became a great source of information and contains immeasurable amount of data which makes it tedious for normal users to retrieve relevant data, therefore it is a demand of present time to have a efficient information extraction system that convert web pages and their data into user friendly structures for this purpose many extraction system has been developed with variable performance this paper will going to throw light on such one IE system. This research paper introduces a method that uses rule based technique to induce an extraction. This research paper enables the user to gather more relevant piece of information and helps to improve the search keyword to extract efficient desirable knowledge for end user
APA, Harvard, Vancouver, ISO, and other styles
41

J, Nafeesa Begum. "An Automated Daily News Reports Generating Application Involving Keyword-Based News Scraping, Summarization, and Sentimental Analysis Leveraging NLP Models." International Journal for Research in Applied Science and Engineering Technology 12, no. 11 (2024): 1556–66. http://dx.doi.org/10.22214/ijraset.2024.65443.

Full text
Abstract:
In the era of information overload, obtaining relevant, concise, and sentiment-analyzed news content is essential for effective decision-making. This paper introduces an automated daily news reporting system that streamlines the process of collecting, summarizing, and analyzing the sentiment of news articles fetched based on the user's keyword. By leveraging stateof-the-art Natural Language Processing (NLP) models like GPT-3.5 for two-level summarization and BERTweet for sentiment analysis, the system provides users with concise, sentiment-labeled reports, enhancing their understanding of news trends and emotional tone. The architecture integrates data scraping, text extraction, and sentiment classification within a cloud-based Python microservice, supported by Flask. The system incorporates user localization options, allowing users to customize news retrieval by region and time preferences. Finalized reports are formatted into HTML and delivered directly to authenticated user emails through Gmail API, ensuring seamless and secure distribution. This research underscores the significance of automated news summarization and sentiment analysis in modern information retrieval. It provides a scalable and personalized solution that enables real-time synthesis of large volumes of news content, making it accessible and relevant for diverse audiences worldwide.
APA, Harvard, Vancouver, ISO, and other styles
42

Vileiniskis, Tomas, and Rita Butkiene. "Applying Semantic Role Labeling and Spreading Activation Techniques for Semantic Information Retrieval." Information Technology And Control 49, no. 2 (2020): 275–88. http://dx.doi.org/10.5755/j01.itc.49.2.24985.

Full text
Abstract:
Semantically enhanced information retrieval (IR) is aimed at improving classical IR methods and goes way beyond plain Boolean keyword matching with the main goal of better serving implicit and ambiguous information needs. As a de-facto pre-requisite to semantic IR, different information extraction (IE) techniques are used to mine unstructured text for underlying knowledge. In this paper we present a method that combines both IE and IR to enable semantic search in natural language texts. First, we apply semantic role labeling (SRL) to automatically extract event-oriented information found in natural language texts to an RDF knowledge graph leveraging semantic web technology. Second, we investigate how a custom flavored graph traversal spreading activation algorithm can be employed to interpret user’s information needs on top of the prior-extracted knowledge base. Finally, we present an assessment on the applicability of our method for semantically enhanced IR. An experimental evaluation on partial WikiQA dataset shows the strengths of our approach and also unveils common pitfalls that we use as guidelines to draw further work directions in the open-domain semantic search field.
APA, Harvard, Vancouver, ISO, and other styles
43

Chen, Shihn-Yuarn, Chia-Ning Chang, Yi-Hsiang Nien, and Hao-Ren Ke. "Concept extraction and clustering for search result organization and virtual community construction." Computer Science and Information Systems 9, no. 1 (2012): 323–55. http://dx.doi.org/10.2298/csis101124020c.

Full text
Abstract:
This study proposes a concept extraction and clustering method, which improves Topic Keyword Clustering by using Log Likelihood Ratio for semantic correlation and Bisection K-Means for document clustering. Two value-added services are proposed to show how this approach can benefit information retrieval (IR) systems. The first service focuses on the organization and visual presentation of search results by clustering and bibliographic coupling. The second one aims at constructing virtual research communities and recommending significant papers to researchers. In addition to the two services, this study conducts quantitative and qualitative evaluations to show the feasibility of the proposed method; moreover, comparison with the previous approach is also performed. The experimental results show that the accuracy of the proposed method for search result organization reaches 80%, outperforming Topic Keyword Clustering. Both the precision and recall of virtual community construction are higher than 70%, and the accuracy of paper recommendation is almost 90%.
APA, Harvard, Vancouver, ISO, and other styles
44

Rahman, Mohammed Mahmudur. "Development of a semantic search method for retrieving food related verses & concepts from the holy Quran using ontology." IIUC Studies 18, no. 1 (2022): 101–22. http://dx.doi.org/10.3329/iiucs.v18i1.61277.

Full text
Abstract:
Qur’an is Allah (SWT)’s greatest miracle and is the source of all knowledge and information. As a Muslim, obligation is not only to recite the Qur’an but also it is important to gain knowledge from the Qur’an. Ontology is the best way to retrieve Quranic knowledge in a technical way. With the help of internet many search engines are found to discover information from Qur’an. But most of the search engines or information retrieval systems are keyword-based which can often lead to irrelevant results. To overcome this problem, semantic search is most useful in this case. The aim of research is to develop an ontological semantic based method to retrieve the food related verses and concepts from holy Qur’an by using natural language query. In this work, triplet extraction algorithm has been used for generating triple, the protege OWL editor 4.3 version used to create food ontology and the SPARQL Apache Jena fuseki server was used for querying. Quranic data are collected from English translation of the holy Qur’an. IIUC Studies Vol.18, December 2021: 101-122
APA, Harvard, Vancouver, ISO, and other styles
45

Lin, Jin Hui, and Ji Xiang Zhang. "Based on Wavelet Multi-Resolution Analysis of Image Retrieval and Reviewed." Advanced Materials Research 546-547 (July 2012): 595–98. http://dx.doi.org/10.4028/www.scientific.net/amr.546-547.595.

Full text
Abstract:
Multimedia technology development and the implementation of the Internet technology leading to a large number of image information appear, based on keywords retrieval methods of traditional text can not meet the requirement of image information retrieval, this makes the content-based image retrieval technology gradually become the focus of research. Based on the content retrieval technology of the key step is necessary in the extraction of image features, which can be used to extract the features such as color, texture and shape. However, because of the image characteristics can only hold each image similarity of a certain aspects, therefore how to better image will be based on content said image retrieval one of the important research direction. This article reviews some content-based image retrieval comparison, such as color features and the texture characteristics and the extraction of the kind of method, each have their own advantages is in.
APA, Harvard, Vancouver, ISO, and other styles
46

Fika Hastarita Rachman, Rike Ayu Arista, Ika Oktavia Suzanti, Yonathan Ferry Hendrawan, and Aryono Yerey Wibowo. "Medical Information Retrieval with Weighting Critical Score for Acute Respiratory Infection (ARI) Desease Detection." Technium: Romanian Journal of Applied Sciences and Technology 17 (November 1, 2023): 457–62. http://dx.doi.org/10.47577/technium.v17i.10124.

Full text
Abstract:
Medical Information Retrieval (Med-IR )is part of computer science that discusses the search for a medical document. Medical Information Retrieval is needed by patients to know the initial prediction of the symptoms they are experiencing. ARI (Acute Respiratory Infection) is a disease that almost everyone has experienced which can cause death. This study uses a dataset of ARI sufferers and user queries that contain symptoms in text form. Furthermore, the query data is processed with the Med-IR application using Bi-Gram, TF-IDF as the feature extraction and Cosine Similarity as the similarity method, so that a return document is produced which is expected to be used as an early prediction of ARI in patients. The research also uses a critical disease wighting process, so that the results of the Med-IR are complemented by predictions of the severity level of the disease. From the results of research conducted at the Assyafi'u Sentosa Lengkong Clinic, Nganjuk, the best results were obtained for precision values ​​of 85.5% and 52.9% for recall values ​. The evaluation of disease severity with Mean Absolute Percentage Error (MAPE) getting a low score of 2,529%. Keyword : Medical Information Retrieval, ARI, Weighting critical disease, Bi-Gram, TF-IDF, Cosine Similarity.
APA, Harvard, Vancouver, ISO, and other styles
47

Guan, Bo, Lichun Gong, and Yanzhao Shen. "A Novel Coverless Text Steganographic Algorithm Based on Polynomial Encryption." Security and Communication Networks 2022 (March 7, 2022): 1–12. http://dx.doi.org/10.1155/2022/1153704.

Full text
Abstract:
Aiming at the problems of low text utilization rate and ambiguity of secret information extraction in the “tag + keyword” coverless information hiding methods, we propose a coverless information hiding method of Chinese text based on polynomial encryption in this paper. The secret data communication is realized by the analogously forged URLs transmission, which improves the security of mobile computing to avoid the risk that the carrier texts are attacked maliciously. The text utilization rate and retrieval success rate are improved by using tags to select multiple keywords to expand the number of keywords in the index table. The keyword’s location in secret message is obtained through text vocabulary matching, and the tag index and location information are encrypted by polynomial for secret information transmission. The split keywords of secret message can be transmitted in disorder. The experimental result shows that the secret message can be successfully extracted by sorting the extracted keywords according to the frequency and relevance of their combined tags. In our experiments with different test databases, both the hiding success rate and the average hiding capacity are improved with higher test values than other existing methods.
APA, Harvard, Vancouver, ISO, and other styles
48

Varghese, Nisha, and Shafi Shereef. "DOMAIN-SPECIFIC TOKEN RECOGNITION USING BIDIRECTIONAL ENCODER REPRESENTATIONS FROM TRANSFORMERS AND SCIBERT." ICTACT Journal on Microelectronics 10, no. 2 (2024): 1817–21. https://doi.org/10.21917/ijme.2024.0314.

Full text
Abstract:
Make machines to read and comprehend information from natural language documents are not an easy task. Machine reading comprehension is a solution to alleviate this issue by extracting the relevant information from the corpus by posing a question based on the context. The problem associated with this knowledge retrieval is in the correct answer extraction from the context with language understanding. The traditional rule-based, keyword search and deep learning approaches are inadequate to infer the right answer from the input context. The Transformer based methodologies are used to excerpt the most accurate answer from the context document. This article utilizes one of the exceptional transformer models - BERT (Bidirectional Encoder Representations from Transformers) for empirical analysis for Neural Machine Reading Comprehension. This article aims to reveal the differences between the BERT and the domain-specific models. Furthermore, explores the need for domain specific models and how these models outperform the BERT.
APA, Harvard, Vancouver, ISO, and other styles
49

Nijat Babayev, Nijat Babayev. "IMPLEMENTATION AND RESEARCH OF INTELLIGENT INFORMATION RETRIEVAL AND DATA ANALYSIS IN WEB APPLICATIONS." ETM - Equipment, Technologies, Materials 13, no. 01 (2023): 65–73. http://dx.doi.org/10.36962/etm13012023-65.

Full text
Abstract:
The main stages of Web-Mining were described, the main difference between information extraction and information retrieval was highlighted in the article. The contents of Web Content Mining (extraction of Web content), Web Structure Mining (extraction of Web structures), and Web Usage Mining (analysis of the use of Web resources) have been clarified separately. In the section entitled “Information systems for Data Mining and processes in Web Applications”, it was shown that when using Web Mining, developers face several types of challenges. Also, the main approaches to conducting intelligent search of information and data analysis in web applications were discussed in the article. The components of the Data Mining model and the stages of its application was considered. As a practical example of the implementation of a software product for research, Russian development - the program PolyAnalyst was considered. It was also noted that the professional version of the system - PolyAnalyst Pro allows us to analyze various data formats and optimize business processes. The system performs a full range of data analysis tasks: loading, combining, cleaning and transforming data, deep text analysis, extracting information, visualizing results and creating reports. In the conclusion part it was noted that to improve the efficiency of work in real time, the creation of new software tools using Web Mining can bring maximum benefits. Building such intelligent analyzers can help analyze log files efficiently. Keywords: Web Mining, Data Mining, Web applications, information retrieval, Web resources.
APA, Harvard, Vancouver, ISO, and other styles
50

Hakdağlı, Özlem. "Hybrid Question-Answering System: A FAISS and BM25 Approach for Extracting Information from Technical Document." Orclever Proceedings of Research and Development 5, no. 1 (2024): 226–37. https://doi.org/10.56038/oprd.v5i1.535.

Full text
Abstract:
In this study, a hybrid question-answering system was developed to accelerate access to information contained in corporate technical documents and to generate appropriate responses to user queries. The system combines dense vector-based retrieval (FAISS) and sparse text-based retrieval (BM25) methods, integrated with the XLM-RoBERTa Large model. Evaluations conducted on a dataset consisting of 23 technical documents demonstrated the system's effectiveness in responding to both semantic and keyword-based queries. This study presents an innovative approach that enables fast and accurate access to information from technical documents, enhancing the efficiency of corporate knowledge management processes.
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!