To see the other types of publications on this topic, follow the link: Web document clustering (WDC).

Journal articles on the topic 'Web document clustering (WDC)'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 journal articles for your research on the topic 'Web document clustering (WDC).'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse journal articles on a wide variety of disciplines and organise your bibliography correctly.

1

Rani Manukonda, Sumathi, Asst Prof Kmit, Narayanguda ., et al. "Efficient Document Clustering for Web Search Result." International Journal of Engineering & Technology 7, no. 3.3 (2018): 90. http://dx.doi.org/10.14419/ijet.v7i3.3.14494.

Full text
Abstract:
Clustering the document in data mining is one of the traditional approach in which the same documents that are more relevant are grouped together. Document clustering take part in achieving accuracy that retrieve information for systems that identifies the nearest neighbors of the document. Day to day the massive quantity of data is being generated and it is clustered. According to particular sequence to improve the cluster qualityeven though different clustering methods have been introduced, still many challenges exist for the improvement of document clustering. For web search purposea docume
APA, Harvard, Vancouver, ISO, and other styles
2

Im, Yeong-Hui. "A Post Web Document Clustering Algorithm." KIPS Transactions:PartB 9B, no. 1 (2002): 7–16. http://dx.doi.org/10.3745/kipstb.2002.9b.1.007.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

He, Xiaofeng, Hongyuan Zha, Chris H.Q. Ding, and Horst D. Simon. "Web document clustering using hyperlink structures." Computational Statistics & Data Analysis 41, no. 1 (2002): 19–45. http://dx.doi.org/10.1016/s0167-9473(02)00070-1.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Creţulescu, Radu G., Daniel I. Morariu, Macarie Breazu, and Daniel Volovici. "DBSCAN Algorithm for Document Clustering." International Journal of Advanced Statistics and IT&C for Economics and Life Sciences 9, no. 1 (2019): 58–66. http://dx.doi.org/10.2478/ijasitels-2019-0007.

Full text
Abstract:
AbstractDocument clustering is a problem of automatically grouping similar document into categories based on some similarity metrics. Almost all available data, usually on the web, are unclassified so we need powerful clustering algorithms that work with these types of data. All common search engines return a list of pages relevant to the user query. This list needs to be generated fast and as correct as possible. For this type of problems, because the web pages are unclassified, we need powerful clustering algorithms. In this paper we present a clustering algorithm called DBSCAN – Density-Bas
APA, Harvard, Vancouver, ISO, and other styles
5

Hammouda, K. M., and M. S. Kamel. "Efficient phrase-based document indexing for Web document clustering." IEEE Transactions on Knowledge and Data Engineering 16, no. 10 (2004): 1279–96. http://dx.doi.org/10.1109/tkde.2004.58.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Chawla, Suruchi. "Application of Convolution Neural Networks in Web Search Log Mining for Effective Web Document Clustering." International Journal of Information Retrieval Research 12, no. 1 (2022): 1–14. http://dx.doi.org/10.4018/ijirr.300367.

Full text
Abstract:
The volume of web search data stored in search engine log is increasing and has become big search log data. The web search log has been the source of data for mining based on web document clustering techniques to improve the efficiency and effectiveness of information retrieval. In this paper Deep Learning Model Convolution Neural Network(CNN) is used in big web search log data mining to learn the semantic representation of a document. These semantic documents vectors are clustered using K-means to group relevant documents for effective web document clustering. Experiment was done on the data
APA, Harvard, Vancouver, ISO, and other styles
7

Shen Huang, Zheng Chen, Yong Yu, and Wei-Ying Ma. "Multitype features coselection for Web document clustering." IEEE Transactions on Knowledge and Data Engineering 18, no. 4 (2006): 448–59. http://dx.doi.org/10.1109/tkde.2006.1599384.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Chan, Samuel W. K., and Mickey W. C. Chong. "Unsupervised clustering for nontextual web document classification." Decision Support Systems 37, no. 3 (2004): 377–96. http://dx.doi.org/10.1016/s0167-9236(03)00035-6.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Boley, Daniel, Maria Gini, Robert Gross, et al. "Partitioning-based clustering for Web document categorization." Decision Support Systems 27, no. 3 (1999): 329–41. http://dx.doi.org/10.1016/s0167-9236(99)00055-x.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

Su, Zhong, Qiang Yang, Hongjiang Zhang, Xiaowei Xu, Yu-Hen Hu, and Shaoping Ma. "Correlation-Based Web Document Clustering for Adaptive Web Interface Design." Knowledge and Information Systems 4, no. 2 (2002): 151–67. http://dx.doi.org/10.1007/s101150200002.

Full text
APA, Harvard, Vancouver, ISO, and other styles
11

Li, Zhao, and Xindong Wu. "A Phrase-Based Method for Hierarchical Clustering of Web Snippets." Proceedings of the AAAI Conference on Artificial Intelligence 24, no. 1 (2010): 1947–48. http://dx.doi.org/10.1609/aaai.v24i1.7773.

Full text
Abstract:
Document clustering has been applied in web information retrieval, which facilitates users’ quick browsing by organizing retrieved results into different groups. Meanwhile, a tree-like hierarchical structure is wellsuited for organizing the retrieved results in favor of web users. In this regard, we introduce a new method for hierarchical clustering of web snippets by exploiting a phrase-based document index. In our method, a hierarchy of web snippets is built based on phrases instead of all snippets, and the snippets are then assigned to the corresponding clusters consisting of phrases. We sh
APA, Harvard, Vancouver, ISO, and other styles
12

Kaushik, Kishore Phukon. "Incorporation of contextual information through Graph Modeling in Web content mining." Indian Journal of Science and Technology 13, no. 46 (2020): 4573–78. https://doi.org/10.17485/IJST/v13i46.1660.

Full text
Abstract:
Abstract <strong>Objectives:</strong>&nbsp;The objectives of this research article is to deal with the problem of web document clustering by modeling the web documents as directed completely labeled graphs that incorporate contextual information in the computation process to the extent required. The computational complexity of the MCS algorithm based on this graph model is O(n2), n being the number of nodes. As graph similarity using MCS is an NP-complete problem, so this is an important result that allows us to forgo sub-optimal approximation approaches and find the exact solution in polynomi
APA, Harvard, Vancouver, ISO, and other styles
13

Sawalkar, Abhishek, Mohit Mandlecha, Dnyanesh Kulkarni, and Dr Ratnamala S. Paswan. "Comparing the Performance of SOM with Traditional Methods for Document Clustering Using Wordnet Ontologies." International Journal for Research in Applied Science and Engineering Technology 10, no. 4 (2022): 1512–18. http://dx.doi.org/10.22214/ijraset.2022.41554.

Full text
Abstract:
Abstract: Retrieving useful information has become challenging due to the rapid expansion of web material. To improve the retrieval outcomes, efficient clustering methods are required. Document clustering is the process of identifying similarities and differences among given objects and grouping them into clusters with comparable features. We used WordNet lexical as an addition to compare several document clustering techniques in this article. The suggested method employs WordNet to determine the relevance of the concepts in the text, and then clusters the content using several document cluste
APA, Harvard, Vancouver, ISO, and other styles
14

Tarczynski, Tomasz. "Document Clustering - Concepts, Metrics and Algorithms." International Journal of Electronics and Telecommunications 57, no. 3 (2011): 271–77. http://dx.doi.org/10.2478/v10177-011-0036-5.

Full text
Abstract:
Document Clustering - Concepts, Metrics and AlgorithmsDocument clustering, which is also refered to astext clustering, is a technique of unsupervised document organisation. Text clustering is used to group documents into subsets that consist of texts that are similar to each orher. These subsets are called clusters. Document clustering algorithms are widely used in web searching engines to produce results relevant to a query. An example of practical use of those techniques are Yahoo! hierarchies of documents [1]. Another application of document clustering is browsing which is defined as search
APA, Harvard, Vancouver, ISO, and other styles
15

Jinarat, Supakpong, Choochart Haruechaiyasak, and Arnon Rungsawang. "Graph-Based Concept Clustering for Web Search Results." International Journal of Electrical and Computer Engineering (IJECE) 5, no. 6 (2015): 1536. http://dx.doi.org/10.11591/ijece.v5i6.pp1536-1544.

Full text
Abstract:
A search engine usually returns a long list of web search results corresponding to a query from the user. Users must spend a lot of time for browsing and navigating the search results for the relevant results. Many research works applied the text clustering techniques, called web search results clustering, to handle the problem. Unfortunately, search result document returned from search engine is a very short text. It is difficult to cluster related documents into the same group because a short document has low informative content. In this paper, we proposed a method to cluster the web search
APA, Harvard, Vancouver, ISO, and other styles
16

Sung, Ki-Youn, and Bo-Hyun Yun. "Topic based Web Document Clustering using Named Entities." Journal of the Korea Contents Association 10, no. 5 (2010): 29–36. http://dx.doi.org/10.5392/jkca.2010.10.5.029.

Full text
APA, Harvard, Vancouver, ISO, and other styles
17

He, Y., S. C. Hui, and A. C. M. Fong. "Mining a web citation database for document clustering." Applied Artificial Intelligence 16, no. 4 (2002): 283–302. http://dx.doi.org/10.1080/08839510252906462.

Full text
APA, Harvard, Vancouver, ISO, and other styles
18

Khan, M. Shamim, and Sebastian W. Khor. "Web document clustering using a hybrid neural network." Applied Soft Computing 4, no. 4 (2004): 423–32. http://dx.doi.org/10.1016/j.asoc.2004.02.003.

Full text
APA, Harvard, Vancouver, ISO, and other styles
19

Fersini, E., E. Messina, and F. Archetti. "A probabilistic relational approach for web document clustering." Information Processing & Management 46, no. 2 (2010): 117–30. http://dx.doi.org/10.1016/j.ipm.2009.08.003.

Full text
APA, Harvard, Vancouver, ISO, and other styles
20

Kaneko, Masaya, Shusuke Okamoto, Masaki Kohana, and You Inayoshi. "Document clustering based on web search hit counts." International Journal of Business Intelligence and Data Mining 8, no. 1 (2013): 61. http://dx.doi.org/10.1504/ijbidm.2013.055787.

Full text
APA, Harvard, Vancouver, ISO, and other styles
21

Balaji Dhashanamoorthi. "Construction of suffix tree using key phrases for document using down-top incremental conceptual hierarchical text clustering approach." International Journal of Science and Research Archive 6, no. 1 (2022): 294–307. http://dx.doi.org/10.30574/ijsra.2022.6.1.0143.

Full text
Abstract:
With development of technologies in the World Wide Web, usage of document increases day by day. In order to access the document easily, document clustering technique is introduced. In the field of data mining, document clustering plays a vital role. Organizing the unstructured and unlabeled document is one of the major problems and it is ever growing and complex. Handling of such unorganized documents causes more expensive. Hence, challenges raised by the continuing growth of unstructured and unlabeled documents are handled in this proposed work. Document clustering is one of the most powerful
APA, Harvard, Vancouver, ISO, and other styles
22

Subhashini, R., and V. Jawahar Senthil Kumar. "A Roadmap to Integrate Document Clustering in Information Retrieval." International Journal of Information Retrieval Research 1, no. 1 (2011): 31–44. http://dx.doi.org/10.4018/ijirr.2011010103.

Full text
Abstract:
The World Wide Web is a large distributed digital information space. The ability to search and retrieve information from the Web efficiently and effectively is an enabling technology for realizing its full potential. Information Retrieval (IR) plays an important role in search engines. Today’s most advanced engines use the keyword-based (“bag of words”) paradigm, which has inherent disadvantages. Organizing web search results into clusters facilitates the user’s quick browsing of search results. Traditional clustering techniques are inadequate because they do not generate clusters with highly
APA, Harvard, Vancouver, ISO, and other styles
23

Takale, Sheetal A., Prakash J. Kulkarni, and Sahil K. Shah. "An Intelligent Web Search Using Multi-Document Summarization." International Journal of Information Retrieval Research 6, no. 2 (2016): 41–65. http://dx.doi.org/10.4018/ijirr.2016040103.

Full text
Abstract:
Information available on the internet is huge, diverse and dynamic. Current Search Engine is doing the task of intelligent help to the users of the internet. For a query, it provides a listing of best matching or relevant web pages. However, information for the query is often spread across multiple pages which are returned by the search engine. This degrades the quality of search results. So, the search engines are drowning in information, but starving for knowledge. Here, we present a query focused extractive summarization of search engine results. We propose a two level summarization process
APA, Harvard, Vancouver, ISO, and other styles
24

Obidallah, Waeal J., Bijan Raahemi, and Waleed Rashideh. "Multi-Layer Web Services Discovery Using Word Embedding and Clustering Techniques." Data 7, no. 5 (2022): 57. http://dx.doi.org/10.3390/data7050057.

Full text
Abstract:
We propose a multi-layer data mining architecture for web services discovery using word embedding and clustering techniques to improve the web service discovery process. The proposed architecture consists of five layers: web services description and data preprocessing; word embedding and representation; syntactic similarity; semantic similarity; and clustering. In the first layer, we identify the steps to parse and preprocess the web services documents. In the second layer, Bag of Words with Term Frequency–Inverse Document Frequency and three word-embedding models are employed for web services
APA, Harvard, Vancouver, ISO, and other styles
25

Chawla, Suruchi. "Application of Fuzzy C-Means Clustering and Semantic Ontology in Web Query Session Mining for Intelligent Information Retrieval." International Journal of Fuzzy System Applications 10, no. 1 (2021): 1–19. http://dx.doi.org/10.4018/ijfsa.2021010101.

Full text
Abstract:
Information retrieval based on keywords search retrieves irrelevant documents because of vocabulary gap between document content and search queries. The keyword vector representation of web documents is very high dimensional, and keyword terms are unable to capture the semantic of document content. Ontology has been built in various domains for representing the semantics of documents based on concepts relevant to document subject. The web documents often contain multiple topics; therefore, fuzzy c-means document clustering has been used for discovering clusters with overlapping boundaries. In
APA, Harvard, Vancouver, ISO, and other styles
26

Avanija, J., and K. Ramar. "Semantic Clustering of Web Documents." International Journal of Information Technology and Web Engineering 7, no. 4 (2012): 20–33. http://dx.doi.org/10.4018/jitwe.2012100102.

Full text
Abstract:
With the massive growth and large volume of the web it is very difficult to recover results based on the user preferences. The next generation web architecture, semantic web reduces the burden of the user by performing search based on semantics instead of keywords. Even in the context of semantic technologies optimization problem occurs but rarely considered. In this paper document clustering is applied to recover relevant documents. The authors propose an ontology based clustering algorithm using semantic similarity measure and Particle Swarm Optimization (PSO), which is applied to the annota
APA, Harvard, Vancouver, ISO, and other styles
27

R, Subhashini, and Jawahar Senthil Kumar .V. "A NOVEL DOCUMENT CLUSTERING FOR ORGANIZING THE WEB PAGES." International Journal on Information Sciences and Computing 4, no. 2 (2010): 49–54. http://dx.doi.org/10.18000/ijisac.50079.

Full text
APA, Harvard, Vancouver, ISO, and other styles
28

Carullo, Moreno, Elisabetta Binaghi, and Ignazio Gallo. "An online document clustering technique for short web contents." Pattern Recognition Letters 30, no. 10 (2009): 870–76. http://dx.doi.org/10.1016/j.patrec.2009.04.001.

Full text
APA, Harvard, Vancouver, ISO, and other styles
29

Hammouda, Khaled, and Mohamed Kamel. "Distributed collaborative Web document clustering using cluster keyphrase summaries." Information Fusion 9, no. 4 (2008): 465–80. http://dx.doi.org/10.1016/j.inffus.2006.12.001.

Full text
APA, Harvard, Vancouver, ISO, and other styles
30

Fadllullah, Arif, Dasrit Debora Kamudi, Muhamad Nasir, Agus Zainal Arifin, and Diana Purwitasari. "WEB NEWS DOCUMENTS CLUSTERING IN INDONESIAN LANGUAGE USING SINGULAR VALUE DECOMPOSITION-PRINCIPAL COMPONENT ANALYSIS (SVDPCA) AND ANT ALGORITHMS." Jurnal Ilmu Komputer dan Informasi 9, no. 1 (2016): 17. http://dx.doi.org/10.21609/jiki.v9i1.362.

Full text
Abstract:
Ant-based document clustering is a cluster method of measuring text documents similarity based on the shortest path between nodes (trial phase) and determines the optimal clusters of sequence document similarity (dividing phase). The processing time of trial phase Ant algorithms to make document vectors is very long because of high dimensional Document-Term Matrix (DTM). In this paper, we proposed a document clustering method for optimizing dimension reduction using Singular Value Decomposition-Principal Component Analysis (SVDPCA) and Ant algorithms. SVDPCA reduces size of the DTM dimensions
APA, Harvard, Vancouver, ISO, and other styles
31

Al-Mofareji, Hanan, Mahmoud Kamel, and Mohamed Y. Dahab. "WeDoCWT: A New Method for Web Document Clustering Using Discrete Wavelet Transforms." Journal of Information & Knowledge Management 16, no. 01 (2017): 1750004. http://dx.doi.org/10.1142/s0219649217500046.

Full text
Abstract:
Organizing web information is an important aspect of finding information in the easiest and most efficient way. We present a new method for web document clustering called WeDoCWT, which exploits the discrete wavelet transform and term signal, to improve the document representation. We studied different methods for document segmentation to construct the term signals. We used two datasets, UW-CAN and WebKB, to evaluate the proposed method. The experimental results indicated that dividing the documents into fixed segments is preferable to dividing them into logical segments based on HTML features
APA, Harvard, Vancouver, ISO, and other styles
32

Zhao, Ying, Ya Jun Du, and Qiang Qiang Peng. "Clustering Chinese Web Search Results Based on Association Calculation." Applied Mechanics and Materials 55-57 (May 2011): 1418–23. http://dx.doi.org/10.4028/www.scientific.net/amm.55-57.1418.

Full text
Abstract:
Clustering web search results is a kind of solution which help user to find the interested topic by grouping the search results. This paper presents an improved method for clustering search results focused on Chinese web pages. The main contributions of this paper are the following: First, in this paper, a method which identifies the complete semantic information phrase by comparing the attributes of base clusters in the suffix tree document model and the overlap of their document sets is presented. Second, by analyzing the content and structure of title and snippet of Chinese web search resul
APA, Harvard, Vancouver, ISO, and other styles
33

TSEKOURAS, GEORGE E., and DAMIANOS GAVALAS. "AN EFFECTIVE FUZZY CLUSTERING ALGORITHM FOR WEB DOCUMENT CLASSIFICATION: A CASE STUDY IN CULTURAL CONTENT MINING." International Journal of Software Engineering and Knowledge Engineering 23, no. 06 (2013): 869–86. http://dx.doi.org/10.1142/s021819401350023x.

Full text
Abstract:
This article presents a novel crawling and clustering method for extracting and processing cultural data from the web in a fully automated fashion. Our architecture relies upon a focused web crawler to download web documents relevant to culture. The focused crawler is a web crawler that searches and processes only those web pages that are relevant to a particular topic. After downloading the pages, we extract from each document a number of words for each thematic cultural area, filtering the documents with non-cultural content; we then create multidimensional document vectors comprising the mo
APA, Harvard, Vancouver, ISO, and other styles
34

Ping, Deng Li, Guo Bing, and Zheng Wen. "Web Service Clustering Approach Based on Network and Fused Document-Based and Tag-Based Topics Similarity." International Journal of Web Services Research 18, no. 3 (2021): 63–81. http://dx.doi.org/10.4018/ijwsr.2021070104.

Full text
Abstract:
To produce a web services clustering with values that satisfy many requirements is a challenging focus. In this article, the authors proposed a new approach with two models, which are helpful to the service clustering problem. Firstly, a document-tag LDA model (DTag-LDA) is proposed that considers the tag information of web services, and the tag can describe the effective information of documents accurately. Based on the first model, this article further proposes an efficient document weight and tag weight-LDA model (DTw-LDA), which fused multi-modal data network. To further improve the cluste
APA, Harvard, Vancouver, ISO, and other styles
35

Nishina, Tomoya, and Akira Utsumi. "Web Document Clustering Based on the Clusters of Topic Words." Journal of Natural Language Processing 17, no. 4 (2010): 23–41. http://dx.doi.org/10.5715/jnlp.17.4_23.

Full text
APA, Harvard, Vancouver, ISO, and other styles
36

Krishnaraj, Dr N., Dr P. Kumar, and Sri K. Bhagavan. "Conceptual Semantic Model for Web Document Clustering Using Term Frequency." EAI Endorsed Transactions on Energy Web 5, no. 20 (2018): 155744. http://dx.doi.org/10.4108/eai.12-9-2018.155744.

Full text
APA, Harvard, Vancouver, ISO, and other styles
37

Srikanth, D., and S. Sakthivel. "Time and Space Efficient Web Document Clustering Using Rayleigh Distribution." Wireless Personal Communications 102, no. 4 (2018): 3255–68. http://dx.doi.org/10.1007/s11277-018-5366-5.

Full text
APA, Harvard, Vancouver, ISO, and other styles
38

Luzon, Christine, Luisito Lacatan, Harold Bangalisan, and Jayvee Osapdin. "Web-Based File Clustering and Indexing for Mindoro State University." International Journal of Computing Sciences Research 6 (January 31, 2022): 951–61. http://dx.doi.org/10.25147/ijcsr.2017.001.1.82.

Full text
Abstract:
Purpose – The Web-Based File Clustering and Indexing for Mindoro State University aim to organize data circulated over the Web into groups/collections to facilitate data availability and access, and at the same time meet user preferences. The main benefits include: increasing Web information accessibility, understanding users’ navigation behavior, improving information retrieval and content delivery on the Web. Web-based file clustering could help in reaching the required documents that the user is searching for. Method – In this paper a novel approach has been introduced for search results clu
APA, Harvard, Vancouver, ISO, and other styles
39

T. Elavarasi. "Spectral Clustering-Based Particle Swarm Optimization Algorithm for Document Clustering." Journal of Information Systems Engineering and Management 10, no. 4s (2025): 134–46. https://doi.org/10.52783/jisem.v10i4s.487.

Full text
Abstract:
The process of automatically grouping documents into clusters such that the documents in one cluster are very comparable to the documents in the remaining clusters have been known as document clustering. Due to its broad application in a number of fields, including search engines, web mining, and information retrieval, it has been the subject of much research. It involves clustering documents that are identical to one another and calculating how identical they are. It facilitates simple navigation by offering effective document representation as well as visualization. Hence, this research pape
APA, Harvard, Vancouver, ISO, and other styles
40

Feng, Jian, Ying Zhang, and Yuqiang Qiao. "A Detection Method for Phishing Web Page Using DOM-Based Doc2Vec Model." Journal of Computing and Information Technology 28, no. 1 (2020): 19–31. http://dx.doi.org/10.20532/cit.2020.1004899.

Full text
Abstract:
Detecting phishing web pages is a challenging task. The existing detection method for phishing web page based on DOM (Document Object Model) is mainly aiming at obtaining structural characteristics but ignores the overall representation of web pages and the semantic information that HTML tags may have. This paper regards DOMs as a natural language with Doc2Vec model and learns the structural semantics automatically to detect phishing web pages. Firstly, the DOM structure of the obtained web page is parsed to construct the DOM tree, then the Doc2Vec model is used to vectorize the DOM tree, and
APA, Harvard, Vancouver, ISO, and other styles
41

Chahal, Poonam, and Manjeet Singh. "An Efficient Approach for Ranking of Semantic Web Documents by Computing Semantic Similarity and Using HCS Clustering." International Journal of Semiotics and Visual Rhetoric 5, no. 1 (2021): 45–56. http://dx.doi.org/10.4018/ijsvr.2021010104.

Full text
Abstract:
In today's era, with the availability of a huge amount of dynamic information available in world wide web (WWW), it is complex for the user to retrieve or search the relevant information. One of the techniques used in information retrieval is clustering, and then the ranking of the web documents is done to provide user the information as per their query. In this paper, semantic similarity score of Semantic Web documents is computed by using the semantic-based similarity feature combining the latent semantic analysis (LSA) and latent relational analysis (LRA). The LSA and LRA help to determine
APA, Harvard, Vancouver, ISO, and other styles
42

Li, Gui, Cheng Chen, Zheng Yu Li, Zi Yang Han, and Ping Sun. "Web Data Extraction Based on Tag Path Clustering." Advanced Materials Research 756-759 (September 2013): 1590–94. http://dx.doi.org/10.4028/www.scientific.net/amr.756-759.1590.

Full text
Abstract:
Fully automatic methods that extract structured data from the Web have been studied extensively. The existing methods suffice for simple extraction, but they often fail to handle more complicated Web pages. This paper introduces a method based on tag path clustering to extract structured data. The method gets complete tag path collection by parsing the DOM tree of the Web document. Clustering of tag paths is performed based on introduced similarity measure and the data area can be targeted, then taking advantage of features of tag position, we can separate and filter record, finally complete d
APA, Harvard, Vancouver, ISO, and other styles
43

Reka, M., and N. Shanthi. "An Efficient Multi-Dimensional Level based Semantic Relational Depthness Clustering for Enhancing Web Document Clustering." Asian Journal of Research in Social Sciences and Humanities 6, cs1 (2016): 343. http://dx.doi.org/10.5958/2249-7315.2016.00968.0.

Full text
APA, Harvard, Vancouver, ISO, and other styles
44

VarmaPamba, Raja, and Elizabeth Sherly. "EEWDCO: The Efficient way of Enhancing Web Document Clustering using Ontologies." International Journal of Computer Applications 86, no. 3 (2014): 23–25. http://dx.doi.org/10.5120/14966-3144.

Full text
APA, Harvard, Vancouver, ISO, and other styles
45

Ko, Suc-Bum, and Sung-Dae Youn. "A performance improvement methodology of web document clustering using FDC-TCT." KIPS Transactions:PartD 12D, no. 4 (2005): 637–46. http://dx.doi.org/10.3745/kipstd.2005.12d.4.637.

Full text
APA, Harvard, Vancouver, ISO, and other styles
46

Sinka, Mark P., and David W. Corne. "The BankSearch web document dataset: investigating unsupervised clustering and category similarity." Journal of Network and Computer Applications 28, no. 2 (2005): 129–46. http://dx.doi.org/10.1016/j.jnca.2004.01.002.

Full text
APA, Harvard, Vancouver, ISO, and other styles
47

Lee, Ingyu, and Byung-Won On. "An effective web document clustering algorithm based on bisection and merge." Artificial Intelligence Review 36, no. 1 (2011): 69–85. http://dx.doi.org/10.1007/s10462-011-9203-4.

Full text
APA, Harvard, Vancouver, ISO, and other styles
48

Xu, Shuting, and Jun Zhang. "A Parallel Hybrid Web Document Clustering Algorithm and its Performance Study." Journal of Supercomputing 30, no. 2 (2004): 117–31. http://dx.doi.org/10.1023/b:supe.0000040611.25862.d9.

Full text
APA, Harvard, Vancouver, ISO, and other styles
49

Li, Peng, Bin Wang, and Wei Jin. "Improving Web Document Clustering through Employing User-Related Tag Expansion Techniques." Journal of Computer Science and Technology 27, no. 3 (2012): 554–66. http://dx.doi.org/10.1007/s11390-012-1243-y.

Full text
APA, Harvard, Vancouver, ISO, and other styles
50

Haji, Saad Hikmat, Karwan Jacksi, and Razwan Mohmed Salah. "A Semantics-Based Clustering Approach for Online Laboratories Using K-Means and HAC Algorithms." Mathematics 11, no. 3 (2023): 548. http://dx.doi.org/10.3390/math11030548.

Full text
Abstract:
Due to the availability of a vast amount of unstructured data in various forms (e.g., the web, social networks, etc.), the clustering of text documents has become increasingly important. Traditional clustering algorithms have not been able to solve this problem because the semantic relationships between words could not accurately represent the meaning of the documents. Thus, semantic document clustering has been extensively utilized to enhance the quality of text clustering. This method is called unsupervised learning and it involves grouping documents based on their meaning, not on common key
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!