Log in

Relevant bibliographies by topics / Incremental Clustering

Contents

Journal articles
Dissertations / Theses
Books
Book chapters
Conference papers
Reports

Academic literature on the topic 'Incremental Clustering'

Author: Grafiati

Published: 4 June 2021

Last updated: 7 September 2023

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the lists of relevant articles, books, theses, conference reports, and other scholarly sources on the topic 'Incremental Clustering.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Journal articles on the topic "Incremental Clustering"

1

Ling Ping, Rong Xiangsheng, and Dong Yongquan. "Incremental Spectral Clustering." Journal of Convergence Information Technology 7, no. 15 (2012): 286–93. http://dx.doi.org/10.4156/jcit.vol7.issue15.34.

Full text

APA, Harvard, Vancouver, ISO, and other styles

2

Chaudhari, Archana Yashodip, and Preeti Mulay. "Cloud4NFICA-Nearness Factor-Based Incremental Clustering Algorithm Using Microsoft Azure for the Analysis of Intelligent Meter Data." International Journal of Information Retrieval Research 10, no. 2 (2020): 21–39. http://dx.doi.org/10.4018/ijirr.2020040102.

Full text

Abstract:

Intelligent electricity meters (IEMs) form a key infrastructure necessary for the growth of smart grids. IEMs generate a considerable amount of electricity data incrementally. However, on an influx of new data, traditional clustering task re-cluster all of the data from scratch. The incremental clustering method is an essential way to solve the problem of clustering with dynamic data. Given the volume of IEM data and the number of data types involved, an incremental clustering method is highly complex. Microsoft Azure provide the processing power necessary to handle incremental clustering analytics. The proposed Cloud4NFICA is a scalable platform of a nearness factor-based incremental clustering algorithm. This research uses the real dataset of Irish households collected by IEMs and related socioeconomic data. Cloud4NFICA is incremental in nature, hence accommodates the influx of new data. Cloud4NFICA was designed as an infrastructure as a service. It is visible from the study that the developed system performs well on the scalability aspect.

APA, Harvard, Vancouver, ISO, and other styles

3

NamAnh, Dao. "Segmentation by Incremental Clustering." International Journal of Computer Applications 111, no. 12 (2015): 23–30. http://dx.doi.org/10.5120/19591-1360.

Full text

APA, Harvard, Vancouver, ISO, and other styles

4

Vijaya Saradhi, V., and P. Charly Abraham. "Incremental maximum margin clustering." Pattern Analysis and Applications 19, no. 4 (2015): 1057–67. http://dx.doi.org/10.1007/s10044-015-0447-5.

Full text

APA, Harvard, Vancouver, ISO, and other styles

5

LIU, YONGLI, YUANXIN OUYANG, and ZHANG XIONG. "INCREMENTAL CLUSTERING USING INFORMATION BOTTLENECK THEORY." International Journal of Pattern Recognition and Artificial Intelligence 25, no. 05 (2011): 695–712. http://dx.doi.org/10.1142/s0218001411008622.

Full text

Abstract:

Document clustering is one of the most effective techniques to organize documents in an unsupervised manner. In this paper, an Incremental method for document Clustering based on Information Bottleneck theory (ICIB) is presented. The ICIB is designed to improve the accuracy and efficiency of document clustering, and resolve the issue that an arbitrary choice of document similarity measure could produce an inaccurate clustering result. In our approach, document similarity is calculated using information bottleneck theory and documents are grouped incrementally. A first document is selected randomly and classified as one cluster, then each remaining document is processed incrementally according to the mutual information loss introduced by the merger of the document and each existing cluster. If the minimum value of mutual information loss is below a certain threshold, the document will be added to its closest cluster; otherwise it will be classified as a new cluster. The incremental clustering process is low-precision and order-dependent, which cannot guarantee accurate clustering results. Therefore, an improved sequential clustering algorithm (SIB) is proposed to adjust the intermediate clustering results. In order to test the effectiveness of ICIB method, ten independent document subsets are constructed based on the 20NewsGroup and Reuters-21578 corpora. Experimental results show that our ICIB method achieves higher accuracy and time performance than K-Means, AIB and SIB algorithms.

APA, Harvard, Vancouver, ISO, and other styles

6

Kettani, Omar, and Faical Ramdani. "FICA: Fast Incremental Clustering Algorithm." International Journal of Computer Applications 179, no. 33 (2018): 35–38. http://dx.doi.org/10.5120/ijca2018916747.

Full text

APA, Harvard, Vancouver, ISO, and other styles

7

Pradeep, Lanka, and A. M. Sowjanya. "Multi-Density based Incremental Clustering." International Journal of Computer Applications 116, no. 17 (2015): 6–9. http://dx.doi.org/10.5120/20426-2742.

Full text

APA, Harvard, Vancouver, ISO, and other styles

8

Azzopardi, Joel, and Christopher Staff. "Incremental Clustering of News Reports." Algorithms 5, no. 3 (2012): 364–78. http://dx.doi.org/10.3390/a5030364.

Full text

APA, Harvard, Vancouver, ISO, and other styles

9

Dong Su Seong, Ho Sung Kim, and Kyu Ho Park. "Incremental clustering of attributed graphs." IEEE Transactions on Systems, Man, and Cybernetics 23, no. 5 (1993): 1399–411. http://dx.doi.org/10.1109/21.260671.

Full text

APA, Harvard, Vancouver, ISO, and other styles

10

Tan, Qingzhao, and Prasenjit Mitra. "Clustering-based incremental web crawling." ACM Transactions on Information Systems 28, no. 4 (2010): 1–27. http://dx.doi.org/10.1145/1852102.1852103.

Full text

APA, Harvard, Vancouver, ISO, and other styles

More sources

Dissertations / Theses on the topic "Incremental Clustering"

1

Khy, Sophoin, Yoshiharu Ishikawa, and Hiroyuki Kitagawa. "Novelty-based Incremental Document Clustering for On-line Documents." IEEE, 2006. http://hdl.handle.net/2237/7520.

Full text

APA, Harvard, Vancouver, ISO, and other styles

2

Bigdeli, Elnaz. "Incremental Anomaly Detection Using Two-Layer Cluster-based Structure." Thesis, Université d'Ottawa / University of Ottawa, 2016. http://hdl.handle.net/10393/34299.

Full text

Abstract:

Anomaly detection algorithms face several challenges, including processing speed and dealing with noise in data. In this thesis, a two-layer cluster- based anomaly detection structure is presented which is fast, noise-resilient and incremental. In this structure, each normal pattern is considered as a cluster, and each cluster is represented using a Gaussian Mixture Model (GMM). Then, new instances are presented to the GMM to be labeled as normal or abnormal. The proposed structure comprises three main steps. In the first step, the data are clustered. The second step is to represent each cluster in a way that enables the model to classify new instances. The Summarization based on Gaussian Mixture Model (SGMM) proposed in this thesis represents each cluster as a GMM. In the third step, a two-layer structure efficiently updates clusters using GMM representation while detecting and ignoring redundant instances. A new approach, called Collective Probabilistic Labeling (CPL) is presented to update clusters in a batch mode. This approach makes the updating phase noise-resistant and fast. The collective approach also introduces a new concept called 'rag bag' used to store new instances. The new instances collected in the rag bag are clustered and summarized by GMMs. This enables online systems to identify nearby clusters in the existing and new clusters, and merge them quickly, despite the presence of noise to update the model. An important step in the updating is the merging of new clusters with ex- isting ones. To this end, a new distance measure is proposed, which is a mod- i ed Kullback-Leibler distance between two GMMs. This modi ed distance allows accurate identi cation of nearby clusters. After finding neighboring clusters, they are merged, quickly and accurately. One of the reasons that GMM is chosen to represent clusters is to have a clear and valid mathematical representation for clusters, which eases further cluster analysis. In most real-time anomaly detection applications, incoming instances are often similar to previous ones. In these cases, there is no need to update clusters based on duplicates, since they have already been modeled in the cluster distribution. The two-layer structure is responsible for identifying redundant instances. In this structure, redundant instance are ignored, and the remaining new instances are used to update clusters. Ignoring redundant instances, which are typically in the majority, makes the detection phase fast. Each part of the general structure is validated in this thesis. The experiments include, detection rates, clustering goodness, time, memory usage and the complexity of the algorithms. The accuracy of the clustering and summarization of clusters using GMMs is evaluated, and compared to that of other methods. Using Davies-Bouldin (DB) and Dunn indexes, the distances for original and regenerated clusters using GMMs is almost zero with SGMM method while this value for ABACUS is around 0:01. Moreover, the results show that the SGMM algorithm is 3 times faster than ABACUS in running time, using one-third of the memory used by ABACUS. The CPL method, used to label new instances, is found to collectively remove the effect of noise, while increasing the accuracy of labeling new instances. In a noisy environment, the detection rate of the CPL method is 5% higher than other algorithms such as one-class SVM. The false alarm rate is decreased by 10% on average. Memory use is 20 times lesser that that of the one-class SVM. The proposed method is found to lower the false alarm rate, which is one of the basic problems for the one-class SVM. Experiments show the false alarm rate is decreased from 5% to 15% among different datasets, while the detection rate is increased from 5% to 10% in di erent datasets with two- layer structure. The memory usage for the two-layer structure is 20 to 50 times less than that of one-class SVM. One-class SVM uses support vectors in labeling new instances, while the labeling of the two-layer structure depends on the number of GMMs. The experiments show that the two-layer structure is 20 to 50 times faster than the one-class SVM in labeling new instances. Moreover, the updating time of two-layer structure is 2 to 3 times less than one-layer structure. This reduction is the direct result of ignoring redundant instances and using two-layer structure.

APA, Harvard, Vancouver, ISO, and other styles

3

Keysermann, Matthias Ulrich. "An incremental clustering and associative learning architecture for intelligent robotics." Thesis, Heriot-Watt University, 2015. http://hdl.handle.net/10399/2961.

Full text

Abstract:

The ability to learn from the environment and memorise the acquired knowledge is essential for robots to become autonomous and versatile artificial companions. This thesis proposes a novel learning and memory architecture for robots, which performs associative learning and recall of sensory and actuator patterns. The approach avoids the inclusion of task-specific expert knowledge and can deal with any kind of multi-dimensional real-valued data, apart from being tolerant to noise and supporting incremental learning. The proposed architecture integrates two machine learning methods: a topology learning algorithm that performs incremental clustering, and an associative memory model that learns relationship information based on the co-occurrence of inputs. The evaluations of both the topology learning algorithm and the associative memory model involved the memorisation of high-dimensional visual data as well as the association of symbolic data, presented simultaneously and sequentially. Moreover, the document analyses the results of two experiments in which the entire architecture was evaluated regarding its associative and incremental learning capabilities. One experiment comprised an incremental learning task with visual patterns and text labels, which was performed both in a simulated scenario and with a real robot. In a second experiment a robot learned to recognise visual patterns in the form of road signs and associated them with di erent con gurations of its arm joints. The thesis also discusses several learning-related aspects of the architecture and highlights strengths and weaknesses of the proposed approach. The developed architecture and corresponding ndings contribute to the domains of machine learning and intelligent robotics.

APA, Harvard, Vancouver, ISO, and other styles

4

Khy, Sophoin, Yoshiharu Ishikawa, and Hiroyuki Kitagawa. "A Novelty-based Clustering Method for On-line Documents." Springer, 2007. http://hdl.handle.net/2237/7739.

Full text

APA, Harvard, Vancouver, ISO, and other styles

5

Heinen, Milton Roberto. "A connectionist approach for incremental function approximation and on-line tasks." reponame:Biblioteca Digital de Teses e Dissertações da UFRGS, 2011. http://hdl.handle.net/10183/29015.

Full text

Abstract:

Este trabalho propõe uma nova abordagem conexionista, chamada de IGMN (do inglês Incremental Gaussian Mixture Network), para aproximação incremental de funções e tarefas de tempo real. Ela é inspirada em recentes teorias do cérebro, especialmente o MPF (do inglês Memory-Prediction Framework) e a Inteligência Artificial Construtivista, que fazem com que o modelo proposto possua características especiais que não estão presentes na maioria dos modelos de redes neurais existentes. Além disso, IGMN é baseado em sólidos princípios estatísticos (modelos de mistura gaussianos) e assintoticamente converge para a superfície de regressão ótima a medida que os dados de treinamento chegam. As principais vantagens do IGMN em relação a outros modelos de redes neurais são: (i) IGMN aprende instantaneamente analisando cada padrão de treinamento apenas uma vez (cada dado pode ser imediatamente utilizado e descartado); (ii) o modelo proposto produz estimativas razoáveis baseado em poucos dados de treinamento; (iii) IGMN aprende de forma contínua e perpétua a medida que novos dados de treinamento chegam (não existem fases separadas de treinamento e utilização); (iv) o modelo proposto resolve o dilema da estabilidade-plasticidade e não sofre de interferência catastrófica; (v) a topologia da rede neural é definida automaticamente e de forma incremental (novas unidades são adicionadas sempre que necessário); (vi) IGMN não é sensível às condições de inicialização (de fato IGMN não utiliza nenhuma decisão e/ou inicialização aleatória); (vii) a mesma rede neural IGMN pode ser utilizada em problemas diretos e inversos (o fluxo de informações é bidirecional) mesmo em regiões onde a função alvo tem múltiplas soluções; e (viii) IGMN fornece o nível de confiança de suas estimativas. Outra contribuição relevante desta tese é o uso do IGMN em importantes tarefas nas áreas de robótica e aprendizado de máquina, como por exemplo a identificação de modelos, a formação incremental de conceitos, o aprendizado por reforço, o mapeamento robótico e previsão de séries temporais. De fato, o poder de representação e a eficiência e do modelo proposto permitem expandir o conjunto de tarefas nas quais as redes neurais podem ser utilizadas, abrindo assim novas direções nos quais importantes contribuições do estado da arte podem ser feitas. Através de diversos experimentos, realizados utilizando o modelo proposto, é demonstrado que o IGMN é bastante robusto ao problema de overfitting, não requer um ajuste fino dos parâmetros de configuração e possui uma boa performance computacional que permite o seu uso em aplicações de controle em tempo real. Portanto pode-se afirmar que o IGMN é uma ferramenta de aprendizado de máquina bastante útil em tarefas de aprendizado incremental de funções e predição em tempo real.
This work proposes IGMN (standing for Incremental Gaussian Mixture Network), a new connectionist approach for incremental function approximation and real time tasks. It is inspired on recent theories about the brain, specially the Memory-Prediction Framework and the Constructivist Artificial Intelligence, which endows it with some unique features that are not present in most ANN models such as MLP, RBF and GRNN. Moreover, IGMN is based on strong statistical principles (Gaussian mixture models) and asymptotically converges to the optimal regression surface as more training data arrive. The main advantages of IGMN over other ANN models are: (i) IGMN learns incrementally using a single scan over the training data (each training pattern can be immediately used and discarded); (ii) it can produce reasonable estimates based on few training data; (iii) the learning process can proceed perpetually as new training data arrive (there is no separate phases for leaning and recalling); (iv) IGMN can handle the stability-plasticity dilemma and does not suffer from catastrophic interference; (v) the neural network topology is defined automatically and incrementally (new units added whenever is necessary); (vi) IGMN is not sensible to initialization conditions (in fact there is no random initialization/ decision in IGMN); (vii) the same neural network can be used to solve both forward and inverse problems (the information flow is bidirectional) even in regions where the target data are multi-valued; and (viii) IGMN can provide the confidence levels of its estimates. Another relevant contribution of this thesis is the use of IGMN in some important state-of-the-art machine learning and robotic tasks such as model identification, incremental concept formation, reinforcement learning, robotic mapping and time series prediction. In fact, the efficiency of IGMN and its representational power expand the set of potential tasks in which the neural networks can be applied, thus opening new research directions in which important contributions can be made. Through several experiments using the proposed model it is demonstrated that IGMN is also robust to overfitting, does not require fine-tunning of its configuration parameters and has a very good computational performance, thus allowing its use in real time control applications. Therefore, IGMN is a very useful machine learning tool for incremental function approximation and on-line prediction.

APA, Harvard, Vancouver, ISO, and other styles

6

Li, Yanrong. "Techniques for improving clustering and association rules mining from very large transactional databases." Thesis, Curtin University, 2009. http://hdl.handle.net/20.500.11937/907.

Full text

Abstract:

Clustering and association rules mining are two core data mining tasks that have been actively studied by data mining community for nearly two decades. Though many clustering and association rules mining algorithms have been developed, no algorithm is better than others on all aspects, such as accuracy, efficiency, scalability, adaptability and memory usage. While more efficient and effective algorithms need to be developed for handling the large-scale and complex stored datasets, emerging applications where data takes the form of streams pose new challenges for the data mining community. The existing techniques and algorithms for static stored databases cannot be applied to the data streams directly. They need to be extended or modified, or new methods need to be developed to process the data streams.In this thesis, algorithms have been developed for improving efficiency and accuracy of clustering and association rules mining on very large, high dimensional, high cardinality, sparse transactional databases and data streams.A new similarity measure suitable for clustering transactional data is defined and an incremental clustering algorithm, INCLUS, is proposed using this similarity measure. The algorithm only scans the database once and produces clusters based on the user’s expectations of similarities between transactions in a cluster, which is controlled by the user input parameters, a similarity threshold and a support threshold. Intensive testing has been performed to evaluate the effectiveness, efficiency, scalability and order insensitiveness of the algorithm.To extend INCLUS for transactional data streams, an equal-width time window model and an elastic time window model are proposed that allow mining of clustering changes in evolving data streams. The minimal width of the window is determined by the minimum clustering granularity for a particular application. Two algorithms, CluStream_EQ and CluStream_EL, based on the equal-width window model and the elastic window model respectively, are developed by incorporating these models into INCLUS. Each algorithm consists of an online micro-clustering component and an offline macro-clustering component. The online component writes summary statistics of a data stream to the disk, and the offline components uses those summaries and other user input to discover changes in a data stream. The effectiveness and scalability of the algorithms are evaluated by experiments.This thesis also looks into sampling techniques that can improve efficiency of mining association rules in a very large transactional database. The sample size is derived based on the binomial distribution and central limit theorem. The sample size used is smaller than that based on Chernoff Bounds, but still provides the same approximation guarantees. The accuracy of the proposed sampling approach is theoretically analyzed and its effectiveness is experimentally evaluated on both dense and sparse datasets.Applications of stratified sampling for association rules mining is also explored in this thesis. The database is first partitioned into strata based on the length of transactions, and simple random sampling is then performed on each stratum. The total sample size is determined by a formula derived in this thesis and the sample size for each stratum is proportionate to the size of the stratum. The accuracy of transaction size based stratified sampling is experimentally compared with that of random sampling.The thesis concludes with a summary of significant contributions and some pointers for further work.

APA, Harvard, Vancouver, ISO, and other styles

7

Mitchell, Logan Adam. "INCREMENT - Interactive Cluster Refinement." BYU ScholarsArchive, 2016. https://scholarsarchive.byu.edu/etd/5795.

Full text

Abstract:

We present INCREMENT, a cluster refinement algorithm which utilizes user feedback to refine clusterings. INCREMENT is capable of improving clusterings produced by arbitrary clustering algorithms. The initial clustering provided is first sub-clustered to improve query efficiency. A small set of select instances from each of these sub-clusters are presented to a user for labelling. Utilizing the user feedback, INCREMENT trains a feature embedder to map the input features to a new feature space. This space is learned such that spatial distance is inversely correlated with semantic similarity, determined from the user feedback. A final clustering is then formed in the embedded space. INCREMENT is tested on 9 datasets initially clustered with 4 distinct clustering algorithms. INCREMENT improved the accuracy of 71% of the initial clusterings with respect to a target clustering. For all the experiments the median percent improvement is 27.3% for V-Measure and is 6.08% for accuracy.

APA, Harvard, Vancouver, ISO, and other styles

8

Moscoso, Sotelo Kevin Vincent John, and Saenz Oscar Manuel Torre. "Diseño de aplicación de realidad virtual para la promoción del turismo e incremento de la intención de visita de turistas a Perú." Bachelor's thesis, Universidad Peruana de Ciencias Aplicadas (UPC), 2020. http://hdl.handle.net/10757/653957.

Full text

Abstract:

El presente proyecto tiene como objetivo el diseño de un juego serio de realidad virtual para ayudar a la difusión de la historia y patrimonio del Perú y de esta manera apoyar el crecimiento del turismo. Para lograr esto, queremos motivar e incentivar a potenciales turistas a que visiten el Perú a través de un juego que será desarrollado para smartphones y requerirá el uso de gafas de realidad virtual. Dentro del juego, el jugador podrá visitar 3 zonas turísticas donde tendrá que recolectar las piezas de un rompecabezas antes de que se acabe el tiempo y se le recompensará con un video, imágenes y/o un dato curioso sobre la zona. El juego se diseñó así para aprovechar la inmersión ofrecida por la realidad virtual, las técnicas de gamificación y la exposición de los usuarios a información presentada de manera atractiva y entretenida de los lugares turísticos. Adicionalmente, el juego contará con un sistema que recomendará la mejor zona turística para un determinado usuario usando la técnica clustering de machine learning. Al finalizar el juego, se le pedirá al jugador que llene un cuestionario con preguntas para determinar si su intención de visita ha aumentado o se ha mantenido igual. Por último, se pudo confirmar con el respaldo de varios papers que la realidad virtual y el hecho de aprender sobre nuevos destinos turísticos, incrementa el deseo de visitar y conocer dichos lugares.
In this project we aim to design a serious virtual reality game to help the dissemination of the history and heritage of Perú, and support in this way the growth of tourism. To achieve this, we want to motivate and encourage potential tourists to visit Perú through a game that will be developed for smartphones and will require the use of virtual reality headsets. Inside the game, the player will be able to visit 3 tourist destinations where they will have to collect the pieces of a puzzle before time runs out and they will be rewarded with a video, images and/or an interesting fact about the destination. The game was designed to take advantage of the immersion offered by virtual reality, gamification techniques and the exposure of users to information presented in an attractive and entertaining way about the tourist places. Additionally, the game will have a system that will be able to recommend the best tourist place for a specific player using the clustering technique from machine learning. At the end of the game, the player will be asked to fill out a questionnaire with questions to determine if their visiting intention has increased or remained the same. Finally, it was confirmed with the support of several papers that virtual reality and learning about new tourist destinations, increases the desire to visit and get to know these places.
Tesis

APA, Harvard, Vancouver, ISO, and other styles

9

Huang, Chiao-Wei, and 黃僑偉. "Incremental Clustering Malware from Honeypots." Thesis, 2013. http://ndltd.ncl.edu.tw/handle/94716365525085078382.

Full text

Abstract:

碩士
國立中山大學
資訊管理學系研究所
101
In recent years, cybercriminals use new malware or variants in order to effectively evade inspection from security mechanisms. The honeypot is able to capture the malware cybercriminals are using. With the increasing number of captured malware from honeypots, if IT security people can’t distinguish old, variant or new malware in order to further analysis, government organizations and enterprises can’t prevent for new types attack model quickly. Although today there are many scholars propose a lot of researches to analyze malware, most of them focus on single file type of malware. It is not suitable the honeypot malware that are mostly mixed with source code and binary files. Therefore, it still lacks an effective and quick analysis tool for the honeypot malware. We propose honeypot malware analysis system combining source files and binary files. We use the syntax structure of source code files, the image vector of binary files, file name and file structure as our features to measure malware similarity. We adopt incremental clustering as our clustering algorithm to quickly classify the old known malware and new types of malware. After several experimental evaluations, our system can effectively and quickly cluster honeypot malware. Finally, we also compare the performance with virustotal and other researches, and the result confirms that our system can achieve better clustering efficiency.

APA, Harvard, Vancouver, ISO, and other styles

10

"Incremental document clustering for web page classification." 2000. http://library.cuhk.edu.hk/record=b5890417.

Full text

Abstract:

by Wong, Wai-Chiu.
Thesis (M.Phil.)--Chinese University of Hong Kong, 2000.
Includes bibliographical references (leaves 89-94).
Abstracts in English and Chinese.
Abstract --- p.ii
Acknowledgments --- p.iv
Chapter 1 --- Introduction --- p.1
Chapter 1.1 --- Document Clustering --- p.2
Chapter 1.2 --- DC-tree --- p.4
Chapter 1.3 --- Feature Extraction --- p.5
Chapter 1.4 --- Outline of the Thesis --- p.5
Chapter 2 --- Related Work --- p.8
Chapter 2.1 --- Clustering Algorithms --- p.8
Chapter 2.1.1 --- Partitional Clustering Algorithms --- p.8
Chapter 2.1.2 --- Hierarchical Clustering Algorithms --- p.10
Chapter 2.2 --- Document Classification by Examples --- p.11
Chapter 2.2.1 --- k-NN algorithm - Expert Network (ExpNet) --- p.11
Chapter 2.2.2 --- Learning Linear Text Classifier --- p.12
Chapter 2.2.3 --- Generalized Instance Set (GIS) algorithm --- p.12
Chapter 2.3 --- Document Clustering --- p.13
Chapter 2.3.1 --- B+-tree-based Document Clustering --- p.13
Chapter 2.3.2 --- Suffix Tree Clustering --- p.14
Chapter 2.3.3 --- Association Rule Hypergraph Partitioning Algorithm --- p.15
Chapter 2.3.4 --- Principal Component Divisive Partitioning --- p.17
Chapter 2.4 --- Projections for Efficient Document Clustering --- p.18
Chapter 3 --- Background --- p.21
Chapter 3.1 --- Document Preprocessing --- p.21
Chapter 3.1.1 --- Elimination of Stopwords --- p.22
Chapter 3.1.2 --- Stemming Technique --- p.22
Chapter 3.2 --- Problem Modeling --- p.23
Chapter 3.2.1 --- Basic Concepts --- p.23
Chapter 3.2.2 --- Vector Model --- p.24
Chapter 3.3 --- Feature Selection Scheme --- p.25
Chapter 3.4 --- Similarity Model --- p.27
Chapter 3.5 --- Evaluation Techniques --- p.29
Chapter 4 --- Feature Extraction and Weighting --- p.31
Chapter 4.1 --- Statistical Analysis of the Words in the Web Domain --- p.31
Chapter 4.2 --- Zipf's Law --- p.33
Chapter 4.3 --- Traditional Methods --- p.36
Chapter 4.4 --- The Proposed Method --- p.38
Chapter 4.5 --- Experimental Results --- p.40
Chapter 4.5.1 --- Synthetic Data Generation --- p.40
Chapter 4.5.2 --- Real Data Source --- p.41
Chapter 4.5.3 --- Coverage --- p.41
Chapter 4.5.4 --- Clustering Quality --- p.43
Chapter 4.5.5 --- Binary Weight vs Numerical Weight --- p.45
Chapter 5 --- Web Document Clustering Using DC-tree --- p.48
Chapter 5.1 --- Document Representation --- p.48
Chapter 5.2 --- Document Cluster (DC) --- p.49
Chapter 5.3 --- DC-tree --- p.52
Chapter 5.3.1 --- Tree Definition --- p.52
Chapter 5.3.2 --- Insertion --- p.54
Chapter 5.3.3 --- Node Splitting --- p.55
Chapter 5.3.4 --- Deletion and Node Merging --- p.56
Chapter 5.4 --- The Overall Strategy --- p.57
Chapter 5.4.1 --- Preprocessing --- p.57
Chapter 5.4.2 --- Building DC-tree --- p.59
Chapter 5.4.3 --- Identifying the Interesting Clusters --- p.60
Chapter 5.5 --- Experimental Results --- p.61
Chapter 5.5.1 --- Alternative Similarity Measurement : Synthetic Data --- p.61
Chapter 5.5.2 --- DC-tree Characteristics : Synthetic Data --- p.63
Chapter 5.5.3 --- Compare DC-tree and B+-tree: Synthetic Data --- p.64
Chapter 5.5.4 --- Compare DC-tree and B+-tree: Real Data --- p.66
Chapter 5.5.5 --- Varying the Number of Features : Synthetic Data --- p.67
Chapter 5.5.6 --- Non-Correlated Topic Web Page Collection: Real Data --- p.69
Chapter 5.5.7 --- Correlated Topic Web Page Collection: Real Data --- p.71
Chapter 5.5.8 --- Incremental updates on Real Data Set --- p.72
Chapter 5.5.9 --- Comparison with the other clustering algorithms --- p.73
Chapter 6 --- Conclusion --- p.75
Appendix --- p.77
Chapter A --- Stopword List --- p.77
Chapter B --- Porter's Stemming Algorithm --- p.81
Chapter C --- Insertion Algorithm --- p.83
Chapter D --- Node Splitting Algorithm --- p.85
Chapter E --- Features Extracted in Experiment 4.53 --- p.87
Bibliography --- p.88

APA, Harvard, Vancouver, ISO, and other styles

More sources

Books on the topic "Incremental Clustering"

1

Chakraborty, Sanjay, Sk Hafizul Islam, and Debabrata Samanta. Data Classification and Incremental Clustering in Data Mining and Machine Learning. Springer International Publishing, 2022. http://dx.doi.org/10.1007/978-3-030-93088-2.

Full text

APA, Harvard, Vancouver, ISO, and other styles

2

Data Classification and Incremental Clustering in Data Mining and Machine Learning. Springer International Publishing AG, 2022.

APA, Harvard, Vancouver, ISO, and other styles

Book chapters on the topic "Incremental Clustering"

1

M. Bagirov, Adil, Napsu Karmitsa, and Sona Taheri. "Incremental Clustering Algorithms." In Unsupervised and Semi-Supervised Learning. Springer International Publishing, 2020. http://dx.doi.org/10.1007/978-3-030-37826-4_7.

Full text

APA, Harvard, Vancouver, ISO, and other styles

2

Bouchachia, Abdelhamid, and Markus Prossegger. "Incremental Spectral Clustering." In Learning in Non-Stationary Environments. Springer New York, 2012. http://dx.doi.org/10.1007/978-1-4419-8020-5_4.

Full text

APA, Harvard, Vancouver, ISO, and other styles

3

Li, Zhenhui, Jae-Gil Lee, Xiaolei Li, and Jiawei Han. "Incremental Clustering for Trajectories." In Database Systems for Advanced Applications. Springer Berlin Heidelberg, 2010. http://dx.doi.org/10.1007/978-3-642-12098-5_3.

Full text

APA, Harvard, Vancouver, ISO, and other styles

4

He, Ping, Tianyu Jing, Xiaohua Xu, Huihui Lin, Zheng Liao, and Baichuan Fan. "Incremental Constrained Random Walk Clustering." In Advances in Intelligent Systems and Computing. Springer Singapore, 2018. http://dx.doi.org/10.1007/978-981-13-0344-9_21.

Full text

APA, Harvard, Vancouver, ISO, and other styles

5

Hennig, Sascha, and Michael Wurst. "Incremental Clustering of Newsgroup Articles." In Advances in Applied Artificial Intelligence. Springer Berlin Heidelberg, 2006. http://dx.doi.org/10.1007/11779568_37.

Full text

APA, Harvard, Vancouver, ISO, and other styles

6

Patino, Luis, François Bremond, and Monique Thonnat. "Incremental Learning on Trajectory Clustering." In Innovations in Defence Support Systems – 3. Springer Berlin Heidelberg, 2011. http://dx.doi.org/10.1007/978-3-642-18278-5_3.

Full text

APA, Harvard, Vancouver, ISO, and other styles

7

Chakraborty, Sanjay, SK Hafizul Islam, and Debabrata Samanta. "Research Intention Towards Incremental Clustering." In Data Classification and Incremental Clustering in Data Mining and Machine Learning. Springer International Publishing, 2022. http://dx.doi.org/10.1007/978-3-030-93088-2_5.

Full text

APA, Harvard, Vancouver, ISO, and other styles

8

Bresson, Xavier, Huiyi Hu, Thomas Laurent, Arthur Szlam, and James von Brecht. "An Incremental Reseeding Strategy for Clustering." In Mathematics and Visualization. Springer International Publishing, 2018. http://dx.doi.org/10.1007/978-3-319-91274-5_9.

Full text

APA, Harvard, Vancouver, ISO, and other styles

9

Lin, Jessica, Michail Vlachos, Eamonn Keogh, and Dimitrios Gunopulos. "Iterative Incremental Clustering of Time Series." In Advances in Database Technology - EDBT 2004. Springer Berlin Heidelberg, 2004. http://dx.doi.org/10.1007/978-3-540-24741-8_8.

Full text

APA, Harvard, Vancouver, ISO, and other styles

10

Liu, Bo, Jiuhui Pan, and R. I. (Bob) McKay. "Incremental Clustering Based on Swarm Intelligence." In Lecture Notes in Computer Science. Springer Berlin Heidelberg, 2006. http://dx.doi.org/10.1007/11903697_25.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Conference papers on the topic "Incremental Clustering"

1

Narita, Kakeru, Teruhisa Hochin, and Hiroki Nomiya. "Incremental Clustering for Hierarchical Clustering." In 2018 5th International Conference on Computational Science/Intelligence and Applied Informatics (CSII). IEEE, 2018. http://dx.doi.org/10.1109/csii.2018.00025.

Full text

APA, Harvard, Vancouver, ISO, and other styles

2

Chemchem, A., Y. Djenouri, and H. Drias. "Incremental induction rules clustering." In 2013 8th InternationalWorkshop on Systems, Signal Processing and their Applications (WoSSPA). IEEE, 2013. http://dx.doi.org/10.1109/wosspa.2013.6602413.

Full text

APA, Harvard, Vancouver, ISO, and other styles

3

Wang, Chang-Dong, Jian-Huang Lai, and Dong Huang. "Incremental Support Vector Clustering." In 2011 IEEE International Conference on Data Mining Workshops (ICDMW). IEEE, 2011. http://dx.doi.org/10.1109/icdmw.2011.100.

Full text

APA, Harvard, Vancouver, ISO, and other styles

4

Liu, Yongli, Qianqian Guo, Lishen Yang, and Yingying Li. "Research on incremental clustering." In 2012 2nd International Conference on Consumer Electronics, Communications and Networks (CECNet). IEEE, 2012. http://dx.doi.org/10.1109/cecnet.2012.6202079.

Full text

APA, Harvard, Vancouver, ISO, and other styles

5

Bettoumi, Safa. "Incremental Multi-view Clustering." In 2022 2nd International Conference on Computers and Automation (CompAuto). IEEE, 2022. http://dx.doi.org/10.1109/compauto55930.2022.00033.

Full text

APA, Harvard, Vancouver, ISO, and other styles

6

Davidson, Ian, S. S. Ravi, and Martin Ester. "Efficient incremental constrained clustering." In the 13th ACM SIGKDD international conference. ACM Press, 2007. http://dx.doi.org/10.1145/1281192.1281221.

Full text

APA, Harvard, Vancouver, ISO, and other styles

7

Elnekave, Sigal, Mark Last, and Oded Maimon. "Incremental Clustering of Mobile Objects." In 2007 IEEE 23rd International Conference on Data Engineering Workshop. IEEE, 2007. http://dx.doi.org/10.1109/icdew.2007.4401044.

Full text

APA, Harvard, Vancouver, ISO, and other styles

8

Nentwig, Markus, and Erhard Rahm. "Incremental Clustering on Linked Data." In 2018 IEEE International Conference on Data Mining Workshops (ICDMW). IEEE, 2018. http://dx.doi.org/10.1109/icdmw.2018.00084.

Full text

APA, Harvard, Vancouver, ISO, and other styles

9

Aaron, Bryant, Dan E. Tamir, Naphtali D. Rishe, and Abraham Kandel. "Dynamic Incremental K-means Clustering." In 2014 International Conference on Computational Science and Computational Intelligence (CSCI). IEEE, 2014. http://dx.doi.org/10.1109/csci.2014.60.

Full text

APA, Harvard, Vancouver, ISO, and other styles

10

John, Johney, and S. Asharaf. "Incremental multi-document summarization: An incremental clustering based approach." In 2014 International Conference on Data Science & Engineering (ICDSE). IEEE, 2014. http://dx.doi.org/10.1109/icdse.2014.6974625.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Reports on the topic "Incremental Clustering"

1

Fraley, Chris, Adrian Raftery, and Ron Wehrensy. Incremental Model-Based Clustering for Large Datasets With Small Clusters. Defense Technical Information Center, 2003. http://dx.doi.org/10.21236/ada459790.

Full text

APA, Harvard, Vancouver, ISO, and other styles

2

Engel, Bernard, Yael Edan, James Simon, Hanoch Pasternak, and Shimon Edelman. Neural Networks for Quality Sorting of Agricultural Produce. United States Department of Agriculture, 1996. http://dx.doi.org/10.32747/1996.7613033.bard.

Full text

Abstract:

The objectives of this project were to develop procedures and models, based on neural networks, for quality sorting of agricultural produce. Two research teams, one in Purdue University and the other in Israel, coordinated their research efforts on different aspects of each objective utilizing both melons and tomatoes as case studies. At Purdue: An expert system was developed to measure variances in human grading. Data were acquired from eight sensors: vision, two firmness sensors (destructive and nondestructive), chlorophyll from fluorescence, color sensor, electronic sniffer for odor detection, refractometer and a scale (mass). Data were analyzed and provided input for five classification models. Chlorophyll from fluorescence was found to give the best estimation for ripeness stage while the combination of machine vision and firmness from impact performed best for quality sorting. A new algorithm was developed to estimate and minimize training size for supervised classification. A new criteria was established to choose a training set such that a recurrent auto-associative memory neural network is stabilized. Moreover, this method provides for rapid and accurate updating of the classifier over growing seasons, production environments and cultivars. Different classification approaches (parametric and non-parametric) for grading were examined. Statistical methods were found to be as accurate as neural networks in grading. Classification models by voting did not enhance the classification significantly. A hybrid model that incorporated heuristic rules and either a numerical classifier or neural network was found to be superior in classification accuracy with half the required processing of solely the numerical classifier or neural network. In Israel: A multi-sensing approach utilizing non-destructive sensors was developed. Shape, color, stem identification, surface defects and bruises were measured using a color image processing system. Flavor parameters (sugar, acidity, volatiles) and ripeness were measured using a near-infrared system and an electronic sniffer. Mechanical properties were measured using three sensors: drop impact, resonance frequency and cyclic deformation. Classification algorithms for quality sorting of fruit based on multi-sensory data were developed and implemented. The algorithms included a dynamic artificial neural network, a back propagation neural network and multiple linear regression. Results indicated that classification based on multiple sensors may be applied in real-time sorting and can improve overall classification. Advanced image processing algorithms were developed for shape determination, bruise and stem identification and general color and color homogeneity. An unsupervised method was developed to extract necessary vision features. The primary advantage of the algorithms developed is their ability to learn to determine the visual quality of almost any fruit or vegetable with no need for specific modification and no a-priori knowledge. Moreover, since there is no assumption as to the type of blemish to be characterized, the algorithm is capable of distinguishing between stems and bruises. This enables sorting of fruit without knowing the fruits' orientation. A new algorithm for on-line clustering of data was developed. The algorithm's adaptability is designed to overcome some of the difficulties encountered when incrementally clustering sparse data and preserves information even with memory constraints. Large quantities of data (many images) of high dimensionality (due to multiple sensors) and new information arriving incrementally (a function of the temporal dynamics of any natural process) can now be processed. Furhermore, since the learning is done on-line, it can be implemented in real-time. The methodology developed was tested to determine external quality of tomatoes based on visual information. An improved model for color sorting which is stable and does not require recalibration for each season was developed for color determination. Excellent classification results were obtained for both color and firmness classification. Results indicted that maturity classification can be obtained using a drop-impact and a vision sensor in order to predict the storability and marketing of harvested fruits. In conclusion: We have been able to define quantitatively the critical parameters in the quality sorting and grading of both fresh market cantaloupes and tomatoes. We have been able to accomplish this using nondestructive measurements and in a manner consistent with expert human grading and in accordance with market acceptance. This research constructed and used large databases of both commodities, for comparative evaluation and optimization of expert system, statistical and/or neural network models. The models developed in this research were successfully tested, and should be applicable to a wide range of other fruits and vegetables. These findings are valuable for the development of on-line grading and sorting of agricultural produce through the incorporation of multiple measurement inputs that rapidly define quality in an automated manner, and in a manner consistent with the human graders and inspectors.

APA, Harvard, Vancouver, ISO, and other styles

We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!