Dissertations / Theses: 'Graph classification'

1

Wainer, L. J. "Online graph-based learning for classification." Thesis, University College London (University of London), 2008. http://discovery.ucl.ac.uk/1446151/.

Full text

Abstract:

The aim of this thesis is to develop online kernel based algorithms for learning clas sification functions over a graph. An important question in machine learning is: how to learn functions in a high dimension One of the benefits of using a graphical representation of data is that it can provide a dimensionality reduction of the data to the number of nodes plus edges in the graph. Graphs are useful discrete repre sentations of data that have already been used successfully to incorporate structural information in data to aid in semi-supervised learning techniques. In this thesis, an online learning framework is used to provide guarantees on performance of the algo rithms developed. The first step in developing these algorithms required motivating the idea of a "natural" kernel defined on a graph. This natural kernel turns out to be the Laplacian operator associated with the graph. The next step was to look at a well known online algorithm - the perceptron algorithm - with the associated bound, and formulate it for online learning with this kernel. This was a matter of using the Laplacian kernel with the kernel perceptron algorithm. For a binary classification problem, the bound on the performance of this algorithm can be interpreted in terms of natural properties of the graph, such as the graph diameter. Further algorithms were developed, motivated by the idea of a series of alternate projections, which also share this bound interpretation. The minimum norm interpolation algorithm was developed in batch mode and then transformed into an online algorithm. These al gorithms were tested and compared with other proposed algorithms on toy and real data sets. The main comparison algorithm used was k-nearest neighbour along the graph. Once the kernel has been calculated, the new algorithms perform well and offer some advantages over other approaches in terms of computational complexity.

APA, Harvard, Vancouver, ISO, and other styles

2

Saldanha, Richard A. "Graph-theoretic methods in discrimination and classification." Thesis, University of Oxford, 1998. https://ora.ox.ac.uk/objects/uuid:3a06dee1-00e9-4b56-be8e-e991a570ced6.

Full text

Abstract:

This thesis is concerned with the graphical modelling of multivariate data. The main aim of graphical modelling is to provide an easy to understand visual representation of, often complex, data relationships by fitting graphs to data. The graphs consist of nodes denoting random variables and connecting lines or edges are used to depict variable dependencies. Equivalently, the absence of particular edges in a graph describe conditional independencies between random variables. The resulting structure is called a conditional independence graph. The use of conditional independence graphs as a guide to discrete (mainly binary), normal and mixed conditional Gaussian model building is described. The problem of parameter estimation in fitting conditional Gaussian models is considered. A FORTRAN 77 program called CGM is developed and used to fit conditional Gaussian models. Submodel specification, model selection criteria and goodness-of-fit are explored. A procedure for discriminating between groups is constructed using fitted conditional Gaussian models. A Bayesian classification procedure is considered and is used to compute posterior classification probabilities. Standard bias-correcting error rates are used to test the performance of estimated classification rules. The graph-theoretic methodology described in this thesis is applied to a Scandinavian study of intrauterine foetal growth retardation also known as a small-for-gestational age (SGA) birth. Possible pre-pregnancy risk factors associated with SGA births are investigated using conditional independence graphs and an attempt is made to classify SGA births using fitted conditional Gaussian models.

APA, Harvard, Vancouver, ISO, and other styles

3

Ketkar, Nikhil S. "Empirical comparison of graph classification and regression algorithms." Pullman, Wash. : Washington State University, 2009. http://www.dissertations.wsu.edu/Dissertations/Spring2009/n_ketkar_042409.pdf.

Full text

Abstract:

Thesis (Ph. D.)--Washington State University, May 2009.
Title from PDF title page (viewed on June 3, 2009). "School of Electrical Engineering and Computer Science." Includes bibliographical references (p. 101-108).

APA, Harvard, Vancouver, ISO, and other styles

4

Ferrer, Sumsi Miquel. "Theory and Algorithms on the Median Graph. Application to Graph-based Classification and Clustering." Doctoral thesis, Universitat Autònoma de Barcelona, 2008. http://hdl.handle.net/10803/5788.

Full text

Abstract:

Donat un conjunt d'objectes, el concepte genèric de mediana està deﬁnit com l'objecte amb la suma de distàncies a tot el conjunt, més petita. Sovint, aquest concepte és usat per a obtenir el representant del conjunt.
En el reconeixement estructural de patrons, els grafs han estat usats normalment per a representar objectes complexos. En el domini dels grafs, el concepte de mediana és conegut com median graph. Potencialment, té les mateixes aplicacions que el concepte de mediana per poder ser usat com a representant d'un conjunt de grafs.
Tot i la seva simple deﬁnició i les potencials aplicacions, s'ha demostrat que el seu càlcul és una tasca extremadament complexa. Tots els algorismes existents només han estat capaços de treballar amb conjunts petits de grafs, i per tant, la seva aplicació ha estat limitada en molts casos a usar dades sintètiques sense signiﬁcat real. Així, tot i el seu potencial, ha restat com un concepte eminentment teòric.
L'objectiu principal d'aquesta tesi doctoral és el d'investigar a fons la teoria i l'algorísmica relacionada amb el concepte de medinan graph, amb l'objectiu ﬁnal d'extendre la seva aplicabilitat i lliurar tot el seu potencial al món de les aplicacions reals. Per això, presentem nous resultats teòrics i també nous algorismes per al seu càlcul. Des d'un punt de vista teòric aquesta tesi fa dues aportacions fonamentals. Per una banda, s'introdueix el nou concepte d'spectral median graph. Per altra banda es mostra que certes de les propietats teòriques del median graph poden ser millorades sota determinades condicions. Més enllà de les aportacioncs teòriques, proposem cinc noves alternatives per al seu càlcul. La primera d'elles és una conseqüència directa del concepte d'spectral median graph. Després, basats en les millores de les propietats teòriques, presentem dues alternatives més per a la seva obtenció. Finalment, s'introdueix una nova tècnica per al càlcul del median basat en el mapeig de grafs en espais de vectors, i es proposen dos nous algorismes més.
L'avaluació experimental dels mètodes proposats utilitzant una base de dades semi-artiﬁcial (símbols gràﬁcs) i dues amb dades reals (mollècules i pàgines web), mostra que aquests mètodes són molt més eﬁcients que els existents. A més, per primera vegada, hem demostrat que el median graph pot ser un bon representant d'un conjunt d'objectes utilitzant grans quantitats de dades. Hem dut a terme experiments de classiﬁcació i clustering que validen aquesta hipòtesi i permeten preveure una pròspera aplicació del median graph a un bon nombre d'algorismes d'aprenentatge.
Given a set of objects, the generic concept of median is deﬁned as the object with the smallest sum of distances to all the objects in the set. It has been often used as a good alternative to obtain a representative of the set.
In structural pattern recognition, graphs are normally used to represent structured objects. In the graph domain, the concept analogous to the median is known as the median graph. By extension, it has the same potential applications as the generic median in order to be used as the representative of a set of graphs.
Despite its simple deﬁnition and potential applications, its computation has been shown as an extremely complex task. All the existing algorithms can only deal with small sets of graphs, and its application has been constrained in most cases to the use of synthetic data with no real meaning. Thus, it has mainly remained in the box of the theoretical concepts.
The main objective of this work is to further investigate both the theory and the algorithmic underlying the concept of the median graph with the ﬁnal objective to extend its applicability and bring all its potential to the world of real applications. To this end, new theory and new algorithms for its computation are reported. From a theoretical point of view, this thesis makes two main contributions. On one hand, the new concept of spectral median graph. On the other hand, we show that some of the existing theoretical properties of the median graph can be improved under some speciﬁc conditions. In addition to these theoretical contributions, we propose ﬁve new ways to compute the median graph. One of them is a direct consequence of the spectral median graph concept. In addition, we provide two new algorithms based on the new theoretical properties. Finally, we present a novel technique for the median graph computation based on graph embedding into vector spaces. With this technique two more new algorithms are presented.
The experimental evaluation of the proposed methods on one semi-artiﬁcial and two real-world datasets, representing graphical symbols, molecules and webpages, shows that these methods are much more ecient than the existing ones. In addition, we have been able to proof for the ﬁrst time that the median graph can be a good representative of a class in large datasets. We have performed some classiﬁcation and clustering experiments that validate this hypothesis and permit to foresee a successful application of the median graph to a variety of machine learning algorithms.

APA, Harvard, Vancouver, ISO, and other styles

5

Childs, Liam, Zoran Nikoloski, Patrick May, and Dirk Walther. "Identification and classification of ncRNA molecules using graph properties." Universität Potsdam, 2009. http://opus.kobv.de/ubp/volltexte/2010/4519/.

Full text

Abstract:

The study of non-coding RNA genes has received increased attention in recent years fuelled by accumulating evidence that larger portions of genomes than previously acknowledged are transcribed into RNA molecules of mostly unknown function, as well as the discovery of novel non-coding RNA types and functional RNA elements. Here, we demonstrate that specific properties of graphs that represent the predicted RNA secondary structure reflect functional information. We introduce a computational algorithm and an associated web-based tool (GraPPLE) for classifying non-coding RNA molecules as functional and, furthermore, into Rfam families based on their graph properties. Unlike sequence-similarity-based methods and covariance models, GraPPLE is demonstrated to be more robust with regard to increasing sequence divergence, and when combined with existing methods, leads to a significant improvement of prediction accuracy. Furthermore, graph properties identified as most informative are shown to provide an understanding as to what particular structural features render RNA molecules functional. Thus, GraPPLE may offer a valuable computational filtering tool to identify potentially interesting RNA molecules among large candidate datasets.

APA, Harvard, Vancouver, ISO, and other styles

6

Ersahin, Kaan. "Segmentation and classification of polarimetric SAR data using spectral graph partitioning." Thesis, University of British Columbia, 2009. http://hdl.handle.net/2429/14607.

Full text

Abstract:

Polarimetric Synthetic Aperture Radar (POLSAR) data have been commercially available for the last few years, which has increased demand for its operational use in remote sensing applications. Segmentation and classification of image data are important tasks for POLSAR data analysis and interpretation, which often requires human interaction. Existing strategies for automated POLSAR data analysis have utilized the polarimetric attributes of pixels, which involve target decompositions based on physical, mathematical or statistical models. A well-established and widely-used technique is the Wishart classifier, which is used as the benchmark in this work. In this thesis, a new methodology is used that exploits both the polarimetric attributes of pixels, and the visual aspect of the image data through computer vision principles. In this process, the performance level of humans is desired, and several features or cues, inspired by perceptual organization, are utilized, i.e., patch-based similarity of intensity, contour, spatial proximity, and the polarimetric cue. The pair-wise grouping technique of Spectral Graph Partitioning (SGP) is employed to perform the segmentation and classification tasks based on graph cuts. A new classification algorithm is developed for POLSAR data, where segmentation based on the contour and spatial proximity cues is followed by classification based on the polarimetric cue (i.e., similarity of coherency matrices). It offers a way to utilize the complete polarimetric information through the coherency matrix representation in the SGP framework. The proposed unsupervised technique aims to automate the data analysis process for the mapping of distributed targets. Two fully polarimetric data sets in L-, and C-bands acquired by AIRSAR and the Convair-580, both containing agricultural fields, were used to obtain the experimental results and analysis. The results suggest quantitative and qualitative improvements over the Wishart classifier. This method is suitable for applications where homogeneity within each separated region is desirable, such as mapping crops or other types of terrain. The SGP methodology used in the developed scheme is flexible in the definition of affinity functions and will likely allow further improvements through the addition of different image features and data sources.

APA, Harvard, Vancouver, ISO, and other styles

7

Lee, Zed Heeje. "A graph representation of event intervals for efficient clustering and classification." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-281947.

Full text

Abstract:

Sequences of event intervals occur in several application domains, while their inherent complexity hinders scalable solutions to tasks such as clustering and classification. In this thesis, we propose a novel spectral embedding representation of event interval sequences that relies on bipartite graphs. More concretely, each event interval sequence is represented by a bipartite graph by following three main steps: (1) creating a hash table that can quickly convert a collection of event interval sequences into a bipartite graph representation, (2) creating and regularizing a bi-adjacency matrix corresponding to the bipartite graph, (3) defining a spectral embedding mapping on the bi-adjacency matrix. In addition, we show that substantial improvements can be achieved with regard to classification performance through pruning parameters that capture the nature of the relations formed by the event intervals. We demonstrate through extensive experimental evaluation on five real-world datasets that our approach can obtain runtime speedups of up to two orders of magnitude compared to other state-of-the-art methods and similar or better clustering and classification performance.
Sekvenser av händelsesintervall förekommer i flera applikationsdomäner, medan deras inneboende komplexitet hindrar skalbara lösningar på uppgifter som kluster och klassificering. I den här avhandlingen föreslår vi en ny spektral inbäddningsrepresentation av händelsens intervallsekvenser som förlitar sig på bipartitgrafer. Mer konkret representeras varje händelsesintervalsekvens av en bipartitgraf genom att följa tre huvudsteg: (1) skapa en hashtabell som snabbt kan konvertera en samling händelsintervalsekvenser till en bipartig grafrepresentation, (2) skapa och reglera en bi-adjacency-matris som motsvarar bipartitgrafen, (3) definiera en spektral inbäddning på bi-adjacensmatrisen. Dessutom visar vi att väsentliga förbättringar kan uppnås med avseende på klassificeringsprestanda genom beskärningsparametrar som fångar arten av relationerna som bildas av händelsesintervallen. Vi demonstrerar genom omfattande experimentell utvärdering på fem verkliga datasätt att vår strategi kan erhålla runtime-hastigheter på upp till två storlekar jämfört med andra modernaste metoder och liknande eller bättre kluster- och klassificerings- prestanda.

APA, Harvard, Vancouver, ISO, and other styles

8

Wu, Jindong. "Pooling strategies for graph convolution neural networks and their effect on classification." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-288953.

Full text

Abstract:

With the development of graph neural networks, this novel neural network has been applied in a broader and broader range of fields. One of the thorny problems researchers face in this field is selecting suitable pooling methods for a specific research task from various existing pooling methods. In this work, based on the existing mainstream graph pooling methods, we develop a benchmark neural network framework that can be used to compare these different graph pooling methods. By using the framework, we compare four mainstream graph pooling methods and explore their characteristics. Furthermore, we expand two methods for explaining neural network decisions for convolution neural networks to graph neural networks and compare them with the existing GNNExplainer. We run experiments on standard graph classification tasks using the developed framework and discuss the different pooling methods’ distinctive characteristics. Furthermore, we verify the proposed extensions of the explanation methods’ correctness and measure the agreements among the produced explanations. Finally, we explore the characteristics of different methods for explaining neural network decisions and the insights of different pooling methods by applying these explanation methods.
Med utvecklingen av grafneurala nätverk har detta nya neurala nätverk tillämpats i olika område. Ett av de svåra problemen för forskare inom detta område är hur man väljer en lämplig poolningsmetod för en specifik forskningsuppgift från en mängd befintliga poolningsmetoder. I den här arbetet, baserat på de befintliga vanliga grafpoolingsmetoderna, utvecklar vi ett riktmärke för neuralt nätverk ram som kan användas till olika diagram pooling metoders jämförelse. Genom att använda ramverket jämför vi fyra allmängiltig diagram pooling metod och utforska deras egenskaper. Dessutom utvidgar vi två metoder för att förklara beslut om neuralt nätverk från convolution neurala nätverk till diagram neurala nätverk och jämföra dem med befintliga GNNExplainer. Vi kör experiment av grafisk klassificering uppgifter under benchmarkingramverk och hittade olika egenskaper av olika diagram pooling metoder. Dessutom verifierar vi korrekthet i dessa förklarningsmetoder som vi utvecklade och mäter överenskommelserna mellan dem. Till slut, vi försöker utforska egenskaper av olika metoder för att förklara neuralt nätverks beslut och deras betydelse för att välja pooling metoder i grafisk neuralt nätverk.

APA, Harvard, Vancouver, ISO, and other styles

9

Chandra, Nagasai. "Node Classification on Relational Graphs using Deep-RGCNs." DigitalCommons@CalPoly, 2021. https://digitalcommons.calpoly.edu/theses/2265.

Full text

Abstract:

Knowledge Graphs are fascinating concepts in machine learning as they can hold usefully structured information in the form of entities and their relations. Despite the valuable applications of such graphs, most knowledge bases remain incomplete. This missing information harms downstream applications such as information retrieval and opens a window for research in statistical relational learning tasks such as node classification and link prediction. This work proposes a deep learning framework based on existing relational convolutional (R-GCN) layers to learn on highly multi-relational data characteristic of realistic knowledge graphs for node property classification tasks. We propose a deep and improved variant, Deep-RGCNs, with dense and residual skip connections between layers. These skip connections are known to be very successful with popular deep CNN-architectures such as ResNet and DenseNet. In our experiments, we investigate and compare the performance of Deep-RGCN with different baselines on multi-relational graph benchmark datasets, AIFB and MUTAG, and show how the deep architecture boosts the performance in the task of node property classification. We also study the training performance of Deep-RGCNs (with N layers) and discuss the gradient vanishing and over-smoothing problems common to deeper GCN architectures.

APA, Harvard, Vancouver, ISO, and other styles

10

Lamont, Morné Michael Connell. "Binary classification trees : a comparison with popular classification methods in statistics using different software." Thesis, Stellenbosch : Stellenbosch University, 2002. http://hdl.handle.net/10019.1/52718.

Full text

Abstract:

Thesis (MComm) -- Stellenbosch University, 2002.
ENGLISH ABSTRACT: Consider a data set with a categorical response variable and a set of explanatory variables. The response variable can have two or more categories and the explanatory variables can be numerical or categorical. This is a typical setup for a classification analysis, where we want to model the response based on the explanatory variables. Traditional statistical methods have been developed under certain assumptions such as: the explanatory variables are numeric only and! or the data follow a multivariate normal distribution. hl practice such assumptions are not always met. Different research fields generate data that have a mixed structure (categorical and numeric) and researchers are often interested using all these data in the analysis. hl recent years robust methods such as classification trees have become the substitute for traditional statistical methods when the above assumptions are violated. Classification trees are not only an effective classification method, but offer many other advantages. The aim of this thesis is to highlight the advantages of classification trees. hl the chapters that follow, the theory of and further developments on classification trees are discussed. This forms the foundation for the CART software which is discussed in Chapter 5, as well as other software in which classification tree modeling is possible. We will compare classification trees to parametric-, kernel- and k-nearest-neighbour discriminant analyses. A neural network is also compared to classification trees and finally we draw some conclusions on classification trees and its comparisons with other methods.
AFRIKAANSE OPSOMMING: Beskou 'n datastel met 'n kategoriese respons veranderlike en 'n stel verklarende veranderlikes. Die respons veranderlike kan twee of meer kategorieë hê en die verklarende veranderlikes kan numeries of kategories wees. Hierdie is 'n tipiese opset vir 'n klassifikasie analise, waar ons die respons wil modelleer deur gebruik te maak van die verklarende veranderlikes. Tradisionele statistiese metodes is ontwikkelonder sekere aannames soos: die verklarende veranderlikes is slegs numeries en! of dat die data 'n meerveranderlike normaal verdeling het. In die praktyk word daar nie altyd voldoen aan hierdie aannames nie. Verskillende navorsingsvelde genereer data wat 'n gemengde struktuur het (kategories en numeries) en navorsers wil soms al hierdie data gebruik in die analise. In die afgelope jare het robuuste metodes soos klassifikasie bome die alternatief geword vir tradisionele statistiese metodes as daar nie aan bogenoemde aannames voldoen word nie. Klassifikasie bome is nie net 'n effektiewe klassifikasie metode nie, maar bied baie meer voordele. Die doel van hierdie werkstuk is om die voordele van klassifikasie bome uit te wys. In die hoofstukke wat volg word die teorie en verdere ontwikkelinge van klassifikasie bome bespreek. Hierdie vorm die fondament vir die CART sagteware wat bespreek word in Hoofstuk 5, asook ander sagteware waarin klassifikasie boom modelering moontlik is. Ons sal klassifikasie bome vergelyk met parametriese-, "kernel"- en "k-nearest-neighbour" diskriminant analise. 'n Neurale netwerk word ook vergelyk met klassifikasie bome en ten slote word daar gevolgtrekkings gemaak oor klassifikasie bome en hoe dit vergelyk met ander metodes.

APA, Harvard, Vancouver, ISO, and other styles

11

Altun, Gulsah. "Machine Learning and Graph Theory Approaches for Classification and Prediction of Protein Structure." Digital Archive @ GSU, 2008. http://digitalarchive.gsu.edu/cs_diss/31.

Full text

Abstract:

Recently, many methods have been proposed for the classification and prediction problems in bioinformatics. One of these problems is the protein structure prediction. Machine learning approaches and new algorithms have been proposed to solve this problem. Among the machine learning approaches, Support Vector Machines (SVM) have attracted a lot of attention due to their high prediction accuracy. Since protein data consists of sequence and structural information, another most widely used approach for modeling this structured data is to use graphs. In computer science, graph theory has been widely studied; however it has only been recently applied to bioinformatics. In this work, we introduced new algorithms based on statistical methods, graph theory concepts and machine learning for the protein structure prediction problem. A new statistical method based on z-scores has been introduced for seed selection in proteins. A new method based on finding common cliques in protein data for feature selection is also introduced, which reduces noise in the data. We also introduced new binary classifiers for the prediction of structural transitions in proteins. These new binary classifiers achieve much higher accuracy results than the current traditional binary classifiers.

APA, Harvard, Vancouver, ISO, and other styles

12

Jackson, Eugenie Marie. "Explorations in the classification of vertices as good or bad." [Johnson City, Tenn. : East Tennessee State University], 2001. http://etd-submit.etsu.edu/etd/theses/available/etd-0310101-153932/unrestricted/jacksone.pdf.

Full text

APA, Harvard, Vancouver, ISO, and other styles

13

Seeland, Madeleine [Verfasser], Burkhard [Akademischer Betreuer] Rost, and Stefan [Akademischer Betreuer] Kramer. "Structural Graph Clustering: Scalable Methods and Applications for Graph Classification and Regression / Madeleine Seeland. Gutachter: Burkhard Rost ; Stefan Kramer. Betreuer: Burkhard Rost." München : Universitätsbibliothek der TU München, 2014. http://d-nb.info/1058434500/34.

Full text

APA, Harvard, Vancouver, ISO, and other styles

14

Rinke, Sebastian. "Analysis and Adaption of Graph Mapping Algorithms for Regular Graph Topologies." Master's thesis, Universitätsbibliothek Chemnitz, 2009. http://nbn-resolving.de/urn:nbn:de:bsz:ch1-200901453.

Full text

Abstract:

The Message Passing Interface (MPI) standard defines virtual topologies that can be applied to systems of cooperating processes. Among issues regarding a more convenient namespace this may be used to optimize the placement of MPI processes in order to reduce communication time. That means, the processes with their main communication paths represent a graph that has to be cost efficiently mapped onto the graph representing the actual communication network. In this context, this work analyses and compares state-of-the-art task mapping strategies with respect to running time and their quality of solutions to the MPI mapping problem. In particular, the focus is on generic strategies that can be used for arbitrary process/network topologies although, here, the topologies of interest are regular ones, where the number of processes is greater than the number of processors in the underlying physical network. Additionally, different measures of mapping quality are discussed and a close correspondence between the most appropriate, the weighted edge cut, and program execution time is shown. In order to investigate how mapping quality affects MPI program execution time, some mapping strategies have been incorporated into Open MPI. Finally, benchmark results prove that optimized process-to-processor mappings can improve program execution time by up to 60%, compared to the default mapping in many MPI implementations (linear mapping). The findings in this work can serve as reference not only for MPI implementors, but also for researchers investigating static process-to-processor mappings, in general.

APA, Harvard, Vancouver, ISO, and other styles

15

Pavuluri, Manoj Kumar. "Fuzzy decision tree classification for high-resolution satellite imagery /." free to MU campus, to others for purchase, 2003. http://wwwlib.umi.com/cr/mo/fullcit?p1418056.

Full text

APA, Harvard, Vancouver, ISO, and other styles

16

Marcusanu, Mihaela C. "The classification of l₁-embeddable fullerenes." Bowling Green State University / OhioLINK, 2007. http://rave.ohiolink.edu/etdc/view?acc_num=bgsu1180115123.

Full text

APA, Harvard, Vancouver, ISO, and other styles

17

Schenker, Adam. "Graph-Theoretic Techniques for Web Content Mining." [Tampa, Fla.] : University of South Florida, 2003. http://purl.fcla.edu/fcla/etd/SFE0000143.

Full text

APA, Harvard, Vancouver, ISO, and other styles

18

Elsner, Ulrich. "Graph partitioning - a survey." Universitätsbibliothek Chemnitz, 2005. http://nbn-resolving.de/urn:nbn:de:swb:ch1-200501047.

Full text

Abstract:

Many problems appearing in scientific computing and other areas can be formulated as a graph partitioning problems. Examples include data distribution for parallel computers, decomposition of sparse matrices and VLSI-design. In this survey we present the graph partitioning problem, describe some applications and introduce many of the algorithms used to solve the problem.

APA, Harvard, Vancouver, ISO, and other styles

19

Neggaz, Mohammed Yessin. "Automatic classification of dynamic graphs." Thesis, Bordeaux, 2016. http://www.theses.fr/2016BORD0169/document.

Full text

Abstract:

Les réseaux dynamiques sont constitués d’entités établissant des contacts les unes avec les autres dans le temps. Un défi majeur dans les réseaux dynamiques est de prédire les modèles de mobilité et de décider si l’évolution de la topologie satisfait aux exigences du succès d’un algorithme donné. Les types de dynamique résultant de ces réseaux sont variés en échelle et en nature. Par exemple,certains de ces réseaux restent connexes tout le temps; d’autres sont toujours déconnectés mais offrent toujours une sorte de connexité dans le temps et dans l’espace(connexité temporelle); d’autres sont connexes de manière récurrente, périodique,etc. Tous ces contextes peuvent être représentés sous forme de classes de graphes dynamiques correspondant à des conditions nécessaires et/ou suffisantes pour des problèmes ou algorithmes distribués donnés. Étant donné un graphe dynamique,une question naturelle est de savoir à quelles classes appartient ce graphe. Dans ce travail, nous apportons une contribution à l’automatisation de la classification de graphes dynamiques. Nous proposons des stratégies pour tester l’appartenance d’un graphe dynamique à une classe donnée et nous définissons un cadre générique pour le test de propriétés dans les graphes dynamiques. Nous explorons également le cas où aucune propriété sur le graphe n’est garantie, à travers l’étude du problème de maintien d’une forêt d’arbres couvrants dans un graphe dynamique
Dynamic networks consist of entities making contact over time with one another. A major challenge in dynamic networks is to predict mobility patterns and decide whether the evolution of the topology satisfies requirements for the successof a given algorithm. The types of dynamics resulting from these networks are varied in scale and nature. For instance, some of these networks remain connected at all times; others are always disconnected but still offer some kind of connectivity over time and space (temporal connectivity); others are recurrently connected,periodic, etc. All of these contexts can be represented as dynamic graph classes corresponding to necessary or sufficient conditions for given distributed problems or algorithms. Given a dynamic graph, a natural question to ask is to which of the classes this graph belongs. In this work we provide a contribution to the automation of dynamic graphs classification. We provide strategies for testing membership of a dynamic graph to a given class and a generic framework to test properties in dynamic graphs. We also attempt to understand what can still be done in a context where no property on the graph is guaranteed through the distributed problem of maintaining a spanning forest in highly dynamic graphs

APA, Harvard, Vancouver, ISO, and other styles

20

Trinks, Martin. "Graph polynomials and their representations." Doctoral thesis, Technische Universitaet Bergakademie Freiberg Universitaetsbibliothek "Georgius Agricola", 2012. http://nbn-resolving.de/urn:nbn:de:bsz:105-qucosa-94991.

Full text

Abstract:

Graph polynomials are polynomials associated to graphs that encode the number of subgraphs with given properties. We list different frameworks used to define graph polynomials in the literature. We present the edge elimination polynomial and introduce several graph polynomials equivalent to it. Thereby, we connect a recursive definition to the counting of colorings and to the counting of (spanning) subgraphs. Furthermore, we define a graph polynomial that not only generalizes the mentioned, but also many of the well-known graph polynomials, including the Potts model, the matching polynomial, the trivariate chromatic polynomial and the subgraph component polynomial. We proof a recurrence relation for this graph polynomial using edge and vertex operation. The definitions and statements are given in such a way that most of them are also valid in the case of hypergraphs.

APA, Harvard, Vancouver, ISO, and other styles

21

Karunaratne, Thashmee M. "Learning predictive models from graph data using pattern mining." Doctoral thesis, Stockholms universitet, Institutionen för data- och systemvetenskap, 2014. http://urn.kb.se/resolve?urn=urn:nbn:se:su:diva-100713.

Full text

Abstract:

Learning from graphs has become a popular research area due to the ubiquity of graph data representing web pages, molecules, social networks, protein interaction networks etc. However, standard graph learning approaches are often challenged by the computational cost involved in the learning process, due to the richness of the representation. Attempts made to improve their efficiency are often associated with the risk of degrading the performance of the predictive models, creating tradeoffs between the efficiency and effectiveness of the learning. Such a situation is analogous to an optimization problem with two objectives, efficiency and effectiveness, where improving one objective without the other objective being worse off is a better solution, called a Pareto improvement. In this thesis, it is investigated how to improve the efficiency and effectiveness of learning from graph data using pattern mining methods. Two objectives are set where one concerns how to improve the efficiency of pattern mining without reducing the predictive performance of the learning models, and the other objective concerns how to improve predictive performance without increasing the complexity of pattern mining. The employed research method mainly follows a design science approach, including the development and evaluation of artifacts. The contributions of this thesis include a data representation language that can be characterized as a form in between sequences and itemsets, where the graph information is embedded within items. Several studies, each of which look for Pareto improvements in efficiency and effectiveness are conducted using sets of small graphs. Summarizing the findings, some of the proposed methods, namely maximal frequent itemset mining and constraint based itemset mining, result in a dramatically increased efficiency of learning, without decreasing the predictive performance of the resulting models. It is also shown that additional background knowledge can be used to enhance the performance of the predictive models, without increasing the complexity of the graphs.

APA, Harvard, Vancouver, ISO, and other styles

22

Vasilyeva, Elena, Maik Thiele, Christof Bornhövd, and Wolfgang Lehner. "Considering User Intention in Differential Graph Queries." IGI Global, 2015. https://tud.qucosa.de/id/qucosa%3A72931.

Full text

Abstract:

Empty answers are a major problem by processing pattern matching queries in graph databases. Especially, there can be multiple reasons why a query failed. To support users in such situations, differential queries can be used that deliver missing parts of a graph query. Multiple heuristics are proposed for differential queries, which reduce the search space. Although they are successful in increasing the performance, they can discard query subgraphs relevant to a user. To address this issue, the authors extend the concept of differential queries and introduce top-k differential queries that calculate the ranking based on users’ preferences and significantly support the users’ understanding of query database management systems. A user assigns relevance weights to elements of a graph query that steer the search and are used for the ranking. In this paper the authors propose different strategies for selection of relevance weights and their propagation. As a result, the search is modelled along the most relevant paths. The authors evaluate their solution and both strategies on the DBpedia data graph.

APA, Harvard, Vancouver, ISO, and other styles

23

Demco, Anthony A. "Graph kernel extensions and experiments with application to molecule classification, lead hopping and multiple targets." Thesis, University of Southampton, 2009. https://eprints.soton.ac.uk/66209/.

Full text

Abstract:

The discovery of drugs that can effectively treat disease and alleviate pain is one of the core challenges facing modern medicine. The tools and techniques of machine learning have perhaps the greatest potential to provide a fast and effcient route toward the fabrication of novel and effective drugs. In particular, modern structured kernel methods have been successfully applied to range of problem domains and have been recently adapted for graph structures making them directly applicable to pharmaceutical drug discovery. Specifically graph structures have a natural fit with molecular data, in that a graph consists of a set of nodes that represent atoms that are connected by bonds. In this thesis we use graph kernels that utilize three different graph representations: molecular, topological pharmacophore and reduced graphs. We introduce a set of novel graph kernels which are based on a measure of the number of finite walks within a graph. To calculate this measure we employ a dynamic programming framework which allows us to extend graph kernels so they can deal with non-tottering, softmatching and allows the inclusion of gaps. In addition we review several graph colouring methods and subsequently incorporate colour into our graph kernels models. These kernels are designed for molecule classification in general, although we show how they can be adapted to other areas in drug discovery. We conduct three sets of experiments and discuss how our augmented graph kernels are designed and adapted for these areas. First, we classify molecules based on their activity in comparison to a biological target. Second, we explore the related problem of lead hopping. Here one set of chemicals is used to predict another that is structurally dissimilar. We discuss the problems that arise due to the fact that some patterns are filtered from the dataset. By analyzing lead hopping we are able to go beyond the typical cross-validation approach and construct a dataset that more accurately reflect real-world tasks. Lastly, we explore methods of integrating information from multiple targets. We test our models as a multi-response problem and later introduce a new approach that employs Kernel Canonical Correlation Analysis (KCCA) to predict the best molecules for an unseen target. Overall, we show that graph kernels achieve good results in classification, lead hopping and multiple target experiments.

APA, Harvard, Vancouver, ISO, and other styles

24

Hernández, Pérez Bernard. "Multi-View Object Recognition and Classification. Graph-BasedRepresentation of Visual Features and Structured Learning andPrediction." Thesis, KTH, Skolan för datavetenskap och kommunikation (CSC), 2013. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-142486.

Full text

Abstract:

Computer Vision is a subfield within artificial intelligence that includes methods for acquisition, processing, analysis and understanding of images to get results in numerical or symbolic form. The information provided by the results is used to make decisions.We do not speak ofComputer Vision in isolation, interaction with other fields is inevitable and deserve particular attention image processing, pattern recognition and Machine Learning. The main objective of this project is to analyze the behavior of visual feature extraction algorithms and their effectiveness in decision making. The detection of an object in an image, its classification and recognition are the type of decisions that are studied. Feature extraction algorithms are applied to attempt multi-view object recognition. To tackle this problem a new approach is proposed. This approach creates a graph-based representation of the object using cluster analysis recursively. The nodes of the graph represent the main physical components that make up the object. Support Vector Machines (SVMs) are used to classify the nodes, thus classes are classified independently. Finally, the graph-based representation of the object is exploited to drop the assumption of independence and find relations between classes using Structured Output-Support Vector Machines (SO-SVMs).
Datorseende är ett delområde inom artificiell intelligens som innehåller metoder för förvärv, bildbehandling, analys och förståelse av bilder för att få resultat i numerisk eller symbolisk form. Informationen som resultatet ger används för att fatta beslut. Vi kan inte tala om visioner i isolering, samspel med andra områden är oundviklig och förtjänar särskild uppmärksamhet bildbehandling, mönsterigenkänning och maskininlärning. Huvudsyftet med detta projekt är att analysera beteendet hos visuella algoritmers särdragsextraktion och deras effektivitet i beslutsfattande. Upptäckten av ett objekt i en bild, dess klassificering och erkännande är den typ av beslut som studeras. Algoritmers särdragsextraktion tillämpas för att försöka erkänna objekts mång-vy. För att tackla detta problem har ett nytt tillvägagångssätt föreslgits. Detta tillvägagångssätt skapar en grafbaserad representation av objektet med hjälp av rekursiv klusteranalys. Noderna i grafen representerar de viktigaste fysiska komponenterna i objektet. Support Vector Machines (SVMs) används för att klassificera noderna, dessa klasser klassificeras självständigt. Slutligen, grafbaserad representation av objekt utnyttjas för att släppa antagandet om oberoende och hitta relationer mellan klasser genom att använda Structured Output - Support Vector Machines (SOSVMs).

APA, Harvard, Vancouver, ISO, and other styles

25

Biyikoglu, Türker, Josef Leydold, and Peter F. Stadler. "Nodal Domain Theorems and Bipartite Subgraphs." Department of Statistics and Mathematics, Abt. f. Angewandte Statistik u. Datenverarbeitung, WU Vienna University of Economics and Business, 2005. http://epub.wu.ac.at/626/1/document.pdf.

Full text

Abstract:

The Discrete Nodal Domain Theorem states that an eigenfunction of the k-th largest eigenvalue of a generalized graph Laplacian has at most k (weak) nodal domains. We show that the number of strong nodal domains cannot exceed the size of a maximal induced bipartite subgraph and that this bound is sharp for generalized graph Laplacians. Similarly, the number of weak nodal domains is bounded by the size of a maximal bipartite minor. (author's abstract)
Series: Preprint Series / Department of Applied Statistics and Data Processing

APA, Harvard, Vancouver, ISO, and other styles

26

Wappler, Markus. "On Graph Embeddings and a new Minor Monotone Graph Parameter associated with the Algebraic Connectivity of a Graph." Doctoral thesis, Universitätsbibliothek Chemnitz, 2013. http://nbn-resolving.de/urn:nbn:de:bsz:ch1-qucosa-115518.

Full text

Abstract:

We consider the problem of maximizing the second smallest eigenvalue of the weighted Laplacian of a (simple) graph over all nonnegative edge weightings with bounded total weight. We generalize this problem by introducing node significances and edge lengths. We give a formulation of this generalized problem as a semidefinite program. The dual program can be equivalently written as embedding problem. This is fifinding an embedding of the n nodes of the graph in n-space so that their barycenter is at the origin, the distance between adjacent nodes is bounded by the respective edge length, and the embedded nodes are spread as much as possible. (The sum of the squared norms is maximized.) We proof the following necessary condition for optimal embeddings. For any separator of the graph at least one of the components fulfills the following property: Each straight-line segment between the origin and an embedded node of the component intersects the convex hull of the embedded nodes of the separator. There exists always an optimal embedding of the graph whose dimension is bounded by the tree-width of the graph plus one. We defifine the rotational dimension of a graph. This is the minimal dimension k such that for all choices of the node significances and edge lengths an optimal embedding of the graph can be found in k-space. The rotational dimension of a graph is a minor monotone graph parameter. We characterize the graphs with rotational dimension up to two.

APA, Harvard, Vancouver, ISO, and other styles

27

Reiß, Susanna. "Optimizing Extremal Eigenvalues of Weighted Graph Laplacians and Associated Graph Realizations." Doctoral thesis, Universitätsbibliothek Chemnitz, 2012. http://nbn-resolving.de/urn:nbn:de:bsz:ch1-qucosa-93599.

Full text

Abstract:

This thesis deals with optimizing extremal eigenvalues of weighted graph Laplacian matrices. In general, the Laplacian matrix of a (weighted) graph is of particular importance in spectral graph theory and combinatorial optimization (e.g., graph partition like max-cut and graph bipartition). Especially the pioneering work of M. Fiedler investigates extremal eigenvalues of weighted graph Laplacians and provides close connections to the node- and edge-connectivity of a graph. Motivated by Fiedler, Göring et al. were interested in further connections between structural properties of the graph and the eigenspace of the second smallest eigenvalue of weighted graph Laplacians using a semidefinite optimization approach. By redistributing the edge weights of a graph, the following three optimization problems are studied in this thesis: maximizing the second smallest eigenvalue (based on the mentioned work of Göring et al.), minimizing the maximum eigenvalue and minimizing the difference of maximum and second smallest eigenvalue of the weighted Laplacian. In all three problems a semidefinite optimization formulation allows to interpret the corresponding semidefinite dual as a graph realization problem. That is, to each node of the graph a vector in the Euclidean space is assigned, fulfilling some constraints depending on the considered problem. Optimal realizations are investigated and connections to the eigenspaces of corresponding optimized eigenvalues are established. Furthermore, optimal realizations are closely linked to the separator structure of the graph. Depending on this structure, on the one hand folding properties of optimal realizations are characterized and on the other hand the existence of optimal realizations of bounded dimension is proven. The general bounds depend on the tree-width of the graph. In the case of minimizing the maximum eigenvalue, an important family of graphs are bipartite graphs, as an optimal one-dimensional realization may be constructed. Taking the symmetry of the graph into account, a particular optimal edge weighting exists. Considering the coupled problem, i.e., minimizing the difference of maximum and second smallest eigenvalue and the single problems, i.e., minimizing the maximum and maximizing the second smallest eigenvalue, connections between the feasible (optimal) sets are established.

APA, Harvard, Vancouver, ISO, and other styles

28

Bocancea, Andreea. "Supervised Classification Leveraging Refined Unlabeled Data." Thesis, Linköpings universitet, Institutionen för datavetenskap, 2015. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-119320.

Full text

Abstract:

This thesis focuses on how unlabeled data can improve supervised learning classi-fiers in all contexts, for both scarce to abundant label situations. This is meant toaddress the limitations within supervised learning with regards to label availability.Extending the training set with unlabeled data can overcome issues such as selec-tion bias, noise and insufficient data. Based on the overall data distribution andthe initial set of labels, semi-supervised methods provide labels for additional datapoints. The semi-supervised approaches considered in this thesis belong to one ofthe following categories: transductive SVMs, Cluster-then-Label and graph-basedtechniques. Further, we evaluate the behavior of: Logistic regression, Single layerperceptron, SVM and Decision trees. By learning on the extended training set,supervised classifiers are able to generalize better. Based on the results, this the-sis recommends data-processing and algorithmic solutions appropriate to real-worldsituations.

APA, Harvard, Vancouver, ISO, and other styles

29

Kerracher, Natalie. "Tasks and visual techniques for the exploration of temporal graph data." Thesis, Edinburgh Napier University, 2017. http://researchrepository.napier.ac.uk/Output/977758.

Full text

Abstract:

This thesis considers the tasks involved in exploratory analysis of temporal graph data, and the visual techniques which are able to support these tasks. There has been an enormous increase in the amount and availability of graph (network) data, and in particular, graph data that is changing over time. Understanding the mechanisms involved in temporal change in a graph is of interest to a wide range of disciplines. While the application domain may differ, many of the underlying questions regarding the properties of the graph and mechanism of change are the same. The research area of temporal graph visualisation seeks to address the challenges involved in visually representing change in a graph over time. While most graph visualisation tools focus on static networks, recent research has been directed toward the development of temporal visualisation systems. By representing data using computer-generated graphical forms, Information Visualisation techniques harness human perceptual capabilities to recognise patterns, spot anomalies and outliers, and find relationships within the data. Interacting with these graphical representations allow individuals to explore large datasets and gain further insightinto the relationships between different aspects of the data. Visual approaches are particularly relevant for Exploratory Data Analysis (EDA), where the person performing the analysis may be unfamiliar with the data set, and their goal is to make new discoveries and gain insight through its exploration. However, designing visual systems for EDA can be difficult, as the tasks which a person may wish to carry out during their analysis are not always known at outset. Identifying and understanding the tasks involved in such a process has given rise to a number of task taxonomies which seek to elucidate the tasks and structure them in a useful way. While task taxonomies for static graph analysis exist, no suitable temporal graph taxonomy has yet been developed. The first part of this thesis focusses on the development of such a taxonomy. Through the extension and instantiation of an existing formal task framework for general EDA, a task taxonomy and a task design space are developed specifically for exploration of temporal graph data. The resultant task framework is evaluated with respect to extant classifications and is shown to address a number of deficiencies in task coverage in existing works. Its usefulness in both the design and evaluation processes is also demonstrated. Much research currently surrounds the development of systems and techniques for visual exploration of temporal graphs, but little is known about how the different types of techniques relate to one another and which tasks they are able to support. The second part of this thesis focusses on the possibilities in this area: a design spaceof the possible visual encodings for temporal graph data is developed, and extant techniques are classified into this space, revealing potential combinations of encodings which have not yet been employed. These may prove interesting opportunities for further research and the development of novel techniques. The third part of this work addresses the need to understand the types of analysis the different visual techniques support, and indeed whether new techniques are required. The techniques which are able to support the different task dimensions are considered. This task-technique mapping reveals that visual exploration of temporalgraph data requires techniques not only from temporal graph visualisation, but also from static graph visualisation and comparison, and temporal visualisation. A number of tasks which are unsupported or less-well supported, which could prove interesting opportunities for future research, are identified. The taxonomies, design spaces, and mappings in this work bring order to the range of potential tasks of interest when exploring temporal graph data and the assortmentof techniques developed to visualise this type of data, and are designed to be of use in both the design and evaluation of temporal graph visualisation systems.

APA, Harvard, Vancouver, ISO, and other styles

30

Cichocki, Radoslaw. "Classification of objects in images based on various object representations." Thesis, Blekinge Tekniska Högskola, Avdelningen för programvarusystem, 2006. http://urn.kb.se/resolve?urn=urn:nbn:se:bth-5774.

Full text

Abstract:

Object recognition is a hugely researched domain that employs methods derived from mathematics, physics and biology. This thesis combines the approaches for object classification that base on two features – color and shape. Color is represented by color histograms and shape by skeletal graphs. Four hybrids are proposed which combine those approaches in different manners and the hybrids are then tested to find out which of them gives best results.
Mail the author at radoslaw.cichocki(at)gmail.com

APA, Harvard, Vancouver, ISO, and other styles

31

Mathieu, Bérangère. "Segmentation interactive multiclasse d'images par classification de superpixels et optimisation dans un graphe de facteurs." Thesis, Toulouse 3, 2017. http://www.theses.fr/2017TOU30290/document.

Full text

Abstract:

La segmentation est l'un des principaux thèmes du domaine de l'analyse d'images. Segmenter une image consiste à trouver une partition constituée de régions, c'est-à-dire d'ensembles de pixels connexes homogènes selon un critère choisi. L'objectif de la segmentation consiste à obtenir des régions correspondant aux objets ou aux parties des objets qui sont présents dans l'image et dont la nature dépend de l'application visée. Même s'il peut être très fastidieux, un tel découpage de l'image peut être facilement obtenu par un être humain. Il n'en est pas de même quand il s'agit de créer un programme informatique dont l'objectif est de segmenter les images de manière entièrement automatique. La segmentation interactive est une approche semi-automatique où l'utilisateur guide la segmentation d'une image en donnant des indications. Les méthodes qui s'inscrivent dans cette approche se divisent en deux catégories en fonction de ce qui est recherché : les contours ou les régions. Les méthodes qui recherchent des contours permettent d'extraire un unique objet correspondant à une région sans trou. L'utilisateur vient guider la méthode en lui indiquant quelques points sur le contour de l'objet. L'algorithme se charge de relier chacun des points par une courbe qui respecte les caractéristiques de l'image (les pixels de part et d'autre de la courbe sont aussi dissemblables que possible), les indications données par l'utilisateur (la courbe passe par chacun des points désignés) et quelques propriétés intrinsèques (les courbes régulières sont favorisées). Les méthodes qui recherchent les régions groupent les pixels de l'image en des ensembles, de manière à maximiser la similarité en leur sein et la dissemblance entre les différents ensembles. Chaque ensemble correspond à une ou plusieurs composantes connexes et peut contenir des trous. L'utilisateur guide la méthode en traçant des traits de couleur qui désignent quelques pixels appartenant à chacun des ensembles. Si la majorité des méthodes ont été conçues pour extraire un objet principal du fond, les travaux menés durant la dernière décennie ont permis de proposer des méthodes dites multiclasses, capables de produire une partition de l'image en un nombre arbitraire d'ensembles. La contribution principale de ce travail de recherche est la conception d'une nouvelle méthode de segmentation interactive multiclasse par recherche des régions. Elle repose sur la modélisation du problème comme la minimisation d'une fonction de coût pouvant être représentée par un graphe de facteurs. Elle intègre une méthode de classification par apprentissage supervisé assurant l'adéquation entre la segmentation produite et les indications données par l'utilisateur, l'utilisation d'un nouveau terme de régularisation et la réalisation d'un prétraitement consistant à regrouper les pixels en petites régions cohérentes : les superpixels. L'utilisation d'une méthode de sur-segmentation produisant des superpixels est une étape clé de la méthode que nous proposons : elle réduit considérablement la complexité algorithmique et permet de traiter des images contenant plusieurs millions de pixels, tout en garantissant un temps interactif. La seconde contribution de ce travail est une évaluation des algorithmes permettant de grouper les pixels en superpixels, à partir d'un nouvel ensemble de données de référence que nous mettons à disposition et dont la particularité est de contenir des images de tailles différentes : de quelques milliers à plusieurs millions de pixels. Cette étude nous a également permis de concevoir et d'évaluer une nouvelle méthode de production de superpixels
Image segmentation is one of the main research topics in image analysis. It is the task of researching a partition into regions, i.e., into sets of connected pixels, meeting a given uniformity criterion. The goal of image segmentation is to find regions corresponding to the objects or the object parts appearing in the image. The choice of what objects are relevant depends on the application context. Manually locating these objects is a tedious but quite simple task. Designing an automatic algorithm able to achieve the same result is, on the contrary, a difficult problem. Interactive segmentation methods are semi-automatic approaches where a user guide the search of a specific segmentation of an image by giving some indications. There are two kinds of methods : boundary-based and region-based interactive segmentation methods. Boundary-based methods extract a single object corresponding to a unique region without any holes. The user guides the method by selecting some boundary points of the object. The algorithm search for a curve linking all the points given by the user, following the boundary of the object and having some intrinsic properties (regular curves are encouraged). Region-based methods group the pixels of an image into sets, by maximizing the similarity of pixels inside each set and the dissimilarity between pixels belonging to different sets. Each set can be composed of one or several connected components and can contain holes. The user guides the method by drawing colored strokes, giving, for each set, some pixels belonging to it. If the majority of region-based methods extract a single object from the background, some algorithms, proposed during the last decade, are able to solve multi-class interactive segmentation problems, i.e., to extract more than two sets of pixels. The main contribution of this work is the design of a new multi-class interactive segmentation method. This algorithm is based on the minimization of a cost function that can be represented by a factor graph. It integrates a supervised learning classification method checking that the produced segmentation is consistent with the indications given by the user, a new regularization term, and a preprocessing step grouping pixels into small homogeneous regions called superpixels. The use of an over-segmentation method to produce these superpixels is a key step in the proposed interactive segmentation method : it significantly reduces the computational complexity and handles the segmentation of images containing several millions of pixels, by keeping the execution time small enough to ensure comfortable use of the method. The second contribution of our work is an evaluation of over-segmentation algorithms. We provide a new dataset, with images of different sizes with a majority of big images. This review has also allowed us to design a new over-segmentation algorithm and to evaluate it

APA, Harvard, Vancouver, ISO, and other styles

32

Srinivaasan, Gayathri. "Malicious Entity Categorization using Graph modelling." Thesis, KTH, Skolan för informations- och kommunikationsteknik (ICT), 2016. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-202980.

Full text

Abstract:

Today, malware authors not only write malicious software but also employ obfuscation, polymorphism, packing and endless such evasive techniques to escape detection by Anti-Virus Products (AVP). Besides the individual behavior of malware, the relations that exist among them play an important role for improving malware detection. This work aims to enable malware analysts at F-Secure Labs to explore various such relationships between malicious URLs and file samples in addition to their individual behavior and activity. The current detection methods at F-Secure Labs analyze unknown URLs and file samples independently without taking into account the correlations that might exist between them. Such traditional classification methods perform well but are not efficient at identifying complex multi-stage malware that hide their activity. The interactions between malware may include any type of network activity, dropping, downloading, etc. For instance, an unknown downloader that connects to a malicious website which in turn drops a malicious payload, should indeed be blacklisted. Such analysis can help block the malware infection at its source and also comprehend the whole infection chain. The outcome of this proof-of-concept study is a system that detects new malware using graph modelling to infer their relationship to known malware as part of the malware classification services at F-Secure.
Idag, skadliga program inte bara skriva skadlig programvara men också använda förvirring, polymorﬁsm, packning och ändlösa sådana undan tekniker för att ﬂy detektering av antivirusprodukter (AVP). Förutom individens beteende av skadlig kod, de relationer som ﬁnns mellan dem spelar en viktig roll för att förbättra detektering av skadlig kod. Detta arbete syftar till att ge skadliga analytiker på F-Secure Labs att utforska olika sådana relationer mellan skadliga URL: er och ﬁl prover i Förutom deras individuella beteende och aktivitet. De aktuella detektionsmetoder på F-Secure Labs analysera okända webbadresser och ﬁl prover oberoende utan med beaktande av de korrelationer som kan ﬁnnas mellan dem. Sådan traditionella klassiﬁceringsmetoder fungerar bra men är inte effektiva på att identiﬁera komplexa ﬂerstegs skadlig kod som döljer sin aktivitet. Interaktioner mellan malware kan innefatta någon typ av nätverksaktivitet, släppa, nedladdning, etc. Till exempel, en okänd loader som ansluter till en skadlig webbplats som i sin tur släpper en skadlig nyttolast, bör verkligen vara svartlistad. En sådan analys kan hjälpa till att blockera malware infektion vid källan och även förstå hela infektion kedja. Resultatet av denna proof-of-concept studien är ett system som upptäcker ny skadlig kod med hjälp av diagram modellering för att sluta deras förhållande till kända skadliga program som en del av de skadliga klassiﬁcerings tjänster på F-Secure.

APA, Harvard, Vancouver, ISO, and other styles

33

Mantrach, Amin. "Novel measures on directed graphs and applications to large-scale within-network classification." Doctoral thesis, Universite Libre de Bruxelles, 2010. http://hdl.handle.net/2013/ULB-DIPOT:oai:dipot.ulb.ac.be:2013/210033.

Full text

Abstract:

Ces dernières années, les réseaux sont devenus une source importante d’informations dans différents domaines aussi variés que les sciences sociales, la physique ou les mathématiques. De plus, la taille de ces réseaux n’a cessé de grandir de manière conséquente. Ce constat a vu émerger de nouveaux défis, comme le besoin de mesures précises et intuitives pour caractériser et analyser ces réseaux de grandes tailles en un temps raisonnable.

La première partie de cette thèse introduit une nouvelle mesure de similarité entre deux noeuds d’un réseau dirigé et pondéré :la covariance “sum-over-paths”. Celle-ci a une interprétation claire et précise :en dénombrant tous les chemins possibles deux noeuds sont considérés comme fortement corrélés s’ils apparaissent souvent sur un même chemin – de préférence court. Cette mesure dépend d’une distribution de probabilités, définie sur l’ensemble infini dénombrable des chemins dans le graphe, obtenue en minimisant l'espérance du coût total entre toutes les paires de noeuds du graphe sachant que l'entropie relative totale injectée dans le réseau est fixée à priori. Le paramètre d’entropie permet de biaiser la distribution de probabilité sur un large spectre :allant de marches aléatoires naturelles où tous les chemins sont équiprobables à des marches biaisées en faveur des plus courts chemins. Cette mesure est alors appliquée à des problèmes de classification semi-supervisée sur des réseaux de taille moyennes et comparée à l’état de l’art.

La seconde partie de la thèse introduit trois nouveaux algorithmes de classification de noeuds en sein d’un large réseau dont les noeuds sont partiellement étiquetés. Ces algorithmes ont un temps de calcul linéaire en le nombre de noeuds, de classes et d’itérations, et peuvent dés lors être appliqués sur de larges réseaux. Ceux-ci ont obtenus des résultats compétitifs en comparaison à l’état de l’art sur le large réseaux de citations de brevets américains et sur huit autres jeux de données. De plus, durant la thèse, nous avons collecté un nouveau jeu de données, déjà mentionné :le réseau de citations de brevets américains. Ce jeu de données est maintenant disponible pour la communauté pour la réalisation de tests comparatifs.

La partie finale de cette thèse concerne la combinaison d’un graphe de citations avec les informations présentes sur ses noeuds. De manière empirique, nous avons montré que des données basées sur des citations fournissent de meilleurs résultats de classification que des données basées sur des contenus textuels. Toujours de manière empirique, nous avons également montré que combiner les différentes sources d’informations (contenu et citations) doit être considéré lors d’une tâche de classification de textes. Par exemple, lorsqu’il s’agit de catégoriser des articles de revues, s’aider d’un graphe de citations extrait au préalable peut améliorer considérablement les performances. Par contre, dans un autre contexte, quand il s’agit de directement classer les noeuds du réseau de citations, s’aider des informations présentes sur les noeuds n’améliora pas nécessairement les performances.

La théorie, les algorithmes et les applications présentés dans cette thèse fournissent des perspectives intéressantes dans différents domaines.

In recent years, networks have become a major data source in various fields ranging from social sciences to mathematical and physical sciences. Moreover, the size of available networks has grow substantially as well. This has brought with it a number of new challenges, like the need for precise and intuitive measures to characterize and analyze large scale networks in a reasonable time.

The first part of this thesis introduces a novel measure between two nodes of a weighted directed graph: The sum-over-paths covariance. It has a clear and intuitive interpretation: two nodes are considered as highly correlated if they often co-occur on the same -- preferably short -- paths. This measure depends on a probability distribution over the (usually infinite) countable set of paths through the graph which is obtained by minimizing the total expected cost between all pairs of nodes while fixing the total relative entropy spread in the graph. The entropy parameter allows to bias the probability distribution over a wide spectrum: going from natural random walks (where all paths are equiprobable) to walks biased towards shortest-paths. This measure is then applied to semi-supervised classification problems on medium-size networks and compared to state-of-the-art techniques.

The second part introduces three novel algorithms for within-network classification in large-scale networks, i.e. classification of nodes in partially labeled graphs. The algorithms have a linear computing time in the number of edges, classes and steps and hence can be applied to large scale networks. They obtained competitive results in comparison to state-of-the-art technics on the large scale U.S.~patents citation network and on eight other data sets. Furthermore, during the thesis, we collected a novel benchmark data set: the U.S.~patents citation network. This data set is now available to the community for benchmarks purposes.

The final part of the thesis concerns the combination of a citation graph with information on its nodes. We show that citation-based data provide better results for classification than content-based data. We also show empirically that combining both sources of information (content-based and citation-based) should be considered when facing a text categorization problem. For instance, while classifying journal papers, considering to extract an external citation graph may considerably boost the performance. However, in another context, when we have to directly classify the network citation nodes, then the help of features on nodes will not improve the results.

The theory, algorithms and applications presented in this thesis provide interesting perspectives in various fields.

Doctorat en Sciences
info:eu-repo/semantics/nonPublished

APA, Harvard, Vancouver, ISO, and other styles

34

Günther, Manuel [Verfasser]. "Statistical Gabor Graph Based Techniques for the Detection, Recognition, Classification, and Visualization of Human Faces / Manuel Günther." Aachen : Shaker, 2012. http://d-nb.info/1069046140/34.

Full text

APA, Harvard, Vancouver, ISO, and other styles

35

Gullstrand, Mattias, and Stefan Maraš. "Using Graph Neural Networks for Track Classification and Time Determination of Primary Vertices in the ATLAS Experiment." Thesis, KTH, Matematisk statistik, 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-288505.

Full text

Abstract:

Starting in 2027, the high-luminosity Large Hadron Collider (HL-LHC) will begin operation and allow higher-precision measurements and searches for new physics processes between elementary particles. One central problem that arises in the ATLAS detector when reconstructing event information is to separate the rare and interesting hard scatter (HS) interactions from uninteresting pileup (PU) interactions in a spatially compact environment. This problem becomes even harder to solve at higher luminosities. This project relies on leveraging the time dimension and determining a time of the HS interactions to separate them from PU interactions by using information measured by the upcoming High-Granularity Timing Detector (HGTD). The current method relies on using a boosted decision tree (BDT) together with the timing information from the HGTD to determine a time. We suggest a novel approach of utilizing a graph attentional network (GAT) where each bunch-crossing is represented as a graph of tracks and the properties of the GAT are applied on a track level to inspect if such a model can outperform the current BDT. Our results show that we are able to replicate the results of the BDT and even improve some metrics at the expense of increasing the uncertainty of the time determination. We conclude that although there is potential for GATs to outperform the BDT, a more complex model should be applied. Finally, we provide some suggestions for improvement and hope to inspire further study and advancements in this direction which shows promising potential.
Från och med 2027 kommer \textit{high-luminosity Large Hadron Collider} (HL-LHC) att tas i drift och möjliggöra mätningar med högre precision och utforskningar av nya fysikprocesser mellan elementarpartiklar. Ett centralt problem som uppstår i ATLAS-detektorn vid rekonstruktionen av partikelkollisioner är att separera sällsynta och intressanta interaktioner, så kallade \textit{hard-scatters} (HS) från ointressanta \textit{pileup}-interaktioner (PU) i den kompakta rumsliga dimensionen. Svårighetsgraden för detta problem ökar vid högre luminositeter. Med hjälp av den kommande \textit{High-Granularity Timing-detektorns} (HGTD) mätningar kommer även tidsinformation relaterat till interaktionerna att erhållas. I detta projekt används denna information för att beräkna tiden för enskillda interaktioner vilket därmed kan användas för att separera HS-interaktioner från PU-interaktioner. Den nuvarande metoden använder en trädregressionsmetod, s.k. boosted decision tree (BDT) tillsammans med tidsinformationen från HGTD för att bestämma en tid. Vi föreslår ett nytt tillvägagångssätt baserat på ett s.k. uppvaktande grafnätverk (GAT), där varje protonkollision representeras som en graf över partikelspåren och där GAT-egenskaperna tillämpas på spårnivå. Våra resultat visar att vi kan replikera de BDT-baserade resultaten och till och med förbättra resultaten på bekostnad av att öka osäkerheten i tidsbestämningarna. Vi drar slutsatsen att även om det finns potential för GAT-modeller att överträffa BDT-modeller, bör mer komplexa versioner av de förra tillämpas. Vi ger slutligen några förbättringsförslag som vi hoppas ska kunna inspirera till ytterligare studier och framsteg inom detta område, vilket visar lovande potential.

APA, Harvard, Vancouver, ISO, and other styles

36

Guan, Xiao. "Deterministic and Flexible Parallel Latent Feature Models Learning Framework for Probabilistic Knowledge Graph." Thesis, Mittuniversitetet, Avdelningen för informationssystem och -teknologi, 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:miun:diva-35788.

Full text

Abstract:

Knowledge Graph is a rising topic in the field of Artificial Intelligence. As the current trend of knowledge representation, Knowledge graph research is utilizing the large knowledge base freely available on the internet. Knowledge graph also allows inspection, analysis, the reasoning of all knowledge in reality. To enable the ambitious idea of modeling the knowledge of the world, different theory and implementation emerges. Nowadays, we have the opportunity to use freely available information from Wikipedia and Wikidata. The thesis investigates and formulates a theory about learning from Knowledge Graph. The thesis researches probabilistic knowledge graph. It only focuses on a branch called latent feature models in learning probabilistic knowledge graph. These models aim to predict possible relationships of connected entities and relations. There are many models for such a task. The metrics and training process is detailed described and improved in the thesis work. The efficiency and correctness enable us to build a more complex model with confidence. The thesis also covers possible problems in finding and proposes future work.

APA, Harvard, Vancouver, ISO, and other styles

37

Long, Yangjing. "Graph Relations and Constrained Homomorphism Partial Orders." Doctoral thesis, Universitätsbibliothek Leipzig, 2014. http://nbn-resolving.de/urn:nbn:de:bsz:15-qucosa-154281.

Full text

Abstract:

We consider constrained variants of graph homomorphisms such as embeddings, monomorphisms, full homomorphisms, surjective homomorpshims, and locally constrained homomorphisms. We also introduce a new variation on this theme which derives from relations between graphs and is related to multihomomorphisms. This gives a generalization of surjective homomorphisms and naturally leads to notions of R-retractions, R-cores, and R-cocores of graphs. Both \\mbox{R-cores} and R-cocores of graphs are unique up to isomorphism and can be computed in polynomial time. The theory of the graph homomorphism order is well developed, and from it we consider analogous notions defined for orders induced by constrained homomorphisms. We identify corresponding cores, prove or disprove universality, characterize gaps and dualities. We give a new and significantly easier proof of the universality of the homomorphism order by showing that even the class of oriented cycles is universal. We provide a systematic approach to simplify the proofs of several earlier results in this area. We explore in greater detail locally injective homomorphisms on connected graphs, characterize gaps and show universality. We also prove that for every $d\\geq 3$ the homomorphism order on the class of line graphs of graphs with maximum degree $d$ is universal.

APA, Harvard, Vancouver, ISO, and other styles

38

Domke, Jens. "Routing on the Channel Dependency Graph:." Doctoral thesis, Saechsische Landesbibliothek- Staats- und Universitaetsbibliothek Dresden, 2017. http://nbn-resolving.de/urn:nbn:de:bsz:14-qucosa-225902.

Full text

Abstract:

In the pursuit for ever-increasing compute power, and with Moore's law slowly coming to an end, high-performance computing started to scale-out to larger systems. Alongside the increasing system size, the interconnection network is growing to accommodate and connect tens of thousands of compute nodes. These networks have a large influence on total cost, application performance, energy consumption, and overall system efficiency of the supercomputer. Unfortunately, state-of-the-art routing algorithms, which define the packet paths through the network, do not utilize this important resource efficiently. Topology-aware routing algorithms become increasingly inapplicable, due to irregular topologies, which either are irregular by design, or most often a result of hardware failures. Exchanging faulty network components potentially requires whole system downtime further increasing the cost of the failure. This management approach becomes more and more impractical due to the scale of today's networks and the accompanying steady decrease of the mean time between failures. Alternative methods of operating and maintaining these high-performance interconnects, both in terms of hardware- and software-management, are necessary to mitigate negative effects experienced by scientific applications executed on the supercomputer. However, existing topology-agnostic routing algorithms either suffer from poor load balancing or are not bounded in the number of virtual channels needed to resolve deadlocks in the routing tables. Using the fail-in-place strategy, a well-established method for storage systems to repair only critical component failures, is a feasible solution for current and future HPC interconnects as well as other large-scale installations such as data center networks. Although, an appropriate combination of topology and routing algorithm is required to minimize the throughput degradation for the entire system. This thesis contributes a network simulation toolchain to facilitate the process of finding a suitable combination, either during system design or while it is in operation. On top of this foundation, a key contribution is a novel scheduling-aware routing, which reduces fault-induced throughput degradation while improving overall network utilization. The scheduling-aware routing performs frequent property preserving routing updates to optimize the path balancing for simultaneously running batch jobs. The increased deployment of lossless interconnection networks, in conjunction with fail-in-place modes of operation and topology-agnostic, scheduling-aware routing algorithms, necessitates new solutions to solve the routing-deadlock problem. Therefore, this thesis further advances the state-of-the-art by introducing a novel concept of routing on the channel dependency graph, which allows the design of an universally applicable destination-based routing capable of optimizing the path balancing without exceeding a given number of virtual channels, which are a common hardware limitation. This disruptive innovation enables implicit deadlock-avoidance during path calculation, instead of solving both problems separately as all previous solutions.

APA, Harvard, Vancouver, ISO, and other styles

39

Karam, Zahi Nadim. "Subspace and graph methods to leverage auxiliary data for limited target data multi-class classification, applied to speaker verification." Thesis, Massachusetts Institute of Technology, 2011. http://hdl.handle.net/1721.1/66009.

Full text

Abstract:

Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2011.
Cataloged from PDF version of thesis.
Includes bibliographical references (p. 127-130).
Multi-class classification can be adversely affected by the absence of sufficient target (in-class) instances for training. Such cases arise in face recognition, speaker verification, and document classification, among others. Auxiliary data-sets, which contain a diverse sampling of non-target instances, are leveraged in this thesis using subspace and graph methods to improve classification where target data is limited. The auxiliary data is used to define a compact representation that maps instances into a vector space where inner products quantify class similarity. Within this space, an estimate of the subspace that constitutes within-class variability (e.g. the recording channel in speaker verification or the illumination conditions in face recognition) can be obtained using class-labeled auxiliary data. This thesis proposes a way to incorporate this estimate into the SVM framework to perform nuisance compensation, thus improving classification performance. Another contribution is a framework that combines mapping and compensation into a single linear comparison, which motivates computationally inexpensive and accurate comparison functions. A key aspect of the work takes advantage of efficient pairwise comparisons between the training, test, and auxiliary instances to characterize their interaction within the vector space, and exploits it for improved classification in three ways. The first uses the local variability around the train and test instances to reduce false-alarms. The second assumes the instances lie on a low-dimensional manifold and uses the distances along the manifold. The third extracts relational features from a similarity graph where nodes correspond to the training, test and auxiliary instances. To quantify the merit of the proposed techniques, results of experiments in speaker verification are presented where only a single target recording is provided to train the classifier. Experiments are preformed on standard NIST corpora and methods are compared using standard evalutation metrics: detection error trade-off curves, minimum decision costs, and equal error rates.
by Zahi Nadim Karam.
Ph.D.

APA, Harvard, Vancouver, ISO, and other styles

40

Gross, Brandi Nicole. "Input of Factor Graphs into the Detection, Classification, and Localization Chain and Continuous Active SONAR in Undersea Vehicles." Thesis, Virginia Tech, 2015. http://hdl.handle.net/10919/56609.

Full text

Abstract:

The focus of this thesis is to implement factor graphs into the problem of detection, classification, and localization (DCL) of underwater objects using active SOund Navigation And Ranging (SONAR). A factor graph is a bipartite graphical representation of the decomposition of a particular function. Messages are passed along the edges connecting factor and variable nodes, on which, a message passing algorithm is applied to compute the posterior probabilities at a particular node. This thesis addresses two issues. In the first section, the formulation of factor graphs for each section of the DCL chain required followed by their closed-form solutions. For the detector, the factor graph determines if the signal is a detection or simply noise. In the classifier, it outputs the probability for the elements in the class. Last, when using a factor graph for the tracker, it gives the estimated state of the object being tracked. The second part concentrates on the application to Continuous Active SONAR (CAS). When using CAS, a bistatic configuration is used allowing for a more rapid update rate where two unmanned underwater vehicles (UUVs) are used as the receiver and transmitter. The goal is to evaluate CAS's effectiveness to determine if the tracking accuracy improves as the transmit interval decreases. If CAS proves to be more efficient in target tracking, the next objective is to determine which messages sent between the two UUVs are most beneficial. To test this, a particle filter simulation is used.
Master of Science

APA, Harvard, Vancouver, ISO, and other styles

41

Raveaux, Romain. "Fouille de graphes et classification de graphes : application à l’analyse de plans cadastraux." Thesis, La Rochelle, 2010. http://www.theses.fr/2010LAROS311/document.

Full text

Abstract:

Les travaux présentés dans ce mémoire de thèse abordent sous différents angles très intéressants, un sujet vaste et ambitieux : l’interprétation de plans cadastraux couleurs.Dans ce contexte, notre approche se trouve à la confluence de différentes thématiques de recherche telles que le traitement du signal et des images, la reconnaissance de formes, l’intelligence artificielle et l’ingénierie des connaissances. En effet, si ces domaines scientifiques diffèrent dans leurs fondements, ils sont complémentaires et leurs apports respectifs sont indispensables pour la conception d’un système d’interprétation. Le centre du travail est le traitement automatique de documents cadastraux du 19e siècle. La problématique est traitée dans le cadre d'un projet réunissant des historiens, des géomaticiens et des informaticiens. D'une part nous avons considéré le problème sous un angle systémique, s'intéressant à toutes les étapes de la chaîne de traitements mais aussi avec un souci évident de développer des méthodologies applicables dans d'autres contextes. Les documents cadastraux ont été l'objet de nombreuses études mais nous avons su faire preuve d'une originalité certaine, mettant l'accent sur l'interprétation des documents et basant notre étude sur des modèles à base de graphes. Des propositions de traitements appropriés et de méthodologies ont été formulées. Le souci de comblé le gap sémantique entre l’image et l’interprétation a reçu dans le cas des plans cadastraux étudiés une réponse
This thesis tackles the problem of technical document interpretationapplied to ancient and colored cadastral maps. This subject is on the crossroadof different fields like signal or image processing, pattern recognition, artificial intelligence,man-machine interaction and knowledge engineering. Indeed, each of thesedifferent fields can contribute to build a reliable and efficient document interpretationdevice. This thesis points out the necessities and importance of dedicatedservices oriented to historical documents and a related project named ALPAGE.Subsequently, the main focus of this work: Content-Based Map Retrieval within anancient collection of color cadastral maps is introduced

APA, Harvard, Vancouver, ISO, and other styles

42

Lima, Mendez Gipsi. "Towards in silico detection and classification of prokaryotic Mobile Genetic Elements." Doctoral thesis, Universite Libre de Bruxelles, 2008. http://hdl.handle.net/2013/ULB-DIPOT:oai:dipot.ulb.ac.be:2013/210578.

Full text

Abstract:

Bacteriophage genomes show pervasive mosaicism, indicating that horizontal gene exchange plays a crucial role in their evolution. Phage genomes represent unique combinations of modules, each of them with a different phylogenetic history. Thus, a web-like, rather than a hierarchical scheme is needed for an appropriate representation of phage evolutionary relationships. Part of the virology community has long recognized this fact and calls for changing the traditional taxonomy that classifies tailed phages according to the type of genetic materials and phage tail and head/capsid morphologies. Moreover, based on morphological features, the current system depends on inspection of phage virions under the electron microscope and cannot directly classify prophages. With the genomic era, many phages have been sequenced that are not classified, calling for development of an automatic classification procedure that can cope with the sequencing pace. The ACLAME database provides a classification of phage proteins into families and assigns the families with at least 3 members to one or several functions.

In the first contribution of this work, the relative contribution of those different protein families to the similarities between the phages is assessed using pair-wise similarity matrices. The modular character of phage genomes is readily visualized using heatmaps, which differ depending on the function of the proteins used to measure the similarity.

Next, I propose a framework that allows for a reticulate classification of phages based on gene content (with statistical assessment of the significance of number of shared genes). Starting from gene/protein families, we built a weighted graph, where nodes represent phages and edges represent phage-phage similarities in terms of shared families. The topology of the network shows that most dsDNA phages form an interconnected group, confirming that dsDNA phages share a common gene pool, as proposed earlier. Differences are observed between temperate and virulent phages in the values of several centrality measures, which may correlate with different constraints to rampant recombination dictated by the phage lifestyle, and thus with a distinct evolutionary role in the phage population.

To this graph I applied a two-step clustering method to generate a fuzzy classification of phages. Using this methodology, each phage is associated with a membership vector, which quantitatively characterizes the membership of the phage to the clusters. Alternatively, genes were clustered based on their ‘phylogenetic profiles’ to define ‘evolutionary cohesive modules’. Phages can then be described as composite of a set of modules from the collection of modules of the whole phage population. The relationships between phages define a network based on module sharing. Unlike the first network built from statistical significant number of shared genes, this second network allows for a direct exploration of the nature of the functions shared between the connected phages. This functionality of the module-based network runs at the expense of missing links due to genes that are not part of modules, but which are encoded in the first network.

These approaches can easily focus on pre-defined modules for tracing one or several traits across the population. They provide an automatic and dynamic way to study relationships within the phage population. Moreover, they can be extended to the representation of populations of other mobile genetic elements or even to the entire mobilome.

Finally, to enrich the phage sequence space, which in turn allows for a better assessment of phage diversity and evolution, I devise a prophage prediction tool. With this methodology, approximately 800 prophages are predicted in 266 among 800 replicons screened. The comparison of a subset of these predictions with a manually annotated set shows a sensitivity of 79% and a positive predictive value of 91%, this later value suggesting that the procedure makes few false predictions. The preliminary analysis of the predicted prophages indicates that many may constitute novel phage types.

This work allows tracing guidelines for the classification and analysis of other mobile genetic elements. One can foresee that a pool of putative mobile genetic elements sequences can be extracted from the prokaryotic genomes and be further broken down in groups of related elements and evolutionary conserved modules. This would allow widening the picture of the evolutionary and functional relationships between these elements.

Doctorat en Sciences
info:eu-repo/semantics/nonPublished

APA, Harvard, Vancouver, ISO, and other styles

43

Ezzeddine, Diala. "A contribution to topological learning and its application in Social Networks." Thesis, Lyon 2, 2014. http://www.theses.fr/2014LYO22011/document.

Full text

Abstract:

L'Apprentissage Supervisé est un domaine populaire de l'Apprentissage Automatique en progrès constant depuis plusieurs années. De nombreuses techniques ont été développées pour résoudre le problème de classification, mais, dans la plupart des cas, ces méthodes se basent sur la présence et le nombre de points d'une classe donnée dans des zones de l'espace que doit définir le classifieur. Á cause de cela la construction de ce classifieur est dépendante de la densité du nuage de points des données de départ. Dans cette thèse, nous montrons qu'utiliser la topologie des données peut être une bonne alternative lors de la construction des classifieurs. Pour cela, nous proposons d'utiliser les graphes topologiques comme le Graphe de Gabriel (GG) ou le Graphes des Voisins Relatifs (RNG). Ces dernier représentent la topologie de données car ils sont basées sur la notion de voisinages et ne sont pas dépendant de la densité. Pour appliquer ce concept, nous créons une nouvelle méthode appelée Classification aléatoire par Voisinages (Random Neighborhood Classification (RNC)). Cette méthode utilise des graphes topologiques pour construire des classifieurs. De plus, comme une Méthodes Ensemble (EM), elle utilise plusieurs classifieurs pour extraire toutes les informations pertinentes des données. Les EM sont bien connues dans l'Apprentissage Automatique. Elles génèrent de nombreux classifieurs à partir des données, puis agrègent ces classifieurs en un seul. Le classifieur global obtenu est reconnu pour être très eficace, ce qui a été montré dans de nombreuses études. Cela est possible car il s'appuie sur des informations obtenues auprès de chaque classifieur qui le compose. Nous avons comparé RNC à d'autres méthodes de classification supervisées connues sur des données issues du référentiel UCI Irvine. Nous constatons que RNC fonctionne bien par rapport aux meilleurs d'entre elles, telles que les Forêts Aléatoires (RF) et Support Vector Machines (SVM). La plupart du temps, RNC se classe parmi les trois premières méthodes en terme d'eficacité. Ce résultat nous a encouragé à étudier RNC sur des données réelles comme les tweets. Twitter est un réseau social de micro-blogging. Il est particulièrement utile pour étudier l'opinion à propos de l'actualité et sur tout sujet, en particulier la politique. Cependant, l'extraction de l'opinion politique depuis Twitter pose des défis particuliers. En effet, la taille des messages, le niveau de langage utilisé et ambiguïté des messages rend très diffcile d'utiliser les outils classiques d'analyse de texte basés sur des calculs de fréquence de mots ou des analyses en profondeur de phrases. C'est cela qui a motivé cette étude. Nous proposons d'étudier les couples auteur/sujet pour classer le tweet en fonction de l'opinion de son auteur à propos d'un politicien (un sujet du tweet). Nous proposons une procédure qui porte sur l'identification de ces opinions. Nous pensons que les tweets expriment rarement une opinion objective sur telle ou telle action d'un homme politique mais plus souvent une conviction profonde de son auteur à propos d'un mouvement politique. Détecter l'opinion de quelques auteurs nous permet ensuite d'utiliser la similitude dans les termes employés par les autres pour retrouver ces convictions à plus grande échelle. Cette procédure à 2 étapes, tout d'abord identifier l'opinion de quelques couples de manière semi-automatique afin de constituer un référentiel, puis ensuite d'utiliser l'ensemble des tweets d'un couple (tous les tweets d'un auteur mentionnant un politicien) pour les comparer avec ceux du référentiel. L'Apprentissage Topologique semble être un domaine très intéressant à étudier, en particulier pour résoudre les problèmes de classification
Supervised Learning is a popular field of Machine Learning that has made recent progress. In particular, many methods and procedures have been developed to solve the classification problem. Most classical methods in Supervised Learning use the density estimation of data to construct their classifiers.In this dissertation, we show that the topology of data can be a good alternative in constructing classifiers. We propose using topological graphs like Gabriel graphs (GG) and Relative Neighborhood Graphs (RNG) that can build the topology of data based on its neighborhood structure. To apply this concept, we create a new method called Random Neighborhood Classification (RNC).In this method, we use topological graphs to construct classifiers and then apply Ensemble Methods (EM) to get all relevant information from the data. EM is well known in Machine Learning, generates many classifiers from data and then aggregates these classifiers into one. Aggregate classifiers have been shown to be very efficient in many studies, because it leverages relevant and effective information from each generated classifier. We first compare RNC to other known classification methods using data from the UCI Irvine repository. We find that RNC works very well compared to very efficient methods such as Random Forests and Support Vector Machines. Most of the time, it ranks in the top three methods in efficiency. This result has encouraged us to study the efficiency of RNC on real data like tweets. Twitter, a microblogging Social Network, is especially useful to mine opinion on current affairs and topics that span the range of human interest, including politics. Mining political opinion from Twitter poses peculiar challenges such as the versatility of the authors when they express their political view, that motivate this study. We define a new attribute, called couple, that will be very helpful in the process to study the tweets opinion. A couple is an author that talk about a politician. We propose a new procedure that focuses on identifying the opinion on tweet using couples. We think that focusing on the couples's opinion expressed by several tweets can overcome the problems of analysing each single tweet. This approach can be useful to avoid the versatility, language ambiguity and many other artifacts that are easy to understand for a human being but not automatically for a machine.We use classical Machine Learning techniques like KNN, Random Forests (RF) and also our method RNC. We proceed in two steps : First, we build a reference set of classified couples using Naive Bayes. We also apply a second alternative method to Naive method, sampling plan procedure, to compare and evaluate the results of Naive method. Second, we evaluate the performance of this approach using proximity measures in order to use RNC, RF and KNN. The expirements used are based on real data of tweets from the French presidential election in 2012. The results show that this approach works well and that RNC performs very good in order to classify opinion in tweets.Topological Learning seems to be very intersting field to study, in particular to address the classification problem. Many concepts to get informations from topological graphs need to analyse like the ones described by Aupetit, M. in his work (2005). Our work show that Topological Learning can be an effective way to perform classification problem

APA, Harvard, Vancouver, ISO, and other styles

44

Dunn, Sarah, and Sean M. Wilkinson. "Increasing the resilience of air traffic networks using a network graph theory approach." Elsevier, 2015. https://publish.fid-move.qucosa.de/id/qucosa%3A72825.

Full text

Abstract:

Air traffic networks are essential to today’s global society. They are the fastest means of transporting physical goods and people and are a major contributor to the globalisation of the world’s economy. This increasing reliance requires these networks to have high resilience; however, previous events show that they can be susceptible to natural hazards. We assess two strategies to improve the resilience of air traffic networks and show an adaptive reconfiguration strategy is superior to a permanent re-routing solution. We find that, if traffic networks have fixed air routes, the geographical location of airports leaves them vulnerable to spatial hazard.

APA, Harvard, Vancouver, ISO, and other styles

45

Petzold, Maria. "Maximale Kantengewichte zusammenhängender Graphen." Doctoral thesis, Technische Universitaet Bergakademie Freiberg Universitaetsbibliothek "Georgius Agricola", 2012. http://nbn-resolving.de/urn:nbn:de:bsz:105-qucosa-89030.

Full text

Abstract:

Das Gewicht einer Kante e = xy eines Graphen G = (V, E) ist definiert als Summe der Grade seiner Endpunkte und das Gewicht des Graphen als MInimum über alle Kantengewichte. Wir suchen für positive ganze Zahlen n,m und eine Grapheneigenschaft P den Wert: w(n,m, P) := max{w(G) : |V(G)| = n, |E(G)| = m,G in P}. Der ungarische Mathematiker Erdös formulierte 1990 auf dem Czecheslovak Symposium on Combinatorics, Graphs and Complexity die Problemstellung w(n,m, I) zu bestimmen, für die allgemeinste aller Graphenklassen I. Dieses Problem wurde zuerst teilweise von Invančo and Jendrol’ und dann endgültig von Jendrol’ and Schiermeyer gelöst. Sei G in der Graphenklasse C genau dann wenn G zusammenhängend ist. In dieser Arbeit werden Ansätze zur Bestimmung von w(n,m,C) vorgestellt. Im Speziellen betrachten wir Graphen mit bis zu 3n − 6 Kanten, sowie sehr dichte Graphen. Außerdem diskutieren wir einige verallgemeinerte Fragestellungen.

APA, Harvard, Vancouver, ISO, and other styles

46

Behmo, Régis. "Visual feature graphs and image recognition." Phd thesis, Ecole Centrale Paris, 2010. http://tel.archives-ouvertes.fr/tel-00545419.

Full text

Abstract:

La problèmatique dont nous nous occupons dans cette thèse est la classification automatique d'images bidimensionnelles, ainsi que la détection d'objets génériques dans des images. Les avancées de ce champ de recherche contribuent à l'élaboration de systèmes intelligents, tels que des robots autonomes et la création d'un web sémantique. Dans ce contexte, la conception de représentations d'images et de classificateurs appropriés constituent des problèmes ambitieux. Notre travail de recherche fournit des solutions à ces deux problèmes, que sont la représentation et la classification d'images. Afin de générer notre représentation d'image, nous extrayons des attributs visuels de l'image et construisons une structure de graphe basée sur les propriétés liées au relations de proximités entre les points d'intérêt associés. Nous montrons que certaines propriétés spectrales de ces graphes constituent de bons invariants aux classes de transformations géométriques rigides. Notre représentation d'image est basée sur ces propriétés. Les résultats expérimentaux démontrent que cette représentation constitue une amélioration par rapport à d'autres représentations similaires, mais qui n'intègrent pas les informations liées à l'organisation spatiale des points d'intérêt. Cependant, un inconvénient de cette méthode est qu'elle fait appel à une quantification (avec pertes) de l'espace des attributs visuels afin d'être combinée avec un classificateur Support Vecteur Machine (SVM) efficace. Nous résolvons ce problème en créant un nouveau classificateur, basé sur la distance au plus proche voisin, et qui permet la classification d'objets assimilés à des ensembles de points. La linéarité de ce classificateur nous permet également de faire de la détection d'objet, en plus de la classification d'images. Une autre propriété intéressante de ce classificateur est sa capacité à combiner différents types d'attributs visuels de manière optimale. Nous utilisons cette propriété pour formuler le problème de classification de graphes de manière différente. Les expériences, menées sur une grande variété de jeux de données, montrent les bénéfices quantitatifs de notre approche.

APA, Harvard, Vancouver, ISO, and other styles

47

Reinwardt, Manja. "Combinatorial and graph theoretical aspects of two-edge connected reliability." Doctoral thesis, Technische Universitaet Bergakademie Freiberg Universitaetsbibliothek "Georgius Agricola", 2015. http://nbn-resolving.de/urn:nbn:de:bsz:105-qucosa-184297.

Full text

Abstract:

Die Untersuchung von Zuverlässigkeitsnetzwerken geht bis zum frühen 20. Jahrhundert zurück. Diese Arbeit beschäftigt sich hauptsächlich mit der Zweifach-Kantenzusammenhangswahrscheinlichkeit. Zuerst werden einfache Algorithmen, die aber für allgemeine Graphen nicht effizient sind, gezeigt, zusammen mit Reduktionen. Weiterhin werden Charakterisierungen von Kanten bezogen auf Wegemengen gezeigt. Neue strukturelle Bedingungen für diese werden vorgestellt. Neue Ergebnisse liegen ebenfalls für Graphen hoher Dichte und Symmetrie vor, genauer für vollständige und vollständig bipartite Graphen. Naturgemäß sind Graphen von geringer Dichte hier einfacher in der Untersuchung. Die Arbeit zeigt Ergebnisse für Kreise, Räder und Leiterstrukturen. Graphen mit beschränkter Weg- beziehungsweise Baumweite haben polynomiale Algorithmen und in Spezialfällen einfache Formeln, die ebenfalls vorgestellt werden. Der abschließende Teil beschäftigt sich mit Schranken und Approximationen.

APA, Harvard, Vancouver, ISO, and other styles

48

Luqman, Muhammad Muzzamil. "Fuzzy multilevel graph embedding for recognition, indexing and retrieval of graphic document images." Thesis, Tours, 2012. http://www.theses.fr/2012TOUR4005/document.

Full text

Abstract:

Cette thèse aborde le problème du manque de performance des outils exploitant des représentationsà base de graphes en reconnaissance des formes. Nous proposons de contribuer aux nouvellesméthodes proposant de tirer partie, à la fois, de la richesse des méthodes structurelles et de la rapidité des méthodes de reconnaissance de formes statistiques. Deux principales contributions sontprésentées dans ce manuscrit. La première correspond à la proposition d'une nouvelle méthode deprojection explicite de graphes procédant par analyse multi-facettes des graphes. Cette méthodeeffectue une caractérisation des graphes suivant différents niveaux qui correspondent, selon nous,aux point-clés des représentations à base de graphes. Il s'agit de capturer l'information portéepar un graphe au niveau global, au niveau structure et au niveau local ou élémentaire. Ces informationscapturées sont encapsulés dans un vecteur de caractéristiques numériques employantdes histogrammes flous. La méthode proposée utilise, de plus, un mécanisme d'apprentissage nonsupervisée pour adapter automatiquement ses paramètres en fonction de la base de graphes àtraiter sans nécessité de phase d'apprentissage préalable. La deuxième contribution correspondà la mise en place d'une architecture pour l'indexation de masses de graphes afin de permettre,par la suite, la recherche de sous-graphes présents dans cette base. Cette architecture utilise laméthode précédente de projection explicite de graphes appliquée sur toutes les cliques d'ordre 2pouvant être extraites des graphes présents dans la base à indexer afin de pouvoir les classifier.Cette classification permet de constituer l'index qui sert de base à la description des graphes etdonc à leur indexation en ne nécessitant aucune base d'apprentissage pré-étiquetées. La méthodeproposée est applicable à de nombreux domaines, apportant la souplesse d'un système de requêtepar l'exemple et la granularité des techniques d'extraction ciblée (focused retrieval)
This thesis addresses the problem of lack of efficient computational tools for graph based structural pattern recognition approaches and proposes to exploit computational strength of statistical pattern recognition. It has two fold contributions. The first contribution is a new method of explicit graph embedding. The proposed graph embedding method exploits multilevel analysis of graph for extracting graph level information, structural level information and elementary level information from graphs. It embeds this information into a numeric feature vector. The method employs fuzzy overlapping trapezoidal intervals for addressing the noise sensitivity of graph representations and for minimizing the information loss while mapping from continuous graph space to discrete vector space. The method has unsupervised learning abilities and is capable of automatically adapting its parameters to underlying graph dataset. The second contribution is a framework for automatic indexing of graph repositories for graph retrieval and subgraph spotting. This framework exploits explicit graph embedding for representing the cliques of order 2 by numeric feature vectors, together with classification and clustering tools for automatically indexing a graph repository. It does not require a labeled learning set and can be easily deployed to a range of application domains, offering ease of query by example (QBE) and granularity of focused retrieval

APA, Harvard, Vancouver, ISO, and other styles

49

Douar, Brahim. "Fouille de sous-graphes fréquents à base d'arc consistance." Thesis, Montpellier 2, 2012. http://www.theses.fr/2012MON20108/document.

Full text

Abstract:

Avec la croissance importante du besoin d'analyser une grande masse de données structurées tels que les composés chimiques, les structures de protéines ou même les réseaux sociaux, la fouille de sous-graphes fréquents est devenue un défi réel en matière de fouille de données. Ceci est étroitement lié à leur nombre exponentiel ainsi qu'à la NP-complétude du problème d'isomorphisme d'un sous-graphe général. Face à cette complexité, et pour gérer cette taille importante de l'espace de recherche, les méthodes classiques de fouille de graphes ont exploré des heuristiques de recherche basées sur le support, le langage de description des exemples (limitation aux chemins, aux arbres, etc.) ou des hypothèses (recherche de sous-arborescence communes, de chemins communs, etc.). Dans le cadre de cette thèse, nous nous basons sur une méthode d'appariement de graphes issue du domaine de la programmation par contraintes, nommée AC-projection, qui a le mérite d'avoir une complexité polynomiale. Nous introduisons des approches de fouille de graphes permettant d'améliorer les approches existantes pour ce problème. En particulier, nous proposons deux algorithmes, FGMAC et AC-miner, permettant de rechercher les sous-graphes fréquents à partir d'une base de graphes. Ces deux algorithmes profitent, différemment, des propriétés fortes intéressantes de l'AC-projection. En effet, l'algorithme FGMAC adopte un parcours en largeur de l'espace de recherche et exploite l'approche par niveau introduite dans Apriori, tandis que l'algorithme AC-miner parcourt l'espace en profondeur par augmentation de motifs, assurant ainsi une meilleure mise à l'échelle pour les grands graphes. Ces deux approches permettent l'extraction d'un type particulier de graphes, il s'agit de celui des sous-graphes AC-réduits fréquents. Dans un premier temps, nous prouvons, théoriquement, que l'espace de recherche de ces sous-graphes est moins important que celui des sous-graphes fréquents à un isomorphisme près. Ensuite, nous menons une série d'expérimentations permettant de prouver que les algorithmes FGMAC et AC-miner sont plus efficients que ceux de l'état de l'art. Au même temps, nous prouvons que les sous-graphes AC-réduits fréquents, en dépit de leur nombre sensiblement réduit, ont le même pouvoir discriminant que les sous-graphes fréquents à un isomorphisme près. Cette étude est menée en se basant sur une évaluation expérimentale de la qualité des sous-graphes AC-réduits fréquents dans un processus de classification supervisée de graphes
With the important growth of requirements to analyze large amount of structured data such as chemical compounds, proteins structures, social networks, to cite but a few, graph mining has become an attractive track and a real challenge in the data mining field. Because of the NP-Completeness of subgraph isomorphism test as well as the huge search space, frequent subgraph miners are exponential in runtime and/or memory use. In order to alleviate the complexity issue, existing subgraph miners have explored techniques based on the minimal support threshold, the description language of the examples (only supporting paths, trees, etc.) or hypothesis (search for shared trees or common paths, etc.). In this thesis, we are using a new projection operator, named AC-projection, which exhibits nice complexity properties as opposed to the graph isomorphism operator. This operator comes from the constraints programming field and has the advantage of a polynomial complexity. We propose two frequent subgraph mining algorithms based on the latter operator. The first one, named FGMAC, follows a breadth-first order to find frequent subgraphs and takes advantage of the well-known Apriori levelwise strategy. The second is a pattern-growth approach that follows a depth-first search space exploration strategy and uses powerful pruning techniques in order to considerably reduce this search space. These two approaches extract a set of particular subgraphs named AC-reduced frequent subgraphs. As a first step, we have studied the search space for discovering such frequent subgraphs and proved that this one is smaller than the search space of frequent isomorphic subgraphs. Then, we carried out experiments in order to prove that FGMAC and AC-miner are more efficient than the state-of-the-art algorithms. In the same time, we have studied the relevance of frequent AC-reduced subgraphs, which are much fewer than isomorphic ones, on classification and we conclude that we can achieve an important performance gain without or with non-significant loss of discovered pattern's quality

APA, Harvard, Vancouver, ISO, and other styles

50

Grieshaber, Frank. "GODOT: graph of dated objects and texts: building a chronological gazetteer for antiquity." Epigraphy Edit-a-thon : editing chronological and geographic data in ancient inscriptions ; April 20-22, 2016 / edited by Monica Berti. Leipzig, 2016. Beitrag 6, 2016. https://ul.qucosa.de/id/qucosa%3A15468.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Dissertations / Theses on the topic 'Graph classification'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles