Log in

Relevant bibliographies by topics / Metric machine / Dissertations / Theses

To see the other types of publications on this topic, follow the link: Metric machine.

Dissertations / Theses on the topic 'Metric machine'

Author: Grafiati

Published: 4 June 2025

Last updated: 23 June 2025

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 dissertations / theses for your research on the topic 'Metric machine.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.

1

Hagen, Erling. "Using machine learning to balance metric trees." Thesis, Norwegian University of Science and Technology, Department of Computer and Information Science, 2006. http://urn.kb.se/resolve?urn=urn:nbn:no:ntnu:diva-10118.

Full text

Abstract:

<p>The emergence of complex data objects that must to be indexed and accessed in databases has created a need for access methods that are both dynamic and efficient. Lately, metric tree structures have become a popular way of handling this because of the advantages they have compared to traditional methods based on spatial indexing. The most common way to handle indexing is to build tree structures and then prune out branches of the trees during search, and for a dynamic indexing structure it is important that these trees stay balanced in order to keep the worst case search time as low as possible. Normally, this is done based on complex criteria and reshuffling operations. Another way to handle balancing is General Balanced Trees (GBT), proposed by Arne Andersson (Journal of Algorithms 30, 1999), which uses simple, global criteria for rebalancing binary search trees by using total and partial rebuilding. This thesis explores if it is possible to apply this to metric tree structures, and especially two static metric tree structures called the Vantage Point Tree and the Multiple Vantage Point Tree. It discusses how to best make these into dynamic tree structures and how to apply balancing by using GBT paradigms on them. The results of the performance of the new tree structures are analyzed, and the results are compared against already existing structures. The results shows that this works for balancing the trees, and that the structures perform reasonably well compared to already existing structures.</p>

APA, Harvard, Vancouver, ISO, and other styles

2

Chang, Hong. "Semi-supervised distance metric learning /." View abstract or full-text, 2006. http://library.ust.hk/cgi/db/thesis.pl?COMP%202006%20CHANG.

Full text

APA, Harvard, Vancouver, ISO, and other styles

3

Cao, Qiong. "Some topics on similarity metric learning." Thesis, University of Exeter, 2015. http://hdl.handle.net/10871/18662.

Full text

Abstract:

The success of many computer vision problems and machine learning algorithms critically depends on the quality of the chosen distance metrics or similarity functions. Due to the fact that the real-data at hand is inherently task- and data-dependent, learning an appropriate distance metric or similarity function from data for each specific task is usually superior to the default Euclidean distance or cosine similarity. This thesis mainly focuses on developing new metric and similarity learning models for three tasks: unconstrained face verification, person re-identification and kNN classification. Unconstrained face verification is a binary matching problem, the target of which is to predict whether two images/videos are from the same person or not. Concurrently, person re-identification handles pedestrian matching and ranking across non-overlapping camera views. Both vision problems are very challenging because of the large transformation differences in images or videos caused by pose, expression, occlusion, problematic lighting and viewpoint. To address the above concerns, two novel methods are proposed. Firstly, we introduce a new dimensionality reduction method called Intra-PCA by considering the robustness to large transformation differences. We show that Intra-PCA significantly outperforms the classic dimensionality reduction methods (e.g. PCA and LDA). Secondly, we propose a novel regularization framework called Sub-SML to learn distance metrics and similarity functions for unconstrained face verifica- tion and person re-identification. The main novelty of our formulation is to incorporate both the robustness of Intra-PCA to large transformation variations and the discriminative power of metric and similarity learning, a property that most existing methods do not hold. Working with the task of kNN classification which relies a distance metric to identify the nearest neighbors, we revisit some popular existing methods for metric learning and develop a general formulation called DMLp for learning a distance metric from data. To obtain the optimal solution, a gradient-based optimization algorithm is proposed which only needs the computation of the largest eigenvector of a matrix per iteration. Although there is a large number of studies devoted to metric/similarity learning based on different objective functions, few studies address the generalization analysis of such methods. We describe a novel approch for generalization analysis of metric/similarity learning which can deal with general matrix regularization terms including the Frobenius norm, sparse L1-norm, mixed (2, 1)-norm and trace-norm. The novel models developed in this thesis are evaluated on four challenging databases: the Labeled Faces in the Wild dataset for unconstrained face verification in still images; the YouTube Faces database for video-based face verification in the wild; the Viewpoint Invariant Pedestrian Recognition database for person re-identification; the UCI datasets for kNN classification. Experimental results show that the proposed methods yield competitive or state-of-the-art performance.

APA, Harvard, Vancouver, ISO, and other styles

4

Ruan, Yang. "Smooth and locally linear semi-supervised metric learning /." View abstract or full-text, 2009. http://library.ust.hk/cgi/db/thesis.pl?CSED%202009%20RUAN.

Full text

APA, Harvard, Vancouver, ISO, and other styles

5

Shi, Bibo. "Diversification and Generalization for Metric Learning with Applications in Neuroimaging." Ohio University / OhioLINK, 2015. http://rave.ohiolink.edu/etdc/view?acc_num=ohiou1448980736.

Full text

APA, Harvard, Vancouver, ISO, and other styles

6

Engström, Isak. "Automated Gait Analysis : Using Deep Metric Learning." Thesis, Linköpings universitet, Medie- och Informationsteknik, 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-178139.

Full text

Abstract:

Sectors of security, safety, and defence require methods for identifying people on the individual level. Automation of these tasks has the potential of outperforming manual labor, as well as relieving workloads. The ever-extending surveillance camera networks, advances in human pose estimation from monocular cameras, together with the progress of deep learning techniques, pave the way for automated walking gait analysis as an identification method. This thesis investigates the use of 2D kinematic pose sequences to represent gait, monocularly extracted from a limited dataset containing walking individuals captured from five camera views. The sequential information of the gait is captured using recurrent neural networks. Techniques in deep metric learning are applied to evaluate two network models, with contrasting output dimensionalities, against deep-metric-, and non-deep-metric-based embedding spaces. The results indicate that the gait representation, network designs, and network learning structure show promise when identifying individuals, scaling particularly well to unseen individuals. However, with the limited dataset, the network models performed best when the dataset included the labels from both the individuals and the camera views simultaneously, contrary to when the data only contained the labels from the individuals without the information of the camera views. For further investigations, an extension of the data would be required to evaluate the accuracy and effectiveness of these methods, for the re-identification task of each individual.<br><p>Examensarbetet är utfört vid Institutionen för teknik och naturvetenskap (ITN) vid Tekniska fakulteten, Linköpings universitet</p>

APA, Harvard, Vancouver, ISO, and other styles

7

Witoszko, Izabela. "How and why robotics automate work : analyzing automation of tasks using machine learning suitability assessment metric." Thesis, Massachusetts Institute of Technology, 2018. http://hdl.handle.net/1721.1/118508.

Full text

Abstract:

Thesis: S.M. in Engineering and Management, Massachusetts Institute of Technology, System Design and Management Program, 2018.<br>Cataloged from PDF version of thesis.<br>Includes bibliographical references (pages 86-89).<br>As we are at the beginning of the Second Machine Age, where Al, Machine Learning, and Robotics technologies are increasingly influencing this revolution, we are experiencing significant automation changes in many industries such as warehousing and distribution centers. Many of the jobs in these industries aren't just being transformed but also partially or fully automated, often replacing the lowest skilled workers. Even though the core technologies driving automation today are improving exponentially, there are still many areas where human workers exceed and thrive. Some of the jobs might be automated, but there are some tasks which prove to be difficult for machines to perform. The research tries to understand how technology is automating tasks within warehousing jobs right now? By applying rigorous metrics, developed by Erik Brynjolfsson and Tom Mitchell to jobs within warehouses, the thesis aims to show which tasks within these jobs have the highest suitability for machine learning and robotics automation. The research includes the analysis of the not automated tasks and the possible reasons and opportunities for automation.<br>by Izabela Witoszko.<br>S.M. in Engineering and Management

APA, Harvard, Vancouver, ISO, and other styles

8

Carriere, Mathieu. "On Metric and Statistical Properties of Topological Descriptors for geometric Data." Thesis, Université Paris-Saclay (ComUE), 2017. http://www.theses.fr/2017SACLS433/document.

Full text

Abstract:

Dans le cadre de l'apprentissage automatique, l'utilisation de représentations alternatives, ou descripteurs, pour les données est un problème fondamental permettant d'améliorer sensiblement les résultats des algorithmes. Parmi eux, les descripteurs topologiques calculent et encodent l'information de nature topologique contenue dans les données géométriques. Ils ont pour avantage de bénéficier de nombreuses bonnes propriétés issues de la topologie, et désirables en pratique, comme par exemple leur invariance aux déformations continues des données. En revanche, la structure et les opérations nécessaires à de nombreuses méthodes d'apprentissage, comme les moyennes ou les produits scalaires, sont souvent absents de l'espace de ces descripteurs. Dans cette thèse, nous étudions en détail les propriétés métriques et statistiques des descripteurs topologiques les plus fréquents, à savoir les diagrammes de persistance et Mapper. En particulier, nous montrons que le Mapper, qui est empiriquement un descripteur instable, peut être stabilisé avec une métrique appropriée, que l'on utilise ensuite pour calculer des régions de confiance et pour régler automatiquement ses paramètres. En ce qui concerne les diagrammes de persistance, nous montrons que des produits scalaires peuvent être utilisés via des méthodes à noyaux, en définissant deux noyaux, ou plongements, dans des espaces de Hilbert en dimension finie et infinie<br>In the context of supervised Machine Learning, finding alternate representations, or descriptors, for data is of primary interest since it can greatly enhance the performance of algorithms. Among them, topological descriptors focus on and encode the topological information contained in geometric data. One advantage of using these descriptors is that they enjoy many good and desireable properties, due to their topological nature. For instance, they are invariant to continuous deformations of data. However, the main drawback of these descriptors is that they often lack the structure and operations required by most Machine Learning algorithms, such as a means or scalar products. In this thesis, we study the metric and statistical properties of the most common topological descriptors, the persistence diagrams and the Mappers. In particular, we show that the Mapper, which is empirically instable, can be stabilized with an appropriate metric, that we use later on to conpute confidence regions and automatic tuning of its parameters. Concerning persistence diagrams, we show that scalar products can be defined with kernel methods by defining two kernels, or embeddings, into finite and infinite dimensional Hilbert spaces

APA, Harvard, Vancouver, ISO, and other styles

9

Neo, TohKoon. "A Direct Algorithm for the K-Nearest-Neighbor Classifier via Local Warping of the Distance Metric." Diss., CLICK HERE for online access, 2007. http://contentdm.lib.byu.edu/ETD/image/etd2168.pdf.

Full text

APA, Harvard, Vancouver, ISO, and other styles

10

Bellet, Aurélien. "Supervised metric learning with generalization guarantees." Phd thesis, Université Jean Monnet - Saint-Etienne, 2012. http://tel.archives-ouvertes.fr/tel-00770627.

Full text

Abstract:

In recent years, the crucial importance of metrics in machine learningalgorithms has led to an increasing interest in optimizing distanceand similarity functions using knowledge from training data to make them suitable for the problem at hand.This area of research is known as metric learning. Existing methods typically aim at optimizing the parameters of a given metric with respect to some local constraints over the training sample. The learned metrics are generally used in nearest-neighbor and clustering algorithms.When data consist of feature vectors, a large body of work has focused on learning a Mahalanobis distance, which is parameterized by a positive semi-definite matrix. Recent methods offer good scalability to large datasets.Less work has been devoted to metric learning from structured objects (such as strings or trees), because it often involves complex procedures. Most of the work has focused on optimizing a notion of edit distance, which measures (in terms of number of operations) the cost of turning an object into another.We identify two important limitations of current supervised metric learning approaches. First, they allow to improve the performance of local algorithms such as k-nearest neighbors, but metric learning for global algorithms (such as linear classifiers) has not really been studied so far. Second, and perhaps more importantly, the question of the generalization ability of metric learning methods has been largely ignored.In this thesis, we propose theoretical and algorithmic contributions that address these limitations. Our first contribution is the derivation of a new kernel function built from learned edit probabilities. Unlike other string kernels, it is guaranteed to be valid and parameter-free. Our second contribution is a novel framework for learning string and tree edit similarities inspired by the recent theory of (epsilon,gamma,tau)-good similarity functions and formulated as a convex optimization problem. Using uniform stability arguments, we establish theoretical guarantees for the learned similarity that give a bound on the generalization error of a linear classifier built from that similarity. In our third contribution, we extend the same ideas to metric learning from feature vectors by proposing a bilinear similarity learning method that efficiently optimizes the (epsilon,gamma,tau)-goodness. The similarity is learned based on global constraints that are more appropriate to linear classification. Generalization guarantees are derived for our approach, highlighting that our method minimizes a tighter bound on the generalization error of the classifier. Our last contribution is a framework for establishing generalization bounds for a large class of existing metric learning algorithms. It is based on a simple adaptation of the notion of algorithmic robustness and allows the derivation of bounds for various loss functions and regularizers.

APA, Harvard, Vancouver, ISO, and other styles

11

Gennari, Riccardo. "End-to-end Deep Metric Learning con Vision-Language Model per il Fashion Image Captioning." Bachelor's thesis, Alma Mater Studiorum - Università di Bologna, 2022. http://amslaurea.unibo.it/25772/.

Full text

Abstract:

L'image captioning è un task di machine learning che consiste nella generazione di una didascalia, o caption, che descriva le caratteristiche di un'immagine data in input. Questo può essere applicato, ad esempio, per descrivere in dettaglio i prodotti in vendita su un sito di e-commerce, migliorando l'accessibilità del sito web e permettendo un acquisto più consapevole ai clienti con difficoltà visive. La generazione di descrizioni accurate per gli articoli di moda online è importante non solo per migliorare le esperienze di acquisto dei clienti, ma anche per aumentare le vendite online. Oltre alla necessità di presentare correttamente gli attributi degli articoli, infatti, descrivere i propri prodotti con il giusto linguaggio può contribuire a catturare l'attenzione dei clienti. In questa tesi, ci poniamo l'obiettivo di sviluppare un sistema in grado di generare una caption che descriva in modo dettagliato l'immagine di un prodotto dell'industria della moda dato in input, sia esso un capo di vestiario o un qualche tipo di accessorio. A questo proposito, negli ultimi anni molti studi hanno proposto soluzioni basate su reti convoluzionali e LSTM. In questo progetto proponiamo invece un'architettura encoder-decoder, che utilizza il modello Vision Transformer per la codifica delle immagini e GPT-2 per la generazione dei testi. Studiamo inoltre come tecniche di deep metric learning applicate in end-to-end durante l'addestramento influenzino le metriche e la qualità delle caption generate dal nostro modello.

APA, Harvard, Vancouver, ISO, and other styles

12

Zhang, Pin. "Nonlinear Semi-supervised and Unsupervised Metric Learning with Applications in Neuroimaging." Ohio University / OhioLINK, 2018. http://rave.ohiolink.edu/etdc/view?acc_num=ohiou1525266545968548.

Full text

APA, Harvard, Vancouver, ISO, and other styles

13

Berlemont, Samuel. "Automatic non linear metric learning : Application to gesture recognition." Thesis, Lyon, 2016. http://www.theses.fr/2016LYSEI014/document.

Full text

Abstract:

Cette thèse explore la reconnaissance de gestes à partir de capteurs inertiels pour Smartphone. Ces gestes consistent en la réalisation d'un tracé dans l'espace présentant une valeur sémantique, avec l'appareil en main. Notre étude porte en particulier sur l'apprentissage de métrique entre signatures gestuelles grâce à l'architecture "Siamoise" (réseau de neurones siamois, SNN), qui a pour but de modéliser les relations sémantiques entre classes afin d'extraire des caractéristiques discriminantes. Cette architecture est appliquée au perceptron multicouche (MultiLayer Perceptron). Les stratégies classiques de formation d'ensembles d'apprentissage sont essentiellement basées sur des paires similaires et dissimilaires, ou des triplets formés d'une référence et de deux échantillons respectivement similaires et dissimilaires à cette référence. Ainsi, nous proposons une généralisation de ces approches dans un cadre de classification, où chaque ensemble d'apprentissage est composé d’une référence, un exemple positif, et un exemple négatif pour chaque classe dissimilaire. Par ailleurs, nous appliquons une régularisation sur les sorties du réseau au cours de l'apprentissage afin de limiter les variations de la norme moyenne des vecteurs caractéristiques obtenus. Enfin, nous proposons une redéfinition du problème angulaire par une adaptation de la notion de « sinus polaire », aboutissant à une analyse en composantes indépendantes non-linéaire supervisée. A l'aide de deux bases de données inertielles, la base MHAD (Multimodal Human Activity Dataset) ainsi que la base Orange, composée de gestes symboliques inertiels réalisés avec un Smartphone, les performances de chaque contribution sont caractérisées. Ainsi, des protocoles modélisant un monde ouvert, qui comprend des gestes inconnus par le système, mettent en évidence les meilleures capacités de détection et rejet de nouveauté du SNN. En résumé, le SNN proposé permet de réaliser un apprentissage supervisé de métrique de similarité non-linéaire, qui extrait des vecteurs caractéristiques discriminants, améliorant conjointement la classification et le rejet de gestes inertiels<br>As consumer devices become more and more ubiquitous, new interaction solutions are required. In this thesis, we explore inertial-based gesture recognition on Smartphones, where gestures holding a semantic value are drawn in the air with the device in hand. In our research, speed and delay constraints required by an application are critical, leading us to the choice of neural-based models. Thus, our work focuses on metric learning between gesture sample signatures using the "Siamese" architecture (Siamese Neural Network, SNN), which aims at modelling semantic relations between classes to extract discriminative features, applied to the MultiLayer Perceptron. Contrary to some popular versions of this algorithm, we opt for a strategy that does not require additional parameter fine tuning, namely a set threshold on dissimilar outputs, during training. Indeed, after a preprocessing step where the data is filtered and normalised spatially and temporally, the SNN is trained from sets of samples, composed of similar and dissimilar examples, to compute a higher-level representation of the gesture, where features are collinear for similar gestures, and orthogonal for dissimilar ones. While the original model already works for classification, multiple mathematical problems which can impair its learning capabilities are identified. Consequently, as opposed to the classical similar or dissimilar pair; or reference, similar and dissimilar sample triplet input set selection strategies, we propose to include samples from every available dissimilar classes, resulting in a better structuring of the output space. Moreover, we apply a regularisation on the outputs to better determine the objective function. Furthermore, the notion of polar sine enables a redefinition of the angular problem by maximising a normalised volume induced by the outputs of the reference and dissimilar samples, which effectively results in a Supervised Non-Linear Independent Component Analysis. Finally, we assess the unexplored potential of the Siamese network and its higher-level representation for novelty and error detection and rejection. With the help of two real-world inertial datasets, the Multimodal Human Activity Dataset as well as the Orange Dataset, specifically gathered for the Smartphone inertial symbolic gesture interaction paradigm, we characterise the performance of each contribution, and prove the higher novelty detection and rejection rate of our model, with protocols aiming at modelling unknown gestures and open world configurations. To summarise, the proposed SNN allows for supervised non-linear similarity metric learning, which extracts discriminative features, improving both inertial gesture classification and rejection

APA, Harvard, Vancouver, ISO, and other styles

14

Wetzel, Dominikus Emanuel. "Entity-based coherence in statistical machine translation : a modelling and evaluation perspective." Thesis, University of Edinburgh, 2018. http://hdl.handle.net/1842/31522.

Full text

Abstract:

Natural language documents exhibit coherence and cohesion by means of interrelated structures both within and across sentences. Sentences do not stand in isolation from each other and only a coherent structure makes them understandable and sound natural to humans. In Statistical Machine Translation (SMT) only little research exists on translating a document from a source language into a coherent document in the target language. The dominant paradigm is still one that considers sentences independently from each other. There is both a need for a deeper understanding of how to handle specific discourse phenomena, and for automatic evaluation of how well these phenomena are handled in SMT. In this thesis we explore an approach how to treat sentences as dependent on each other by focussing on the problem of pronoun translation as an instance of a discourse-related non-local phenomenon. We direct our attention to pronoun translation in the form of cross-lingual pronoun prediction (CLPP) and develop a model to tackle this problem. We obtain state-of-the-art results exhibiting the benefit of having access to the antecedent of a pronoun for predicting the right translation of that pronoun. Experiments also showed that features from the target side are more informative than features from the source side, confirming linguistic knowledge that referential pronouns need to agree in gender and number with their target-side antecedent. We show our approach to be applicable across the two language pairs English-French and English-German. The experimental setting for CLPP is artificially restricted, both to enable automatic evaluation and to provide a controlled environment. This is a limitation which does not yet allow us to test the full potential of CLPP systems within a more realistic setting that is closer to a full SMT scenario. We provide an annotation scheme, a tool and a corpus that enable evaluation of pronoun prediction in a more realistic setting. The annotated corpus consists of parallel documents translated by a state-of-the-art neural machine translation (NMT) system, where the appropriate target-side pronouns have been chosen by annotators. With this corpus, we exhibit a weakness of our current CLPP systems in that they are outperformed by a state-of-the-art NMT system in this more realistic context. This corpus provides a basis for future CLPP shared tasks and allows the research community to further understand and test their methods. The lack of appropriate evaluation metrics that explicitly capture non-local phenomena is one of the main reasons why handling non-local phenomena has not yet been widely adopted in SMT. To overcome this obstacle and evaluate the coherence of translated documents, we define a bilingual model of entity-based coherence, inspired by work on monolingual coherence modelling, and frame it as a learning-to-rank problem. We first evaluate this model on a corpus where we artificially introduce coherence errors based on typical errors CLPP systems make. This allows us to assess the quality of the model in a controlled environment with automatically provided gold coherence rankings. Results show that this model can distinguish with high accuracy between a human-authored translation and one with coherence errors, that it can also distinguish between document pairs from two corpora with different degrees of coherence errors, and that the learnt model can be successfully applied when the test set distribution of errors comes from a different one than the one from the training data, showing its generalization potentials. To test our bilingual model of coherence as a discourse-aware SMT evaluation metric, we apply it to more realistic data. We use it to evaluate a state-of-the-art NMT system against post-editing systems with pronouns corrected by our CLPP systems. For verifying our metric, we reuse our annotated parallel corpus and consider the pronoun annotations as proxy for human document-level coherence judgements. Experiments show far lower accuracy in ranking translations according to their entity-based coherence than on the artificial corpus, suggesting that the metric has difficulties generalizing to a more realistic setting. Analysis reveals that the system translations in our test corpus do not differ in their pronoun translations in almost half of the document pairs. To circumvent this data sparsity issue, and to remove the need for parameter learning, we define a score-based SMT evaluation metric which directly uses features from our bilingual coherence model.

APA, Harvard, Vancouver, ISO, and other styles

15

Kalaji, Abdul Salam. "Search-based software engineering : a search-based approach for testing from extended finite state machine (EFSM) models." Thesis, Brunel University, 2010. http://bura.brunel.ac.uk/handle/2438/4575.

Full text

Abstract:

The extended finite state machine (EFSM) is a powerful modelling approach that has been applied to represent a wide range of systems. Despite its popularity, testing from an EFSM is a substantial problem for two main reasons: path feasibility and path test case generation. The path feasibility problem concerns generating transition paths through an EFSM that are feasible and satisfy a given test criterion. In an EFSM, guards and assignments in a path‟s transitions may cause some selected paths to be infeasible. The problem of path test case generation is to find a sequence of inputs that can exercise the transitions in a given feasible path. However, the transitions‟ guards and assignments in a given path can impose difficulties when producing such data making the range of acceptable inputs narrowed down to a possibly tiny range. While search-based approaches have proven efficient in automating aspects of testing, these have received little attention when testing from EFSMs. This thesis proposes an integrated search-based approach to automatically test from an EFSM. The proposed approach generates paths through an EFSM that are potentially feasible and satisfy a test criterion. Then, it generates test cases that can exercise the generated feasible paths. The approach is evaluated by being used to test from five EFSM cases studies. The achieved experimental results demonstrate the value of the proposed approach.

APA, Harvard, Vancouver, ISO, and other styles

16

Maltbie, Nicholas. "Integrating Explainability in Deep Learning Application Development: A Categorization and Case Study." University of Cincinnati / OhioLINK, 2021. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1623169431719474.

Full text

APA, Harvard, Vancouver, ISO, and other styles

17

Fabris, Anna. "Traduzione automatica dei documenti social condivisi da malati rari." Bachelor's thesis, Alma Mater Studiorum - Università di Bologna, 2020. http://amslaurea.unibo.it/21645/.

Full text

Abstract:

Circa il 5% della popolazione mondiale è affetta da una delle oltre 6000 malattie rare oggi registrate. Il termine "raro" si riferisce quindi, non tanto alla percentuale di individui colpiti complessivamente, quanto, secondo la definizione dell’Unione Europea, a un’incidenza inferiore a un caso ogni 2000 abitanti. Le limitate conoscenze disponibili per ciascuna malattia portano i pazienti a dover ricercare informazioni in maniera autonoma, spesso mediante l’utilizzo dei social media. Questo ha dato vita a un’enorme quantità di contenuti testuali oggi in continua crescita, che è possibile analizzare efficacemente con soluzioni NLP (Natural Language Processing). La stragrande maggioranza dei tool attualmente disponibili è pensata per l’elaborazione di documenti in lingua inglese, giustificando l’importanza sia dello sviluppo di nuovi modelli multilingua che della traduzione di dataset esistenti per un maggior supporto sul piano implementativo. L’utilizzo dei traduttori automatici presenti in commercio è frequentemente non sufficiente, in quanto le traduzioni delle entità di dominio risultano imprecise, mentre sul piano del preprocessing ne è richiesta una gestione corretta. Questa tesi si pone come obbiettivo il confronto di molteplici tecniche di traduzione automatica e la proposta di un approccio che sappia gestire consapevolmente le entità di dominio tramite l’adozione di sistemi NER (Named Entity Recognition), prendendo come caso di studio un corpus di post e commenti condivisi su un gruppo Facebook di pazienti affetti da Acalasia Esofagea.

APA, Harvard, Vancouver, ISO, and other styles

18

Perrot, Michaël. "Theory and algorithms for learning metrics with controlled behaviour." Thesis, Lyon, 2016. http://www.theses.fr/2016LYSES072/document.

Full text

Abstract:

De nombreux algorithmes en Apprentissage Automatique utilisent une notion de distance ou de similarité entre les exemples pour résoudre divers problèmes tels que la classification, le partitionnement ou l'adaptation de domaine. En fonction des tâches considérées ces métriques devraient avoir des propriétés différentes mais les choisir manuellement peut-être fastidieux et difficile. Une solution naturelle est alors d'adapter automatiquement ces métriques à la tâche considérée. Il s'agit alors d'un problème connu sous le nom d'Apprentissage de Métriques et où le but est principalement de trouver les meilleurs paramètres d'une métrique respectant des contraintes spécifiques. Les approches classiques dans ce domaine se focalisent habituellement sur l'apprentissage de distances de Mahalanobis ou de similarités bilinéaires et l'une des principales limitations est le fait que le contrôle du comportement de ces métriques est souvent limité. De plus, si des travaux théoriques existent pour justifier de la capacité de généralisation des modèles appris, la plupart des approches ne présentent pas de telles garanties. Dans cette thèse nous proposons de nouveaux algorithmes pour apprendre des métriques à comportement contrôlé et nous mettons l'accent sur les propriétés théoriques de ceux-ci. Nous proposons quatre contributions distinctes qui peuvent être séparées en deux parties: (i) contrôler la métrique apprise en utilisant une métrique de référence et (ii) contrôler la transformation induite par la métrique apprise. Notre première contribution est une approche locale d'apprentissage de métriques où le but est de régresser une distance proportionnelle à la perception humaine des couleurs. Notre approche est justifiée théoriquement par des garanties en généralisation sur les métriques apprises. Dans notre deuxième contribution nous nous sommes intéressés à l'analyse théorique de l'intérêt d'utiliser une métrique de référence dans un terme de régularisation biaisé pour aider lors du processus d'apprentissage. Nous proposons d'utiliser trois cadres théoriques différents qui nous permettent de dériver trois mesures différentes de l'apport de la métrique de référence. Ces mesures nous donnent un aperçu de l'impact de la métrique de référence sur celle apprise. Dans notre troisième contribution nous proposons un algorithme d'apprentissage de métriques où la transformation induite est contrôlée. L'idée est que, plutôt que d'utiliser des contraintes de similarité et de dissimilarité, chaque exemple est associé à un point virtuel qui appartient déjà à l'espace induit par la métrique apprise. D'un point de vue théorique nous montrons que les métriques apprises de cette façon généralisent bien mais aussi que notre approche est liée à une méthode plus classique d'apprentissage de métriques basée sur des contraintes de paires. Dans notre quatrième contribution nous essayons aussi de contrôler la transformation induite par une métrique apprise. Cependant, plutôt que considérer un contrôle individuel pour chaque exemple, nous proposons une approche plus globale en forçant la transformation à suivre une transformation géométrique associée à un problème de transport optimal. D'un point de vue théorique nous proposons une discussion sur le lien entre la transformation associée à la métrique apprise et la transformation associée au problème de transport optimal. D'un point de vue plus pratique nous montrons l'intérêt de notre approche pour l'adaptation de domaine mais aussi pour l'édition d'images<br>Many Machine Learning algorithms make use of a notion of distance or similarity between examples to solve various problems such as classification, clustering or domain adaptation. Depending on the tasks considered these metrics should have different properties but manually choosing an adapted comparison function can be tedious and difficult. A natural trend is then to automatically tailor such metrics to the task at hand. This is known as Metric Learning and the goal is mainly to find the best parameters of a metric under some specific constraints. Standard approaches in this field usually focus on learning Mahalanobis distances or Bilinear similarities and one of the main limitations is that the control over the behaviour of the learned metrics is often limited. Furthermore if some theoretical works exist to justify the generalization ability of the learned models, most of the approaches do not come with such guarantees. In this thesis we propose new algorithms to learn metrics with a controlled behaviour and we put a particular emphasis on the theoretical properties of these algorithms. We propose four distinct contributions which can be separated in two parts, namely (i) controlling the metric with respect to a reference metric and (ii) controlling the underlying transformation corresponding to the learned metric. Our first contribution is a local metric learning method where the goal is to regress a distance proportional to the human perception of colors. Our approach is backed up by theoretical guarantees on the generalization ability of the learned metrics. In our second contribution we are interested in theoretically studying the interest of using a reference metric in a biased regularization term to help during the learning process. We propose to use three different theoretical frameworks allowing us to derive three different measures of goodness for the reference metric. These measures give us some insights on the impact of the reference metric on the learned one. In our third contribution we propose a metric learning algorithm where the underlying transformation is controlled. The idea is that instead of using similarity and dissimilarity constraints we associate each learning example to a so-called virtual point belonging to the output space associated with the learned metric. We theoretically show that metrics learned in this way generalize well but also that our approach is linked to a classic metric learning method based on pairs constraints. In our fourth contribution we also try to control the underlying transformation of a learned metric. However instead of considering a point-wise control we consider a global one by forcing the transformation to follow the geometrical transformation associated to an optimal transport problem. From a theoretical standpoint we propose a discussion on the link between the transformation associated with the learned metric and the transformation associated with the optimal transport problem. On a more practical side we show the interest of our approach for domain adaptation but also for a task of seamless copy in images

APA, Harvard, Vancouver, ISO, and other styles

19

Michel, Fabrice. "Multi-Modal Similarity Learning for 3D Deformable Registration of Medical Images." Phd thesis, Ecole Centrale Paris, 2013. http://tel.archives-ouvertes.fr/tel-01005141.

Full text

Abstract:

Even though the prospect of fusing images issued by different medical imagery systems is highly contemplated, the practical instantiation of it is subject to a theoretical hurdle: the definition of a similarity between images. Efforts in this field have proved successful for select pairs of images; however defining a suitable similarity between images regardless of their origin is one of the biggest challenges in deformable registration. In this thesis, we chose to develop generic approaches that allow the comparison of any two given modality. The recent advances in Machine Learning permitted us to provide innovative solutions to this very challenging problem. To tackle the problem of comparing incommensurable data we chose to view it as a data embedding problem where one embeds all the data in a common space in which comparison is possible. To this end, we explored the projection of one image space onto the image space of the other as well as the projection of both image spaces onto a common image space in which the comparison calculations are conducted. This was done by the study of the correspondences between image features in a pre-aligned dataset. In the pursuit of these goals, new methods for image regression as well as multi-modal metric learning methods were developed. The resulting learned similarities are then incorporated into a discrete optimization framework that mitigates the need for a differentiable criterion. Lastly we investigate on a new method that discards the constraint of a database of images that are pre-aligned, only requiring data annotated (segmented) by a physician. Experiments are conducted on two challenging medical images data-sets (Pre-Aligned MRI images and PET/CT images) to justify the benefits of our approach.

APA, Harvard, Vancouver, ISO, and other styles

20

Walters, Craig M. "Application of the human-machine interaction model to Multiple Attribute Task Battery (MATB): Task component interaction and the strategy paradigm." Wright State University / OhioLINK, 2012. http://rave.ohiolink.edu/etdc/view?acc_num=wright1347636464.

Full text

APA, Harvard, Vancouver, ISO, and other styles

21

Kasianenko, Stanislav. "Predicting Software Defectiveness by Mining Software Repositories." Thesis, Linnéuniversitetet, Institutionen för datavetenskap och medieteknik (DM), 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:lnu:diva-78729.

Full text

Abstract:

One of the important aims of the continuous software development process is to localize and remove all existing program bugs as fast as possible. Such goal is highly related to software engineering and defectiveness estimation. Many big companies started to store source code in software repositories as the later grew in popularity. These repositories usually include static source code as well as detailed data for defects in software units. This allows analyzing all the data without interrupting programing process. The main problem of large, complex software is impossibility to control everything manually while the price of the error can be very high. This might result in developers missing defects on testing stage and increase of maintenance cost. The general research goal is to find a way of predicting future software defectiveness with high precision. Reducing maintenance and development costs will contribute to reduce the time-to-market and increase software quality. To address the problem of estimating residual defects an approach was found to predict residual defectiveness of a software by the means of machine learning. For a prime machine learning algorithm, a regression decision tree was chosen as a simple and reliable solution. Data for this tree is extracted from static source code repository and divided into two parts: software metrics and defect data. Software metrics are formed from static code and defect data is extracted from reported issues in the repository. In addition to already reported bugs, they are augmented with unreported bugs found on “discussions” section in repository and parsed by a natural language processor. Metrics were filtered to remove ones, that were not related to defect data by applying correlation algorithm. Remaining metrics were weighted to use the most correlated combination as a training set for the decision tree. As a result, built decision tree model allows to forecast defectiveness with 89% chance for the particular product. This experiment was conducted using GitHub repository on a Java project and predicted number of possible bugs in a single file (Java class). The experiment resulted in designed method for predicting possible defectiveness from a static code of a single big (more than 1000 files) software version.

APA, Harvard, Vancouver, ISO, and other styles

22

Birch, Alexandra. "Reordering metrics for statistical machine translation." Thesis, University of Edinburgh, 2011. http://hdl.handle.net/1842/5024.

Full text

Abstract:

Natural languages display a great variety of different word orders, and one of the major challenges facing statistical machine translation is in modelling these differences. This thesis is motivated by a survey of 110 different language pairs drawn from the Europarl project, which shows that word order differences account for more variation in translation performance than any other factor. This wide ranging analysis provides compelling evidence for the importance of research into reordering. There has already been a great deal of research into improving the quality of the word order in machine translation output. However, there has been very little analysis of how best to evaluate this research. Current machine translation metrics are largely focused on evaluating the words used in translations, and their ability to measure the quality of word order has not been demonstrated. In this thesis we introduce novel metrics for quantitatively evaluating reordering. Our approach isolates the word order in translations by using word alignments. We reduce alignment information to permutations and apply standard distance metrics to compare the word order in the reference to that of the translation. We show that our metrics correlate more strongly with human judgements of word order quality than current machine translation metrics. We also show that a combined lexical and reordering metric, the LRscore, is useful for training translation model parameters. Humans prefer the output of models trained using the LRscore as the objective function, over those trained with the de facto standard translation metric, the BLEU score. The LRscore thus provides researchers with a reliable metric for evaluating the impact of their research on the quality of word order.

APA, Harvard, Vancouver, ISO, and other styles

23

Do, Cao Tri. "Apprentissage de métrique temporelle multi-modale et multi-échelle pour la classification robuste de séries temporelles par plus proches voisins." Thesis, Université Grenoble Alpes (ComUE), 2016. http://www.theses.fr/2016GREAM028/document.

Full text

Abstract:

La définition d'une métrique entre des séries temporelles est un élément important pour de nombreuses tâches en analyse ou en fouille de données, tel que le clustering, la classification ou la prédiction. Les séries temporelles présentent naturellement différentes caractéristiques, que nous appelons modalités, sur lesquelles elles peuvent être comparées, comme leurs valeurs, leurs formes ou leurs contenus fréquentielles. Ces caractéristiques peuvent être exprimées avec des délais variables et à différentes granularités ou localisations temporelles - exprimées globalement ou localement. Combiner plusieurs modalités à plusieurs échelles pour apprendre une métrique adaptée est un challenge clé pour de nombreuses applications réelles impliquant des données temporelles. Cette thèse propose une approche pour l'Apprentissage d'une Métrique Multi-modal et Multi-scale (M2TML) en vue d'une classification robuste par plus proches voisins. La solution est basée sur la projection des paires de séries temporelles dans un espace de dissimilarités, dans lequel un processus d'optimisation à vaste marge est opéré pour apprendre la métrique. La solution M2TML est proposée à la fois dans le contexte linéaire et non-linéaire, et est étudiée pour différents types de régularisation. Une variante parcimonieuse et interprétable de la solution montre le potentiel de la métrique temporelle apprise à pouvoir localiser finement les modalités discriminantes, ainsi que leurs échelles temporelles en vue de la tâche d'analyse considérée. L'approche est testée sur un vaste nombre de 30 bases de données publiques et challenging, couvrant des images, traces, données ECG, qui sont linéairement ou non-linéairement séparables. Les expériences montrent l'efficacité et le potentiel de la méthode M2TML pour la classification de séries temporelles par plus proches voisins<br>The definition of a metric between time series is inherent to several data analysis and mining tasks, including clustering, classification or forecasting. Time series data present naturally several characteristics, called modalities, covering their amplitude, behavior or frequential spectrum, that may be expressed with varying delays and at different temporal granularity and localization - exhibited globally or locally. Combining several modalities at multiple temporal scales to learn a holistic metric is a key challenge for many real temporal data applications. This PhD proposes a Multi-modal and Multi-scale Temporal Metric Learning (M2TML) approach for robust time series nearest neighbors classification. The solution is based on the embedding of pairs of time series into a pairwise dissimilarity space, in which a large margin optimization process is performed to learn the metric. The M2TML solution is proposed for both linear and non linear contexts, and is studied for different regularizers. A sparse and interpretable variant of the solution shows the ability of the learned temporal metric to localize accurately discriminative modalities as well as their temporal scales.A wide range of 30 public and challenging datasets, encompassing images, traces and ECG data, that are linearly or non linearly separable, are used to show the efficiency and the potential of M2TML for time series nearest neighbors classification

APA, Harvard, Vancouver, ISO, and other styles

24

Zantedeschi, Valentina. "A Unified View of Local Learning : Theory and Algorithms for Enhancing Linear Models." Thesis, Lyon, 2018. http://www.theses.fr/2018LYSES055/document.

Full text

Abstract:

Dans le domaine de l'apprentissage machine, les caractéristiques des données varient généralement dans l'espace des entrées : la distribution globale pourrait être multimodale et contenir des non-linéarités. Afin d'obtenir de bonnes performances, l'algorithme d'apprentissage devrait alors être capable de capturer et de s'adapter à ces changements. Même si les modèles linéaires ne parviennent pas à décrire des distributions complexes, ils sont réputés pour leur passage à l'échelle, en entraînement et en test, aux grands ensembles de données en termes de nombre d'exemples et de nombre de fonctionnalités. Plusieurs méthodes ont été proposées pour tirer parti du passage à l'échelle et de la simplicité des hypothèses linéaires afin de construire des modèles aux grandes capacités discriminatoires. Ces méthodes améliorent les modèles linéaires, dans le sens où elles renforcent leur expressivité grâce à différentes techniques. Cette thèse porte sur l'amélioration des approches d'apprentissage locales, une famille de techniques qui infère des modèles en capturant les caractéristiques locales de l'espace dans lequel les observations sont intégrées.L'hypothèse fondatrice de ces techniques est que le modèle appris doit se comporter de manière cohérente sur des exemples qui sont proches, ce qui implique que ses résultats doivent aussi changer de façon continue dans l'espace des entrées. La localité peut être définie sur la base de critères spatiaux (par exemple, la proximité en fonction d'une métrique choisie) ou d'autres relations fournies, telles que l'association à la même catégorie d'exemples ou un attribut commun. On sait que les approches locales d'apprentissage sont efficaces pour capturer des distributions complexes de données, évitant de recourir à la sélection d'un modèle spécifique pour la tâche. Cependant, les techniques de pointe souffrent de trois inconvénients majeurs :ils mémorisent facilement l'ensemble d'entraînement, ce qui se traduit par des performances médiocres sur de nouvelles données ; leurs prédictions manquent de continuité dans des endroits particuliers de l'espace ; elles évoluent mal avec la taille des ensembles des données. Les contributions de cette thèse examinent les problèmes susmentionnés dans deux directions : nous proposons d'introduire des informations secondaires dans la formulation du problème pour renforcer la continuité de la prédiction et atténuer le phénomène de la mémorisation ; nous fournissons une nouvelle représentation de l'ensemble de données qui tient compte de ses spécificités locales et améliore son évolutivité. Des études approfondies sont menées pour mettre en évidence l'efficacité de ces contributions pour confirmer le bien-fondé de leurs intuitions. Nous étudions empiriquement les performances des méthodes proposées tant sur des jeux de données synthétiques que sur des tâches réelles, en termes de précision et de temps d'exécution, et les comparons aux résultats de l'état de l'art. Nous analysons également nos approches d'un point de vue théorique, en étudiant leurs complexités de calcul et de mémoire et en dérivant des bornes de généralisation serrées<br>In Machine Learning field, data characteristics usually vary over the space: the overall distribution might be multi-modal and contain non-linearities.In order to achieve good performance, the learning algorithm should then be able to capture and adapt to these changes. Even though linear models fail to describe complex distributions, they are renowned for their scalability, at training and at testing, to datasets big in terms of number of examples and of number of features. Several methods have been proposed to take advantage of the scalability and the simplicity of linear hypotheses to build models with great discriminatory capabilities. These methods empower linear models, in the sense that they enhance their expressive power through different techniques. This dissertation focuses on enhancing local learning approaches, a family of techniques that infers models by capturing the local characteristics of the space in which the observations are embedded. The founding assumption of these techniques is that the learned model should behave consistently on examples that are close, implying that its results should also change smoothly over the space. The locality can be defined on spatial criteria (e.g. closeness according to a selected metric) or other provided relations, such as the association to the same category of examples or a shared attribute. Local learning approaches are known to be effective in capturing complex distributions of the data, avoiding to resort to selecting a model specific for the task. However, state of the art techniques suffer from three major drawbacks: they easily memorize the training set, resulting in poor performance on unseen data; their predictions lack of smoothness in particular locations of the space;they scale poorly with the size of the datasets. The contributions of this dissertation investigate the aforementioned pitfalls in two directions: we propose to introduce side information in the problem formulation to enforce smoothness in prediction and attenuate the memorization phenomenon; we provide a new representation for the dataset which takes into account its local specificities and improves scalability. Thorough studies are conducted to highlight the effectiveness of the said contributions which confirmed the soundness of their intuitions. We empirically study the performance of the proposed methods both on toy and real tasks, in terms of accuracy and execution time, and compare it to state of the art results. We also analyze our approaches from a theoretical standpoint, by studying their computational and memory complexities and by deriving tight generalization bounds

APA, Harvard, Vancouver, ISO, and other styles

25

Tataru, Augustin. "Metrics for Evaluating Machine Learning Cloud Services." Thesis, Tekniska Högskolan, Högskolan i Jönköping, JTH, Datateknik och informatik, 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:hj:diva-37882.

Full text

Abstract:

Machine Learning (ML) is nowadays being offered as a service by several cloud providers. Consumers require metrics to be able to evaluate and compare between multiple ML cloud services. There aren’t many established metrics that can be used specifically for these types of services. In this paper, the Goal-QuestionMetric paradigm is used to define a set of metrics applicable for ML cloud services. The metrics are created based on goals expressed by professionals who use or are interested in using these services. At the end, a questionnaire is used to evaluate the metrics based on two criteria: relevance and ease of use.

APA, Harvard, Vancouver, ISO, and other styles

26

Lajugie, Rémi. "Prédiction structurée pour l’analyse de données séquentielles." Thesis, Paris, Ecole normale supérieure, 2015. http://www.theses.fr/2015ENSU0024/document.

Full text

Abstract:

Dans cette thèse nous nous intéressons à des problèmes d’apprentissage automatique dans le cadre de sorties structurées avec une structure séquentielle. D’une part, nous considérons le problème de l’apprentissage de mesure de similarité pour deux tâches : (i) la détection de rupture dans des signaux multivariés et (ii) le problème de déformation temporelle entre paires de signaux. Les méthodes généralement utilisées pour résoudre ces deux problèmes dépendent fortement d’une mesure de similarité. Nous apprenons une mesure de similarité à partir de données totalement étiquetées. Nous présentons des algorithmes usuels de prédiction structuré, efficaces pour effectuer l’apprentissage. Nous validons notre approche sur des données réelles venant de divers domaines. D’autre part, nous nous intéressons au problème de la faible supervision pour la tâche d’alignement d’un enregistrement audio sur la partition jouée. Nous considérons la partition comme une représentation symbolique donnant (i) une information complète sur l’ordre des symboles et (ii) une information approximative sur la forme de l’alignement attendu. Nous apprenons un classifieur pour chaque symbole avec ces informations. Nous développons une méthode d’apprentissage fondée sur l’optimisation d’une fonction convexe. Nous démontrons la validité de l’approche sur des données musicales<br>In this manuscript, we consider structured machine learning problems and consider more precisely the ones involving sequential structure. In a first part, we consider the problem of similarity measure learning for two tasks where sequential structure is at stake: (i) the multivariate change-point detection and (ii) the time warping of pairs of time series. The methods generally used to solve these tasks rely on a similarity measure to compare timestamps. We propose to learn a similarity measure from fully labelled data, i.e., signals already segmented or pairs of signals for which the optimal time warping is known. Using standard structured prediction methods, we present algorithmically efficient ways for learning. We propose to use loss functions specifically designed for the tasks. We validate our approach on real-world data. In a second part, we focus on the problem of weak supervision, in which sequential data are not totally labeled. We focus on the problem of aligning an audio recording with its score. We consider the score as a symbolic representation giving: (i) a complete information about the order of events or notes played and (ii) an approximate idea about the expected shape of the alignment. We propose to learn a classifier for each note using this information. Our learning problem is based onthe optimization of a convex function that takes advantage of the weak supervision and of the sequential structure of data. Our approach is validated through experiments on the task of audio-to-score on real musical data

APA, Harvard, Vancouver, ISO, and other styles

27

Forssell, Melker, and Gustav Janér. "Product Matching Using Image Similarity." Thesis, Uppsala universitet, Institutionen för informationsteknologi, 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-413481.

Full text

Abstract:

PriceRunner is an online shopping comparison company. To maintain up-todate prices, PriceRunner has to process large amounts of data every day. The processing of the data includes matching unknown products, referred to as offers, to known products. Offer data includes information about the product such as: title, description, price and often one image of the product. PriceRunner has previously implemented a textual-based machine learning (ML) model, but is also looking for new approaches to complement the current product matching system. The objective of this master’s thesis is to investigate the potential of using an image-based ML model for product matching. Our method uses a similarity learning approach where the network learns to recognise the similarity between images. To achieve this, a siamese neural network was trained with the triplet loss function. The network is trained to map similar images closer together and dissimilar images further apart in a vector space. This approach is often used for face recognition, where there is an extensive amount of classes and a limited amount of images per class, and new classes are frequently added. This is also the case for the image data used in this thesis project. A general model was trained on images from the Clothing and Accessories hierarchy, one of the 16 toplevel hierarchies at PriceRunner, consisting of 17 product categories. The results varied between each product category. Some categories proved to be less suitable for image-based classification while others excelled. The model handles new classes relatively well without any, or with briefer, retraining. It was concluded that there is potential in using images to complement the current product matching system at PriceRunner.

APA, Harvard, Vancouver, ISO, and other styles

28

Muzellec, Boris. "Leveraging regularization, projections and elliptical distributions in optimal transport." Electronic Thesis or Diss., Institut polytechnique de Paris, 2020. http://www.theses.fr/2020IPPAG009.

Full text

Abstract:

Pouvoir manipuler et de comparer de mesures de probabilité est essentiel pour de nombreuses applications en apprentissage automatique. Le transport optimal (TO) définit des divergences entre distributions fondées sur la géométrie des espaces sous-jacents : partant d'une fonction de coût définie sur l'espace dans lequel elles sont supportées, le TO consiste à trouver un couplage entre les deux mesures qui soit optimal par rapport à ce coût. Par son ancrage géométrique, le TO est particulièrement bien adapté au machine learning, et fait l'objet d'une riche théorie mathématique. En dépit de ces avantages, l'emploi du TO pour les sciences des données a longtemps été limité par les difficultés mathématiques et computationnelles liées au problème d'optimisation sous-jacent. Pour contourner ce problème, une approche consiste à se concentrer sur des cas particuliers admettant des solutions en forme close, ou pouvant se résoudre efficacement. En particulier, le TO entre mesures elliptiques constitue l'un des rares cas pour lesquels le TO admet une forme close, définissant la géométrie de Bures-Wasserstein (BW). Cette thèse s'appuie tout particulièrement sur la géométrie de BW, dans le but de l'utiliser comme outil de base pour des applications en sciences des données. Pour ce faire, nous considérons des situations dans lesquelles la géométrie de BW est tantôt utilisée comme un outil pour l'apprentissage de représentations, étendue à partir de projections sur des sous-espaces, ou régularisée par un terme entropique. Dans une première contribution, la géométrie de BW est utilisée pour définir des plongements sous la forme de distributions elliptiques, étendant la représentation classique sous forme de vecteurs de R^d. Dans une deuxième contribution, nous prouvons l'existence de transports qui extrapolent des applications restreintes à des projections en faible dimension, et montrons que ces plans "sous-espace optimaux" admettent des formes closes dans le cas de mesures gaussiennes. La troisième contribution de cette thèse consiste à obtenir des formes closes pour le transport entropique entre des mesures gaussiennes non-normalisées, qui constituent les premières expressions non triviales pour le transport entropique. Finalement, dans une dernière contribution nous utilisons le transport entropique pour imputer des données manquantes de manière non-paramétrique, tout en préservant les distributions sous-jacentes<br>Comparing and matching probability distributions is a crucial in numerous machine learning (ML) algorithms. Optimal transport (OT) defines divergences between distributions that are grounded on geometry: starting from a cost function on the underlying space, OT consists in finding a mapping or coupling between both measures that is optimal with respect to that cost. The fact that OT is deeply grounded in geometry makes it particularly well suited to ML. Further, OT is the object of a rich mathematical theory. Despite those advantages, the applications of OT in data sciences have long been hindered by the mathematical and computational complexities of the underlying optimization problem. To circumvent these issues, one approach consists in focusing on particular cases that admit closed-form solutions or that can be efficiently solved. In particular, OT between elliptical distributions is one of the very few instances for which OT is available in closed form, defining the so-called Bures-Wasserstein (BW) geometry. This thesis builds extensively on the BW geometry, with the aim to use it as basic tool in data science applications. To do so, we consider settings in which it is alternatively employed as a basic tool for representation learning, enhanced using subspace projections, and smoothed further using entropic regularization. In a first contribution, the BW geometry is used to define embeddings as elliptical probability distributions, extending on the classical representation of data as vectors in R^d.In the second contribution, we prove the existence of transportation maps and plans that extrapolate maps restricted to lower-dimensional projections, and show that subspace-optimal plans admit closed forms in the case of Gaussian measures.Our third contribution consists in deriving closed forms for entropic OT between Gaussian measures scaled with a varying total mass, which constitute the first non-trivial closed forms for entropic OT and provide the first continuous test case for the study of entropic OT. Finally, in a last contribution, entropic OT is leveraged to tackle missing data imputation in a non-parametric and distribution-preserving way

APA, Harvard, Vancouver, ISO, and other styles

29

Lew, Ning. "A testing metric for designs modelled as hierarchical finite-state machines." Thesis, National Library of Canada = Bibliothèque nationale du Canada, 2001. http://www.collectionscanada.ca/obj/s4/f2/dsk3/ftp04/MQ58476.pdf.

Full text

APA, Harvard, Vancouver, ISO, and other styles

30

Rossholm, Andreas. "On Enhancement and Quality Assessment of Audio and Video in Communication Systems." Doctoral thesis, Blekinge Tekniska Högskola, Institutionen för tillämpad signalbehandling, 2014. http://urn.kb.se/resolve?urn=urn:nbn:se:bth-00604.

Full text

Abstract:

The use of audio and video communication has increased exponentially over the last decade and has gone from speech over GSM to HD resolution video conference between continents on mobile devices. As the use becomes more widespread the interest in delivering high quality media increases even on devices with limited resources. This includes both development and enhancement of the communication chain but also the topic of objective measurements of the perceived quality. The focus of this thesis work has been to perform enhancement within speech encoding and video decoding, to measure influence factors of audio and video performance, and to build methods to predict the perceived video quality. The audio enhancement part of this thesis addresses the well known problem in the GSM system with an interfering signal generated by the switching nature of TDMA cellular telephony. Two different solutions are given to suppress such interference internally in the mobile handset. The first method involves the use of subtractive noise cancellation employing correlators, the second uses a structure of IIR notch filters. Both solutions use control algorithms based on the state of the communication between the mobile handset and the base station. The video enhancement part presents two post-filters. These two filters are designed to improve visual quality of highly compressed video streams from standard, block-based video codecs by combating both blocking and ringing artifacts. The second post-filter also performs sharpening. The third part addresses the problem of measuring audio and video delay as well as skewness between these, also known as synchronization. This method is a black box technique which enables it to be applied on any audiovisual application, proprietary as well as open standards, and can be run on any platform and over any network connectivity. The last part addresses no-reference (NR) bitstream video quality prediction using features extracted from the coded video stream. Several methods have been used and evaluated: Multiple Linear Regression (MLR), Artificial Neural Network (ANN), and Least Square Support Vector Machines (LS-SVM), showing high correlation with both MOS and objective video assessment methods as PSNR and PEVQ. The impact from temporal, spatial and quantization variations on perceptual video quality has also been addressed, together with the trade off between these, and for this purpose a set of locally conducted subjective experiments were performed.

APA, Harvard, Vancouver, ISO, and other styles

31

Morales, Aguirre Marco Antonio. "Metrics for sampling-based motion planning." [College Station, Tex. : Texas A&M University, 2007. http://hdl.handle.net/1969.1/ETD-TAMU-2462.

Full text

APA, Harvard, Vancouver, ISO, and other styles

32

Liljeson, Mattias, and Alexander Mohlin. "Software defect prediction using machine learning on test and source code metrics." Thesis, Blekinge Tekniska Högskola, Institutionen för kreativa teknologier, 2014. http://urn.kb.se/resolve?urn=urn:nbn:se:bth-4162.

Full text

Abstract:

Context. Software testing is the process of finding faults in software while executing it. The results of the testing are used to find and correct faults. Software defect prediction estimates where faults are likely to occur in source code. The results from the defect prediction can be used to opti- mize testing and ultimately improve software quality. Machine learning, that concerns computer programs learning from data, is used to build pre- diction models which then can be used to classify data. Objectives. In this study we, in collaboration with Ericsson, investigated whether software metrics from source code files combined with metrics from their respective tests predicts faults with better prediction perfor- mance compared to using only metrics from the source code files. Methods. A literature review was conducted to identify inputs for an ex- periment. The experiment was applied on one repository from Ericsson to identify the best performing set of metrics. Results. The prediction performance results of three metric sets are pre- sented and compared with each other. Wilcoxon’s signed rank tests are performed on four different performance measures for each metric set and each machine learning algorithm to demonstrate significant differences of the results. Conclusions. We conclude that metrics from tests can be used to predict faults. However, the combination of source code metrics and test metrics do not outperform using only source code metrics. Moreover, we conclude that models built with metrics from the test metric set with minimal infor- mation of the source code can in fact predict faults in the source code.

APA, Harvard, Vancouver, ISO, and other styles

33

Miloš, Radovanović. "High-Dimensional Data Representations and Metrics for Machine Learning and Data Mining." Phd thesis, Univerzitet u Novom Sadu, Prirodno-matematički fakultet u Novom Sadu, 2011. https://www.cris.uns.ac.rs/record.jsf?recordId=77530&source=NDLTD&language=en.

Full text

Abstract:

In the current information age, massive amounts of data are gathered, at a rate prohibiting their effective structuring, analysis, and conversion into useful knowledge. This information overload is manifested both in large numbers of data objects recorded in data sets, and large numbers of attributes, also known as high dimensionality. This dis-sertation deals with problems originating from high dimensionality of data representation, referred to as the “curse of dimensionality,” in the context of machine learning, data mining, and information retrieval. The described research follows two angles: studying the behavior of (dis)similarity metrics with increasing dimensionality, and exploring feature-selection methods, primarily with regard to document representation schemes for text classification. The main results of the dissertation, relevant to the first research angle, include theoretical insights into the concentration behavior of cosine similarity, and a detailed analysis of the phenomenon of hubness, which refers to the tendency of some points in a data set to become hubs by being in-cluded in unexpectedly many <em>k</em>-nearest neighbor lists of other points. The mechanisms behind the phenomenon are studied in detail, both from a theoretical and empirical perspective, linking hubness with the (intrinsic) dimensionality of data, describing its interaction with the cluster structure of data and the information provided by class la-bels, and demonstrating the interplay of the phenomenon and well known algorithms for classification, semi-supervised learning, clustering, and outlier detection, with special consideration being given to time-series classification and information retrieval. Results pertaining to the second research angle include quantification of the interaction between various transformations of high-dimensional document representations, and feature selection, in the context of text classification.<br>U tekućem &bdquo;informatičkom dobu“, masivne količine podataka sesakupljaju brzinom koja ne dozvoljava njihovo efektivno strukturiranje,analizu, i pretvaranje u korisno znanje. Ovo zasićenje informacijamase manifestuje kako kroz veliki broj objekata uključenihu skupove podataka, tako i kroz veliki broj atributa, takođe poznatkao velika dimenzionalnost. Disertacija se bavi problemima kojiproizilaze iz velike dimenzionalnosti reprezentacije podataka, čestonazivanim &bdquo;prokletstvom dimenzionalnosti“, u kontekstu ma&scaron;inskogučenja, data mining-a i information retrieval-a. Opisana istraživanjaprate dva pravca: izučavanje pona&scaron;anja metrika (ne)sličnosti u odnosuna rastuću dimenzionalnost, i proučavanje metoda odabira atributa,prvenstveno u interakciji sa tehnikama reprezentacije dokumenata zaklasifikaciju teksta. Centralni rezultati disertacije, relevantni za prvipravac istraživanja, uključuju teorijske uvide u fenomen koncentracijekosinusne mere sličnosti, i detaljnu analizu fenomena habovitosti kojise odnosi na tendenciju nekih tačaka u skupu podataka da postanuhabovi tako &scaron;to bivaju uvr&scaron;tene u neočekivano mnogo lista k najbližihsuseda ostalih tačaka. Mehanizmi koji pokreću fenomen detaljno suproučeni, kako iz teorijske tako i iz empirijske perspektive. Habovitostje povezana sa (latentnom) dimenzionalno&scaron;ću podataka, opisanaje njena interakcija sa strukturom klastera u podacima i informacijamakoje pružaju oznake klasa, i demonstriran je njen efekat napoznate algoritme za klasifikaciju, semi-supervizirano učenje, klasteringi detekciju outlier-a, sa posebnim osvrtom na klasifikaciju vremenskihserija i information retrieval. Rezultati koji se odnose nadrugi pravac istraživanja uključuju kvantifikaciju interakcije izmeđurazličitih transformacija vi&scaron;edimenzionalnih reprezentacija dokumenatai odabira atributa, u kontekstu klasifikacije teksta.

APA, Harvard, Vancouver, ISO, and other styles

34

Qamar, Ali Mustafa. "Mesures de similarité et cosinus généralisé : une approche d'apprentissage supervisé fondée sur les k plus proches voisins." Phd thesis, Grenoble, 2010. http://www.theses.fr/2010GRENM083.

Full text

Abstract:

Les performances des algorithmes d'apprentissage automatique dépendent de la métrique utilisée pour comparer deux objets, et beaucoup de travaux ont montré qu'il était préférable d'apprendre une métrique à partir des données plutôt que se reposer sur une métrique simple fondée sur la matrice identité. Ces résultats ont fourni la base au domaine maintenant qualifié d'apprentissage de métrique. Toutefois, dans ce domaine, la très grande majorité des développements concerne l'apprentissage de distances. Toutefois, dans certaines situations, il est préférable d'utiliser des similarités (par exemple le cosinus) que des distances. Il est donc important, dans ces situations, d'apprendre correctement les métriques à la base des mesures de similarité. Il n'existe pas à notre connaissance de travaux complets sur le sujet, et c'est une des motivations de cette thèse. Dans le cas des systèmes de filtrage d'information où le but est d'affecter un flot de documents à un ou plusieurs thèmes prédéfinis et où peu d'information de supervision est disponible, des seuils peuvent être appris pour améliorer les mesures de similarité standard telles que le cosinus. L'apprentissage de tels seuils représente le premier pas vers un apprentissage complet des mesures de similarité. Nous avons utilisé cette stratégie au cours des campagnes CLEF INFILE 2008 et 2009, en proposant des versions en ligne et batch de nos algorithmes. Cependant, dans le cas où l'on dispose de suffisamment d'information de supervision, comme en catégorisation, il est préférable d'apprendre des métriques complètes, et pas seulement des seuils. Nous avons développé plusieurs algorithmes qui visent à ce but dans le cadre de la catégorisation à base de k plus proches voisins. Nous avons tout d'abord développé un algorithme, SiLA, qui permet d'apprendre des similarités non contraintes (c'est-à-dire que la mesure peut être symétrique ou non). SiLA est une extension du perceptron par vote et permet d'apprendre des similarités qui généralisent le cosinus, ou les coefficients de Dice ou de Jaccard. Nous avons ensuite comparé SiLA avec RELIEF, un algorithme standard de re-pondération d'attributs, dont le but n'est pas sans lien avec l'apprentissage de métrique. En effet, il a récemment été suggéré par Sun et Wu que RELIEF pouvait être considéré comme un algorithme d'apprentissage de métrique avec pour fonction objectif une approximation de la fonction de perte 0-1. Nous montrons ici que cette approximation est relativement mauvaise et peut être avantageusement remplacée par une autre, qui conduit à un algorithme dont les performances sont meilleures. Nous nous sommes enfin intéressés à une extension directe du cosinus, extension définie comme la forme normalisée d'un produit scalaire dans un espace projeté. Ce travail a donné lieu à l'algorithme gCosLA. Nous avons testé tous nos algorithmes sur plusieurs bases de données. Un test statistique, le s-test, est utilisé pour déterminer si les différences entre résultats sont significatives ou non. GCosLA est l'algorithme qui a fourni les meilleurs résultats. De plus, SiLA et gCosLA se comparent avantageusement à plusieurs algorithmes standard, ce qui illustre leur bien fondé<br>Almost all machine learning problems depend heavily on the metric used. Many works have proved that it is a far better approach to learn the metric structure from the data rather than assuming a simple geometry based on the identity matrix. This has paved the way for a new research theme called metric learning. Most of the works in this domain have based their approaches on distance learning only. However some other works have shown that similarity should be preferred over distance metrics while dealing with textual datasets as well as with non-textual ones. Being able to efficiently learn appropriate similarity measures, as opposed to distances, is thus of high importance for various collections. If several works have partially addressed this problem for different applications, no previous work is known which has fully addressed it in the context of learning similarity metrics for kNN classification. This is exactly the focus of the current study. In the case of information filtering systems where the aim is to filter an incoming stream of documents into a set of predefined topics with little supervision, cosine based category specific thresholds can be learned. Learning such thresholds can be seen as a first step towards learning a complete similarity measure. This strategy was used to develop Online and Batch algorithms for information filtering during the INFILE (Information Filtering) track of the CLEF (Cross Language Evaluation Forum) campaign during the years 2008 and 2009. However, provided enough supervised information is available, as is the case in classification settings, it is usually beneficial to learn a complete metric as opposed to learning thresholds. To this end, we developed numerous algorithms for learning complete similarity metrics for kNN classification. An unconstrained similarity learning algorithm called SiLA is developed in which case the normalization is independent of the similarity matrix. SiLA encompasses, among others, the standard cosine measure, as well as the Dice and Jaccard coefficients. SiLA is an extension of the voted perceptron algorithm and allows to learn different types of similarity functions (based on diagonal, symmetric or asymmetric matrices). We then compare SiLA with RELIEF, a well known feature re-weighting algorithm. It has recently been suggested by Sun and Wu that RELIEF can be seen as a distance metric learning algorithm optimizing a cost function which is an approximation of the 0-1 loss. We show here that this approximation is loose, and propose a stricter version closer to the the 0-1 loss, leading to a new, and better, RELIEF-based algorithm for classification. We then focus on a direct extension of the cosine similarity measure, defined as a normalized scalar product in a projected space. The associated algorithm is called generalized Cosine simiLarity Algorithm (gCosLA). All of the algorithms are tested on many different datasets. A statistical test, the s-test, is employed to assess whether the results are significantly different. GCosLA performed statistically much better than SiLA on many of the datasets. Furthermore, SiLA and gCosLA were compared with many state of the art algorithms, illustrating their well-foundedness

APA, Harvard, Vancouver, ISO, and other styles

35

Boyapati, Sai Nikhil, and Ramesh Mummidi. "Predicting sales using Machine Learning Techniques." Thesis, Blekinge Tekniska Högskola, Institutionen för datavetenskap, 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:bth-20237.

Full text

APA, Harvard, Vancouver, ISO, and other styles

36

Damacharla, Praveen Lakshmi Venkata Naga. "Simulation Studies and Benchmarking of Synthetic Voice Assistant Based Human-Machine Teams (HMT)." University of Toledo / OhioLINK, 2018. http://rave.ohiolink.edu/etdc/view?acc_num=toledo1535119916261581.

Full text

APA, Harvard, Vancouver, ISO, and other styles

37

Alsterman, Marcus, and Maximilian Karlström. "Evaluation of Machine Learning Methods for Predicting Client Metrics for a Telecom Service." Thesis, KTH, Skolan för teknikvetenskap (SCI), 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-214733.

Full text

Abstract:

A video streaming service faces several difficultiesoperating. Hardware is expensive and it is crucial to prioritizecustomers in a way that will make them content with the serviceprovided. That is, deliver a sufficient frame rate and neverallocate too much, essentially waste, resources on a client. Thisallocation has to be done several times per second so readingdata from the client is out of the question, because the systemwould be adapting too slow. This raises the question whether it ispossible to predict the frame rate of a client using only variablesmeasured on the server and if it can be done efficiently. Which itcan [1]. To further build on the work of Yanggratoke et al [1], weevaluated several different machine learning methods on a dataset in terms of performance, training time and dependence on thesize of the data set. Neural networks, having the best adaptingcapabilities, resulted in the best performance but training is moretime consuming than for the linear model. Using neural networksis a good idea when the relationship between input and outputis not linear.

APA, Harvard, Vancouver, ISO, and other styles

38

Forte, Paolo. "Predicting Service Metrics from Device and Network Statistics." Thesis, KTH, Kommunikationsnät, 2015. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-175892.

Full text

Abstract:

For an IT company that provides a service over the Internet like Facebook or Spotify, it is very important to provide a high quality of service; however, predicting the quality of service is generally a hard task. The goal of this thesis is to investigate whether an approach that makes use of statistical learning to predict the quality of service can obtain accurate predictions for a Voldemort key-value store [1] in presence of dynamic load patterns and network statistics. The approach follows the idea that the service-level metrics associated with the quality of service can be estimated from serverside statistical observations, like device and network statistics. The advantage of the approach analysed in this thesis is that it can virtually work with any kind of service, since it is based only on device and network statistics, which are unaware of the type of service provided. The approach is structured as follows. During the service operations, a large amount of device statistics from the Linux kernel of the operating system (e.g. cpu usage level, disk activity, interrupts rate) and some basic end-to-end network statistics (e.g. average round-trip-time, packet loss rate) are periodically collected on the service platform. At the same time, some service-level metrics (e.g. average reading time, average writing time, etc.) are collected on the client machine as indicators of the store’s quality of service. To emulate network statistics, such as dynamic delay and packet loss, all the traffic is redirected to flow through a network emulator. Then, different types of statistical learning methods, based on linear and tree-based regression algorithms, are applied to the data collections to obtain a learning model able to accurately predict the service-level metrics from the device and network statistics. The results, obtained for different traffic scenarios and configurations, show that the thesis’ approach can find learning models that can accurately predict the service-level metrics for a single-node store with error rates lower than 20% (NMAE), even in presence of network impairments.

APA, Harvard, Vancouver, ISO, and other styles

39

Jiang, Zuoying. "Predicting Service Metrics from Device Statistics in a Container-Based Environment." Thesis, KTH, Kommunikationsnät, 2015. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-175889.

Full text

Abstract:

Service assurance is critical for high-demand services running on telecom clouds. While service performance metrics may not always be available in real time to telecom operators or service providers, service performance prediction becomes an important building block for such a system. However, it is generally hard to achieve. In this master thesis, we propose a machine-learning based method that enables performance prediction for services running in virtualized environments with Docker containers. This method is service agnostic and the prediction models built by this method use only device statistics collected from the server machine and from the containers hosted on it to predict the values of the service-level metrics experienced on the client side. The evaluation results from the testbed, which runs a Video-on-Demand service using containerized servers, show that such a method can accurately predict different service-level metrics under various scenarios and, by applying suitable preprocessing techniques, the performance of the prediction models can be further improved. In this thesis, we also show the design of a proof-of-concept of a Real-Time Analytics Engine that uses online learning methods to predict the service-level metrics in real time in a container-based environment.

APA, Harvard, Vancouver, ISO, and other styles

40

François, Damien. "High-dimensional data analysis : optimal metrics and feature selection." Université catholique de Louvain, 2007. http://edoc.bib.ucl.ac.be:81/ETD-db/collection/available/BelnUcetd-01152007-162739/.

Full text

Abstract:

High-dimensional data are everywhere: texts, sounds, spectra, images, etc. are described by thousands of attributes. However, many data analysis tools at disposal (coming from statistics, artificial intelligence, etc.) were designed for low-dimensional data. Many of the explicit or implicit assumptions made while developing the classical data analysis tools are not transposable to high-dimensional data. For instance, many tools rely on the Euclidean distance, to compare data elements. But the Euclidean distance concentrates in high-dimensional spaces: all distances between data elements seem identical. The Euclidean distance is furthermore incapable of identifying important attributes from irrelevant ones. This thesis therefore focuses the choice of a relevant distance function to compare high-dimensional data and the selection of the relevant attributes. In Part One of the thesis, the phenomenon of the concentration of the distances is considered, and its consequences on data analysis tools are studied. It is shown that for nearest neighbours search, the Euclidean distance and the Gaussian kernel, both heavily used, may not be appropriate; it is thus proposed to use Fractional metrics and Generalised Gaussian kernels. Part Two of this thesis focuses on the problem of feature selection in the case of a large number of initial features. Two methods are proposed to (1) reduce the computational burden of feature selection process and (2) cope with the instability induced by high correlation between features that often appear with high-dimensional data. Most of the concepts studied and presented in this thesis are illustrated on chemometric data, and more particularly on spectral data, with the objective of inferring a physical or chemical property of a material by analysis the spectrum of the light it reflects.

APA, Harvard, Vancouver, ISO, and other styles

41

Mao, Yida. "A metrics based detection of reusable object-oriented software components using machine learning algorithm." Thesis, National Library of Canada = Bibliothèque nationale du Canada, 1999. http://www.collectionscanada.ca/obj/s4/f2/dsk1/tape7/PQDD_0028/MQ50828.pdf.

Full text

APA, Harvard, Vancouver, ISO, and other styles

42

Mao, Yida 1972. "A metrics based detection of reusable object-oriented software components using machine learning algorithm /." Thesis, McGill University, 1999. http://digitool.Library.McGill.CA:80/R/?func=dbin-jump-full&object_id=21601.

Full text

Abstract:

Since the emergence of the object technology, organizations have accumulated a tremendous amount of object-oriented (OO) code. Instead of continuing to recreate components similar to existing artifacts, and considering the rising costs of development, many organizations would like to decrease software development costs and cycle time by reusing existing OO components. The difficulty of finding reusable components is that reuse is a complex and thus less quantifiable measure. In this research, we first proposed three reuse hypotheses about the impact of three internal characteristics (inheritance, coupling, and complexity) of OO software artifacts on reusability. Corresponding metrics suites were then selected and extracted. We used C4.5, a machine learning algorithm, to build predictive models from the learning data set that we have obtained from a medium sized software system developed in C++. Each predictive models was then verified according to its completeness, correctness and global accuracy. The verification results proved that the proposed hypotheses were correct. The uniqueness of this research work is that we have combined the state of the art of three different subjects (reuse detection and prediction, OO metrics and their extraction, and applied machine learning algorithm) to form a process of finding interesting properties of OO software components that affect reusability.

APA, Harvard, Vancouver, ISO, and other styles

43

Larsson, Martin, and Samuel Ljungberg. "Readability: Man and Machine : Using readability metrics to predict results from unsupervised sentiment analysis." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-301842.

Full text

Abstract:

Readability metrics assess the ease with which human beings read and understand written texts. With the advent of machine learning techniques that allow computers to also analyse text, this provides an interesting opportunity to investigate whether readability metrics can be used to inform on the ease with which machines understand texts. To that end, the specific machine analysed in this paper uses word embeddings to conduct unsupervised sentiment analysis. This specification minimises the need for labelling and human intervention, thus relying heavily on the machine instead of the human. Across two different datasets, sentiment predictions are made using Google’s Word2Vec word embedding algorithm, and are evaluated to produce a dichotomous output variable per sentiment. This variable, representing whether a prediction is correct or not, is then used as the dependent variable in a logistic regression with 17 readability metrics as independent variables. The resulting model has high explanatory power and the effects of readability metrics on the results from the sentiment analysis are mostly statistically significant. However, metrics affect sentiment classification in the two datasets differently, indicating that the metrics are expressions of linguistic behaviour unique to the datasets. The implication of the findings is that readability metrics could be used directly in sentiment classification models to improve modelling accuracy. Moreover, the results also indicate that machines are able to pick up on information that human beings do not pick up on, for instance that certain words are associated with more positive or negative sentiments.<br>Läsbarhetsmått bedömer hur lätt eller svårt det är för människor att läsa och förstå skrivna texter. Eftersom nya maskininlärningstekniker har utvecklats kan datorer numera också analysera texter. Därför är en intressant infallsvinkel huruvida läsbarhetsmåtten också kan användas för att bedöma hur lätt eller svårt det är för maskiner att förstå texter. Mot denna bakgrund använder den specifika maskinen i denna uppsats ordinbäddningar i syfte att utföra oövervakad sentimentanalys. Således minimeras behovet av etikettering och mänsklig handpåläggning, vilket resulterar i en mer djupgående analys av maskinen istället för människan. I två olika dataset jämförs rätt svar mot sentimentförutsägelser från Googles ordinbäddnings-algoritm Word2Vec för att producera en binär utdatavariabel per sentiment. Denna variabel, som representerar om en förutsägelse är korrekt eller inte, används sedan som beroende variabel i en logistisk regression med 17 olika läsbarhetsmått som oberoende variabler. Den resulterande modellen har högt förklaringsvärde och effekterna av läsbarhetsmåtten på resultaten från sentimentanalysen är mestadels statistiskt signifikanta. Emellertid är effekten på klassificeringen beroende på dataset, vilket indikerar att läsbarhetsmåtten ger uttryck för olika lingvistiska beteenden som är unika till datamängderna. Implikationen av resultaten är att läsbarhetsmåtten kan användas direkt i modeller som utför sentimentanalys för att förbättra deras prediktionsförmåga. Dessutom indikerar resultaten också att maskiner kan plocka upp på information som människor inte kan, exempelvis att vissa ord är associerade med positiva eller negativa sentiment.

APA, Harvard, Vancouver, ISO, and other styles

44

Tang, Chen. "Forecasting Service Metrics for Network Services." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-284505.

Full text

Abstract:

As the size and complexity of the internet increased dramatically in recent years,the burden of network service management also became heavier. The need foran intelligent way for data analysis and forecasting becomes urgent. The wideimplementation of machine learning and data analysis methods provides a newway to analyze large amounts of data.In this project, I study and evaluate data forecasting methods using machinelearning techniques and time series analysis methods on data collected fromthe KTH testbed. Comparing different methods with respect to accuracy andcomputing overhead I propose the best method for data forecasting for differentscenarios.The results show that machine learning techniques using regression can achievebetter performance with higher accuracy and smaller computing overhead. Timeseries data analysis methods have relatively lower accuracy, and the computingoverhead is much higher than machine learning techniques on the datasetsevaluated in this project.<br>Eftersom storleken och komplexiteten på internet har ökat dramatiskt under de senaste åren så har belastningen av nätverkshantering också blivit tyngre. Behovet av ett intelligent sätt för dataanalys och prognos blir brådskande. Den breda implementeringen av maskininlärningsmetoder och dataanalysmetoder ger ett nytt sätt att analysera stora mängder data.I detta projekt studerar och utvärderar jag dataprognosmetoder med hjälp av maskininlärningstekniker och analyser av tidsserier som samlats in från KTHtestbädden. Baserat på jämförelse av olika metoder med avseende på noggrannhet och beräkningskostnader, så föreslår jag föreslår den bästa metoden för dataprognoser för olika scenarier.Resultaten visar att maskininlärningstekniker som använder regression kan uppnå bättre prestanda med högre noggrannhet och mindre datoromkostnader. Metoderför dataanalys av tidsserier har relativt lägre noggrannhet, och beräkningsomkostnaderna är mycket högre än maskininlärningstekniker på de datauppsättningar som utvärderatsi detta projekt.

APA, Harvard, Vancouver, ISO, and other styles

45

Bäck, Jesper. "Domain similarity metrics for predicting transfer learning performance." Thesis, Linköpings universitet, Interaktiva och kognitiva system, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-153747.

Full text

Abstract:

The lack of training data is a common problem in machine learning. One solution to thisproblem is to use transfer learning to remove or reduce the requirement of training data.Selecting datasets for transfer learning can be difficult however. As a possible solution, thisstudy proposes the domain similarity metrics document vector distance (DVD) and termfrequency-inverse document frequency (TF-IDF) distance. DVD and TF-IDF could aid inselecting datasets for good transfer learning when there is no data from the target domain.The simple metric, shared vocabulary, is used as a baseline to check whether DVD or TF-IDF can indicate a better choice for a fine-tuning dataset. SQuAD is a popular questionanswering dataset which has been proven useful for pre-training models for transfer learn-ing. The results were therefore measured by pre-training a model on the SQuAD datasetand fine-tuning on a selection of different datasets. The proposed metrics were used tomeasure the similarity between the datasets to see whether there was a correlation betweentransfer learning effect and similarity. The results found a clear relation between a smalldistance according to the DVD metric and good transfer learning. This could prove usefulfor a target domain without training data, a model could be trained on a big dataset andfine-tuned on a small dataset that is very similar to the target domain. It was also foundthat even small amount of training data from the target domain can be used to fine-tune amodel pre-trained on another domain of data, achieving better performance compared toonly training on data from the target domain.

APA, Harvard, Vancouver, ISO, and other styles

46

Lins, Isis Didier. "Models for quantifying risk and reliability metrics via metaheuristics and support vector machines." Universidade Federal de Pernambuco, 2013. https://repositorio.ufpe.br/handle/123456789/12936.

Full text

Abstract:

Submitted by Daniella Sodre (daniella.sodre@ufpe.br) on 2015-04-10T16:15:19Z No. of bitstreams: 2 dscidl.pdf: 3672005 bytes, checksum: 16e2ea719e96351a648acbff70be2fb0 (MD5) license_rdf: 1232 bytes, checksum: 66e71c371cc565284e70f40736c94386 (MD5)<br>Made available in DSpace on 2015-04-10T16:15:19Z (GMT). No. of bitstreams: 2 dscidl.pdf: 3672005 bytes, checksum: 16e2ea719e96351a648acbff70be2fb0 (MD5) license_rdf: 1232 bytes, checksum: 66e71c371cc565284e70f40736c94386 (MD5) Previous issue date: 2013-02-27<br>CNPq<br>Nesse trabalho são desenvolvidos modelos de quantificação de métricas de risco e confiabilidade para sistemas em diferentes etapas do ciclo de vida. Para sistemas na fase de projeto, um Algoritmo Genético Multiobjetivo (MOGA) é combinado à Simulação Discreta de Eventos (DES) a fim de prover configurações não-dominadas com relação à disponibilidade e ao custo. O MOGA + DES proposto incorpora Processos de Renovação Generalizados para modelagem de reparos imperfeitos e também indica o número ótimo de equipes de manutenção. Para a fase operacional é proposto um hibridismo entre MOGA e Inspeção Baseada no Risco para elaboração de planos de inspeção não-dominados em termos de risco e custo que atendem às normas locais. Regressão via Support Vector Machines (SVR) é aplicada nos casos em que a métrica relacionada à confiabilidade (variável resposta) de um sistema operacional é função de variáveis ambientais e operacionais com expressão analítica desconhecida. Otimização via Nuvens de Partículas é combinada à SVR para a seleção simultânea das variáveis explicativas mais relevantes e dos valores dos hiperparâmetros que aparecem no problema de treinamento de SVR. Com o objetivo de avaliar a incerteza relacionada à variável resposta, métodos bootstrap são combinados à SVR para a obtenção de intervalos de confiança e de previsão. São realizados experimentos numéricos e são apresentados exemplos de aplicação no contexto da indústria do petróleo. Os resultados obtidos indicam que os modelos propostos fornecem informações importantes para o planejamento de custos e para a implementação de ações apropriadas a fim de evitar eventos indesejados. --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------This work develops models for quantifying risk and reliability-related metrics of systems in different phases of their life cycle. For systems in the design phase, a Multi-Objective Genetic Algorithm (MOGA) is coupled with Discrete Event Simulation (DES) to provide non-dominated configurations with respect to availability and cost. The proposed MOGA + DES incorporates a Generalized Renewal Process to account for imperfect repairs and it also indicates the optimal number of maintenance teams. For the operational phase, a hybridism between MOGA and Risk-Based Inspection is proposed for the elaboration of non-dominated inspection plans in terms of risk and cost that comply with local regulations. Regression via Support Vector Machines (SVR) is applied when the reliability-related metric (response variable) of an operational system is function of a number of environmental and operational variables with unknown analytical relationship. A Particle Swarm Optimization is combined to SVR for the selection of the most relevant variables along with the tuning of the SVR hyperparameters that appear in its training problem. In order to assess the uncertainty related to the response variable, bootstrap methods are coupled with SVR to construct confidence and prediction intervals. Numerical experiments and application examples in the context of oil industry are provided. The obtained results indicate that the proposed frameworks give valuable information for budget planning and for the implementation of proper actions to avoid undesired events.

APA, Harvard, Vancouver, ISO, and other styles

47

Gray, David Philip Harry. "Software defect prediction using static code metrics : formulating a methodology." Thesis, University of Hertfordshire, 2013. http://hdl.handle.net/2299/11067.

Full text

Abstract:

Software defect prediction is motivated by the huge costs incurred as a result of software failures. In an effort to reduce these costs, researchers have been utilising software metrics to try and build predictive models capable of locating the most defect-prone parts of a system. These areas can then be subject to some form of further analysis, such as a manual code review. It is hoped that such defect predictors will enable software to be produced more cost effectively, and/or be of higher quality. In this dissertation I identify many data quality and methodological issues in previous defect prediction studies. The main data source is the NASA Metrics Data Program Repository. The issues discovered with these well-utilised data sets include many examples of seemingly impossible values, and much redundant data. The redundant, or repeated data points are shown to be the cause of potentially serious data mining problems. Other methodological issues discovered include the violation of basic data mining principles, and the misleading reporting of classifier predictive performance. The issues discovered lead to a new proposed methodology for software defect prediction. The methodology is focused around data analysis, as this appears to have been overlooked in many prior studies. The aim of the methodology is to be able to obtain a realistic estimate of potential real-world predictive performance, and also to have simple performance baselines with which to compare against the actual performance achieved. This is important as quantifying predictive performance appropriately is a difficult task. The findings of this dissertation raise questions about the current defect prediction body of knowledge. So many data-related and/or methodological errors have previously occurred that it may now be time to revisit the fundamental aspects of this research area, to determine what we really know, and how we should proceed.

APA, Harvard, Vancouver, ISO, and other styles

48

Ikonomovski, Stefan V. "Detection of faulty components in Object-Oriented systems using design metrics and a machine learning algorithm." Thesis, National Library of Canada = Bibliothèque nationale du Canada, 1998. http://www.collectionscanada.ca/obj/s4/f2/dsk1/tape10/PQDD_0025/MQ50796.pdf.

Full text

APA, Harvard, Vancouver, ISO, and other styles

49

Furness, Jane. "The application of physiological metrics in validating user experience evaluation on automotive human machine interface systems." Thesis, Coventry University, 2016. http://curve.coventry.ac.uk/open/items/ca44bd4a-9c2f-4a72-b493-fef4e16874b5/1.

Full text

Abstract:

Automotive in-vehicle information systems have seen an era of continuous development within the industry and are recognised as a key differentiator for prospective customers. This presents a significant challenge for designers and engineers in producing effective next generation systems which are helpful, novel, exciting, safe and easy to use. The usability of any new human machine interface (HMI) has an implicit cost in terms of the perceived aesthetic perception and associated user experience. Achieving the next engaging automotive interface, not only has to address the user requirements but also has to incorporate established safety standards whilst considering new interaction technologies. An automotive (HMI) evaluation may combine a triad of physiological, subjective and performance-based measurements which are employed to provide relevant and valuable data for product evaluation. However, there is also a growing interest and appreciation that determining real-time quantitative metrics to drivers’ affective responses provide valuable user affective feedback. The aim of this research was to explore to what extent physiological metrics such as heart rate variability could be used to quantify or validate subjective testing of automotive HMIs. This research employed both objective and subjective metrics to assess user engagement during interactions with an automotive infotainment system. The mapping of both physiological and self-report scales was examined over a series of studies in order to provide a greater understanding of users’ responses. By analysing the data collected it may provide guidance within the early stages of in-vehicle design evaluation in terms of usability and user satisfaction. This research explored these metrics as an objective, quantitative, diagnostic measure of affective response, in the assessment of HMIs. Development of a robust methodology was constructed for the application and understanding of these metrics. Findings from the three studies point towards the value of using a combination of methods when examining user interaction with an in-car HMI. For the next generation of interface systems, physiological measures, such as heart rate variability may offer an additional dimension of validity when examining the complexities of the driving task that drivers perform every day. There appears to be no boundaries on technology advancements and with this, comes extra pressure for car manufacturers to produce similar interactive and connective devices to those that are already in use in homes. A successful in-car HMI system will be intuitive to use, aesthetically pleasing and possess an element of pleasure however, the design components that are needed for a highly usable HMI have to be considered within the context of the constraints of the manufacturing process and the risks associated with interacting with an in-car HMI whilst driving. The findings from the studies conducted in this research are discussed in relation to the usability and benefits of incorporating physiological measures that can assist in our understanding of driver interaction with different automotive HMIs.

APA, Harvard, Vancouver, ISO, and other styles

50

Chatterley, James J. "Sound Quality Analysis of Sewing Machines." BYU ScholarsArchive, 2005. https://scholarsarchive.byu.edu/etd/424.

Full text

Abstract:

Sound quality analysis is a tool designed to help determine customer preferences, which can be used to help the designer improve product quality. Many industries desire to know how the consuming public perceives their product, as this affects the product life and success. This research investigates which of the six sewing machines provided by Viking Sewing Machine Group (VSM group) consumers find most acoustically appealing. The sound quality analysis methods used include both jury based listening tests and quantitative sound quality metrics from empirical equations. The results from both methods are completely independent and are shown to have a very strong correlation. The procedures and results of both methods, jury listening tests and mathematical metrics, are presented. Near field sound intensity scans identified acoustic hot spots and give direction for possible design modifications to improve the acoustic signature of the two top tier machines, the Designer 1 and Creative 2144 (Husqvarna Viking and Pfaff respectively). This research determined that the entry level Pfaff Select 1530 has the most acoustically appealing sound of the six machines evaluated. In addition, it was also determined that a reduction in the higher frequency sounds produced by the machines is preferred over a reduction in the lower frequency sounds. Further investigations, including an evaluation of machine isolation and startup sounds, were also performed. The machine isolation results are highly dependant on the individual machine being evaluated and would require independent evaluation. In the machine startup sound assessment, it was discovered that again the Pfaff Select 1530 has the preferred sound. Near field acoustic intensity scans provide additional information on locations of strong acoustic radiation. The near field scans provided valuable design information. The acoustic "hot" spots were discovered to exist in the lower portions of the machines near the main stepper motor in the Designer 1, and radiating from the bottom plate of the machine in the Pfaff Creative 2144. This analysis has led to various design modifications that could be implemented to improve the sound quality of the machines, specifically the Designer 1 and the Creative 2144.

APA, Harvard, Vancouver, ISO, and other styles

We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!