To see the other types of publications on this topic, follow the link: Machine Learning,Customer Segmentation.

Dissertations / Theses on the topic 'Machine Learning,Customer Segmentation'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 dissertations / theses for your research on the topic 'Machine Learning,Customer Segmentation.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.

1

Armani, Luca. "Machine Learning: Customer Segmentation." Bachelor's thesis, Alma Mater Studiorum - Università di Bologna, 2021. http://amslaurea.unibo.it/24925/.

Full text
Abstract:
Con lo scopo di risparmiare capitale e incrementare i profitti tramite attività di marketing sempre più mirate, conoscere le preferenze di un cliente e supportarlo nell’acquisto, sta passando dall’essere una scelta all’essere una necessità. A tal proposito, le aziende si stanno muovendo verso un approccio sempre più automatizzato per riuscire a classificare la clientela, cos`ı da ottimizzare sempre più l’esperienza d’acquisto. Tramite il Machine Learning è possibile effettuare svariati tipi di analisi che consentano di raggiungere questo scopo. L’obiettivo di questo progetto è, in prima fase, di dare una panoramica al lettore su quali siano le tecniche e gli strumenti che mette a disposizione il ML. In un secondo momento verrà descritto il problema della Customer Segmentation e quali tecniche e benefici porta con sé questo tema. Per finire, verranno descritte le varie fasi su cui si fonda il seguente progetto di ML rivolto alla classificazione della clientela, basandosi sul totale di spesa effettuata in termini monetari e la quantità di articoli acquistati.
APA, Harvard, Vancouver, ISO, and other styles
2

Johansson, Axel, and Jonas Wikström. "Customer segmentation using machine learning." Thesis, Uppsala universitet, Avdelningen för systemteknik, 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-443868.

Full text
Abstract:
In this thesis, the process of developing an application for segmenting customers with the use of machine learning is described. The project was carried out at a company which provides a booking platform for beauty and health services. Data about customers were analyzed and processed in order to train two classification models able to segment customers into three different customer groups. The performance of the two models, a Logistic Regression model and a Support Vector Classifier, were evaluated with different numbers of features and compared to classifications made by human experts working at the company. The results shows that the logistic regression model achieved an accuracy of 71% when classifying users into the three groups, which was more accurate than the experts manual classification. A web API where the model is provided has been developed and presented to the company. The results of the study showed that machine learning is a useful technique for performing customer segmentation based on behavioral data. Even in the case where the classes are not naturally divisible, the application provides valuable insights on user behaviour that can help the company become more data-driven.
APA, Harvard, Vancouver, ISO, and other styles
3

Afzalan, Milad. "Data-driven customer energy behavior characterization for distributed energy management." Diss., Virginia Tech, 2020. http://hdl.handle.net/10919/99210.

Full text
Abstract:
With the ever-growing concerns of environmental and climate concerns for energy consumption in our society, it is crucial to develop novel solutions that improve the efficient utilization of distributed energy resources for energy efficiency and demand response (DR). As such, there is a need to develop targeted energy programs, which not only meet the requirement of energy goals for a community but also take the energy use patterns of individual households into account. To this end, a sound understanding of the energy behavior of customers at the neighborhood level is needed, which requires operational analytics on the wealth of energy data from customers and devices. In this dissertation, we focus on data-driven solutions for customer energy behavior characterization with applications to distributed energy management and flexibility provision. To do so, the following problems were studied: (1) how different customers can be segmented for DR events based on their energy-saving potential and balancing peak and off-peak demand, (2) what are the opportunities for extracting Time-of-Use of specific loads for automated DR applications from the whole-house energy data without in-situ training, and (3) how flexibility in customer demand adoption of renewable and distributed resources (e.g., solar panels, battery, and smart loads) can improve the demand-supply problem. In the first study, a segmentation methodology form historical energy data of households is proposed to estimate the energy-saving potential for DR programs at a community level. The proposed approach characterizes certain attributes in time-series data such as frequency, consistency, and peak time usage. The empirical evaluation of real energy data of 400 households shows the successful ranking of different subsets of consumers according to their peak energy reduction potential for the DR event. Specifically, it was shown that the proposed approach could successfully identify the 20-30% of customers who could achieve 50-70% total possible demand reduction for DR. Furthermore, the rebound effect problem (creating undesired peak demand after a DR event) was studied, and it was shown that the proposed approach has the potential of identifying a subset of consumers (~5%-40% with specific loads like AC and electric vehicle) who contribute to balance the peak and off-peak demand. A projection on Austin, TX showed 16MWh reduction during a 2-h event can be achieved by a justified selection of 20% of residential customers. In the second study, the feasibility of inferring time-of-use (ToU) operation of flexible loads for DR applications was investigated. Unlike several efforts that required considerable model parameter selection or training, we sought to infer ToU from machine learning models without in-situ training. As the first part of this study, the ToU inference from low-resolution 15-minute data (smart meter data) was investigated. A framework was introduced which leveraged the smart meter data from a set of neighbor buildings (equipped with plug meters) with similar energy use behavior for training. Through identifying similar buildings in energy use behavior, the machine learning classification models (including neural network, SVM, and random forest) were employed for inference of appliance ToU in buildings by accounting for resident behavior reflected in their energy load shapes from smart meter data. Investigation on electric vehicle (EV) and dryer for 10 buildings over 20 days showed an average F-score of 83% and 71%. As the second part of this study, the ToU inference from high-resolution data (60Hz) was investigated. A self-configuring framework, based on the concept of spectral clustering, was introduced that automatically extracts the appliance signature from historical data in the environment to avoid the problem of model parameter selection. Using the framework, appliance signatures are matched with new events in the electricity signal to identify the ToU of major loads. The results on ~1500 events showed an F-score of >80% for major loads like AC, washing machine, and dishwasher. In the third study, the problem of demand-supply balance, in the presence of varying levels of small-scale distributed resources (solar panel, battery, and smart load) was investigated. The concept of load complementarity between consumers and prosumers for load balancing among a community of ~250 households was investigated. The impact of different scenarios such as varying levels of solar penetration, battery integration level, in addition to users' flexibility for balancing the supply and demand were quantitatively measured. It was shown that (1) even with 100% adoption of solar panels, the renewable supply cannot cover the demand of the network during afternoon times (e.g., after 3 pm), (2) integrating battery for individual households could improve the self-sufficiency by more than 15% during solar generation time, and (3) without any battery, smart loads are also capable of improving the self-sufficiency as an alternative, by providing ~60% of what commercial battery systems would offer. The contribution of this dissertation is through introducing data-driven solutions/investigations for characterizing the energy behavior of households, which could increase the flexibility of the aggregate daily energy load profiles for a community. When combined, the findings of this research can serve to the field of utility-scale energy analytics for the integration of DR and improved reshaping of network energy profiles (i.e., mitigating the peaks and valleys in daily demand profiles).<br>Doctor of Philosophy<br>Buildings account for more than 70% of electricity consumption in the U.S., in which more than 40% is associated with the residential sector. During recent years, with the advancement in Information and Communication Technologies (ICT) and the proliferation of data from consumers and devices, data-driven methods have received increasing attention for improving the energy-efficiency initiatives. With the increased adoption of renewable and distributed resources in buildings (e.g., solar panels and storage systems), an important aspect to improve the efficiency by matching the demand and supply is to add flexibility to the energy consumption patterns (e.g., trying to match the times of high energy demand from buildings and renewable generation). In this dissertation, we introduced data-driven solutions using the historical energy data of consumers with application to the flexibility provision. Specific problems include: (1) introducing a ranking score for buildings in a community to detect the candidates that can provide higher energy saving in the future events, (2) estimating the operation time of major energy-intensive appliances by analyzing the whole-house energy data using machine learning models, and (3) investigating the potential of achieving demand-supply balance in communities of buildings under the impact of different levels of solar panels, battery systems, and occupants energy consumption behavior. In the first study, a ranking score was introduced that analyzes the historical energy data from major loads such as washing machines and dishwashers in individual buildings and group the buildings based on their potential for energy saving at different times of the day. The proposed approach was investigated for real data of 400 buildings. The results for EV, washing machine, dishwasher, dryer, and AC show that the approach could successfully rank buildings by their demand reduction potential at critical times of the day. In the second study, machine learning (ML) frameworks were introduced to identify the times of the day that major energy-intensive appliances are operated. To do so, the input of the model was considered as the main circuit electricity information of the whole building either in lower-resolution data (smart meter data) or higher-resolution data (60Hz). Unlike previous studies that required considerable efforts for training the model (e.g, defining specific parameters for mathematical formulation of the appliance model), the aim was to develop data-driven approaches to learn the model either from the same building itself or from the neighbors that have appliance-level metering devices. For the lower-resolution data, the objective was that, if a few samples of buildings have already access to plug meters (i.e., appliance level data), one could estimate the operation time of major appliances through ML models by matching the energy behavior of the buildings, reflected in their smart meter information, with the ones in the neighborhood that have similar behaviors. For the higher-resolution data, an algorithm was introduced that extract the appliance signature (i.e., change in the pattern of electricity signal when an appliance is operated) to create a processed library and match the new events (i.e., times that an appliance is operated) by investigating the similarity with the ones in the processed library. The investigation on major appliances like AC, EV, dryer, and washing machine shows the >80% accuracy on standard performance metrics. In the third study, the impact of adding small-scale distributed resources to individual buildings (solar panels, battery, and users' practice in changing their energy consumption behavior) for matching the demand-supply for the communities was investigated. A community of ~250 buildings was considered to account for realistic uncertain energy behavior across households. It was shown that even when all buildings have a solar panel, during the afternoon times (after 4 pm) in which still ~30% of solar generation is possible, the community could not supply their demand. Furthermore, it was observed that including users' practice in changing their energy consumption behavior and battery could improve the utilization of solar energy around >10%-15%. The results can serve as a guideline for utilities and decision-makers to understand the impact of such different scenarios on improving the utilization of solar adoption. These series of studies in this dissertation contribute to the body of literature by introducing data-driven solutions/investigations for characterizing the energy behavior of households, which could increase the flexibility in energy consumption patterns.
APA, Harvard, Vancouver, ISO, and other styles
4

Havaei, Seyed Mohammad. "Machine learning methods for brain tumor segmentation." Thèse, Université de Sherbrooke, 2017. http://hdl.handle.net/11143/10260.

Full text
Abstract:
Abstract : Malignant brain tumors are the second leading cause of cancer related deaths in children under 20. There are nearly 700,000 people in the U.S. living with a brain tumor and 17,000 people are likely to loose their lives due to primary malignant and central nervous system brain tumor every year. To identify whether a patient is diagnosed with brain tumor in a non-invasive way, an MRI scan of the brain is acquired followed by a manual examination of the scan by an expert who looks for lesions (i.e. cluster of cells which deviate from healthy tissue). For treatment purposes, the tumor and its sub-regions are outlined in a procedure known as brain tumor segmentation . Although brain tumor segmentation is primarily done manually, it is very time consuming and the segmentation is subject to variations both between observers and within the same observer. To address these issues, a number of automatic and semi-automatic methods have been proposed over the years to help physicians in the decision making process. Methods based on machine learning have been subjects of great interest in brain tumor segmentation. With the advent of deep learning methods and their success in many computer vision applications such as image classification, these methods have also started to gain popularity in medical image analysis. In this thesis, we explore different machine learning and deep learning methods applied to brain tumor segmentation.<br>Résumé: Les tumeurs malignes au cerveau sont la deuxième cause principale de décès chez les enfants de moins de 20 ans. Il y a près de 700 000 personnes aux États-Unis vivant avec une tumeur au cerveau, et 17 000 personnes sont chaque année à risque de perdre leur vie suite à une tumeur maligne primaire dans le système nerveu central. Pour identifier de façon non-invasive si un patient est atteint d'une tumeur au cerveau, une image IRM du cerveau est acquise et analysée à la main par un expert pour trouver des lésions (c.-à-d. un groupement de cellules qui diffère du tissu sain). Une tumeur et ses régions doivent être détectées à l'aide d'une segmentation pour aider son traitement. La segmentation de tumeur cérébrale et principalement faite à la main, c'est une procédure qui demande beaucoup de temps et les variations intra et inter expert pour un même cas varient beaucoup. Pour répondre à ces problèmes, il existe beaucoup de méthodes automatique et semi-automatique qui ont été proposés ces dernières années pour aider les praticiens à prendre des décisions. Les méthodes basées sur l'apprentissage automatique ont suscité un fort intérêt dans le domaine de la segmentation des tumeurs cérébrales. L'avènement des méthodes de Deep Learning et leurs succès dans maintes applications tels que la classification d'images a contribué à mettre de l'avant le Deep Learning dans l'analyse d'images médicales. Dans cette thèse, nous explorons diverses méthodes d'apprentissage automatique et de Deep Learning appliquées à la segmentation des tumeurs cérébrales.
APA, Harvard, Vancouver, ISO, and other styles
5

POMIER, ROMAIN. "Machine Learning to handle customer issues." Thesis, KTH, Skolan för datavetenskap och kommunikation (CSC), 2014. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-153911.

Full text
Abstract:
Amadeus is providing solutions to travel businesses. Today, messages from customers are received by one team who dispatches them manually.This team is making a lot of mistakes. The purpose of this thesisis to study, design and prototype a tool to analyse automatically these messages and send them to the right team. This paper studies differentsmachine learning methods to learn the classification from previous messages.Besides, we will use several text processing optimisation methods to improve our results.
APA, Harvard, Vancouver, ISO, and other styles
6

Shepherd, T. "Dynamical models and machine learning for supervised segmentation." Thesis, University College London (University of London), 2009. http://discovery.ucl.ac.uk/18729/.

Full text
Abstract:
This thesis is concerned with the problem of how to outline regions of interest in medical images, when the boundaries are weak or ambiguous and the region shapes are irregular. The focus on machine learning and interactivity leads to a common theme of the need to balance conflicting requirements. First, any machine learning method must strike a balance between how much it can learn and how well it generalises. Second, interactive methods must balance minimal user demand with maximal user control. To address the problem of weak boundaries,methods of supervised texture classification are investigated that do not use explicit texture features. These methods enable prior knowledge about the image to benefit any segmentation framework. A chosen dynamic contour model, based on probabilistic boundary tracking, combines these image priors with efficient modes of interaction. We show the benefits of the texture classifiers over intensity and gradient-based image models, in both classification and boundary extraction. To address the problem of irregular region shape, we devise a new type of statistical shape model (SSM) that does not use explicit boundary features or assume high-level similarity between region shapes. First, the models are used for shape discrimination, to constrain any segmentation framework by way of regularisation. Second, the SSMs are used for shape generation, allowing probabilistic segmentation frameworks to draw shapes from a prior distribution. The generative models also include novel methods to constrain shape generation according to information from both the image and user interactions. The shape models are first evaluated in terms of discrimination capability, and shown to out-perform other shape descriptors. Experiments also show that the shape models can benefit a standard type of segmentation algorithm by providing shape regularisers. We finally show how to exploit the shape models in supervised segmentation frameworks, and evaluate their benefits in user trials.
APA, Harvard, Vancouver, ISO, and other styles
7

Mahbod, Amirreza. "Structural Brain MRI Segmentation Using Machine Learning Technique." Thesis, KTH, Skolan för teknik och hälsa (STH), 2016. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-189985.

Full text
Abstract:
Segmenting brain MR scans could be highly benecial for diagnosing, treating and evaluating the progress of specic diseases. Up to this point, manual segmentation,performed by experts, is the conventional method in hospitals and clinical environments. Although manual segmentation is accurate, it is time consuming, expensive and might not be reliable. Many non-automatic and semi automatic methods have been proposed in the literature in order to segment MR brain images, but the levelof accuracy is not comparable with manual segmentation. The aim of this project is to implement and make a preliminary evaluation of a method based on machine learning technique for segmenting gray matter (GM),white matter (WM) and cerebrospinal uid (CSF) of brain MR scans using images available within the open MICCAI grand challenge (MRBrainS13).The proposed method employs supervised articial neural network based autocontext algorithm, exploiting intensity-based, spatial-based and shape model-basedlevel set segmentation results as features of the network. The obtained average results based on Dice similarity index were 97.73%, 95.37%, 82.76%, 88.47% and 84.78% for intracranial volume, brain (WM + GM), CSF, WM and GM respectively. This method achieved competitive results with considerably shorter required training time in MRBrainsS13 challenge.
APA, Harvard, Vancouver, ISO, and other styles
8

Thien, Bao Nguyen. "Machine Learning for Tract Segmentation in dMRI data." Doctoral thesis, Università degli studi di Trento, 2016. https://hdl.handle.net/11572/368335.

Full text
Abstract:
Diffusion MRI (dMRI) data allows to reconstruct the 3D pathways of axons within the white matter of the brain as a set of streamlines, called tractography. A streamline is a vectorial representation of thousands of neuronal axons expressing structural connectivity. An important task is to group the same functional streamlines into one tract segmentation. This work is extremely helpful for neuro surgery or diagnosing brain disease. However, the segmentation process is difficult and time consuming due to the large number of streamlines (about 3 × 10 5 in a normal brain) and the variability of the brain anatomy among different subjects. In our project, the goal is to design an effective method for tract segmentation task based on machine learning techniques and to develop an interactive tool to assist medical practitioners to perform this task more precisely, more easily, and faster. First, we propose a design of interactive segmentation process by presenting the user a clustered version of the tractography in which user selects some of the clusters to identify a superset of the streamlines of interest. This superset is then re-clustered at a finer scale and again the user is requested to select the relevant clusters. The process of re-clustering and manual selection is iterated until the remaining streamlines faithfully represent the desired anatomical structure of interest. To solve the computational issue of clustering a large number of streamlines under the strict time constraints requested by the interactive use, we present a solution which consists in embedding the streamlines into a Euclidean space (call dissimilarity representation), and then in adopting a state-of-the art scalable implementation of the k-means algorithm. The dissimilarity representation is defined by selecting a set of streamlines called prototypes and then mapping any new streamline to the vector of distances from prototypes. Second, an algorithm is proposed to find the correspondence/mapping between streamlines in tractographies among two different samples, without requiring any transformation as the traditional tractography registration usually does. In other words, we try to find a mapping between the tractographies. Mapping is very useful for studying tractography data across subjects. Last but not least, by exploring the mapping in the context of dissimilarity representation, we also propose the algorithmic solution to build the common vectorial representation of streamlines across subject. The core of the proposed solution combines two state-of-the-art elements: first using the recently proposed tractography mapping approach to align the prototypes across subjects; then applying the dissimilarity representation to build the common vectorial representation for streamline. Preliminary results of applying our methods in clinical use-cases show evidence that our proposed algorithm is greatly beneficial (in terms of time efficiency, easiness.etc.) for the study of white matter tractography in clinical applications.
APA, Harvard, Vancouver, ISO, and other styles
9

Thien, Bao Nguyen. "Machine Learning for Tract Segmentation in dMRI data." Doctoral thesis, University of Trento, 2016. http://eprints-phd.biblio.unitn.it/1704/1/20163103_thesis_bao.pdf.

Full text
Abstract:
Diffusion MRI (dMRI) data allows to reconstruct the 3D pathways of axons within the white matter of the brain as a set of streamlines, called tractography. A streamline is a vectorial representation of thousands of neuronal axons expressing structural connectivity. An important task is to group the same functional streamlines into one tract segmentation. This work is extremely helpful for neuro surgery or diagnosing brain disease. However, the segmentation process is difficult and time consuming due to the large number of streamlines (about 3 × 10 5 in a normal brain) and the variability of the brain anatomy among different subjects. In our project, the goal is to design an effective method for tract segmentation task based on machine learning techniques and to develop an interactive tool to assist medical practitioners to perform this task more precisely, more easily, and faster. First, we propose a design of interactive segmentation process by presenting the user a clustered version of the tractography in which user selects some of the clusters to identify a superset of the streamlines of interest. This superset is then re-clustered at a finer scale and again the user is requested to select the relevant clusters. The process of re-clustering and manual selection is iterated until the remaining streamlines faithfully represent the desired anatomical structure of interest. To solve the computational issue of clustering a large number of streamlines under the strict time constraints requested by the interactive use, we present a solution which consists in embedding the streamlines into a Euclidean space (call dissimilarity representation), and then in adopting a state-of-the art scalable implementation of the k-means algorithm. The dissimilarity representation is defined by selecting a set of streamlines called prototypes and then mapping any new streamline to the vector of distances from prototypes. Second, an algorithm is proposed to find the correspondence/mapping between streamlines in tractographies among two different samples, without requiring any transformation as the traditional tractography registration usually does. In other words, we try to find a mapping between the tractographies. Mapping is very useful for studying tractography data across subjects. Last but not least, by exploring the mapping in the context of dissimilarity representation, we also propose the algorithmic solution to build the common vectorial representation of streamlines across subject. The core of the proposed solution combines two state-of-the-art elements: first using the recently proposed tractography mapping approach to align the prototypes across subjects; then applying the dissimilarity representation to build the common vectorial representation for streamline. Preliminary results of applying our methods in clinical use-cases show evidence that our proposed algorithm is greatly beneficial (in terms of time efficiency, easiness.etc.) for the study of white matter tractography in clinical applications.
APA, Harvard, Vancouver, ISO, and other styles
10

Le, Truc Duc. "Machine Learning Methods for 3D Object Classification and Segmentation." Thesis, University of Missouri - Columbia, 2019. http://pqdtopen.proquest.com/#viewpdf?dispub=13877153.

Full text
Abstract:
<p> Object understanding is a fundamental problem in computer vision and it has been extensively researched in recent years thanks to the availability of powerful GPUs and labelled data, especially in the context of images. However, 3D object understanding is still not on par with its 2D domain and deep learning for 3D has not been fully explored yet. In this dissertation, I work on two approaches, both of which advances the state-of-the-art results in 3D classification and segmentation.</p><p> The first approach, called MVRNN, is based multi-view paradigm. In contrast to MVCNN which does not generate consistent result across different views, by treating the multi-view images as a temporal sequence, our MVRNN correlates the features and generates coherent segmentation across different views. MVRNN demonstrated state-of-the-art performance on the Princeton Segmentation Benchmark dataset.</p><p> The second approach, called PointGrid, is a hybrid method which combines points and regular grid structure. 3D points can retain fine details but irregular, which is challenge for deep learning methods. Volumetric grid is simple and has regular structure, but does not scale well with data resolution. Our PointGrid, which is simple, allows the fine details to be consumed by normal convolutions under a coarser resolution grid. PointGrid achieved state-of-the-art performance on ModelNet40 and ShapeNet datasets in 3D classification and object part segmentation. </p><p>
APA, Harvard, Vancouver, ISO, and other styles
11

Bardolet, Pettersson Susana. "Managing imbalanced training data by sequential segmentation in machine learning." Thesis, Linköpings universitet, Avdelningen för medicinsk teknik, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-155091.

Full text
Abstract:
Imbalanced training data is a common problem in machine learning applications. Thisproblem refers to datasets in which the foreground pixels are significantly fewer thanthe background pixels. By training a machine learning model with imbalanced data, theresult is typically a model that classifies all pixels as the background class. A result thatindicates no presence of a specific condition when it is actually present is particularlyundesired in medical imaging applications. This project proposes a sequential system oftwo fully convolutional neural networks to tackle the problem. Semantic segmentation oflung nodules in thoracic computed tomography images has been performed to evaluate theperformance of the system. The imbalanced data problem is present in the training datasetused in this project, where the average percentage of pixels belonging to the foregroundclass is 0.0038 %. The sequential system achieved a sensitivity of 83.1 % representing anincrease of 34 % compared to the single system. The system only missed 16.83% of thenodules but had a Dice score of 21.6 % due to the detection of multiple false positives. Thismethod shows considerable potential to be a solution to the imbalanced data problem withcontinued development.
APA, Harvard, Vancouver, ISO, and other styles
12

Verzellesi, Laura. "Machine Learning methods for hepatocellular malignancies segmentation and MVI prediction." Master's thesis, Alma Mater Studiorum - Università di Bologna, 2021. http://amslaurea.unibo.it/23534/.

Full text
Abstract:
The aim of this thesis project is to automatically localize HCC tumors in the human liver and subsequently predict if the tumor will undergo microvascular infiltration (MVI), the initial stage of metastasis development. The input data for the work have been partially supplied by Sant'Orsola Hospital and partially downloaded from online medical databases. Two Unet models have been implemented for the automatic segmentation of the livers and the HCC malignancies within it. The segmentation models have been evaluated with the Intersection-over-Union and the Dice Coefficient metrics. The outcomes obtained for the liver automatic segmentation are quite good (IOU = 0.82; DC = 0.35); the outcomes obtained for the tumor automatic segmentation (IOU = 0.35; DC = 0.46) are, instead, affected by some limitations: it can be state that the algorithm is almost always able to detect the location of the tumor, but it tends to underestimate its dimensions. The purpose is to achieve the CT images of the HCC tumors, necessary for features extraction. The 14 Haralick features calculated from the 3D-GLCM, the 120 Radiomic features and the patients' clinical information are collected to build a dataset of 153 features. Now, the goal is to build a model able to discriminate, based on the features given, the tumors that will undergo MVI and those that will not. This task can be seen as a classification problem: each tumor needs to be classified either as “MVI positive” or “MVI negative”. Techniques for features selection are implemented to identify the most descriptive features for the problem at hand and then, a set of classification models are trained and compared. Among all, the models with the best performances (around 80-84% ± 8-15%) result to be the XGBoost Classifier, the SDG Classifier and the Logist Regression models (without penalization and with Lasso, Ridge or Elastic Net penalization).
APA, Harvard, Vancouver, ISO, and other styles
13

Eslami, Seyed Mohammadali. "Generative probabilistic models for object segmentation." Thesis, University of Edinburgh, 2014. http://hdl.handle.net/1842/8898.

Full text
Abstract:
One of the long-standing open problems in machine vision has been the task of ‘object segmentation’, in which an image is partitioned into two sets of pixels: those that belong to the object of interest, and those that do not. A closely related task is that of ‘parts-based object segmentation’, where additionally each of the object’s pixels are labelled as belonging to one of several predetermined parts. There is broad agreement that segmentation is coupled to the task of object recognition. Knowledge of the object’s class can lead to more accurate segmentations, and in turn accurate segmentations can be used to obtain higher recognition rates. In this thesis we focus on one side of this relationship: given the object’s class and its bounding box, how accurately can we segment it? Segmentation is challenging primarily due to the huge amount of variability one sees in images of natural scenes. A large number of factors combine in complex ways to generate the pixel intensities that make up any given image. In this work we approach the problem by developing generative probabilistic models of the objects in question. Not only does this allow us to express notions of variability and uncertainty in a principled way, but also to separate the problems of model design and inference. The thesis makes the following contributions: First, we demonstrate an explicit probabilistic model of images of objects based on a latent Gaussian model of shape. This can be learned from images in an unsupervised fashion. Through experiments on a variety of datasets we demonstrate the advantages of explicitly modelling shape variability. We then focus on the task of constructing more accurate models of shape. We present a type of layered probabilistic model that we call a Shape Boltzmann Machine (SBM) for the task of modelling foreground/background (binary) and parts-based (categorical) shapes. We demonstrate that it constitutes the state-of-the-art and characterises a ‘strong’ model of shape, in that samples from the model look realistic and that it generalises to generate samples that differ from training examples. Finally, we demonstrate how the SBM can be used in conjunction with an appearance model to form a fully generative model of images of objects. We show how parts-based object segmentations can be obtained simply by performing probabilistic inference in this joint model. We apply the model to several challenging datasets and find that its performance is comparable to the state-of-the-art.
APA, Harvard, Vancouver, ISO, and other styles
14

Gonzalez, Munoz Mario, and Philip Hedström. "Predicting Customer Behavior in E-commerce using Machine Learning." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-260269.

Full text
Abstract:
E-handel har varit en snabbt växande sektor de senaste åren och förväntas fortsätta växa i samma takt under de närmsta. Detta har öppnat upp nya möjligheter för företag som försöker sälja sina produkter och tjänster, men det tvingar dem även att utnyttja dessa möjligheter för att vara konkurrenskraftiga. En intressant möjlighet som vi har valt att fokusera detta arbete på är förmågan att använda kunddata, som inte varit tillgänglig i fysiska butiker, till att identifiera mönster i kundbeteenden. Förhoppningsvis ger detta en ökad förståelse för kunderna och gör det möjligt att förutspå framtida beteenden. Vi fokuserade specifikt på att skilja mellan potentiella köpare och faktiska köpare, med avsikt att identifiera nyckelfaktorer som avgör ifall en kund genomför ett köp eller ej. Detta gjorde vi genom att använda Binary Logistic Regression, en algoritm som använder övervakad maskininlärning för att klassificera en observation mellan två klasser. Vi lyckades ta fram en modell som förutsåg om en kund skulle genomföra ett köp eller ej med en noggrannhet på 88%.<br>E-commerce has been a rapidly growing sector during the last years, and are predicted to continue to grow as fast during the next ones. This has opened up a lot of opportunities for companies trying to sell their products or services, but it is also forcing them to exploit these opportunities before their competitors in order to not fall behind. One interesting opportunity we have chosen to focus this thesis on is the ability to use customer data, that has not been available with physical stores, to identify customer behaviour patterns and develop a better understanding for the customers. Hopefully this makes it possible to predict customer behaviour. We specifically focused on distinguishing possible-buyers from buyers, with the intent of identifying key factors that affect whether the customer performs a purchase or not. We did this using Binary Logistic Regression, a supervised machine learning algorithm that is trained to classify an input observation. We managed to create a model that predicted whether or not a customer was a possible-buyer or buyer with an accuracy of 88%.
APA, Harvard, Vancouver, ISO, and other styles
15

Wolf, Steffen [Verfasser], and Fred [Akademischer Betreuer] Hamprecht. "Machine Learning for Instance Segmentation / Steffen Wolf ; Betreuer: Fred A. Hamprecht." Heidelberg : Universitätsbibliothek Heidelberg, 2020. http://nbn-resolving.de/urn:nbn:de:bsz:16-heidok-283535.

Full text
APA, Harvard, Vancouver, ISO, and other styles
16

Kantedal, Simon. "Evaluating Segmentation of MR Volumes Using Predictive Models and Machine Learning." Thesis, Linköpings universitet, Institutionen för medicinsk teknik, 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-171102.

Full text
Abstract:
A reliable evaluation system is essential for every automatic process. While techniques for automatic segmentation of images have been extensively researched in recent years, evaluation of the same has not received an equal amount of attention. Amra Medical AB has developed a system for automatic segmentation of magnetic resonance (MR) images of human bodies using an atlas-based approach. Through their software, Amra is able to derive body composition measurements, such as muscle and fat volumes, from the segmented MR images. As of now, the automatic segmentations are quality controlled by clinical experts to ensure their correctness. This thesis investigates the possibilities to leverage predictive modelling to reduce the need for a manual quality control (QC) step in an otherwise automatic process. Two different regression approaches have been implemented as a part of this study: body composition measurement prediction (BCMP) and manual correction prediction (MCP). BCMP aims at predicting the derived body composition measurements and comparing the predictions to actual measurements. The theory is that large deviations between the predictions and the measurements signify an erroneously segmented sample. MCP instead tries to directly predict the amount of manual correction needed for each sample. Several regression models have been implemented and evaluated for the two approaches. Comparison of the regression models shows that local linear regression (LLR) is the most performant model for both BCMP and MCP. The results show that the inaccuracies in the BCMP-models, in practice, renders this approach useless. MCP proved to be a far more viable approach; using MCP together with LLR achieves a high true positive rate with a reasonably low false positive rate for several body composition measurements. These results suggest that the type of system developed in this thesis has the potential to reduce the need for manual inspections of the automatic segmentation masks.
APA, Harvard, Vancouver, ISO, and other styles
17

Wolf, Steffen [Verfasser], and Fred A. [Akademischer Betreuer] Hamprecht. "Machine Learning for Instance Segmentation / Steffen Wolf ; Betreuer: Fred A. Hamprecht." Heidelberg : Universitätsbibliothek Heidelberg, 2020. http://d-nb.info/1211090493/34.

Full text
APA, Harvard, Vancouver, ISO, and other styles
18

Modaresi, Seyed Mohammad Reza. "Meta-decomposition and evaluation processes in machine learning applications." Electronic Thesis or Diss., Paris 13, 2023. http://www.theses.fr/2023PA131046.

Full text
Abstract:
La segmentation est une étape cruciale dans diverses applications du monde réel telles que l’analyse d’images médicales, la reconnaissance d’activités et la détection d’événements sonores. Elle implique de diviser les données d’entrée en segments plus petits, ce qui induit des modifications dans certaines caractéristiques des données d’entrée. Ce processus introduit au moins deux familles de biais incontrôlables. La première famille de biais est introduite dans le modèle en raison des changements dans l’espace du problème provoqués par la segmentation elle-même. La deuxième famille de biais est causée par le processus de segmentation lui-même, y compris la fixation de la méthode de segmentation et de ses paramètres. Cette thèse présente une nouvelle couche adaptative conçue pour améliorer les modèles de segmentation d’images médicales existants, améliorant ainsi leurs performances. Cette couche adaptative ajuste dynamiquement la taille du champ récepteur en fonction des informations des pixels et de leur voisinage. Ces concepts sont ensuite étendus à des scénarios plus complexes impliquant des types de données hétérogènes, présentant une nouvelle approche de méta-décomposition ou d’apprentissage de la décomposition pour la segmentation. Cette approche atténue les biais implicites tout en permettant une segmentation adaptative pour différents types de données, prenant en compte les variations et les hétérogénéités des données telles que les différences saisonnières dans les activités. Reconnaissant l’impact de la segmentation sur l’espace du problème, la recherche examine les inconvénients des méthodes d’évaluation de pointe, en mettant l’accent sur la nécessité de cadres plus complets qui se concentrent sur des méthodes d’évaluation basées sur des points, négligeant les relations spatiales ou temporelles entre les instances. Pour valider l’efficacité des techniques d’évaluation suggérées et de l’approche de méta-décomposition, des expérimentations approfondies sont menées sur divers ensembles de données réels concrets<br>Segmentation is a crucial primary step in a variety of real-world applications such as medical image analysis, activity recognition, and sound event detection. It involves partitioning input data into smaller segments, thereby inducing alterations in certain characteristics of the input data. This process introduces at least two families of uncontrollable biases. The first family of biases is introduced to the model due to the changes in problem space made by the segmentation itself. The second family of biases is caused by the segmentation process itself, including the fixation of the segmentation method and its parameters. This thesis presents a novel adaptive layer designed to augment existing medical image segmentation deep models, enhancing their performance. This adaptive layer dynamically adjusts the receptive field size based on pixel and neighboring information. These concepts are then extended to more intricate scenarios involving heterogeneous data types, presenting a novel meta-decomposition or learning-to-decompose approach for segmentation. This approach mitigates implicit biases while enabling adaptive segmentation for various data types, accommodating data variations and heterogeneities such as seasonal differences in activities. Recognizing the impact of segmentation on the problem space, the research scrutinizes the drawbacks of state-of-the-art evaluation methods, emphasizing the necessity for more comprehensive frameworks, focusing on point-based evaluation methods, neglects spatial or temporal relationships between instances. To validate the efficacy of the suggested evaluation techniques and the meta-decomposition approach, extensive experimentation is conducted across diverse concrete real-world datasets
APA, Harvard, Vancouver, ISO, and other styles
19

Ringqvist, Sanna. "Classification of terrain using superpixel segmentation and supervised learning." Thesis, Linköpings universitet, Datorseende, 2014. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-112511.

Full text
Abstract:
The usage of 3D-modeling is expanding rapidly. Modeling from aerial imagery has become very popular due to its increasing number of both civilian and mili- tary applications like urban planning, navigation and target acquisition. This master thesis project was carried out at Vricon Systems at SAAB. The Vricon system produces high resolution geospatial 3D data based on aerial imagery from manned aircrafts, unmanned aerial vehicles (UAV) and satellites. The aim of this work was to investigate to what degree superpixel segmentation and supervised learning can be applied to a terrain classification problem using imagery and digital surface models (dsm). The aim was also to investigate how the height information from the digital surface model may contribute compared to the information from the grayscale values. The goal was to identify buildings, trees and ground. Another task was to evaluate existing methods, and compare results. The approach for solving the stated goal was divided into several parts. The first part was to segment the image using superpixel segmentation, after that features were extracted. Then the classifiers were created and trained and finally the classifiers were evaluated. The classification method that obtained the best results in this thesis had approx- imately 90 % correctly labeled superpixels. The result was equal, if not better, compared to other solutions available on the market.
APA, Harvard, Vancouver, ISO, and other styles
20

BINDER, THOMAS. "Gland Segmentation with Convolutional Neural Networks : Validity of Stroma Segmentation as a General Approach." Thesis, KTH, Skolan för kemi, bioteknologi och hälsa (CBH), 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-246134.

Full text
Abstract:
The analysis of glandular morphology within histopathology images is a crucial step in determining the stage of cancer. Manual annotation is a very laborious task. It is time consuming and suffers from the subjectivity of the specialists that label the glands. One of the aims of computational pathology is developing tools to automate gland segmentation. Such an algorithm would improve the efficiency of cancer diag- nosis. This is a complex task as there is a large variability in glandular morphologies and staining techniques. So far, specialised models have given promising results focusing on only one organ. This work investigated the idea of a cross domain ap- proximation. Unlike parenchymae the stroma tissue that lies between the glands is similar throughout all organs in the body. Creating a model able to precisely seg- ment the stroma would pave the way for a cross organ model. It would be able to segment the tissue and therefore give access to gland morphologies of different organs. To address this issue, we investigated different new and former architec- tures such as the MILD-net which is the currently best performing algorithm of the GlaS challenge. New architectures were created based on the promising U shaped network as well as Xception and the ResNet for feature extraction. These networks were trained on colon histopathology images focusing on glands and on the stroma. The comparision of the different results showed that this initial cross domain ap- proximation goes into the right direction and incites for further developments.
APA, Harvard, Vancouver, ISO, and other styles
21

Garcia, Gomez David. "Exploration of customer churn routes using machine learning probabilistic models." Doctoral thesis, Universitat Politècnica de Catalunya, 2014. http://hdl.handle.net/10803/144660.

Full text
Abstract:
The ongoing processes of globalization and deregulation are changing the competitive framework in the majority of economic sectors. The appearance of new competitors and technologies entails a sharp increase in competition and a growing preoccupation among service providing companies with creating stronger bonds with customers. Many of these companies are shifting resources away from the goal of capturing new customers and are instead focusing on retaining existing ones. In this context, anticipating the customer¿s intention to abandon, a phenomenon also known as churn, and facilitating the launch of retention-focused actions represent clear elements of competitive advantage. Data mining, as applied to market surveyed information, can provide assistance to churn management processes. In this thesis, we mine real market data for churn analysis, placing a strong emphasis on the applicability and interpretability of the results. Statistical Machine Learning models for simultaneous data clustering and visualization lay the foundations for the analyses, which yield an interpretable segmentation of the surveyed markets. To achieve interpretability, much attention is paid to the intuitive visualization of the experimental results. Given that the modelling techniques under consideration are nonlinear in nature, this represents a non-trivial challenge. Newly developed techniques for data visualization in nonlinear latent models are presented. They are inspired in geographical representation methods and suited to both static and dynamic data representation.
APA, Harvard, Vancouver, ISO, and other styles
22

Singh, Maneesha. "A machine learning approach for image enhancement and segmentation for aviation security." Thesis, University of Exeter, 2004. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.410826.

Full text
APA, Harvard, Vancouver, ISO, and other styles
23

Volkov, Mikhail Ph D. Massachusetts Institute of Technology. "Machine learning and coresets for automated real-time data segmentation and summarization." Thesis, Massachusetts Institute of Technology, 2016. http://hdl.handle.net/1721.1/107865.

Full text
Abstract:
Thesis: Ph. D. in Computer Science and Engineering, Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2016.<br>Cataloged from PDF version of thesis.<br>Includes bibliographical references (pages 160-174).<br>In this thesis, we develop a family of real-time data reduction algorithms for large data streams, by computing a compact and meaningful representation of the data called a coreset. This representation can then be used to enable efficient analysis such as segmentation, summarization, classification, and prediction. Our proposed algorithms support large streams and datasets that axe too large to store in memory, allow easy parallelization, and generalize to different data types and analyses. We discuss some of the challenges that arise when dealing with real Big Data systems. Such systems are designed to routinely process unseen, possibly unbounded, data streams; are expected to perform reliably, online, in real-time, in the presence of noise, and under many performance and bandwidth limitations; and are required to produce results that are provably close to optimal. We will motivate the need for new data reduction techniques, in the form of theoretical and practical open problems in computer science, robotics, and medicine, and show how coresets can help to overcome these challenges and enable us to build several practical systems that meet these specifications. We propose a theoretical framework for constructing several coreset algorithms that efficiently compress the data while preserving its semantic content. We provide an efficient construction of our algorithms and present several systems that are capable of handling unbounded, real-time data streams, and are easily scalable and parallelizable. Finally, we demonstrate the performance of our systems with numerous experimental results on a variety of data sources, from financial price data to laparoscopic surgery video.<br>by Mikhail Volkov.<br>Ph. D. in Computer Science and Engineering
APA, Harvard, Vancouver, ISO, and other styles
24

Wu, Qian. "Segmentation-based Retinal Image Analysis." Thesis, Blekinge Tekniska Högskola, Institutionen för datavetenskap, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:bth-18524.

Full text
Abstract:
Context. Diabetic retinopathy is the most common cause of new cases of legal blindness in people of working age. Early diagnosis is the key to slowing the progression of the disease, thus preventing blindness. Retinal fundus image is an important basis for judging these retinal diseases. With the development of technology, computer-aided diagnosis is widely used. Objectives. The thesis is to investigate whether there exist specific regions that could assist in better prediction of the retinopathy disease, it means to find the best region in fundus image that works the best in retinopathy classification with the use of computer vision and machine learning techniques. Methods. An experiment method was used as research methods. With image segmentation techniques, the fundus image is divided into regions to obtain the optic disc dataset, blood vessel dataset, and other regions (regions other than blood vessel and optic disk) dataset. These datasets and original fundus image dataset were tested on Random Forest (RF), Support Vector Machines (SVM) and Convolutional Neural Network (CNN) models, respectively. Results. It is found that the results on different models are inconsistent. As compared to the original fundus image, the blood vessel region exhibits the best performance on SVM model, the other regions perform best on RF model, while the original fundus image has higher prediction accuracy on CNN model. Conclusions. The other regions dataset has more predictive power than original fundus image dataset on RF and SVM models. On CNN model, extracting features from the fundus image does not significantly improve predictive performance as compared to the entire fundus image.
APA, Harvard, Vancouver, ISO, and other styles
25

Ngo, Quang Thanh. "Online perception with machine learning for automated driving." Technische Universität Chemnitz, 2019. https://monarch.qucosa.de/id/qucosa%3A73111.

Full text
Abstract:
The understanding of the environment is the critical ability not only for the living creature but also for automation fields like the robot, automated car, and intelligent system. Especially for some essential task in the domain of automotive such as autonomous driving, path planning, localization, and object detection, the more information we gather, the better the result we get. Intelligent vehicle technology relies on sensorial perception to understand the surroundings of the vehicle. The objective of the research is developing a cooperative online perception system with semantic segmentation for automated driving and improving the current semantic segmentation framework to make it more robust and more suitable for our future projects.
APA, Harvard, Vancouver, ISO, and other styles
26

Choi, Chiyoung. "Predicting Customer Complaints in Mobile Telecom Industry Using Machine Learning Algorithms." Thesis, Purdue University, 2018. http://pqdtopen.proquest.com/#viewpdf?dispub=10791168.

Full text
Abstract:
<p> Mobile telecom industry competition has been fierce for decades, therefore increasing the importance of customer retention. Most mobile operators consider customer complaints as a key factor of customer retention. We implement machine learning algorithms to predict the customer complaints of a Korean mobile telecom company. We used four machine learning algorithms ANN (Artificial Neural Network), SVM (Support Vector Machine), KNN (K-Nearest Neighbors) and DT (Decision Tree). Our experiment utilized a database of 10,000 Korean mobile market subscribers and the variables of gender, age, device manufacturer, service quality, and complaint status. We found that ANN&rsquo;s prediction performance outperformed other algorithms. We also propose the segmented-prediction model for better accuracy and practical usage. Segments of the customer group are examined by gender, age, and device manufacturer. Prediction power is better for female, older customers, and the non-iPhone groups than other segment groups. The highest accuracy s ANN&rsquo;s 87.3% prediction for the 60s group. </p><p>
APA, Harvard, Vancouver, ISO, and other styles
27

Raina, Kevin. "Machine Learning Methods for Brain Lesion Delineation." Thesis, Université d'Ottawa / University of Ottawa, 2020. http://hdl.handle.net/10393/41156.

Full text
Abstract:
Brain lesions are regions of abnormal or damaged tissue in the brain, commonly due to stroke, cancer or other disease. They are diagnosed primarily using neuroimaging, the most common modalities being Magnetic Resonance Imaging (MRI) or Computed Tomography (CT). Brain lesions have a high degree of variability in terms of location, size, intensity and form, which makes diagnosis challenging. Traditionally, radiologists diagnose lesions by inspecting neuroimages directly by eye; however, this is time-consuming and subjective. For these reasons, many automated methods have been developed for lesion delineation (segmentation), lesion identification and diagnosis. The goal of this thesis is to improve and develop automated methods for delineating brain lesions from multimodal MRI scans. First, we propose an improvement to existing segmentation methods by exploiting the bilateral quasi-symmetry of healthy brains, which breaks down when lesions are present. We augment our data using nonlinear registration of a neuroimage to a reflected version of itself, leading to an improvement in Dice coefficient of 13 percent. Second, we model lesion volume in brain image patches with a modified Poisson regression method. The model accurately identified the lesion image with the larger lesion volume for 86 percent of paired sample patches. Both of these projects were published in the proceedings of the BIOSTEC 2020 conference. In the last two chapters, we propose a confidence-based approach to measure segmentation uncertainty, and apply an unsupervised segmentation method based on mutual information.
APA, Harvard, Vancouver, ISO, and other styles
28

Shan, Min. "Building Customer Churn Prediction Models in Fitness Industry with Machine Learning Methods." Thesis, Umeå universitet, Institutionen för datavetenskap, 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:umu:diva-142515.

Full text
Abstract:
With the rapid growth of digital systems, churn management has become a major focus within customer relationship management in many industries. Ample research has been conducted for churn prediction in different industries with various machine learning methods. This thesis aims to combine feature selection and supervised machine learning methods for defining models of churn prediction and apply them on fitness industry. Forward selection is chosen as feature selection methods. Support Vector Machine, Boosted Decision Tree and Artificial Neural Network are used and compared as learning algorithms. The experiment shows the model trained by Boosted Decision Tree delivers the best result in this project. Moreover, the discussion about the findings in the project are presented.
APA, Harvard, Vancouver, ISO, and other styles
29

Rosander, Oliver, and Jim Ahlstrand. "Email Classification with Machine Learning and Word Embeddings for Improved Customer Support." Thesis, Blekinge Tekniska Högskola, Institutionen för datalogi och datorsystemteknik, 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:bth-15946.

Full text
Abstract:
Classifying emails into distinct labels can have a great impact on customer support. By using machine learning to label emails the system can set up queues containing emails of a specific category. This enables support personnel to handle request quicker and more easily by selecting a queue that match their expertise. This study aims to improve the manually defined rule based algorithm, currently implemented at a large telecom company, by using machine learning. The proposed model should have higher F1-score and classification rate. Integrating or migrating from a manually defined rule based model to a machine learning model should also reduce the administrative and maintenance work. It should also make the model more flexible. By using the frameworks, TensorFlow, Scikit-learn and Gensim, the authors conduct five experiments to test the performance of several common machine learning algorithms, text-representations, word embeddings and how they work together. In this article a web based interface were implemented which can classify emails into 33 different labels with 0.91 F1-score using a Long Short Term Memory network. The authors conclude that Long Short Term Memory networks outperform other non-sequential models such as Support Vector Machines and ADABoost when predicting labels for emails.
APA, Harvard, Vancouver, ISO, and other styles
30

Grönros, Lovisa, and Ida Janér. "Predicting Customer Churn Rate in the iGaming Industry using Supervised Machine Learning." Thesis, KTH, Matematisk statistik, 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-228609.

Full text
Abstract:
Mr Green is one of the leading online game providers in the European market. Their mission is to o˙er entertainment and a superior user experience to their customers. To be able to better understand each individual customer and the entire customer life cycle the concept of churn rate is essential, which is also an important input value when calculating the return on marketing e˙orts. This thesis analyzes the feasibility to use 24 hours of initial data on player characteristics and behaviour to predict the probability of each customer churning or not. This is done by examining various supervised machine learning models to determine which model best captures the customer behaviour. The evaluated models are logistic regression, random forest and linear discriminant analysis, as well as two ensemble methods using stacking and voting classifiers. The main finding is that the best accuracy is obtained using a voting ensemble method with the three base models logistic regression, random forest and linear discriminant analysis weighted as w = (0.005, 0.80, 0.015). With this model the attained accuracy is 75.94 %.<br>Mr Green är en av de ledande onlinespelsleverantörerna på den europeiska mark-naden. Deras mission är att erbjuda underhållning och en överlägsen användarup-plevelse till sina kunder. För att bättre kunna förstå sina kunder och deras livs-cykel är kundbortfall ett ytterst viktigt koncept. Det är också ett viktigt mått för att kunna utvärdera resultaten av marknadsföring. Denna rapport analyserar möjligheten att, med 24 timmars data över kundbeteende, kunna avgöra vilka kun-der som kommer att lämna siten. Detta görs genom att undersöka olika modeller inom övervakad maskininlärning för att avgöra vilken som bäst fångar kundernas be-teende. Modellerna som undersöks är logistisk regression, random forest och en linjär diskriminantanalys, samt två olika sammansättningsmodeller som använder sig av stacking och voting. Resultatet av denna studie är att en sammansättningsmodell som väger modellerna logistisk regression, random forest och en linjär diskriminan-tanalys ger den högsta förklaringsgraden på 75.94 %.
APA, Harvard, Vancouver, ISO, and other styles
31

Martinez, Matthew, and Leon Phillip L. De. "Unsupervised Segmentation and Labeling for Smartphone Acquired Gait Data." International Foundation for Telemetering, 2016. http://hdl.handle.net/10150/624183.

Full text
Abstract:
As the population ages, prediction of falls risk is becoming an increasingly important research area. Due to built-in inertial sensors and ubiquity, smartphones provide an at- tractive data collection and computing platform for falls risk prediction and continuous gait monitoring. One challenge in continuous gait monitoring is that signi cant signal variability exists between individuals with a high falls risk and those with low-risk. This variability increases the di cultly in building a universal system which segments and labels changes in signal state. This paper presents a method which uses unsu- pervised learning techniques to automatically segment a gait signal by computing the dissimilarity between two consecutive windows of data, applying an adaptive threshold algorithm to detect changes in signal state, and using a rule-based gait recognition al- gorithm to label the data. Using inertial data,the segmentation algorithm is compared against manually segmented data and is capable of achieving recognition rates greater than 71.8%.
APA, Harvard, Vancouver, ISO, and other styles
32

Wohlfarth, Till. "Machine-learning pour la prédiction des prix dans le secteur du tourisme en ligne." Thesis, Paris, ENST, 2013. http://www.theses.fr/2013ENST0090/document.

Full text
Abstract:
Nous nous intéressons au problème de la prédiction de l’occurrence d’une baisse de prix pour fournir un conseil à l’achat immédiat ou reporté d’un voyage sur un site web de comparaison des prix. La méthodologie proposée repose sur l’apprentissage statistique d’un modèle d’évolution du prix à partir de l’information conjointe d’attributs du voyage considéré et d’observations passées du prix et de la "popularité" celui-ci. L’originalité principale consiste à représenter l’évolution des prix par le processus ponctuel inhomogène des sauts de celui-ci. A partir d’une base de données constituée par liligo.com, nous mettons en oeuvre une méthode d’apprentissage d’un modèle d’évolution des prix. Ce modèle permet de fournir un prédicteur de l’occurrence d’une baisse du prix sur une période future donnée et donc de prodiguer un conseil d’achat ou d’attente au client<br>The goal of this paper is to consider the design of decision-making tools in the context of varying travel prices from the customer’s perspective. Based on vast streams of heterogeneous historical data collected through the internet, we describe here two approaches to forecasting travel price changes at a given horizon, taking as input variables a list of descriptive characteristics of the flight, together with possible features of the past evolution of the related price series. Though heterogeneous in many respects ( e.g. sampling, scale), the collection of historical prices series is here represented in a unified manner, by marked point processes (MPP). State-of-the-art supervised learning algorithms, possibly combined with a preliminary clustering stage, grouping flights whose related price series exhibit similar behavior, can be next used in order to help the customer to decide when to purchase her/his ticket
APA, Harvard, Vancouver, ISO, and other styles
33

Sjölund, Jens. "Dose planning from MRI using machine learning for automatic segmentation of skull and air." Thesis, KTH, Fysik, 2012. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-94833.

Full text
APA, Harvard, Vancouver, ISO, and other styles
34

Kim, Eun Young. "Machine-learning based automated segmentation tool development for large-scale multicenter MRI data analysis." Diss., University of Iowa, 2013. https://ir.uiowa.edu/etd/4998.

Full text
Abstract:
Background: Volumetric analysis of brain structures from structural Mag- netic Resonance (MR) images advances the understanding of the brain by providing means to study brain morphometric changes quantitatively along aging, development, and disease status. Due to the recent increased emphasis on large-scale multicenter brain MR study design, the demand for an automated brain MRI processing tool has increased as well. This dissertation describes an automatic segmentation framework for subcortical structures of brain MRI that is robust for a wide variety of MR data. Method: The proposed segmentation framework, BRAINSCut, is an inte- gration of robust data standardization techniques and machine-learning approaches. First, a robust multi-modal pre-processing tool for automated registration, bias cor- rection, and tissue classification, has been implemented for large-scale heterogeneous multi-site longitudinal MR data analysis. The segmentation framework was then constructed to achieve robustness for large-scale data via the following comparative experiments: 1) Find the best machine-learning algorithm among several available approaches in the field. 2) Find an efficient intensity normalization technique for the proposed region-specific localized normalization with a choice of robust statistics. 3) Find high quality features that best characterize the MR brain subcortical structures. Our tool is built upon 32 handpicked multi-modal muticenter MR images with man- ual traces of six subcortical structures (nucleus accumben, caudate nucleus, globus pallidum, putamen, thalamus, and hippocampus) from three experts. A fundamental task associated with brain MR image segmentation for re- search and clinical trials is the validation of segmentation accuracy. This dissertation evaluated the proposed segmentation framework in terms of validity and reliability. Three groups of data were employed for the various evaluation aspects: 1) traveling human phantom data for the multicenter reliability, 2) a set of repeated scans for the measurement stability across various disease statuses, and 3) a large-scale data from Huntington's disease (HD) study for software robustness as well as segmentation accuracy. Result: Segmentation accuracy of six subcortical structures was improved with 1) the bias-corrected inputs, 2) the two region-specific intensity normalization strategies and 3) the random forest machine-learning algorithm with the selected feature-enhanced image. The analysis of traveling human phantom data showed no center-specific bias in volume measurements from BRAINSCut. The repeated mea- sure reliability of the most of structures also displayed no specific association to disease progression except for caudate nucleus from the group of high risk for HD. The constructed segmentation framework was successfully applied on multicenter MR data from PREDICT-HD [133] study ( < 10% failure rate over 3000 scan sessions pro- cessed). Conclusion: Random-forest based segmentation method is effective and robust to large-scale multicenter data variation, especially with a proper choice of the intensity normalization techniques. Benefits of proper normalization approaches are more apparent compared to the custom set of feature-enhanced images for the ccuracy and robustness of the segmentation tool. BRAINSCut effectively produced subcortical volumetric measurements that are robust to center and disease status with validity confirmed by human experts and low failure rate from large-scale multicenter MR data. Sample size estimation, which is crutial for designing efficient clinical and research trials, is provided based on our experiments for six subcortical structures.
APA, Harvard, Vancouver, ISO, and other styles
35

Abdelsamea, Ahmed Mohammed Mohammed. "Regional active contours based on variational level sets and machine learning for image segmentation." Thesis, IMT Alti Studi Lucca, 2015. http://e-theses.imtlucca.it/160/1/Abdelsamea_phdthesis.pdf.

Full text
Abstract:
Image segmentation is the problem of partitioning an image into different subsets, where each subset may have a different characterization in terms of color, intensity, texture, and/or other features. Segmentation is a fundamental component of image processing, and plays a significant role in computer vision, object recognition, and object tracking. Active Contour Models (ACMs) constitute a powerful energy-based minimization framework for image segmentation, which relies on the concept of contour evolution. Starting from an initial guess, the contour is evolved with the aim of approximating better and better the actual object boundary. Handling complex images in an efficient, effective, and robust way is a real challenge, especially in the presence of intensity inhomogeneity, overlap between the foreground/background intensity distributions, objects characterized by many different intensities, and/or additive noise. In this thesis, to deal with these challenges, we propose a number of image segmentation models relying on variational level set methods and specific kinds of neural networks, to handle complex images in both supervised and unsupervised ways. Experimental results demonstrate the high accuracy of the segmentation results, obtained by the proposed models on various benchmark synthetic and real images compared with state-of-the-art active contour models.
APA, Harvard, Vancouver, ISO, and other styles
36

Sarigul, Erol. "Interactive Machine Learning for Refinement and Analysis of Segmented CT/MRI Images." Diss., Virginia Tech, 2004. http://hdl.handle.net/10919/25954.

Full text
Abstract:
This dissertation concerns the development of an interactive machine learning method for refinement and analysis of segmented computed tomography (CT) images. This method uses higher-level domain-dependent knowledge to improve initial image segmentation results. A knowledge-based refinement and analysis system requires the formulation of domain knowledge. A serious problem faced by knowledge-based system designers is the knowledge acquisition bottleneck. Knowledge acquisition is very challenging and an active research topic in the field of machine learning and artificial intelligence. Commonly, a knowledge engineer needs to have a domain expert to formulate acquired knowledge for use in an expert system. That process is rather tedious and error-prone. The domain expert's verbal description can be inaccurate or incomplete, and the knowledge engineer may not correctly interpret the expert's intent. In many cases, the domain experts prefer to do actions instead of explaining their expertise. These problems motivate us to find another solution to make the knowledge acquisition process less challenging. Instead of trying to acquire expertise from a domain expert verbally, we can ask him/her to show expertise through actions that can be observed by the system. If the system can learn from those actions, this approach is called learning by demonstration. We have developed a system that can learn region refinement rules automatically. The system observes the steps taken as a human user interactively edits a processed image, and then infers rules from those actions. During the system's learn mode, the user views labeled images and makes refinements through the use of a keyboard and mouse. As the user manipulates the images, the system stores information related to those manual operations, and develops internal rules that can be used later for automatic postprocessing of other images. After one or more training sessions, the user places the system into its run mode. The system then accepts new images, and uses its rule set to apply postprocessing operations automatically in a manner that is modeled after those learned from the human user. At any time, the user can return to learn mode to introduce new training information, and this will be used by the system to updates its internal rule set. The system does not simply memorize a particular sequence of postprocessing steps during a training session, but instead generalizes from the image data and from the actions of the human user so that new CT images can be refined appropriately. Experimental results have shown that IntelliPost improves the segmentation accuracy of the overall system by applying postprocessing rules. In tests two different CT datasets of hardwood logs, the use of IntelliPost resulted in improvements of 1.92% and 9.45%, respectively. For two different medical datasets, the use of IntelliPost resulted in improvements of 4.22% and 0.33%, respectively.<br>Ph. D.
APA, Harvard, Vancouver, ISO, and other styles
37

Bergström, Sebastian. "Customer segmentation of retail chain customers using cluster analysis." Thesis, KTH, Matematisk statistik, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-252559.

Full text
Abstract:
In this thesis, cluster analysis was applied to data comprising of customer spending habits at a retail chain in order to perform customer segmentation. The method used was a two-step cluster procedure in which the first step consisted of feature engineering, a square root transformation of the data in order to handle big spenders in the data set and finally principal component analysis in order to reduce the dimensionality of the data set. This was done to reduce the effects of high dimensionality. The second step consisted of applying clustering algorithms to the transformed data. The methods used were K-means clustering, Gaussian mixture models in the MCLUST family, t-distributed mixture models in the tEIGEN family and non-negative matrix factorization (NMF). For the NMF clustering a slightly different data pre-processing step was taken, specifically no PCA was performed. Clustering partitions were compared on the basis of the Silhouette index, Davies-Bouldin index and subject matter knowledge, which revealed that K-means clustering with K = 3 produces the most reasonable clusters. This algorithm was able to separate the customer into different segments depending on how many purchases they made overall and in these clusters some minor differences in spending habits are also evident. In other words there is some support for the claim that the customer segments have some variation in their spending habits.<br>I denna uppsats har klusteranalys tillämpats på data bestående av kunders konsumtionsvanor hos en detaljhandelskedja för att utföra kundsegmentering. Metoden som använts bestod av en två-stegs klusterprocedur där det första steget bestod av att skapa variabler, tillämpa en kvadratrotstransformation av datan för att hantera kunder som spenderar långt mer än genomsnittet och slutligen principalkomponentanalys för att reducera datans dimension. Detta gjordes för att mildra effekterna av att använda en högdimensionell datamängd. Det andra steget bestod av att tillämpa klusteralgoritmer på den transformerade datan. Metoderna som användes var K-means klustring, gaussiska blandningsmodeller i MCLUST-familjen, t-fördelade blandningsmodeller från tEIGEN-familjen och icke-negativ matrisfaktorisering (NMF). För klustring med NMF användes förbehandling av datan, mer specifikt genomfördes ingen PCA. Klusterpartitioner jämfördes baserat på silhuettvärden, Davies-Bouldin-indexet och ämneskunskap, som avslöjade att K-means klustring med K=3 producerar de rimligaste resultaten. Denna algoritm lyckades separera kunderna i olika segment beroende på hur många köp de gjort överlag och i dessa segment finns vissa skillnader i konsumtionsvanor. Med andra ord finns visst stöd för påståendet att kundsegmenten har en del variation i sina konsumtionsvanor.
APA, Harvard, Vancouver, ISO, and other styles
38

Espis, Andrea. "Object detection and semantic segmentation for assisted data labeling." Master's thesis, Alma Mater Studiorum - Università di Bologna, 2022.

Find full text
Abstract:
The automation of data labeling tasks is a solution to the errors and time costs related to human labeling. In this thesis work CenterNet, DeepLabV3, and K-Means applied to the RGB color space, are deployed to build a pipeline for Assisted data labeling: a semi-automatic process to iteratively improve the quality of the annotations. The proposed pipeline pointed out a total of 1547 wrong and missing annotations when applied to a dataset originally containing 8,300 annotations. Moreover, the quality of each annotation has been drastically improved, and at the same time, more than 600 hours of work have been saved. The same models have also been used to address the real-time Tire inspection task, regarding the detection of markers on the surface of tires. According to the experiments, the combination of DeepLabV3 output and post-processing based on the area and shape of the predicted blobs, achieves a maximum of mean Precision 0.992, with mean Recall 0.982, and a maximum of mean Recall 0.998, with mean Precision 0.960.
APA, Harvard, Vancouver, ISO, and other styles
39

Gerard, Alex Michael. "Iterative cerebellar segmentation using convolutional neural networks." Thesis, University of Iowa, 2018. https://ir.uiowa.edu/etd/6579.

Full text
Abstract:
Convolutional neural networks (ConvNets) have quickly become the most widely used tool for image perception and interpretation tasks over the past several years. The single most important resource needed for training a ConvNet that will successfully generalize to unseen examples is an adequately sized labeled dataset. In many interesting medical imaging cases, the necessary size or quality of training data is not suitable for directly training a ConvNet. Furthermore, access to the expertise to manually label such datasets is often infeasible. To address these barriers, we investigate a method for iterative refinement of the ConvNet training. Initially, unlabeled images are attained, minimal labeling is performed, and a model is trained on the sparse manual labels. At the end of each training iteration, full images are predicted, and additional manual labels are identified to improve the training dataset. In this work, we show how to utilize patch-based ConvNets to iteratively build a training dataset for automatically segmenting MRI images of the human cerebellum. We construct this training dataset using a small collection of high-resolution 3D images and transfer the resulting model to a much larger, much lower resolution, collection of images. Both T1-weighted and T2-weighted MRI modalities are utilized to capture the additional features that arise from the differences in contrast between modalities. The objective is to perform tissue-level segmentation, classifying each volumetric pixel (voxel) in an image as white matter, gray matter, or cerebrospinal fluid (CSF). We will present performance results on the lower resolution dataset, and report achieving a 12.7% improvement in accuracy over the existing segmentation method, expectation maximization. Further, we will present example segmentations from our iterative approach that demonstrate it’s ability to detect white matter branching near the outer regions of the anatomy, which agrees with the known biological structure of the cerebellum and has typically eluded traditional segmentation algorithms.
APA, Harvard, Vancouver, ISO, and other styles
40

Giraldo, Zuluaga Jhony Heriberto. "Graph-based Algorithms in Computer Vision, Machine Learning, and Signal Processing." Electronic Thesis or Diss., La Rochelle, 2022. http://www.theses.fr/2022LAROS037.

Full text
Abstract:
L'apprentissage de la représentation graphique et ses applications ont suscité une attention considérable ces dernières années. En particulier, les Réseaux Neuronaux Graphiques (RNG) et le Traitement du Signal Graphique (TSG) ont été largement étudiés. Les RNGs étendent les concepts des réseaux neuronaux convolutionnels aux données non euclidiennes modélisées sous forme de graphes. De même, le TSG étend les concepts du traitement classique des signaux numériques aux signaux supportés par des graphes. Les RNGs et TSG ont de nombreuses applications telles que l'apprentissage semi-supervisé, la segmentation sémantique de nuages de points, la prédiction de relations individuelles dans les réseaux sociaux, la modélisation de protéines pour la découverte de médicaments, le traitement d'images et de vidéos. Dans cette thèse, nous proposons de nouvelles approches pour le traitement des images et des vidéos, les RNGs, et la récupération des signaux de graphes variant dans le temps. Notre principale motivation est d'utiliser l'information géométrique que nous pouvons capturer à partir des données pour éviter les méthodes avides de données, c'est-à-dire l'apprentissage avec une supervision minimale. Toutes nos contributions s'appuient fortement sur les développements de la TSG et de la théorie spectrale des graphes. En particulier, la théorie de l'échantillonnage et de la reconstruction des signaux de graphes joue un rôle central dans cette thèse. Les principales contributions de cette thèse sont résumées comme suit : 1) nous proposons de nouveaux algorithmes pour la segmentation d'objets en mouvement en utilisant les concepts de la TSG et des RNGs, 2) nous proposons un nouvel algorithme pour la segmentation sémantique faiblement supervisée en utilisant des réseaux de neurones hypergraphiques, 3) nous proposons et analysons les RNGs en utilisant les concepts de la TSG et de la théorie des graphes spectraux, et 4) nous introduisons un nouvel algorithme basé sur l'extension d'une fonction de lissage de Sobolev pour la reconstruction de signaux graphiques variant dans le temps à partir d'échantillons discrets<br>Graph representation learning and its applications have gained significant attention in recent years. Notably, Graph Neural Networks (GNNs) and Graph Signal Processing (GSP) have been extensively studied. GNNs extend the concepts of convolutional neural networks to non-Euclidean data modeled as graphs. Similarly, GSP extends the concepts of classical digital signal processing to signals supported on graphs. GNNs and GSP have numerous applications such as semi-supervised learning, point cloud semantic segmentation, prediction of individual relations in social networks, modeling proteins for drug discovery, image, and video processing. In this thesis, we propose novel approaches in video and image processing, GNNs, and recovery of time-varying graph signals. Our main motivation is to use the geometrical information that we can capture from the data to avoid data hungry methods, i.e., learning with minimal supervision. All our contributions rely heavily on the developments of GSP and spectral graph theory. In particular, the sampling and reconstruction theory of graph signals play a central role in this thesis. The main contributions of this thesis are summarized as follows: 1) we propose new algorithms for moving object segmentation using concepts of GSP and GNNs, 2) we propose a new algorithm for weakly-supervised semantic segmentation using hypergraph neural networks, 3) we propose and analyze GNNs using concepts from GSP and spectral graph theory, and 4) we introduce a novel algorithm based on the extension of a Sobolev smoothness function for the reconstruction of time-varying graph signals from discrete samples
APA, Harvard, Vancouver, ISO, and other styles
41

Sergue, Marie. "Customer Churn Analysis and Prediction using Machine Learning for a B2B SaaS company." Thesis, KTH, Matematisk statistik, 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-269540.

Full text
Abstract:
This past decade, the majority of services have been digitalized and data more and more available, easy to store and to process in order to understand customers behaviors. In order to be leaders in their proper industries, subscription-based businesses must focus on their Customer Relationship Management and in particular churn management, that is understanding customers cancelling their subscription. In this thesis, churn analysis is performed on real life data from a Software as a Service (SaaS) company selling an advanced cloud-based business phone system, Aircall. This use case has the particularity that the available dataset gathers customers data on a monthly basis and has a very imbalanced distribution of the target: a large majority of customers do not churn. Therefore, several methods are tried in order to diminish the impact of the imbalance while remaining as close as possible to the real world and the temporal framework. These methods include oversampling and undersampling (SMOTE and Tomek's link) and time series cross-validation. Then logistic regression and random forest models are used with an aim to both predict and explain churn.The non-linear method performed better than logistic regression, suggesting the limitation of linear models for our use case. Moreover, mixing oversampling with undersampling gives better performances in terms of precision/recall trade-off. Time series cross-validation also happens to be an efficient method to improve performance of the model. Overall, the resulting model is more useful to explain churn than to predict it. It highlighted some features majorly influencing churn, mostly related to product usage.<br>Under det senaste decenniet har många tjänster digitaliserats och data blivit mer och mer tillgängliga, enkla att lagra och bearbeta med syftet att förstå kundbeteende. För att kunna vara ledande inom sina branscher måste prenumerationsbaserade företag fokusera på kundrelationshantering och i synnerhet churn management, det vill säga förståelse för hur kunder avbryter sin prenumeration. I denna uppsats utförs kärnanalys på verkliga data från ett SaaS-företag (software as a service) som säljer ett avancerat molnbaserat företagstelefonsystem, Aircall. Denna fallstudie är speciell på så sätt att den tillgängliga datamängden består av månatlig kunddata med en mycket ojämn fördelning: en stor majoritet av kunderna avbryter inte sina prenumerationer. Därför undersöks flera metoder för att minska effekten av denna obalans, samtidigt som de förblir så nära den verkliga världen och den tidsmässiga ramen. Dessa metoder inkluderar översampling och undersampling (SMOTE och Tomeks länk) och korsvalidering av tidsserier. Sedan används logistisk regression och random forests i syfte att både förutsäga och förklara prenumerationsbortfall. Den icke-linjära metoden presterade bättre än logistisk regression, vilket tyder på en begränsning hos linjära modeller i vårt användningsfall. Dessutom ger blandning av översampling med undersampling bättre prestanda när det gäller precision och återkoppling. Korsvalidering av tidsserier är också en effektiv metod för att förbättra modellens prestanda. Sammantaget är den resulterande modellen mer användbar för att förklara bortfall än att förutsäga dessa. Med hjälp av modellen kunde vissa faktorer, främst relaterade till produktanvändning, som påverkar bortfallet identifieras.
APA, Harvard, Vancouver, ISO, and other styles
42

Delissen, Johan. "Graph Based Machine Learning approaches and Clustering in a Customer Relationship Management Setting." Thesis, Luleå tekniska universitet, Institutionen för system- och rymdteknik, 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:ltu:diva-81892.

Full text
Abstract:
This master thesis investigates the utilisation of various graph based machine learning models for solving a customer segmentation problem, a task coupled to Customer Relationship Management, where the objective is to divide customers into different groups based on similar attributes. More specifically a customer segmentation problem is solved via an unsupervised machine learning technique named clustering, using the k-means clustering algorithm. Three different representations of customers as a vector of attributes are created and then utilised by the k-means algorithm to divide users into different clusters. The first representation is using a elementary feature vector and the other two approaches are using feature vectors produced by graph based machine learning models. Results show that similar grouping are found but that results vary depending on what data is included in the instantiation and training of the various approaches and their corresponding models.
APA, Harvard, Vancouver, ISO, and other styles
43

Hast, Matteus. "Evaluation of machine learning algorithms for customer demand prediction of in-flight meals." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-255020.

Full text
Abstract:
This study aims to evaluate multiple Machine Learning Algorithms (MLAs) for estimating the customer demand of in-flight meals. As a result of the review of related works, four MLAs were selected, namely Linear Regression (LR), Support Vector Regression (SVR), Extreme Gradient Boosting (XGBoost) and a Multilayer Perceptron Neural Network (MLP). The study investigates which MLA is best suited for the problem at hand and which features are most influential for customer demand prediction of in-flight meals. Focus is put on finding applicable MLAs and on evaluating, comparing and tweaking the parameters of the MLAs to further optimise the selected models. The available data set comes from a single airline company and consists mainly of flights with a short to medium long flight duration time.The results show that the four evaluated models, LR, SVR, XGBoost and MLP performs with no significant difference against one another and are comparable in their performance in regard to estimation accuracy with results close to each other’s. However, the SVR model underperforms in regard to model fitting and prediction time in comparison towards the remaining three models. Furthermore, the most important feature for customer demand prediction of in-flight meals is the scheduled flight duration time.<br>Syftet med den här studien är att utvärdera ett flertal maskininlärningsalgoritmer för prediktering av konsumentefterfrågan för måltider under flygning. Undersökningen över tidigare arbeten utförda i liknande fält resulterade i att fyra maskininlärningsalgoritmer blev valda, nämligen linjär regression, stödvektormaskin för regression, Extreme Gradient Boosting och ett flerlagersperceptron-neuronnät. Studien utforskar vilken maskininlärningsalgoritm som är bäst anpassad för att prediktera problemet samt vilka egenskaper i datat som är mest inflytesrika när det handlar om att prediktera konsumentefterfrågan av måltider under flygning. Fokus ligger på att finna applicerbara maskininlärningsalgoritmer och på att utvärdera, jämföra samt på att justera parametrarna i syfte till att optimera modellerna. Den tillgängliga datan härstammar från ett enstaka flygbolag och består mestadels av korta och mediumlånga flyg.Resultatet påvisar att de fyra modellerna, linjär regression, en stödvektormaskin för regression, Extreme Gradient Boosting och ett flerlagersperceptron-neuronnät presterar utan någon signifikant skillnad gentemot varandra och är jämförbara i deras prestation i avseende till predikteringprecision med liknande resultat. I avseende till modellanpassningsoch predikteringstid underpresterar dock stödvektormaskinen avsevärt i jämförelse med de resterande tre modellerna. Resultatet visar även att den viktigaste egenskapen i datat för prediktering av konsumentefterfrågan av måltider under flygning är den schemalagda flygtiden.
APA, Harvard, Vancouver, ISO, and other styles
44

Pehrson, Jakob, and Sara Lindstrand. "Support Unit Classification through Supervised Machine Learning." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-281537.

Full text
Abstract:
The purpose of this article is to evaluate the impact a supervised machine learning classification model can have on the process of internal customer support within a large digitized company. Chatbots are becoming a frequently used utility among digital services, though the true general impact is not always clear. The research is separated into the following two questions: (1) Which supervised machine learning algorithm of naïve Bayes, logistic regression, and neural networks can best predict the correct support a user needs and with what accuracy? And (2) What is the effect on the productivity and customer satisfaction of using machine learning to sort customer needs? The data was collected from the internal server database of a large digital company and was then trained on and tested with the three classification algorithms. Furthermore, a survey was collected with questions focused on understanding how the current system affects the involved employees. A first finding indicates that neural networks is the best suited model for the classification task. Though, when the scope and complexity was limited, naïve Bayes and logistic regression performed sufficiently. A second finding of the study is that the classification model potentially improves productivity given that the baseline is met. However, a difficulty exists in drawing conclusions on the exact effects on customer satisfaction since there are many aspects to take into account. Nevertheless, there is a good potential to achieve a positive net effect.<br>Syftet med artikeln är att utvärdera den påverkan som en klassificeringsmodell kan ha på den interna processen av kundtjänst inom ett stort digitaliserat företag. Chatbotar används allt mer frekvent bland digitala tjänster, även om den generella effekten inte alltid är tydlig. Studien är uppdelad i följande två frågeställningar: (1) Vilken klassificeringsalgoritm bland naive Bayes, logistisk regression, och neurala nätverk kan bäst förutspå den korrekta hjälpen en användare är i behov av och med vilken noggrannhet? Och (2) Vad är effekten på produktivitet och kundnöjdhet för användandet av maskininlärning för sortering av kundbehov? Data samlades från ett stort, digitalt företags interna databas och används sedan i träning och testning med de tre klassificeringsalgoritmerna. Vidare, en enkät skickades ut med fokus på att förstå hur det nuvarande systemet påverkar de berörda arbetarna. Ett första fynd indikerar att neurala nätverk är den mest lämpade modellen för klassificeringen. Däremot, när omfånget och komplexiteten var begränsat presenterade även naive Bayes och logistisk regression tillräckligt. Ett andra fynd av studien är att klassificeringen potentiellt förbättrar produktiviteten givet att baslinjen är mött. Däremot existerar en svårighet i att dra slutsatser om den exakta effekten på kundnöjdhet eftersom det finns många olika aspekter att ta hänsyn till. Likväl finns en god potential i att uppnå en positiv nettoeffekt.
APA, Harvard, Vancouver, ISO, and other styles
45

Haupt, Johannes Sebastian. "Machine Learning for Marketing Decision Support." Doctoral thesis, Humboldt-Universität zu Berlin, 2020. http://dx.doi.org/10.18452/21554.

Full text
Abstract:
Die Digitalisierung der Wirtschaft macht das Customer Targeting zu einer wichtigen Schnittmenge von Marketing und Wirtschaftsinformatik. Marketingtreibende können auf Basis von soziodemografischen und Verhaltensdaten gezielt einzelne Kunden mit personalisierten Botschaften ansprechen. Diese Arbeit erweitert die Perspektive der Forschung im Bereich der modellbasierten Vorhersage von Kundenverhalten durch 1) die Entwicklung und Validierung neuer Methoden des maschinellen Lernens, die explizit darauf ausgelegt sind, die Profitabilität des Customer Targeting im Direktmarketing und im Kundenbindungsmanagement zu optimieren, und 2) die Untersuchung der Datenerfassung mit Ziel des Customer Targeting aus Unternehmens- und Kundensicht. Die Arbeit entwickelt Methoden welche den vollen Umfang von E-Commerce-Daten nutzbar machen und die Rahmenbedingungen der Marketingentscheidung während der Modellbildung berücksichtigen. Die zugrundeliegenden Modelle des maschinellen Lernens skalieren auf hochdimensionale Kundendaten und ermöglichen die Anwendung in der Praxis. Die vorgeschlagenen Methoden basieren zudem auf dem Verständnis des Customer Targeting als einem Problem der Identifikation von Kausalzusammenhängen. Die Modellschätzung sind für die Umsetzung profitoptimierter Zielkampagnen unter komplexen Kostenstrukturen ausgelegt. Die Arbeit adressiert weiterhin die Quantifizierung des Einsparpotenzials effizienter Versuchsplanung bei der Datensammlung und der monetären Kosten der Umsetzung des Prinzips der Datensparsamkeit. Eine Analyse der Datensammlungspraktiken im E-Mail-Direktmarketing zeigt zudem, dass eine Überwachung des Leseverhaltens in der Marketingkommunikation von E-Commerce-Unternehmen ohne explizite Kundenzustimmung weit verbreitet ist. Diese Erkenntnis bildet die Grundlage für ein auf maschinellem Lernen basierendes System zur Erkennung und Löschung von Tracking-Elementen in E-Mails.<br>The digitization of the economy has fundamentally changed the way in which companies interact with customers and made customer targeting a key intersection of marketing and information systems. Building models of customer behavior at scale requires development of tools at the intersection of data management and statistical knowledge discovery. This dissertation widens the scope of research on predictive modeling by focusing on the intersections of model building with data collection and decision support. Its goals are 1) to develop and validate new machine learning methods explicitly designed to optimize customer targeting decisions in direct marketing and customer retention management and 2) to study the implications of data collection for customer targeting from the perspective of the company and its customers. First, the thesis proposes methods that utilize the richness of e-commerce data, reduce the cost of data collection through efficient experiment design and address the targeting decision setting during model building. The underlying state-of-the-art machine learning models scale to high-dimensional customer data and can be conveniently applied by practitioners. These models further address the problem of causal inference that arises when the causal attribution of customer behavior to a marketing incentive is difficult. Marketers can directly apply the model estimates to identify profitable targeting policies under complex cost structures. Second, the thesis quantifies the savings potential of efficient experiment design and the monetary cost of an internal principle of data privacy. An analysis of data collection practices in direct marketing emails reveals the ubiquity of tracking mechanisms without user consent in e-commerce communication. These results form the basis for a machine-learning-based system for the detection and deletion of tracking elements from emails.
APA, Harvard, Vancouver, ISO, and other styles
46

Riviere, Jean-Philippe. "Capturing traces of the dance learning process." Electronic Thesis or Diss., université Paris-Saclay, 2020. http://www.theses.fr/2020UPASG054.

Full text
Abstract:
Cette thèse porte sur la conception d’outils interactifs pour comprendre et faciliter l’apprentissage de la danse à partir de vidéos. Les processus d’apprentissage des danseurs représentent une source d’informations riches pour les chercheurs qui s’intéressent à la conception de systèmes soutenant l’apprentissage moteur. En effet, les danseurs experts réutilisent un large éventail de compétences qu’ils ont appris. Cependant, ces compétences sont en partie le résultat de connaissances implicites et incarnées,qui sont difficilement exprimables et verbalisables par un individu.Dans cette thèse, je soutiens que nous pouvons capturer et sauvegarder une trace des connaissances implicites des danseurs et les utiliser pour concevoir des outils interactifs qui soutiennent l’apprentissage de la danse. Mon approche consiste à étudier différentes sessions d’apprentissage de danse dans des contextes réels, aussi bien individuels que collaboratifs.Sur la base des résultats apportés par ces études, je contribue à une meilleure compréhension des processus implicites qui sous-tendent l’apprentissage de la danse dans des contextes individuels et collectifs. Je présente plusieurs stratégies d’apprentissage utilisées par des danseurs et j’affirme que l’on peut documenter ces stratégies en sauvegardant une trace de l’apprentissage. Je discute de l’opportunité que représente la capture de ces connaissances incarnées et j’apporte de nouvelles perspectives pour la conception d’outils d’aide à l’apprentissage du mouvement par la vidéo<br>This thesis focuses on designing interactive tools to understand and support dance learning from videos. Dancers' learning practice represents a rich source of information for researchers interested in designing systems that support motor learning. Indeed, dancers embody a wide range of skills that they reuse during new dance sequences learning. However, these skills are in part the result of embodied implicit knowledge. In this thesis, I argue that we can capture and save traces of dancers' embodied knowledge and use them to design interactive tools that support dance learning. My approach is to study real-life dance learning tasks in individual and collaborative settings. Based on the findings from all the studies, I discuss the challenge of capturing embodied knowledge to support dancers’ learning practice. My thesis highlights that although dancers’ learning processes are diverse, similar strategies emerge to structure their learning process. Finally, I bring and discuss new perspectives to the design of movement-based learning tools
APA, Harvard, Vancouver, ISO, and other styles
47

Persson, Peter. "Starved neural learning : Morpheme segmentation using low amounts of data." Thesis, Stockholms universitet, Avdelningen för datorlingvistik, 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:su:diva-160953.

Full text
Abstract:
Automatic morpheme segmentation as a field has been dominated by unsupervised methods since its inception. Partly due to theoretical motivations, but also due to resource constraints. Given the success neural network methods have shown on a wide variety of field in later years, it would seem compelling to apply these methods to the morpheme segmentation field. This study explores the efficacy of modern neural networks, specifically convolutional neural networks and Bi-directional LSTM networks, on the morpheme segmentation task in a resource low setting to determine their viability as contenders with previous unsupervised, minimally supervised, and semi-supervised systems in the field. One architecture of each type is implemented and trained on a new gold standard data set and the results are compared to previously established methods. A qualitative error analysis of the architectures’ segmentations is also performed. The study demonstrates that a BLSTM system can be trained with minimal effort to produce a proof of concept solution at low levels of training data and suggests that BLSTM methods may be a fruitful direction for further research in this field.
APA, Harvard, Vancouver, ISO, and other styles
48

Syed, Tahir Qasim. "Analysis of the migratory potential of cancerous cells by image preprocessing, segmentation and classification." Thesis, Evry-Val d'Essonne, 2011. http://www.theses.fr/2011EVRY0041/document.

Full text
Abstract:
Ce travail de thèse s’insère dans un projet de recherche plus global dont l’objectif est d’analyser le potentiel migratoire de cellules cancéreuses. Dans le cadre de ce doctorat, on s’intéresse à l’utilisation du traitement des images pour dénombrer et classifier les cellules présentes dans une image acquise via un microscope. Les partenaires biologistes de ce projet étudient l’influence de l’environnement sur le comportement migratoire de cellules cancéreuses à partir de cultures cellulaires pratiquées sur différentes lignées de cellules cancéreuses. Le traitement d’images biologiques a déjà donné lieu `a un nombre important de publications mais, dans le cas abordé ici et dans la mesure où le protocole d’acquisition des images acquises n'était pas figé, le défi a été de proposer une chaîne de traitements adaptatifs ne contraignant pas les biologistes dans leurs travaux de recherche. Quatre étapes sont détaillées dans ce mémoire. La première porte sur la définition des prétraitements permettant d’homogénéiser les conditions d’acquisition. Le choix d’exploiter l’image des écarts-type plutôt que la luminosité est un des résultats issus de cette première partie. La deuxième étape consiste à compter le nombre de cellules présentent dans l’image. Un filtre original, nommé filtre «halo», permettant de renforcer le centre des cellules afin d’en faciliter leur comptage, a été proposé. Une étape de validation statistique de ces centres permet de fiabiliser le résultat obtenu. L’étape de segmentation des images, sans conteste la plus difficile, constitue la troisième partie de ce travail. Il s’agit ici d’extraire des «vignettes», contenant une seule cellule. Le choix de l’algorithme de segmentation a été celui de la «Ligne de Partage des Eaux», mais il a fallu adapter cet algorithme au contexte des images faisant l’objet de cette étude. La proposition d’utiliser une carte de probabilités comme données d’entrée a permis d’obtenir une segmentation au plus près des bords des cellules. Par contre cette méthode entraine une sur-segmentation qu’il faut réduire afin de tendre vers l’objectif : «une région = une cellule». Pour cela un algorithme utilisant un concept de hiérarchie cumulative basée morphologie mathématique a été développée. Il permet d’agréger des régions voisines en travaillant sur une représentation arborescente de ces régions et de leur niveau associé. La comparaison des résultats obtenus par cette méthode à ceux proposés par d’autres approches permettant de limiter la sur-segmentation a permis de prouver l’efficacité de l’approche proposée. L’étape ultime de ce travail consiste dans la classification des cellules. Trois classes ont été définies : cellules allongées (migration mésenchymateuse), cellules rondes «blebbantes» (migration amiboïde) et cellules rondes «lisses» (stade intermédiaire du mode de migration). Sur chaque vignette obtenue à la fin de l’étape de segmentation, des caractéristiques de luminosité, morphologiques et texturales ont été calculées. Une première analyse de ces caractéristiques a permis d’élaborer une stratégie de classification, à savoir séparer dans un premier temps les cellules rondes des cellules allongées, puis séparer les cellules rondes «lisses» des «blebbantes». Pour cela on divise les paramètres en deux jeux qui vont être utilisés successivement dans ces deux étapes de classification. Plusieurs algorithmes de classification ont été testés pour retenir, au final, l’utilisation de deux réseaux de neurones permettant d’obtenir plus de 80% de bonne classification entre cellules longues et cellules rondes, et près de 90% de bonne classification entre cellules rondes «lisses» et «blebbantes»<br>This thesis is part of a broader research project which aims to analyze the potential migration of cancer cells. As part of this doctorate, we are interested in the use of image processing to count and classify cells present in an image acquired usinga microscope. The partner biologists of this project study the influence of the environment on the migratory behavior of cancer cells from cell cultures grown on different cancer cell lines. The processing of biological images has so far resulted in a significant number of publications, but in the case discussed here, since the protocol for the acquisition of images acquired was not fixed, the challenge was to propose a chain of adaptive processing that does not constrain the biologists in their research. Four steps are detailed in this paper. The first concerns the definition of pre-processing steps to homogenize the conditions of acquisition. The choice to use the image of standard deviations rather than the brightness is one of the results of this first part. The second step is to count the number of cells present in the image. An original filter, the so-called “halo” filter, that reinforces the centre of the cells in order to facilitate counting, has been proposed. A statistical validation step of the centres affords more reliability to the result. The stage of image segmentation, undoubtedly the most difficult, constitutes the third part of this work. This is a matter of extracting images each containing a single cell. The choice of segmentation algorithm was that of the “watershed”, but it was necessary to adapt this algorithm to the context of images included in this study. The proposal to use a map of probabilities as input yielded a segmentation closer to the edges of cells. As against this method leads to an over-segmentation must be reduced in order to move towards the goal: “one region = one cell”. For this algorithm the concept of using a cumulative hierarchy based on mathematical morphology has been developed. It allows the aggregation of adjacent regions by working on a tree representation ofthese regions and their associated level. A comparison of the results obtained by this method with those proposed by other approaches to limit over-segmentation has allowed us to prove the effectiveness of the proposed approach. The final step of this work consists in the classification of cells. Three classes were identified: spread cells (mesenchymal migration), “blebbing” round cells (amoeboid migration) and “smooth” round cells (intermediate stage of the migration modes). On each imagette obtained at the end of the segmentation step, intensity, morphological and textural features were calculated. An initial analysis of these features has allowed us to develop a classification strategy, namely to first separate the round cells from spread cells, and then separate the “smooth” and “blebbing” round cells. For this we divide the parameters into two sets that will be used successively in Two the stages of classification. Several classification algorithms were tested, to retain in the end, the use of two neural networks to obtain over 80% of good classification between long cells and round cells, and nearly 90% of good Classification between “smooth” and “blebbing” round cells
APA, Harvard, Vancouver, ISO, and other styles
49

Malmberg, Olle, and Bobby Zhou. "Using Machine Learning to Detect Customer Acquisition Opportunities and Evaluating the Required Organizational Prerequisites." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-263056.

Full text
Abstract:
This paper aims to investigate whether or not it is possible to identify users who are about change provider of service with machine learning. It is believed that the Consumer Decision Journey is a better model than traditional funnel models when it comes to depicting the processes which consumers go through, leading up to a purchase. Analytical and operational Customer Relationship Management are presented as possible fields where such implementations can be useful. Based on previous studies, Random Forest and XGBoost were chosen as algorithms to be further evaluated because of its general high performance. The final results were produced by an iterative process which began with data processing followed by feature selection, training of model and testing the model. Literature review and unstructured and semi-structured interviews with the employer Growth Hackers Sthlm were also used as methods in a complementary fashion, with the purpose of gaining a wider perspective of the state-of-the-art of ML-implementations. The final results showed that Random Forest could identify the sought-after users (positive) while XGBoost was inferior to Random Forest in terms of distinguishing between positive and negative classes. An implementation of such model could support and benefit an organization’s customer acquisition operations. However, organizational prerequisites regarding the data infrastructure and the level of AI and machine learning integration in the organization’s culture are the most important ones and need to be considered before such implementations.<br>I det här arbetet undersöks huruvida det är möjligt att identifiera ett beteende bland användare som innebär att användaren snart ska byta tillhandahållare av tjänst med hjälp av maskininlärning. Målet är att kunna bidra till ett maskininlärningsverktyg i kundförvärvningssyfte, såsom analytical och operational Customer Relationship Management. Det sökta beteendet i rapporten utgår från modellen ”the Consumer Decision Journey”. I modellen beskrivs fyra faser där fas två innebär att konsumenten aktivt söker samt är mer mottaglig för information kring köpet. Genom tidigare studier och handledning av uppdragsgivare valdes algoritmerna RandomForest och XGBoost som huvudsakliga algoritmer som skulle testas. Resultaten producerades genom en iterativ process. Det första steget var att städa data. Därefter valdes parametrar och viktades. Sedan testades algoritmerna mot testdata och utvärderades. Detta gjordes i loopar tills förbättringar endast var marginella. De slutliga resultaten visade att framförallt Random Forest kunde identifiera ett beteende som innebär att en användare är i fas 2, medan XGBoost presterade sämre när det kom till att urskilja bland positiva och negativa användare. Dock fångade XGBoost fler positiva användare än vad Random Forest gjorde. I syfte att undersöka de organisatoriska förutsättningarna för att implementera maskininlärning och AI gjordes litteraturstudier och uppdragsgivaren intervjuades kontinuerligt. De viktigaste förutsättningarna fastställdes till två kategorier, datainfrastruktur och hur väl AI och maskininlärning är integrerat i organisationens kultur.
APA, Harvard, Vancouver, ISO, and other styles
50

Sörsäter, Michael. "Active Learning for Road Segmentation using Convolutional Neural Networks." Thesis, Linköpings universitet, Datorseende, 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-152286.

Full text
Abstract:
In recent years, development of Convolutional Neural Networks has enabled high performing semantic segmentation models. Generally, these deep learning based segmentation methods require a large amount of annotated data. Acquiring such annotated data for semantic segmentation is a tedious and expensive task. Within machine learning, active learning involves in the selection of new data in order to limit the usage of annotated data. In active learning, the model is trained for several iterations and additional samples are selected that the model is uncertain of. The model is then retrained on additional samples and the process is repeated again. In this thesis, an active learning framework has been applied to road segmentation which is semantic segmentation of objects related to road scenes. The uncertainty in the samples is estimated with Monte Carlo dropout. In Monte Carlo dropout, several dropout masks are applied to the model and the variance is captured, working as an estimate of the model’s uncertainty. Other metrics to rank the uncertainty evaluated in this work are: a baseline method that selects samples randomly, the entropy in the default predictions and three additional variations/extensions of Monte Carlo dropout. Both the active learning framework and uncertainty estimation are implemented in the thesis. Monte Carlo dropout performs slightly better than the baseline in 3 out of 4 metrics. Entropy outperforms all other implemented methods in all metrics. The three additional methods do not perform better than Monte Carlo dropout. An analysis of what kind of uncertainty Monte Carlo dropout capture is performed together with a comparison of the samples selected by baseline and Monte Carlo dropout. Future development and possible improvements are also discussed.
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!