Log in

Relevant bibliographies by topics / Feature and model selection / Dissertations / Theses

To see the other types of publications on this topic, follow the link: Feature and model selection.

Dissertations / Theses on the topic 'Feature and model selection'

Author: Grafiati

Published: 3 June 2025

Last updated: 23 June 2025

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 dissertations / theses for your research on the topic 'Feature and model selection.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.

1

Gustafsson, Robin. "Ordering Classifier Chains using filter model feature selection techniques." Thesis, Blekinge Tekniska Högskola, Institutionen för datalogi och datorsystemteknik, 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:bth-14817.

Full text

Abstract:

Context: Multi-label classification concerns classification with multi-dimensional output. The Classifier Chain breaks the multi-label problem into multiple binary classification problems, chaining the classifiers to exploit dependencies between labels. Consequently, its performance is influenced by the chain's order. Approaches to finding advantageous chain orders have been proposed, though they are typically costly. Objectives: This study explored the use of filter model feature selection techniques to order Classifier Chains. It examined how feature selection techniques can be adapted to evaluate label dependence, how such information can be used to select a chain order and how this affects the classifier's performance and execution time. Methods: An experiment was performed to evaluate the proposed approach. The two proposed algorithms, Forward-Oriented Chain Selection (FOCS) and Backward-Oriented Chain Selection (BOCS), were tested with three different feature evaluators. 10-fold cross-validation was performed on ten benchmark datasets. Performance was measured in accuracy, 0/1 subset accuracy and Hamming loss. Execution time was measured during chain selection, classifier training and testing. Results: Both proposed algorithms led to improved accuracy and 0/1 subset accuracy (Friedman & Hochberg, p < 0.05). FOCS also improved the Hamming loss while BOCS did not. Measured effect sizes ranged from 0.20 to 1.85 percentage points. Execution time was increased by less than 3 % in most cases. Conclusions: The results showed that the proposed approach can improve the Classifier Chain's performance at a low cost. The improvements appear similar to comparable techniques in magnitude but at a lower cost. It shows that feature selection techniques can be applied to chain ordering, demonstrates the viability of the approach and establishes FOCS and BOCS as alternatives worthy of further consideration.

APA, Harvard, Vancouver, ISO, and other styles

2

McCann, Michael. "A feature selection design model for business improvement in semiconductor process engineering." Thesis, Ulster University, 2011. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.538949.

Full text

APA, Harvard, Vancouver, ISO, and other styles

3

Maboudi, Afkham Heydar. "Improving Image Classification Performance using Joint Feature Selection." Doctoral thesis, KTH, Datorseende och robotik, CVAP, 2014. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-144896.

Full text

Abstract:

In this thesis, we focus on the problem of image classification and investigate how its performance can be systematically improved. Improving the performance of different computer vision methods has been the subject of many studies. While different studies take different approaches to achieve this improvement, in this thesis we address this problem by investigating the relevance of the statistics collected from the image. We propose a framework for gradually improving the quality of an already existing image descriptor. In our studies, we employ a descriptor which is composed the response of a series of discriminative components for summarizing each image. As we will show, this descriptor has an ideal form in which all categories become linearly separable. While, reaching this form is not possible, we will argue how by replacing a small fraction of these components, it is possible to obtain a descriptor which is, on average, closer to this ideal form. To do so, we initially identify which components do not contribute to the quality of the descriptor and replace them with more robust components. As we will show, this replacement has a positive effect on the quality of the descriptor. While there are many ways of obtaining more robust components, we introduce a joint feature selection problem to obtain image features that retains class discriminative properties while simultaneously generalising between within class variations. Our approach is based on the concept of a joint feature where several small features are combined in a spatial structure. The proposed framework automatically learns the structure of the joint constellations in a class dependent manner improving the generalisation and discrimination capabilities of the local descriptor while still retaining a low-dimensional representations. The joint feature selection problem discussed in this thesis belongs to a specific class of latent variable models that assumes each labeled sample is associated with a set of different features, with no prior knowledge of which feature is the most relevant feature to be used. Deformable-Part Models (DPM) can be seen as good examples of such models. These models are usually considered to be expensive to train and very sensitive to the initialization. Here, we focus on the learning of such models by introducing a topological framework and show how it is possible to both reduce the learning complexity and produce more robust decision boundaries. We will also argue how our framework can be used for producing robust decision boundaries without exploiting the dataset bias or relying on accurate annotations. To examine the hypothesis of this thesis, we evaluate different parts of our framework on several challenging datasets and demonstrate how our framework is capable of gradually improving the performance of image classification by collecting more robust statistics from the image and improving the quality of the descriptor.<br><p>QC 20140506</p>

APA, Harvard, Vancouver, ISO, and other styles

4

Li, Qi. "Application of Improved Feature Selection Algorithm in SVM Based Market Trend Prediction Model." Thesis, Portland State University, 2019. http://pqdtopen.proquest.com/#viewpdf?dispub=10979352.

Full text

Abstract:

<p> In this study, a <b>Prediction Accuracy Based Hill Climbing Feature Selection Algorithm</b> <b>(AHCFS)</b> is created and compared with an <b>Error Rate Based Sequential Feature Selection Algorithm</b> <b> (ERFS)</b> which is an existing Matlab algorithm. The goal of the study is to create a new piece of an algorithm that has potential to outperform the existing Matlab sequential feature selection algorithm in predicting the movement of S&P 500 (</p><p>GSPC) prices under certain circumstances. The twoalgorithms are tested based on historical data of </p><p>GSPC, and <b>SupportVector Machine</b> <b>(SVM)</b> is employed by both as the classifier. A prediction without feature selection algorithm implemented is carried out and used as a baseline for comparison between the two algorithms. The prediction horizon set in this study for both algorithms varies from one to 60 days. The study results show that AHCFS reaches higher prediction accuracy than ERFS in the majority of the cases.</p><p>

APA, Harvard, Vancouver, ISO, and other styles

5

Butko, Taras. "Feature selection for multimodal: acoustic event detection." Doctoral thesis, Universitat Politècnica de Catalunya, 2011. http://hdl.handle.net/10803/32176.

Full text

Abstract:

The detection of the Acoustic Events (AEs) naturally produced in a meeting room may help to describe the human and social activity. The automatic description of interactions between humans and environment can be useful for providing: implicit assistance to the people inside the room, context-aware and content-aware information requiring a minimum of human attention or interruptions, support for high-level analysis of the underlying acoustic scene, etc. On the other hand, the recent fast growth of available audio or audiovisual content strongly demands tools for analyzing, indexing, searching and retrieving the available documents. Given an audio document, the first processing step usually is audio segmentation (AS), i.e. the partitioning of the input audio stream into acoustically homogeneous regions which are labelled according to a predefined broad set of classes like speech, music, noise, etc. Acoustic event detection (AED) is the objective of this thesis work. A variety of features coming not only from audio but also from the video modality is proposed to deal with that detection problem in meeting-room and broadcast news domains. Two basic detection approaches are investigated in this work: a joint segmentation and classification using Hidden Markov Models (HMMs) with Gaussian Mixture Densities (GMMs), and a detection-by-classification approach using discriminative Support Vector Machines (SVMs). For the first case, a fast one-pass-training feature selection algorithm is developed in this thesis to select, for each AE class, the subset of multimodal features that shows the best detection rate. AED in meeting-room environments aims at processing the signals collected by distant microphones and video cameras in order to obtain the temporal sequence of (possibly overlapped) AEs that have been produced in the room. When applied to interactive seminars with a certain degree of spontaneity, the detection of acoustic events from only the audio modality alone shows a large amount of errors, which is mostly due to the temporal overlaps of sounds. This thesis includes several novelties regarding the task of multimodal AED. Firstly, the use of video features. Since in the video modality the acoustic sources do not overlap (except for occlusions), the proposed features improve AED in such rather spontaneous scenario recordings. Secondly, the inclusion of acoustic localization features, which, in combination with the usual spectro-temporal audio features, yield a further improvement in recognition rate. Thirdly, the comparison of feature-level and decision-level fusion strategies for the combination of audio and video modalities. In the later case, the system output scores are combined using two statistical approaches: weighted arithmetical mean and fuzzy integral. On the other hand, due to the scarcity of annotated multimodal data, and, in particular, of data with temporal sound overlaps, a new multimodal database with a rich variety of meeting-room AEs has been recorded and manually annotated, and it has been made publicly available for research purposes.<br>La detecció d'esdeveniments acústics (Acoustic Events -AEs-) que es produeixen naturalment en una sala de reunions pot ajudar a descriure l'activitat humana i social. La descripció automàtica de les interaccions entre els éssers humans i l'entorn pot ser útil per a proporcionar: ajuda implícita a la gent dins de la sala, informació sensible al context i al contingut sense requerir gaire atenció humana ni interrupcions, suport per a l'anàlisi d'alt nivell de l'escena acústica, etc. La detecció i la descripció d'activitat és una funcionalitat clau de les interfícies perceptives que treballen en entorns de comunicació humana com sales de reunions. D'altra banda, el recent creixement ràpid del contingut audiovisual disponible requereix l'existència d'eines per a l'anàlisi, indexació, cerca i recuperació dels documents existents. Donat un document d'àudio, el primer pas de processament acostuma a ser la seva segmentació (Audio Segmentation (AS)), és a dir, la partició de la seqüència d'entrada d'àudio en regions acústiques homogènies que s'etiqueten d'acord amb un conjunt predefinit de classes com parla, música, soroll, etc. De fet, l'AS pot ser vist com un cas particular de la detecció d’esdeveniments acústics, i així es fa en aquesta tesi. La detecció d’esdeveniments acústics (Acoustic Event Detection (AED)) és un dels objectius d'aquesta tesi. Es proposa tot una varietat de característiques que provenen no només de l'àudio, sinó també de la modalitat de vídeo, per fer front al problema de la detecció en dominis de sala de reunions i de difusió de notícies. En aquest treball s'investiguen dos enfocaments bàsics de detecció: 1) la realització conjunta de segmentació i classificació utilitzant models de Markov ocults (Hidden Markov Models (HMMs)) amb models de barreges de gaussianes (Gaussian Mixture Models (GMMs)), i 2) la detecció per classificació utilitzant màquines de vectors suport (Support Vector Machines (SVM)) discriminatives. Per al primer cas, en aquesta tesi es desenvolupa un algorisme de selecció de característiques ràpid d'un sol pas per tal de seleccionar, per a cada AE, el subconjunt de característiques multimodals que aconsegueix la millor taxa de detecció. L'AED en entorns de sales de reunió té com a objectiu processar els senyals recollits per micròfons distants i càmeres de vídeo per tal d'obtenir la seqüència temporal dels (possiblement superposats) esdeveniments acústics que s'han produït a la sala. Quan s'aplica als seminaris interactius amb un cert grau d'espontaneïtat, la detecció d'esdeveniments acústics a partir de només la modalitat d'àudio mostra una gran quantitat d'errors, que és sobretot a causa de la superposició temporal dels sons. Aquesta tesi inclou diverses contribucions pel que fa a la tasca d'AED multimodal. En primer lloc, l'ús de característiques de vídeo. Ja que en la modalitat de vídeo les fonts acústiques no se superposen (exceptuant les oclusions), les característiques proposades Resum iv milloren la detecció en els enregistraments en escenaris de caire espontani. En segon lloc, la inclusió de característiques de localització acústica, que, en combinació amb les característiques habituals d'àudio espectrotemporals, signifiquen nova millora en la taxa de reconeixement. En tercer lloc, la comparació d'estratègies de fusió a nivell de característiques i a nivell de decisions, per a la utilització combinada de les modalitats d'àudio i vídeo. En el darrer cas, les puntuacions de sortida del sistema es combinen fent ús de dos mètodes estadístics: la mitjana aritmètica ponderada i la integral difusa. D'altra banda, a causa de l'escassetat de dades multimodals anotades, i, en particular, de dades amb superposició temporal de sons, s'ha gravat i anotat manualment una nova base de dades multimodal amb una rica varietat d'AEs de sala de reunions, i s'ha posat a disposició pública per a finalitats d'investigació. Per a la segmentació d'àudio en el domini de difusió de notícies, es proposa una arquitectura jeràrquica de sistema, que agrupa apropiadament un conjunt de detectors, cada un dels quals correspon a una de les classes acústiques d'interès. S'han desenvolupat dos sistemes diferents de SA per a dues bases de dades de difusió de notícies: la primera correspon a gravacions d'àudio del programa de debat Àgora del canal de televisió català TV3, i el segon inclou diversos segments d'àudio del canal de televisió català 3/24 de difusió de notícies. La sortida del primer sistema es va utilitzar com a primera etapa dels sistemes de traducció automàtica i de subtitulat del projecte Tecnoparla, un projecte finançat pel govern de la Generalitat en el que es desenvoluparen diverses tecnologies de la parla per extreure tota la informació possible del senyal d'àudio. El segon sistema d'AS, que és un sistema de detecció jeràrquica basat en HMM-GMM amb selecció de característiques, ha obtingut resultats competitius en l'avaluació de segmentació d'àudio Albayzín2010. Per acabar, val la pena esmentar alguns resultats col·laterals d’aquesta tesi. L’autor ha sigut responsable de l'organització de l'avaluació de sistemes de segmentació d'àudio dins de la campanya Albayzín-2010 abans esmentada. S'han especificat les classes d’esdeveniments, les bases de dades, la mètrica i els protocols d'avaluació utilitzats, i s'ha realitzat una anàlisi posterior dels sistemes i els resultats presentats pels vuit grups de recerca participants, provinents d'universitats espanyoles i portugueses. A més a més, s'ha implementat en la sala multimodal de la UPC un sistema de detecció d'esdeveniments acústics per a dues fonts simultànies, basat en HMM-GMM, i funcionant en temps real, per finalitats de test i demostració.

APA, Harvard, Vancouver, ISO, and other styles

6

Tarca, Adi-Laurentiu. "Neural networks in multiphase reactors data mining: feature selection, prior knowledge, and model design." Thesis, Université Laval, 2004. http://www.theses.ulaval.ca/2004/21673/21673.pdf.

Full text

Abstract:

Les réseaux de neurones artificiels (RNA) suscitent toujours un vif intérêt dans la plupart des domaines d’ingénierie non seulement pour leur attirante « capacité d’apprentissage » mais aussi pour leur flexibilité et leur bonne performance, par rapport aux approches classiques. Les RNA sont capables «d’approximer» des relations complexes et non linéaires entre un vecteur de variables d’entrées x et une sortie y. Dans le contexte des réacteurs multiphasiques le potentiel des RNA est élevé car la modélisation via la résolution des équations d’écoulement est presque impossible pour les systèmes gaz-liquide-solide. L’utilisation des RNA dans les approches de régression et de classification rencontre cependant certaines difficultés. Un premier problème, général à tous les types de modélisation empirique, est celui de la sélection des variables explicatives qui consiste à décider quel sous-ensemble xs ⊂ x des variables indépendantes doit être retenu pour former les entrées du modèle. Les autres difficultés à surmonter, plus spécifiques aux RNA, sont : le sur-apprentissage, l’ambiguïté dans l’identification de l’architecture et des paramètres des RNA et le manque de compréhension phénoménologique du modèle résultant. Ce travail se concentre principalement sur trois problématiques dans l’utilisation des RNA: i) la sélection des variables, ii) l’utilisation de la connaissance apriori, et iii) le design du modèle. La sélection des variables, dans le contexte de la régression avec des groupes adimensionnels, a été menée avec les algorithmes génétiques. Dans le contexte de la classification, cette sélection a été faite avec des méthodes séquentielles. Les types de connaissance a priori que nous avons insérés dans le processus de construction des RNA sont : i) la monotonie et la concavité pour la régression, ii) la connectivité des classes et des coûts non égaux associés aux différentes erreurs, pour la classification. Les méthodologies développées dans ce travail ont permis de construire plusieurs modèles neuronaux fiables pour les prédictions de la rétention liquide et de la perte de charge dans les colonnes garnies à contre-courant ainsi que pour la prédiction des régimes d’écoulement dans les colonnes garnies à co-courant.<br>Artificial neural networks (ANN) have recently gained enormous popularity in many engineering fields, not only for their appealing “learning ability,” but also for their versatility and superior performance with respect to classical approaches. Without supposing a particular equational form, ANNs mimic complex nonlinear relationships that might exist between an input feature vector x and a dependent (output) variable y. In the context of multiphase reactors the potential of neural networks is high as the modeling by resolution of first principle equations to forecast sought key hydrodynamics and transfer characteristics is intractable. The general-purpose applicability of neural networks in regression and classification, however, poses some subsidiary difficulties that can make their use inappropriate for certain modeling problems. Some of these problems are general to any empirical modeling technique, including the feature selection step, in which one has to decide which subset xs ⊂ x should constitute the inputs (regressors) of the model. Other weaknesses specific to the neural networks are overfitting, model design ambiguity (architecture and parameters identification), and the lack of interpretability of resulting models. This work addresses three issues in the application of neural networks: i) feature selection ii) prior knowledge matching within the models (to answer to some extent the overfitting and interpretability issues), and iii) the model design. Feature selection was conducted with genetic algorithms (yet another companion from artificial intelligence area), which allowed identification of good combinations of dimensionless inputs to use in regression ANNs, or with sequential methods in a classification context. The type of a priori knowledge we wanted the resulting ANN models to match was the monotonicity and/or concavity in regression or class connectivity and different misclassification costs in classification. Even the purpose of the study was rather methodological; some resulting ANN models might be considered contributions per se. These models-- direct proofs for the underlying methodologies-- are useful for predicting liquid hold-up and pressure drop in counter-current packed beds and flow regime type in trickle beds.

APA, Harvard, Vancouver, ISO, and other styles

7

Liang, Wen. "Integrated feature, neighbourhood, and model optimization for personalised modelling and knowledge discovery." Click here to access this resource online, 2009. http://hdl.handle.net/10292/749.

Full text

Abstract:

“Machine learning is the process of discovering and interpreting meaningful information, such as new correlations, patterns and trends by sifting through large amounts of data stored in repositories, using pattern recognition technologies as well as statistical and mathematical techniques” (Larose, 2005). From my understanding, machine learning is a process of using different analysis techniques to observe previously unknown, potentially meaningful information, and discover strong patterns and relationships from a large dataset. Professor Kasabov (2007b) classified computational models into three categories (e.g. global, local, and personalised) which have been widespread and used in the areas of data analysis and decision support in general, and in the areas of medicine and bioinformatics in particular. Most recently, the concept of personalised modelling has been widely applied to various disciplines such as personalised medicine, personalised drug design for known diseases (e.g. cancer, diabetes, brain disease, etc.) as well as for other modelling problems in ecology, business, finance, crime prevention, and so on. The philosophy behind the personalised modelling approach is that every person is different from others, thus he/she will benefit from having a personalised model and treatment. However, personalised modelling is not without issues, such as defining the correct number of neighbours or defining an appropriate number of features. As a result, the principal goal of this research is to study and address these issues and to create a novel framework and system for personalised modelling. The framework would allow users to select and optimise the most important features and nearest neighbours for a new input sample in relation to a certain problem based on a weighted variable distance measure in order to obtain more precise prognostic accuracy and personalised knowledge, when compared with global modelling and local modelling approaches.

APA, Harvard, Vancouver, ISO, and other styles

8

Watts-Willis, Tristan A. "Autonomous model selection for surface classification via unmanned aerial vehicle." Scholarly Commons, 2017. https://scholarlycommons.pacific.edu/uop_etds/224.

Full text

Abstract:

In the pursuit of research in remote areas, robots may be employed to deploy sensor networks. These robots need a method of classifying a surface to determine if it is a suitable installation site. Developing surface classification models manually requires significant time and detracts from the goal of automating systems. We create a system that automatically collects the data using an Unmanned Aerial Vehicle (UAV), extracts features, trains a large number of classifiers, selects the best classifier, and programs the UAV with that classifier. We design this system with user configurable parameters for choosing a high accuracy, efficient classifier. In support of this system, we also develop an algorithm for evaluating the effectiveness of individual features as indicators of the variable of interest. Motivating our work is a prior project that manually developed a surface classifier using an accelerometer; we replicate those results with our new automated system and improve on those results, providing a four-surface classifier with a 75% classification rate and a hard/soft classifier with a 100% classification rate. We further verify our system through a field experiment that collects and classifies new data, proving its end-to-end functionality. The general form of our system provides a valuable tool for automation of classifier creation and is released as an open-source tool.

APA, Harvard, Vancouver, ISO, and other styles

9

Algarni, Abdulmohsen. "Relevance feature discovery for text analysis." Thesis, Queensland University of Technology, 2011. https://eprints.qut.edu.au/48230/1/Abdulmohsen_Algarni_Thesis.pdf.

Full text

Abstract:

It is a big challenge to guarantee the quality of discovered relevance features in text documents for describing user preferences because of the large number of terms, patterns, and noise. Most existing popular text mining and classification methods have adopted term-based approaches. However, they have all suffered from the problems of polysemy and synonymy. Over the years, people have often held the hypothesis that pattern-based methods should perform better than term- based ones in describing user preferences, but many experiments do not support this hypothesis. This research presents a promising method, Relevance Feature Discovery (RFD), for solving this challenging issue. It discovers both positive and negative patterns in text documents as high-level features in order to accurately weight low-level features (terms) based on their specificity and their distributions in the high-level features. The thesis also introduces an adaptive model (called ARFD) to enhance the exibility of using RFD in adaptive environment. ARFD automatically updates the system's knowledge based on a sliding window over new incoming feedback documents. It can efficiently decide which incoming documents can bring in new knowledge into the system. Substantial experiments using the proposed models on Reuters Corpus Volume 1 and TREC topics show that the proposed models significantly outperform both the state-of-the-art term-based methods underpinned by Okapi BM25, Rocchio or Support Vector Machine and other pattern-based methods.

APA, Harvard, Vancouver, ISO, and other styles

10

Alrjebi, Mustafa M. M. "Robust Face Recognition via Multi-channel Models and Feature Selection." Thesis, Curtin University, 2017. http://hdl.handle.net/20.500.11937/66074.

Full text

Abstract:

This thesis proposes various novel approaches to improve the performance of face recognition via using multi channels for face image representation. Methods proposed in this thesis are designed to solve different face recognition problems including Opened Mouth, Occlusions, Illumination variations, and Pose variations. All proposed methods have achieved significant improvements over state of the art methods.

APA, Harvard, Vancouver, ISO, and other styles

11

Ala'raj, Maher A. "A credit scoring model based on classifiers consensus system approach." Thesis, Brunel University, 2016. http://bura.brunel.ac.uk/handle/2438/13669.

Full text

Abstract:

Managing customer credit is an important issue for each commercial bank; therefore, banks take great care when dealing with customer loans to avoid any improper decisions that can lead to loss of opportunity or financial losses. The manual estimation of customer creditworthiness has become both time- and resource-consuming. Moreover, a manual approach is subjective (dependable on the bank employee who gives this estimation), which is why devising and implementing programming models that provide loan estimations is the only way of eradicating the ‘human factor’ in this problem. This model should give recommendations to the bank in terms of whether or not a loan should be given, or otherwise can give a probability in relation to whether the loan will be returned. Nowadays, a number of models have been designed, but there is no ideal classifier amongst these models since each gives some percentage of incorrect outputs; this is a critical consideration when each percent of incorrect answer can mean millions of dollars of losses for large banks. However, the LR remains the industry standard tool for credit-scoring models development. For this purpose, an investigation is carried out on the combination of the most efficient classifiers in credit-scoring scope in an attempt to produce a classifier that exceeds each of its classifiers or components. In this work, a fusion model referred to as ‘the Classifiers Consensus Approach’ is developed, which gives a lot better performance than each of single classifiers that constitute it. The difference of the consensus approach and the majority of other combiners lie in the fact that the consensus approach adopts the model of real expert group behaviour during the process of finding the consensus (aggregate) answer. The consensus model is compared not only with single classifiers, but also with traditional combiners and a quite complex combiner model known as the ‘Dynamic Ensemble Selection’ approach. As a pre-processing technique, step data-filtering (select training entries which fits input data well and remove outliers and noisy data) and feature selection (remove useless and statistically insignificant features which values are low correlated with real quality of loan) are used. These techniques are valuable in significantly improving the consensus approach results. Results clearly show that the consensus approach is statistically better (with 95% confidence value, according to Friedman test) than any other single classifier or combiner analysed; this means that for similar datasets, there is a 95% guarantee that the consensus approach will outperform all other classifiers. The consensus approach gives not only the best accuracy, but also better AUC value, Brier score and H-measure for almost all datasets investigated in this thesis. Moreover, it outperformed Logistic Regression. Thus, it has been proven that the use of the consensus approach for credit-scoring is justified and recommended in commercial banks. Along with the consensus approach, the dynamic ensemble selection approach is analysed, the results of which show that, under some conditions, the dynamic ensemble selection approach can rival the consensus approach. The good sides of dynamic ensemble selection approach include its stability and high accuracy on various datasets. The consensus approach, which is improved in this work, may be considered in banks that hold the same characteristics of the datasets used in this work, where utilisation could decrease the level of mistakenly rejected loans of solvent customers, and the level of mistakenly accepted loans that are never to be returned. Furthermore, the consensus approach is a notable step in the direction of building a universal classifier that can fit data with any structure. Another advantage of the consensus approach is its flexibility; therefore, even if the input data is changed due to various reasons, the consensus approach can be easily re-trained and used with the same performance.

APA, Harvard, Vancouver, ISO, and other styles

12

Huynh, Bao Tuyen. "Estimation and feature selection in high-dimensional mixtures-of-experts models." Thesis, Normandie, 2019. http://www.theses.fr/2019NORMC237.

Full text

Abstract:

Cette thèse traite de la modélisation et de l’estimation de modèles de mélanges d’experts de grande dimension, en vue d’efficaces estimation de densité, prédiction et classification de telles données complexes car hétérogènes et de grande dimension. Nous proposons de nouvelles stratégies basées sur l’estimation par maximum de vraisemblance régularisé des modèles pour pallier aux limites des méthodes standards, y compris l’EMV avec les algorithmes d’espérance-maximisation (EM), et pour effectuer simultanément la sélection des variables pertinentes afin d’encourager des solutions parcimonieuses dans un contexte haute dimension. Nous introduisons d’abord une méthode d’estimation régularisée des paramètres et de sélection de variables d’un mélange d’experts, basée sur des régularisations l1 (lasso) et le cadre de l’algorithme EM, pour la régression et la classification adaptés aux contextes de la grande dimension. Ensuite, nous étendons la stratégie un mélange régularisé de modèles d’experts pour les données discrètes, y compris pour la classification. Nous développons des algorithmes efficaces pour maximiser la fonction de log-vraisemblance l1 -pénalisée des données observées. Nos stratégies proposées jouissent de la maximisation monotone efficace du critère optimisé, et contrairement aux approches précédentes, ne s’appuient pas sur des approximations des fonctions de pénalité, évitent l’inversion de matrices et exploitent l’efficacité de l’algorithme de montée de coordonnées, particulièrement dans l’approche proximale par montée de coordonnées<br>This thesis deals with the problem of modeling and estimation of high-dimensional MoE models, towards effective density estimation, prediction and clustering of such heterogeneous and high-dimensional data. We propose new strategies based on regularized maximum-likelihood estimation (MLE) of MoE models to overcome the limitations of standard methods, including MLE estimation with Expectation-Maximization (EM) algorithms, and to simultaneously perform feature selection so that sparse models are encouraged in such a high-dimensional setting. We first introduce a mixture-of-experts’ parameter estimation and variable selection methodology, based on l1 (lasso) regularizations and the EM framework, for regression and clustering suited to high-dimensional contexts. Then, we extend the method to regularized mixture of experts models for discrete data, including classification. We develop efficient algorithms to maximize the proposed l1 -penalized observed-data log-likelihood function. Our proposed strategies enjoy the efficient monotone maximization of the optimized criterion, and unlike previous approaches, they do not rely on approximations on the penalty functions, avoid matrix inversion, and exploit the efficiency of the coordinate ascent algorithm, particularly within the proximal Newton-based approach

APA, Harvard, Vancouver, ISO, and other styles

13

Jiang, Jinzhu. "Feature Screening for High-Dimensional Variable Selection In Generalized Linear Models." Bowling Green State University / OhioLINK, 2021. http://rave.ohiolink.edu/etdc/view?acc_num=bgsu1626826068909307.

Full text

APA, Harvard, Vancouver, ISO, and other styles

14

Smith, Robert Anthony. "A General Model for Continuous Noninvasive Pulmonary Artery Pressure Estimation." BYU ScholarsArchive, 2011. https://scholarsarchive.byu.edu/etd/3189.

Full text

Abstract:

Elevated pulmonary artery pressure (PAP) is a significant healthcare risk. Continuous monitoring for patients with elevated PAP is crucial for effective treatment, yet the most accurate method is invasive and expensive, and cannot be performed repeatedly. Noninvasive methods exist but are inaccurate, expensive, and cannot be used for continuous monitoring. We present a machine learning model based on heart sounds that estimates pulmonary artery pressure with enough accuracy to exclude an invasive diagnostic operation, allowing for consistent monitoring of heart condition in suspect patients without the cost and risk of invasive monitoring. We conduct a greedy search through 38 possible features using a 109-patient cross-validation to find the most predictive features. Our best general model has a standard estimate of error (SEE) of 8.28 mmHg, which outperforms the previous best performance in the literature on a general set of unseen patient data.

APA, Harvard, Vancouver, ISO, and other styles

15

Bommert, Andrea Martina [Verfasser], Jörg [Akademischer Betreuer] Rahnenführer, and Claus [Gutachter] Weihs. "Integration of feature selection stability in model fitting / Andrea Martina Bommert ; Gutachter: Claus Weihs ; Betreuer: Jörg Rahnenführer." Dortmund : Universitätsbibliothek Dortmund, 2020. http://d-nb.info/1227040385/34.

Full text

APA, Harvard, Vancouver, ISO, and other styles

16

Cament, Riveros Leonardo. "Enhancements by weighted feature fusion, selection and active shape model for frontal and pose variation face recognition." Tesis, Universidad de Chile, 2015. http://repositorio.uchile.cl/handle/2250/132854.

Full text

Abstract:

Doctor en Ingeniería Eléctrica<br>Face recognition is one of the most active areas of research in computer vision because of its wide range of possible applications in person identification, access control, human computer interfaces, and video search, among many others. Face identification is a one-to-n matching problem where a captured face is compared to n samples in a database. In this work a new method for robust face recognition is proposed. The methodology is divided in two parts, the first one focuses in face recognition robust to illumination, expression and small age variation and the second part focuses in pose variation. The proposed algorithm is based on Gabor features; which have been widely studied in face identification because of their good results and robustness. In the first part, a new method for face identification is proposed that combines local normalization for an illumination compensation stage, entropy-like weighted Gabor features for a feature extraction stage, and improvements in the Borda count classification through a threshold to eliminate low-score Gabor jets from the voting process. The FERET, AR, and FRGC 2.0 databases were used to test and compare the proposed method results with those previously published. Results on these databases show significant improvements relative to previously published results, reaching the best performance on the FERET and AR databases. Our proposed method also showed significant robustness to slight pose variations. The method was tested assuming noisy eye detection to check its robustness to inexact face alignment. Results show that the proposed method is robust to errors of up to three pixels in eye detection. However, face identification is strongly affected when the test images are very different from those of the gallery, as is the case in varying face pose. The second part of this work proposes a new 2D Gabor-based method which modifies the grid from which the Gabor features are extracted using a mesh to model face deformations produced by varying pose. Also, a statistical model of the Borda count scores computed by using the Gabor features is used to improve recognition performance across pose. The method was tested on the FERET and CMU-PIE databases, and the performance improvement provided by each block was assessed. The proposed method achieved the highest classification accuracy ever published on the FERET database with 2D face recognition methods. The performance obtained in the CMU-PIE database is among those obtained by the best published methods. Extensive experimental results are provided for different combinations of the proposed method, including results with two poses enrolled as a gallery.

APA, Harvard, Vancouver, ISO, and other styles

17

Ke, Yuan. "Feature selection and structure specification in ultra-high dimensional semi-parametric model with an application in medical science." Thesis, University of York, 2015. http://etheses.whiterose.ac.uk/8842/.

Full text

Abstract:

In this thesis, we consider the feature selection, model specification and estimation of the generalised semi-varying coefficient models (GSVCMs), where the number of potential covariates is allowed to diverge with the sample size. Based on the penalised likelihood approach and kernel smoothing method, we propose a penalised weighted least squares procedure to select the significant covariates, identify constant coefficients among the coefficients of the selected covariates, and estimate the functional or constant coefficients in GSVCMs. A computational algorithm is also proposed to implement the procedure. Our approach not only inherits many desirable statistical properties from the local maximum likelihood estimation and nonconcave penalised likelihood method, but also computationally attractive thanks to the proposed computational algorithm. Under some mild conditions, we establish the theoretical properties for the proposed procedure such as sparsity, oracle property and the uniform convergence rates of the proposed estimators. We also provide simulation studies to show the proposed procedure works very well when the sample size is finite. We then use the proposed procedure to analyse a real environmental data set, which leads to some interesting findings. Finally, we establish a classification method and show it can be used to improve predictive modelling for classify the patients with early inflammatory arthritis at baseline into different risk groups in future disease progression.

APA, Harvard, Vancouver, ISO, and other styles

18

Tran, The Truyen. "On conditional random fields: applications, feature selection, parameter estimation and hierarchical modelling." Thesis, Curtin University, 2008. http://hdl.handle.net/20.500.11937/436.

Full text

Abstract:

There has been a growing interest in stochastic modelling and learning with complex data, whose elements are structured and interdependent. One of the most successful methods to model data dependencies is graphical models, which is a combination of graph theory and probability theory. This thesis focuses on a special type of graphical models known as Conditional Random Fields (CRFs) (Lafferty et al., 2001), in which the output state spaces, when conditioned on some observational input data, are represented by undirected graphical models. The contributions of thesis involve both (a) broadening the current applicability of CRFs in the real world and (b) deepening the understanding of theoretical aspects of CRFs. On the application side, we empirically investigate the applications of CRFs in two real world settings. The first application is on a novel domain of Vietnamese accent restoration, in which we need to restore accents of an accent-less Vietnamese sentence. Experiments on half a million sentences of news articles show that the CRF-based approach is highly accurate. In the second application, we develop a new CRF-based movie recommendation system called Preference Network (PN). The PN jointly integrates various sources of domain knowledge into a large and densely connected Markov network. We obtained competitive results against well-established methods in the recommendation field.On the theory side, the thesis addresses three important theoretical issues of CRFs: feature selection, parameter estimation and modelling recursive sequential data. These issues are all addressed under a general setting of partial supervision in that training labels are not fully available. For feature selection, we introduce a novel learning algorithm called AdaBoost.CRF that incrementally selects features out of a large feature pool as learning proceeds. AdaBoost.CRF is an extension of the standard boosting methodology to structured and partially observed data. We demonstrate that the AdaBoost.CRF is able to eliminate irrelevant features and as a result, returns a very compact feature set without significant loss of accuracy. Parameter estimation of CRFs is generally intractable in arbitrary network structures. This thesis contributes to this area by proposing a learning method called AdaBoost.MRF (which stands for AdaBoosted Markov Random Forests). As learning proceeds AdaBoost.MRF incrementally builds a tree ensemble (a forest) that cover the original network by selecting the best spanning tree at a time. As a result, we can approximately learn many rich classes of CRFs in linear time. The third theoretical work is on modelling recursive, sequential data in that each level of resolution is a Markov sequence, where each state in the sequence is also a Markov sequence at the finer grain. One of the key contributions of this thesis is Hierarchical Conditional Random Fields (HCRF), which is an extension to the currently popular sequential CRF and the recent semi-Markov CRF (Sarawagi and Cohen, 2004). Unlike previous CRF work, the HCRF does not assume any fixed graphical structures.Rather, it treats structure as an uncertain aspect and it can estimate the structure automatically from the data. The HCRF is motivated by Hierarchical Hidden Markov Model (HHMM) (Fine et al., 1998). Importantly, the thesis shows that the HHMM is a special case of HCRF with slight modification, and the semi-Markov CRF is essentially a flat version of the HCRF. Central to our contribution in HCRF is a polynomial-time algorithm based on the Asymmetric Inside Outside (AIO) family developed in (Bui et al., 2004) for learning and inference. Another important contribution is to extend the AIO family to address learning with missing data and inference under partially observed labels. We also derive methods to deal with practical concerns associated with the AIO family, including numerical overflow and cubic-time complexity. Finally, we demonstrate good performance of HCRF against rivals on two applications: indoor video surveillance and noun-phrase chunking.

APA, Harvard, Vancouver, ISO, and other styles

19

Wang, Yun-Feng. "The selection of rapid prototyping processes based on feature extraction from STL models." Thesis, Kingston University, 2000. http://eprints.kingston.ac.uk/20668/.

Full text

Abstract:

Rapid Prototyping & Manufacturing has recently emerged as a new manufacturing technology that allows the rapid creation of three-dimensional models and prototypes. It automates the fabrication of solid objects directly from designs created by CAD systems, without part-specific tooling or human intervention. From visualising designs to generating production tooling, the Rapid Prototyping & Manufacturing gives the advantages needed in today's competitive environment. There are many different rapid prototyping systems available. This proliferation of rapid prototyping systems has, to some degree, created some confusion in the market place. Whether the potential customer or user is thinking of using a rapid prototyping bureaux or purchasing a rapid prototyping system, the increasing number of systems coming onto the market and the ever improving capabilities of existing systems presents a significant problem in choosing the optimum system for a particular need. The aim of this project is to develop an intelligent rapid prototyping system selector based on the feature extraction from STL files to automatically select the most suitable rapid prototyping system for a given prototype. The combination of STL model feature extraction and expert system selection is an effective method of rapid prototyping process selection. By analysing the object's STL file, the object's feature representations are extracted. These features together with the user's requirements are used to determine the most suitable system on which to build, or the most suitable system to buy. Mathematical models for computing build time, accuracy, cost and mechanical properties are established. A knowledge-based system is developed for rapid prototyping system selection. An integrated software package for STL file feature extraction, rapid prototyping system simulation and knowledge-based rapid prototyping system selection has been developed.

APA, Harvard, Vancouver, ISO, and other styles

20

Tran, The Truyen. "On conditional random fields: applications, feature selection, parameter estimation and hierarchical modelling." Curtin University of Technology, Dept. of Computing, 2008. http://espace.library.curtin.edu.au:80/R/?func=dbin-jump-full&object_id=18614.

Full text

Abstract:

There has been a growing interest in stochastic modelling and learning with complex data, whose elements are structured and interdependent. One of the most successful methods to model data dependencies is graphical models, which is a combination of graph theory and probability theory. This thesis focuses on a special type of graphical models known as Conditional Random Fields (CRFs) (Lafferty et al., 2001), in which the output state spaces, when conditioned on some observational input data, are represented by undirected graphical models. The contributions of thesis involve both (a) broadening the current applicability of CRFs in the real world and (b) deepening the understanding of theoretical aspects of CRFs. On the application side, we empirically investigate the applications of CRFs in two real world settings. The first application is on a novel domain of Vietnamese accent restoration, in which we need to restore accents of an accent-less Vietnamese sentence. Experiments on half a million sentences of news articles show that the CRF-based approach is highly accurate. In the second application, we develop a new CRF-based movie recommendation system called Preference Network (PN). The PN jointly integrates various sources of domain knowledge into a large and densely connected Markov network. We obtained competitive results against well-established methods in the recommendation field.<br>On the theory side, the thesis addresses three important theoretical issues of CRFs: feature selection, parameter estimation and modelling recursive sequential data. These issues are all addressed under a general setting of partial supervision in that training labels are not fully available. For feature selection, we introduce a novel learning algorithm called AdaBoost.CRF that incrementally selects features out of a large feature pool as learning proceeds. AdaBoost.CRF is an extension of the standard boosting methodology to structured and partially observed data. We demonstrate that the AdaBoost.CRF is able to eliminate irrelevant features and as a result, returns a very compact feature set without significant loss of accuracy. Parameter estimation of CRFs is generally intractable in arbitrary network structures. This thesis contributes to this area by proposing a learning method called AdaBoost.MRF (which stands for AdaBoosted Markov Random Forests). As learning proceeds AdaBoost.MRF incrementally builds a tree ensemble (a forest) that cover the original network by selecting the best spanning tree at a time. As a result, we can approximately learn many rich classes of CRFs in linear time. The third theoretical work is on modelling recursive, sequential data in that each level of resolution is a Markov sequence, where each state in the sequence is also a Markov sequence at the finer grain. One of the key contributions of this thesis is Hierarchical Conditional Random Fields (HCRF), which is an extension to the currently popular sequential CRF and the recent semi-Markov CRF (Sarawagi and Cohen, 2004). Unlike previous CRF work, the HCRF does not assume any fixed graphical structures.<br>Rather, it treats structure as an uncertain aspect and it can estimate the structure automatically from the data. The HCRF is motivated by Hierarchical Hidden Markov Model (HHMM) (Fine et al., 1998). Importantly, the thesis shows that the HHMM is a special case of HCRF with slight modification, and the semi-Markov CRF is essentially a flat version of the HCRF. Central to our contribution in HCRF is a polynomial-time algorithm based on the Asymmetric Inside Outside (AIO) family developed in (Bui et al., 2004) for learning and inference. Another important contribution is to extend the AIO family to address learning with missing data and inference under partially observed labels. We also derive methods to deal with practical concerns associated with the AIO family, including numerical overflow and cubic-time complexity. Finally, we demonstrate good performance of HCRF against rivals on two applications: indoor video surveillance and noun-phrase chunking.

APA, Harvard, Vancouver, ISO, and other styles

21

Sokolovska, Nataliya. "Contributions to the estimation of probabilistic discriminative models: semi-supervised learning and feature selection." Phd thesis, Télécom ParisTech, 2010. http://pastel.archives-ouvertes.fr/pastel-00006257.

Full text

Abstract:

Dans cette thèse nous étudions l'estimation de modèles probabilistes discriminants, surtout des aspects d'apprentissage semi-supervisé et de sélection de caractéristiques. Le but de l'apprentissage semi-supervisé est d'améliorer l'efficacité de l'apprentissage supervisé en utilisant des données non-étiquetées. Cet objectif est difficile à atteindre dans les cas des modèles discriminants. Les modèles probabilistes discriminants permettent de manipuler des représentations linguistiques riches, sous la forme de vecteurs de caractéristiques de très grande taille. Travailler en grande dimension pose des problèmes, en particulier computationnels, qui sont exacerbés dans le cadre de modèles de séquences tels que les champs aléatoires conditionnels (CRF). Notre contribution est double. Nous introduisons une méthode originale et simple pour intégrer des données non étiquetées dans une fonction objectif semi-supervisée. Nous démontrons alors que l'estimateur semi-supervisé correspondant est asymptotiquement optimal. Le cas de la régression logistique est illustré par des résultats d'expèriences. Dans cette étude, nous proposons un algorithme d'estimation pour les CRF qui réalise une sélection de modèle, par le truchement d'une pénalisation $L_1$. Nous présentons également les résultats d'expériences menées sur des tâches de traitement des langues (le chunking et la détection des entités nommées), en analysant les performances en généralisation et les caractéristiques sélectionnées. Nous proposons finalement diverses pistes pour améliorer l'efficacité computationelle de cette technique.

APA, Harvard, Vancouver, ISO, and other styles

22

Quinton, Clément. "Cloud environment selection and configuration : a software product lines-based approach." Thesis, Lille 1, 2014. http://www.theses.fr/2014LIL10079/document.

Full text

Abstract:

Pour tirer pleinement profit des avantages liés au déploiement dans les nuages, les applications doivent s’exécuter sur des environnements configurés de façon à répondre précisément aux besoins desdites applications. Nous considérons que la sélection et la configuration d’un nuage peut s’appuyer sur une approche basée sur les Lignes de Produits Logiciels (LPLs). Les LPLs sont définies pour exploiter les points communs par la définition d’éléments réutilisables. Cette thèse propose une approche basée sur les LPLs pour sélectionner et configurer un nuage en fonction des besoins d’une application à déployer. Concrètement, nous introduisons un modèle de variabilité permettant de décrire les points communs des différents nuages. Ce modèle de variabilité est notamment étendu avec des attributs et des multiplicités, qui peuvent être également impliqués dans des contraintes complexes. Ensuite, nous proposons une approche permettant de vérifier la cohérence de modèles de variabilité étendus avec des cardinalités, en particulier lorsque ceux-ci évoluent. En cas d’incohérence, nous fournissons un support permettant d’en expliquer son origine et sa cause. Enfin, nous proposons une plateforme automatisée de sélection et configuration de nuages, permettant la dérivation de fichiers de configuration relatifs aux besoins de l’application à déployer en fonction du nuage choisi. Ce travail de recherche s’est effectué dans le cadre du projet européen PaaSage. Les expérimentations menées montrent les avantages de notre approche basée sur les LPLs et, en particulier, comment son utilisation améliore la fiabilité lors d’un déploiement, tout en proposant une plateforme flexible et extensible<br>To benefit from the promise of the cloud computing paradigm, applications must be deployed on well-suited and configured cloud environments fulfilling the application’s requirements. We consider that the selection and configuration of such environments can leverage Software Product Line (SPL) principles. SPLs were defined to take advantage of software commonalities through the definition of reusable artifacts. This thesis thus proposes an approach based on SPLs to select and configure cloud environments regarding the requirements related to the application to deploy. In particular, we introduce a variability model enabling the description of commonalities and variabilities between clouds as feature models. In addition, we extend this variability model with attributes and cardinalities, together with constraints over them. Then, we propose an approach to check the consistency of cardinality-based feature models when evolving those models. Our approach provides a support to detect and explain automatically a cardinality inconsistency. Finally, we propose an automated platform to select and configure cloud environments, which generates configuration scripts regarding the requirements of the application to deploy. This work has been done as part of the European PaaSage project. The experiments we conducted to evaluate our approach show that it is well suited to handle the configuration of cloud environments, being both scalable and practical while improving the reliability of the deployment

APA, Harvard, Vancouver, ISO, and other styles

23

Ocloo, Isaac Xoese. "Energy Distance Correlation with Extended Bayesian Information Criteria for feature selection in high dimensional models." Bowling Green State University / OhioLINK, 2021. http://rave.ohiolink.edu/etdc/view?acc_num=bgsu1625238661031258.

Full text

APA, Harvard, Vancouver, ISO, and other styles

24

Chida, Anjum A. "Protein Tertiary Model Assessment Using Granular Machine Learning Techniques." Digital Archive @ GSU, 2012. http://digitalarchive.gsu.edu/cs_diss/65.

Full text

Abstract:

The automatic prediction of protein three dimensional structures from its amino acid sequence has become one of the most important and researched fields in bioinformatics. As models are not experimental structures determined with known accuracy but rather with prediction it’s vital to determine estimates of models quality. We attempt to solve this problem using machine learning techniques and information from both the sequence and structure of the protein. The goal is to generate a machine that understands structures from PDB and when given a new model, predicts whether it belongs to the same class as the PDB structures (correct or incorrect protein models). Different subsets of PDB (protein data bank) are considered for evaluating the prediction potential of the machine learning methods. Here we show two such machines, one using SVM (support vector machines) and another using fuzzy decision trees (FDT). First using a preliminary encoding style SVM could get around 70% in protein model quality assessment accuracy, and improved Fuzzy Decision Tree (IFDT) could reach above 80% accuracy. For the purpose of reducing computational overhead multiprocessor environment and basic feature selection method is used in machine learning algorithm using SVM. Next an enhanced scheme is introduced using new encoding style. In the new style, information like amino acid substitution matrix, polarity, secondary structure information and relative distance between alpha carbon atoms etc is collected through spatial traversing of the 3D structure to form training vectors. This guarantees that the properties of alpha carbon atoms that are close together in 3D space and thus interacting are used in vector formation. With the use of fuzzy decision tree, we obtained a training accuracy around 90%. There is significant improvement compared to previous encoding technique in prediction accuracy and execution time. This outcome motivates to continue to explore effective machine learning algorithms for accurate protein model quality assessment. Finally these machines are tested using CASP8 and CASP9 templates and compared with other CASP competitors, with promising results. We further discuss the importance of model quality assessment and other information from proteins that could be considered for the same.

APA, Harvard, Vancouver, ISO, and other styles

25

Li, Pin. "A Systematic Methodology for Developing Robust Prognostic Models Suitable for Large-Scale Deployment." University of Cincinnati / OhioLINK, 2020. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1593268220645085.

Full text

APA, Harvard, Vancouver, ISO, and other styles

26

Garg, Anushka. "Comparing Machine Learning Algorithms and Feature Selection Techniques to Predict Undesired Behavior in Business Processesand Study of Auto ML Frameworks." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-285559.

Full text

Abstract:

In recent years, the scope of Machine Learning algorithms and its techniques are taking up a notch in every industry (for example, recommendation systems, user behavior analytics, financial applications and many more). In practice, they play an important role in utilizing the power of the vast data we currently generate on a daily basis in our digital world.In this study, we present a comprehensive comparison of different supervised Machine Learning algorithms and feature selection techniques to build a best predictive model as an output. Thus, this predictive model helps companies predict unwanted behavior in their business processes. In addition, we have researched for the automation of all the steps involved (from understanding data to implementing models) in the complete Machine Learning Pipeline, also known as AutoML, and provide a comprehensive survey of the various frameworks introduced in this domain. These frameworks were introduced to solve the problem of CASH (combined algorithm selection and Hyper- parameter optimization), which is basically automation of various pipelines involved in the process of building a Machine Learning predictive model.<br>Under de senaste åren har omfattningen av maskininlärnings algoritmer och tekniker tagit ett steg i alla branscher (till exempel rekommendationssystem, beteendeanalyser av användare, finansiella applikationer och många fler). I praktiken spelar de en viktig roll för att utnyttja kraften av den enorma mängd data vi för närvarande genererar dagligen i vår digitala värld.I den här studien presenterar vi en omfattande jämförelse av olika övervakade maskininlärnings algoritmer och funktionsvalstekniker för att bygga en bästa förutsägbar modell som en utgång. Således hjälper denna förutsägbara modell företag att förutsäga oönskat beteende i sina affärsprocesser. Dessutom har vi undersökt automatiseringen av alla inblandade steg (från att förstå data till implementeringsmodeller) i den fullständiga maskininlärning rörledningen, även känd som AutoML, och tillhandahåller en omfattande undersökning av de olika ramarna som introducerats i denna domän. Dessa ramar introducerades för att lösa problemet med CASH (kombinerat algoritmval och optimering av Hyper-parameter), vilket i grunden är automatisering av olika rörledningar som är inblandade i processen att bygga en förutsägbar modell för maskininlärning.

APA, Harvard, Vancouver, ISO, and other styles

27

Ge, Esther. "The query based learning system for lifetime prediction of metallic components." Thesis, Queensland University of Technology, 2008. https://eprints.qut.edu.au/18345/4/Esther_Ting_Ge_Thesis.pdf.

Full text

Abstract:

This research project was a step forward in developing an efficient data mining method for estimating the service life of metallic components in Queensland school buildings. The developed method links together the different data sources of service life information and builds the model for a real situation when the users have information on limited inputs only. A practical lifetime prediction system was developed for the industry partners of this project including Queensland Department of Public Works and Queensland Department of Main Roads. The system provides high accuracy in practice where not all inputs are available for querying to the system.

APA, Harvard, Vancouver, ISO, and other styles

28

Ge, Esther. "The query based learning system for lifetime prediction of metallic components." Queensland University of Technology, 2008. http://eprints.qut.edu.au/18345/.

Full text

Abstract:

This research project was a step forward in developing an efficient data mining method for estimating the service life of metallic components in Queensland school buildings. The developed method links together the different data sources of service life information and builds the model for a real situation when the users have information on limited inputs only. A practical lifetime prediction system was developed for the industry partners of this project including Queensland Department of Public Works and Queensland Department of Main Roads. The system provides high accuracy in practice where not all inputs are available for querying to the system.

APA, Harvard, Vancouver, ISO, and other styles

29

Basu, Satrajit. "Developing Predictive Models for Lung Tumor Analysis." Scholar Commons, 2012. http://scholarcommons.usf.edu/etd/3963.

Full text

Abstract:

A CT-scan of lungs has become ubiquitous as a thoracic diagnostic tool. Thus, using CT-scan images in developing predictive models for tumor types and survival time of patients afflicted with Non-Small Cell Lung Cancer (NSCLC) would provide a novel approach to non-invasive tumor analysis. It can provide an alternative to histopathological techniques such as needle biopsy. Two major tumor analysis problems were addressed in course of this study, tumor type classification and survival time prediction. CT-scan images of 109 patients with NSCLC were used in this study. The first involved classifying tumor types into two major classes of non-small cell lung tumors, Adenocarcinoma and Squamous-cell Carcinoma, each constituting 30% of all lung tumors. In a first of its kind investigation, a large group of 2D and 3D image features, which were hypothesized to be useful, are evaluated for effectiveness in classifying the tumors. Classifiers including decision trees and support vector machines (SVM) were used along with feature selection techniques (wrappers and relief-F) to build models for tumor classification. Results show that over the large feature space for both 2D and 3D features it is possible to predict tumor classes with over 63% accuracy, showing new features may be of help. The accuracy achieved using 2D and 3D features is similar, with 3D easier to use. The tumor classification study was then extended by introducing the Bronchioalveolar Carcinoma (BAC) tumor type. Following up on the hypothesis that Bronchioalveolar Carcinoma is substantially different from other NSCLC tumor types, a two-class problem was created, where an attempt was made to differentiate BAC from the other two tumor types. To make a three-class problem a two-class problem, misclassification amongst Adenocarcinoma and Squamous-cell Carcinoma were ignored. Using the same prediction models as the previous study and just 3D image features, tumor classes were predicted with around 77% accuracy. The final study involved predicting two year survival time in patients suffering from NSCLC. Using a subset of the image features and a handful of clinical features, predictive models were developed to predict two year survival time in 95 NSCLC patients. A support vector machine classifier, naive Bayes classifier and decision tree classifier were used to develop the predictive models. Using the Area Under the Curve (AUC) as a performance metric, different models were developed and analyzed for their effectiveness in predicting survival time. A novel feature selection method to group features based on a correlation measure has been proposed in this work along with feature space reduction using principal component analysis. The parameters for the support vector machine were tuned using grid search. A model based on a combination of image and clinical features, achieved the best performance with an AUC of 0.69, using dimensionality reduction by means of principal component analysis along with grid search to tune the parameters of the SVM classifier. The study showed the effectiveness of a predominantly image feature space in predicting survival time. A comparison of the performance of the models from different classifiers also indicate SVMs consistently outperformed or matched the other two classifiers for this data.

APA, Harvard, Vancouver, ISO, and other styles

30

Yako, Mary. "Emotional Content in Novels for Literary Genre Prediction : And Impact of Feature Selection on Text Classification Models." Thesis, Uppsala universitet, Institutionen för lingvistik och filologi, 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-447148.

Full text

Abstract:

Automatic literary genre classification presents a challenging task for Natural Language Processing (NLP) systems, mainly because literary texts have deeper levels of meanings, hold distinctive themes, and communicate certain messages and emotions. We conduct a study where we experiment with building literary genre classifiers based on emotions in novels, to investigate the effects that features pertinent to emotions have on models of genre prediction. We begin by performing an analysis of emotions describing emotional composition and density in the dataset. The experiments are carried out on a dataset consisting of novels categorized in eight different genres. Genre prediction models are built using three algorithms: Random Forest, Support Vector Machine, and k-Nearest Neighbor. We build models based on emotion-words counts and emotional words in a novel, and compare them to models of commonly used features, the bag-of-words and the TF-IDF features. Moreover, we use a feature selection dimensionality reduction procedure on the TF-IDF feature set and study its impact on classification performance. Finally, we train and test the classifiers on a combination of the two most optimal emotion-related feature sets, and compare them on classifiers trained and tested on a combination of bag-of-words and the reduced TF-IDF features. Our results confirm that: using features of emotional content in novels improves classification performance a 75% F1 compared to a bag-of-words baseline of 71% F1; TF-IDF feature filtering method positively impacts genre classification performance on literary texts.

APA, Harvard, Vancouver, ISO, and other styles

31

Jeon, Woojay. "Speech Analysis and Cognition Using Category-Dependent Features in a Model of the Central Auditory System." Diss., Georgia Institute of Technology, 2006. http://hdl.handle.net/1853/14061.

Full text

Abstract:

It is well known that machines perform far worse than humans in recognizing speech and audio, especially in noisy environments. One method of addressing this issue of robustness is to study physiological models of the human auditory system and to adopt some of its characteristics in computers. As a first step in studying the potential benefits of an elaborate computational model of the primary auditory cortex (A1) in the central auditory system, we qualitatively and quantitatively validate the model under existing speech processing recognition methodology. Next, we develop new insights and ideas on how to interpret the model, and reveal some of the advantages of its dimension-expansion that may be potentially used to improve existing speech processing and recognition methods. This is done by statistically analyzing the neural responses to various classes of speech signals and forming empirical conjectures on how cognitive information is encoded in a category-dependent manner. We also establish a theoretical framework that shows how noise and signal can be separated in the dimension-expanded cortical space. Finally, we develop new feature selection and pattern recognition methods to exploit the category-dependent encoding of noise-robust cognitive information in the cortical response. Category-dependent features are proposed as features that "specialize" in discriminating specific sets of classes, and as a natural way of incorporating them into a Bayesian decision framework, we propose methods to construct hierarchical classifiers that perform decisions in a two-stage process. Phoneme classification tasks using the TIMIT speech database are performed to quantitatively validate all developments in this work, and the results encourage future work in exploiting high-dimensional data with category(or class)-dependent features for improved classification or detection.

APA, Harvard, Vancouver, ISO, and other styles

32

Xiao, Ying. "New tools for unsupervised learning." Diss., Georgia Institute of Technology, 2014. http://hdl.handle.net/1853/52995.

Full text

Abstract:

In an unsupervised learning problem, one is given an unlabelled dataset and hopes to find some hidden structure; the prototypical example is clustering similar data. Such problems often arise in machine learning and statistics, but also in signal processing, theoretical computer science, and any number of quantitative scientific fields. The distinguishing feature of unsupervised learning is that there are no privileged variables or labels which are particularly informative, and thus the greatest challenge is often to differentiate between what is relevant or irrelevant in any particular dataset or problem. In the course of this thesis, we study a number of problems which span the breadth of unsupervised learning. We make progress in Gaussian mixtures, independent component analysis (where we solve the open problem of underdetermined ICA), and we formulate and solve a feature selection/dimension reduction model. Throughout, our goal is to give finite sample complexity bounds for our algorithms -- these are essentially the strongest type of quantitative bound that one can prove for such algorithms. Some of our algorithmic techniques turn out to be very efficient in practice as well. Our major technical tool is tensor spectral decomposition: tensors are generalisations of matrices, and often allow access to the "fine structure" of data. Thus, they are often the right tools for unravelling the hidden structure in an unsupervised learning setting. However, naive generalisations of matrix algorithms to tensors run into NP-hardness results almost immediately, and thus to solve our problems, we are obliged to develop two new tensor decompositions (with robust analyses) from scratch. Both of these decompositions are polynomial time, and can be viewed as efficient generalisations of PCA extended to tensors.

APA, Harvard, Vancouver, ISO, and other styles

33

Ali, Rozniza. "Ensemble classification and signal image processing for genus Gyrodactylus (Monogenea)." Thesis, University of Stirling, 2014. http://hdl.handle.net/1893/21734.

Full text

Abstract:

This thesis presents an investigation into Gyrodactylus species recognition, making use of machine learning classification and feature selection techniques, and explores image feature extraction to demonstrate proof of concept for an envisaged rapid, consistent and secure initial identification of pathogens by field workers and non-expert users. The design of the proposed cognitively inspired framework is able to provide confident discrimination recognition from its non-pathogenic congeners, which is sought in order to assist diagnostics during periods of a suspected outbreak. Accurate identification of pathogens is a key to their control in an aquaculture context and the monogenean worm genus Gyrodactylus provides an ideal test-bed for the selected techniques. In the proposed algorithm, the concept of classification using a single model is extended to include more than one model. In classifying multiple species of Gyrodactylus, experiments using 557 specimens of nine different species, two classifiers and three feature sets were performed. To combine these models, an ensemble based majority voting approach has been adopted. Experimental results with a database of Gyrodactylus species show the superior performance of the ensemble system. Comparison with single classification approaches indicates that the proposed framework produces a marked improvement in classification performance. The second contribution of this thesis is the exploration of image processing techniques. Active Shape Model (ASM) and Complex Network methods are applied to images of the attachment hooks of several species of Gyrodactylus to classify each species according to their true species type. ASM is used to provide landmark points to segment the contour of the image, while the Complex Network model is used to extract the information from the contour of an image. The current system aims to confidently classify species, which is notifiable pathogen of Atlantic salmon, to their true class with high degree of accuracy. Finally, some concluding remarks are made along with proposal for future work.

APA, Harvard, Vancouver, ISO, and other styles

34

Ditzenberger, David A. "Selection and extraction of local geometric features for two dimensional model-based object recognition." Virtual Press, 1992. http://liblink.bsu.edu/uhtbin/catkey/834526.

Full text

Abstract:

A topic of computer vision that has been recently studied by a substantial number of scientists is the recognition of objects in digitized gray scale images. The primary goal of model-based object recognition research is the efficient and precise matching of features extracted from sensory data with the corresponding features in an object model database. A source of difficulty during the feature extraction is the determination and representation of pertinent attributes from the sensory data of the objects in the image. In addition, features which are visible from a single vantage point are not usually adequate for the unique identification of an object and its orientation. This paper will describe a regimen that can be used to address these problems. Image preprocessing such as edge detection, image thinning, thresholding, etc., will first be addressed. This will be followed by an in depth discussion that will center upon the extraction of local geometric feature vectors and the hypothesis-verification model used for two dimensional object recognition.<br>Department of Computer Science

APA, Harvard, Vancouver, ISO, and other styles

35

Ida, Yasutoshi. "Algorithms for Accelerating Machine Learning with Wide and Deep Models." Doctoral thesis, Kyoto University, 2021. http://hdl.handle.net/2433/263771.

Full text

APA, Harvard, Vancouver, ISO, and other styles

36

Nordling, Torbjörn E. M. "Robust inference of gene regulatory networks : System properties, variable selection, subnetworks, and design of experiments." Doctoral thesis, KTH, Reglerteknik, 2013. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-120830.

Full text

Abstract:

In this thesis, inference of biological networks from in vivo data generated by perturbation experiments is considered, i.e. deduction of causal interactions that exist among the observed variables. Knowledge of such regulatory influences is essential in biology. A system property–interampatteness–is introduced that explains why the variation in existing gene expression data is concentrated to a few “characteristic modes” or “eigengenes”, and why previously inferred models have a large number of false positive and false negative links. An interampatte system is characterized by strong INTERactions enabling simultaneous AMPlification and ATTEnuation of different signals and we show that perturbation of individual state variables, e.g. genes, typically leads to ill-conditioned data with both characteristic and weak modes. The weak modes are typically dominated by measurement noise due to poor excitation and their existence hampers network reconstruction. The excitation problem is solved by iterative design of correlated multi-gene perturbation experiments that counteract the intrinsic signal attenuation of the system. The next perturbation should be designed such that the expected response practically spans an additional dimension of the state space. The proposed design is numerically demonstrated for the Snf1 signalling pathway in S. cerevisiae. The impact of unperturbed and unobserved latent state variables, that exist in any real biological system, on the inferred network and required set-up of the experiments for network inference is analysed. Their existence implies that a subnetwork of pseudo-direct causal regulatory influences, accounting for all environmental effects, in general is inferred. In principle, the number of latent states and different paths between the nodes of the network can be estimated, but their identity cannot be determined unless they are observed or perturbed directly. Network inference is recognized as a variable/model selection problem and solved by considering all possible models of a specified class that can explain the data at a desired significance level, and by classifying only the links present in all of these models as existing. As shown, these links can be determined without any parameter estimation by reformulating the variable selection problem as a robust rank problem. Solution of the rank problem enable assignment of confidence to individual interactions, without resorting to any approximation or asymptotic results. This is demonstrated by reverse engineering of the synthetic IRMA gene regulatory network from published data. A previously unknown activation of transcription of SWI5 by CBF1 in the IRMA strain of S. cerevisiae is proven to exist, which serves to illustrate that even the accumulated knowledge of well studied genes is incomplete.<br>Denna avhandling behandlar inferens av biologiskanätverk från in vivo data genererat genom störningsexperiment, d.v.s. bestämning av kausala kopplingar som existerar mellan de observerade variablerna. Kunskap om dessa regulatoriska influenser är väsentlig för biologisk förståelse. En system egenskap—förstärksvagning—introduceras. Denna förklarar varför variationen i existerande genexpressionsdata är koncentrerat till några få ”karakteristiska moder” eller ”egengener” och varför de modeller som konstruerats innan innehåller många falska positiva och falska negativa linkar. Ett system med förstärksvagning karakteriseras av starka kopplingar som möjliggör simultan FÖRSTÄRKning och förSVAGNING av olika signaler. Vi demonstrerar att störning av individuella tillståndsvariabler, t.ex. gener, typiskt leder till illakonditionerat data med både karakteristiska och svaga moder. De svaga moderna domineras typiskt av mätbrus p.g.a. dålig excitering och försvårar rekonstruktion av nätverket. Excitationsproblemet löses med iterativdesign av experiment där korrelerade störningar i multipla gener motverkar systemets inneboende försvagning av signaller. Följande störning bör designas så att det förväntade svaret praktiskt spänner ytterligare en dimension av tillståndsrummet. Den föreslagna designen demonstreras numeriskt för Snf1 signalleringsvägen i S. cerevisiae. Påverkan av ostörda och icke observerade latenta tillståndsvariabler, som existerar i varje verkligt biologiskt system, på konstruerade nätverk och planeringen av experiment för nätverksinferens analyseras. Existens av dessa tillståndsvariabler innebär att delnätverk med pseudo-direkta regulatoriska influenser, som kompenserar för miljöeffekter, generellt bestäms. I princip så kan antalet latenta tillstånd och alternativa vägar mellan noder i nätverket bestämmas, men deras identitet kan ej bestämmas om de inte direkt observeras eller störs. Nätverksinferens behandlas som ett variabel-/modelselektionsproblem och löses genom att undersöka alla modeller inom en vald klass som kan förklara datat på den önskade signifikansnivån, samt klassificera endast linkar som är närvarande i alla dessa modeller som existerande. Dessa linkar kan bestämmas utan estimering av parametrar genom att skriva om variabelselektionsproblemet som ett robustrangproblem. Lösning av rangproblemet möjliggör att statistisk konfidens kan tillskrivas individuella linkar utan approximationer eller asymptotiska betraktningar. Detta demonstreras genom rekonstruktion av det syntetiska IRMA genreglernätverket från publicerat data. En tidigare okänd aktivering av transkription av SWI5 av CBF1 i IRMA stammen av S. cerevisiae bevisas. Detta illustrerar att t.o.m. den ackumulerade kunskapen om välstuderade gener är ofullständig.<br><p>QC 20130508</p>

APA, Harvard, Vancouver, ISO, and other styles

37

Fineschi, Benedetta. "Selection of competent oocytes by morphological features. Can an artificial intelligence - based model predict oocyte quality?" Doctoral thesis, Università di Siena, 2022. http://hdl.handle.net/11365/1211555.

Full text

Abstract:

Infertility is a significant problem of humanity. Despite the many advances in the field of assisted reproductive techniques (ART), accurately predicting the outcome of an in vitro fertilization cycle (IVF) has yet to be achieved. The focus of a great deal of research Is to improve on the current 30% success rate of IVF. Assessment of oocyte quality is probably the most important and difficult task in the ART. Oocyte quality has a direct impact on the fertilization and oocyte competence; the identification of oocyte quality markers is particularly important to select embryos with higher developmental potential and thus increase the success rates of IVF cycles. Nevertheless, the assessment of the oocyte morphology is still performed more arbitrarily. Over the past years, the ARTs have been accompanied by constant innovations; the use of artificial intelligence (AI) techniques has become increasingly popular in the medical field and is being leveraged in the embryology laboratory to help improve IVF outcomes. The aims of this study are to evaluate the influence of specific morphological characteristics of oocytes on the outcome of intracytoplasmic sperm injection (ICSI) and to develop an AI-based model that predicts oocyte quality and fertilization outcome.

APA, Harvard, Vancouver, ISO, and other styles

38

Zhang, Ligang. "Towards spontaneous facial expression recognition in real-world video." Thesis, Queensland University of Technology, 2012. https://eprints.qut.edu.au/53199/1/Ligang_Zhang_Thesis.pdf.

Full text

Abstract:

Facial expression is an important channel of human social communication. Facial expression recognition (FER) aims to perceive and understand emotional states of humans based on information in the face. Building robust and high performance FER systems that can work in real-world video is still a challenging task, due to the various unpredictable facial variations and complicated exterior environmental conditions, as well as the difficulty of choosing a suitable type of feature descriptor for extracting discriminative facial information. Facial variations caused by factors such as pose, age, gender, race and occlusion, can exert profound influence on the robustness, while a suitable feature descriptor largely determines the performance. Most present attention on FER has been paid to addressing variations in pose and illumination. No approach has been reported on handling face localization errors and relatively few on overcoming facial occlusions, although the significant impact of these two variations on the performance has been proved and highlighted in many previous studies. Many texture and geometric features have been previously proposed for FER. However, few comparison studies have been conducted to explore the performance differences between different features and examine the performance improvement arisen from fusion of texture and geometry, especially on data with spontaneous emotions. The majority of existing approaches are evaluated on databases with posed or induced facial expressions collected in laboratory environments, whereas little attention has been paid on recognizing naturalistic facial expressions on real-world data. This thesis investigates techniques for building robust and high performance FER systems based on a number of established feature sets. It comprises of contributions towards three main objectives: (1) Robustness to face localization errors and facial occlusions. An approach is proposed to handle face localization errors and facial occlusions using Gabor based templates. Template extraction algorithms are designed to collect a pool of local template features and template matching is then performed to covert these templates into distances, which are robust to localization errors and occlusions. (2) Improvement of performance through feature comparison, selection and fusion. A comparative framework is presented to compare the performance between different features and different feature selection algorithms, and examine the performance improvement arising from fusion of texture and geometry. The framework is evaluated for both discrete and dimensional expression recognition on spontaneous data. (3) Evaluation of performance in the context of real-world applications. A system is selected and applied into discriminating posed versus spontaneous expressions and recognizing naturalistic facial expressions. A database is collected from real-world recordings and is used to explore feature differences between standard database images and real-world images, as well as between real-world images and real-world video frames. The performance evaluations are based on the JAFFE, CK, Feedtum, NVIE, Semaine and self-collected QUT databases. The results demonstrate high robustness of the proposed approach to the simulated localization errors and occlusions. Texture and geometry have different contributions to the performance of discrete and dimensional expression recognition, as well as posed versus spontaneous emotion discrimination. These investigations provide useful insights into enhancing robustness and achieving high performance of FER systems, and putting them into real-world applications.

APA, Harvard, Vancouver, ISO, and other styles

39

Kabir, Mitra. "Prediction of mammalian essential genes based on sequence and functional features." Thesis, University of Manchester, 2017. https://www.research.manchester.ac.uk/portal/en/theses/prediction-of-mammalian-essential-genes-based-on-sequence-and-functional-features(cf8eeed5-c2b3-47c3-9a8f-2cc290c90d56).html.

Full text

Abstract:

Essential genes are those whose presence is imperative for an organism's survival, whereas the functions of non-essential genes may be useful but not critical. Abnormal functionality of essential genes may lead to defects or death at an early stage of life. Knowledge of essential genes is therefore key to understanding development, maintenance of major cellular processes and tissue-specific functions that are crucial for life. Existing experimental techniques for identifying essential genes are accurate, but most of them are time consuming and expensive. Predicting essential genes using computational methods, therefore, would be of great value as they circumvent experimental constraints. Our research is based on the hypothesis that mammalian essential (lethal) and non-essential (viable) genes are distinguishable by various properties. We examined a wide range of features of Mus musculus genes, including sequence, protein-protein interactions, gene expression and function, and found 75 features that were statistically discriminative between lethal and viable genes. These features were used as inputs to create a novel machine learning classifier, allowing the prediction of a mouse gene as lethal or viable with the cross-validation and blind test accuracies of ∼91% and ∼93%, respectively. The prediction results are promising, indicating that our classifier is an effective mammalian essential gene prediction method. We further developed the mouse gene essentiality study by analysing the association between essentiality and gene duplication. Mouse genes were labelled as singletons or duplicates, and their expression patterns over 13 developmental stages were examined. We found that lethal genes originating from duplicates are considerably lower in proportion than singletons. At all developmental stages a significantly higher proportion of singletons and lethal genes are expressed than duplicates and viable genes. Lethal genes were also found to be more ancient than viable genes. In addition, we observed that duplicate pairs with similar patterns of developmental co-expression are more likely to be viable; lethal gene duplicate pairs do not have such a trend. Overall, these results suggest that duplicate genes in mouse are less likely to be essential than singletons. Finally, we investigated the evolutionary age of mouse genes across development to see if the morphological hourglass pattern exists in the mouse. We found that in mouse embryos, genes expressed in early and late stages are evolutionarily younger than those expressed in mid-embryogenesis, thus yielding an hourglass pattern. However, the oldest genes are not expressed at the phylotypic stage stated in prior studies, but instead at an earlier time point - the egg cylinder stage. These results question the application of the hourglass model to mouse development.

APA, Harvard, Vancouver, ISO, and other styles

40

Camargo, Sandro da Silva. "Um modelo neural de aprimoramento progressivo para redução de dimensionalidade." reponame:Biblioteca Digital de Teses e Dissertações da UFRGS, 2010. http://hdl.handle.net/10183/26500.

Full text

Abstract:

Nas últimas décadas, avanços em tecnologias de geração, coleta e armazenamento de dados têm contribuído para aumentar o tamanho dos bancos de dados nas diversas áreas de conhecimento humano. Este aumento verifica-se não somente em relação à quantidade de amostras de dados, mas principalmente em relação à quantidade de características descrevendo cada amostra. A adição de características causa acréscimo de dimensões no espaço matemático, conduzindo ao crescimento exponencial do hipervolume dos dados, problema denominado “maldição da dimensionalidade”. A maldição da dimensionalidade tem sido um problema rotineiro para cientistas que, a fim de compreender e explicar determinados fenômenos, têm se deparado com a necessidade de encontrar estruturas significativas ocultas, de baixa dimensão, dentro de dados de alta dimensão. Este processo denomina-se redução de dimensionalidade dos dados (RDD). Do ponto de vista computacional, a conseqüência natural da RDD é uma diminuição do espaço de busca de hipóteses, melhorando o desempenho e simplificando os resultados da modelagem de conhecimento em sistemas autônomos de aprendizado. Dentre as técnicas utilizadas atualmente em sistemas autônomos de aprendizado, as redes neurais artificiais (RNAs) têm se tornado particularmente atrativas para modelagem de sistemas complexos, principalmente quando a modelagem é difícil ou quando a dinâmica do sistema não permite o controle on-line. Apesar de serem uma poderosa técnica, as RNAs têm seu desempenho afetado pela maldição da dimensionalidade. Quando a dimensão do espaço de entradas é alta, as RNAs podem utilizar boa parte de seus recursos para representar porções irrelevantes do espaço de busca, dificultando o aprendizado. Embora as RNAs, assim como outras técnicas de aprendizado de máquina, consigam identificar características mais informativas para um processo de modelagem, a utilização de técnicas de RDD frequentemente melhora os resultados do processo de aprendizado. Este trabalho propõe um wrapper que implementa um modelo neural de aprimoramento progressivo para RDD em sistemas autônomos de aprendizado supervisionado visando otimizar o processo de modelagem. Para validar o modelo neural de aprimoramento progressivo, foram realizados experimentos com bancos de dados privados e de repositórios públicos de diferentes domínios de conhecimento. A capacidade de generalização dos modelos criados é avaliada por meio de técnicas de validação cruzada. Os resultados obtidos demonstram que o modelo neural de aprimoramento progressivo consegue identificar características mais informativas, permitindo a RDD, e tornando possível criar modelos mais simples e mais precisos. A implementação da abordagem e os experimentos foram realizados no ambiente Matlab, utilizando o toolbox de RNAs.<br>In recent decades, advances on data generation, collection and storing technologies have contributed to increase databases size in different knowledge areas. This increase is seen not only regarding samples amount, but mainly regarding dimensionality, i.e. the amount of features describing each sample. Features adding causes dimension increasing in mathematical space, leading to an exponential growth of data hypervolume. This problem is called “the curse of dimensionality”. The curse of dimensionality has been a routine problem for scientists, that in order to understand and explain some phenomena, have faced with the demand to find meaningful low dimensional structures hidden in high dimensional search spaces. This process is called data dimensionality reduction (DDR). From computational viewpoint, DDR natural consequence is a reduction of hypothesis search space, improving performance and simplifying the knowledge modeling results in autonomous learning systems. Among currently used techniques in autonomous learning systems, artificial neural networks (ANNs) have becoming particularly attractive to model complex systems, when modeling is hard or when system dynamics does not allow on-line control. Despite ANN being a powerful tool, their performance is affected by the curse of dimensionality. When input space dimension is high, ANNs can use a significant part of their resources to represent irrelevant parts of input space making learning process harder. Although ANNs, and other machine learning techniques, can identify more informative features for a modeling process, DDR techniques often improve learning results. This thesis proposes a wrapper which implements a Progressive Enhancement Neural Model to DDR in supervised autonomous learning systems in order to optimize the modeling process. To validate the proposed approach, experiments were performed with private and public databases, from different knowledge domains. The generalization ability of developed models is evaluated by means of cross validation techniques. Obtained results demonstrate that the proposed approach can identify more informative features, allowing DDR, and becoming possible to create simpler and more accurate models. The implementation of the proposed approach and related experiments were performed in Matlab Environment, using ANNs toolbox.

APA, Harvard, Vancouver, ISO, and other styles

41

Peterson, Ryan Andrew. "Ranked sparsity: a regularization framework for selecting features in the presence of prior informational asymmetry." Diss., University of Iowa, 2019. https://ir.uiowa.edu/etd/6834.

Full text

Abstract:

In this dissertation, we explore and illustrate the concept of ranked sparsity, a phenomenon that often occurs naturally in the presence of derived variables. Ranked sparsity arises in modeling applications when an expected disparity exists in the quality of information between different feature sets. Its presence can cause traditional model selection methods to fail because statisticians commonly presume that each potential parameter is equally worthy of entering into the final model - we call this principle "covariate equipoise". However, this presumption does not always hold, especially in the presence of derived variables. For instance, when all possible interactions are considered as candidate predictors, the presumption of covariate equipoise will often produce misclassified and opaque models. The sheer number of additional candidate variables grossly inflates the number of false discoveries in the interactions, resulting in unnecessarily complex and difficult-to-interpret models with many (truly spurious) interactions. We suggest a modeling strategy that requires a stronger level of evidence in order to allow certain variables (e.g. interactions) to be selected in the final model. This ranked sparsity paradigm can be implemented either with a modified Bayesian information criterion (RBIC) or with the sparsity-ranked lasso (SRL). In chapter 1, we provide a philosophical motivation for ranked sparsity by describing situations where traditional model selection methods fail. Chapter 1 also presents some of the relevant literature, and motivates why ranked sparsity methods are necessary in the context of interactions. Finally, we introduce RBIC and SRL as possible recourses. In chapter 2, we explore the performance of SRL relative to competing methods for selecting polynomials and interactions in a series of simulations. We show that the SRL is a very attractive method because it is fast, accurate, and does not tend to inflate the number of Type I errors in the interactions. We illustrate its utility in an application to predict the survival of lung cancer patients using a set of gene expression measurements and clinical covariates, searching in particular for gene-environment interactions, which are very difficult to find in practice. In chapter 3, we present three extensions of the SRL in very different contexts. First, we show how the method can be used to optimize for cost and prediction accuracy simulataneously when covariates have differing collection costs. In this setting, the SRL produces what we call "minimally invasive" models, i.e. models that can easily (and cheaply) be applied to new data. Second, we investigate the use of the SRL in the context of time series regression, where we evaluate our method against several other state-of-the-art techniques in predicting the hourly number of arrivals at the Emergency Department of the University of Iowa Hospitals and Clinics. Finally, we show how the SRL can be utilized to balance model stability and model adaptivity in an application which uses a rich new source of smartphone thermometer data to predict flu incidence in real time.

APA, Harvard, Vancouver, ISO, and other styles

42

Kleynhans, Neil Taylor. "Automatic speech recognition for resource-scarce environments / N.T. Kleynhans." Thesis, North-West University, 2013. http://hdl.handle.net/10394/9668.

Full text

Abstract:

Automatic speech recognition (ASR) technology has matured over the past few decades and has made significant impacts in a variety of fields, from assistive technologies to commercial products. However, ASR system development is a resource intensive activity and requires language resources in the form of text annotated audio recordings and pronunciation dictionaries. Unfortunately, many languages found in the developing world fall into the resource-scarce category and due to this resource scarcity the deployment of ASR systems in the developing world is severely inhibited. In this thesis we present research into developing techniques and tools to (1) harvest audio data, (2) rapidly adapt ASR systems and (3) select “useful” training samples in order to assist with resource-scarce ASR system development. We demonstrate an automatic audio harvesting approach which efficiently creates a speech recognition corpus by harvesting an easily available audio resource. We show that by starting with bootstrapped acoustic models, trained with language data obtain from a dialect, and then running through a few iterations of an alignment-filter-retrain phase it is possible to create an accurate speech recognition corpus. As a demonstration we create a South African English speech recognition corpus by using our approach and harvesting an internet website which provides audio and approximate transcriptions. The acoustic models developed from harvested data are evaluated on independent corpora and show that the proposed harvesting approach provides a robust means to create ASR resources. As there are many acoustic model adaptation techniques which can be implemented by an ASR system developer it becomes a costly endeavour to select the best adaptation technique. We investigate the dependence of the adaptation data amount and various adaptation techniques by systematically varying the adaptation data amount and comparing the performance of various adaptation techniques. We establish a guideline which can be used by an ASR developer to chose the best adaptation technique given a size constraint on the adaptation data, for the scenario where adaptation between narrow- and wide-band corpora must be performed. In addition, we investigate the effectiveness of a novel channel normalisation technique and compare the performance with standard normalisation and adaptation techniques. Lastly, we propose a new data selection framework which can be used to design a speech recognition corpus. We show for limited data sets, independent of language and bandwidth, the most effective strategy for data selection is frequency-matched selection and that the widely-used maximum entropy methods generally produced the least promising results. In our model, the frequency-matched selection method corresponds to a logarithmic relationship between accuracy and corpus size; we also investigated other model relationships, and found that a hyperbolic relationship (as suggested from simple asymptotic arguments in learning theory) may lead to somewhat better performance under certain conditions.<br>Thesis (PhD (Computer and Electronic Engineering))--North-West University, Potchefstroom Campus, 2013.

APA, Harvard, Vancouver, ISO, and other styles

43

Medeiros, ClÃudio Marques de SÃ. "Uma contribuiÃÃo ao problema de seleÃÃo de modelos neurais usando o princÃpio de mÃxima correlaÃÃo dos erros." Universidade Federal do CearÃ, 2008. http://www.teses.ufc.br/tde_busca/arquivo.php?codArquivo=2132.

Full text

Abstract:

nÃo hÃ<br>PropÃe-se nesta tese um mÃtodo de poda de pesos para redes Perceptron Multicamadas (MLP). TÃcnicas clÃssicas de poda convencionais, tais como Optimal Brain Surgeon(OBS) e Optimal Brain Damage(OBD), baseiam-se na anÃlise de sensibilidade de cada peso da rede, o que requer a determinaÃÃo da inversa da matriz Hessiana da funÃÃo-custo. A inversÃo da matriz Hessiana, alÃm de possuir um alto custo computacional, Ã bastante susceptÃvel a problemas numÃricos decorrentes do mal-condicionamento da mesma. MÃtodos de poda baseados na regularizaÃÃo da funÃÃo-custo, por outro lado, exigem a determinaÃÃo por tentativa-e-erro de um parÃmetro de regularizaÃÃo. Tendo em mente as limitaÃÃes dos mÃtodos de poda supracitados, o mÃtodo proposto baseia-se no "PrincÃpio da MÃxima CorrelaÃÃo dos Erros" (MAXCORE). A idÃia consiste em analisar a importÃncia de cada conexÃo da rede a partir da correlaÃÃo cruzada entre os erros em uma camada e os erros retropropagados para a camada anterior, partindo da camada de saÃda em direÃÃo Ã camada de entrada. As conexÃes que produzem as maiores correlaÃÃes tendem a se manter na rede podada. Uma vantagem imediata deste procedimento estÃ em nÃo requerer a inversÃo de matrizes, nem um parÃmetro de regularizaÃÃo. O desempenho do mÃtodo proposto Ã avaliado em problemas de classificaÃÃo de padrÃes e os resultados sÃo comparados aos obtidos pelos mÃtodos OBS/OBD e por um mÃtodo de poda baseado em regularizaÃÃo. Para este fim, sÃo usados, alÃm de dados articialmente criados para salientar caracterÃsticas importantes do mÃtodo, os conjuntos de dados bem conhecidos da comunidade de aprendizado de mÃquinas: Iris, Wine e Dermatology. Utilizou-se tambÃm um conjunto de dados reais referentes ao diagnÃstico de patologias da coluna vertebral. Os resultados obtidos mostram que o mÃtodo proposto apresenta desempenho equivalente ou superior aos mÃtodos de poda convencionais, com as vantagens adicionais do baixo custo computacional e simplicidade. O mÃtodo proposto tambÃm mostrou-se bastante agressivo na poda de unidades de entrada (atributos), o que sugere a sua aplicaÃÃo em seleÃÃo de caracterÃsticas.<br>This thesis proposes a new pruning method which eliminates redundant weights in a multilayer perceptron (MLP). Conventional pruning techniques, like Optimal Brain Surgeon (OBS) and Optimal Brain Damage (OBD), are based on weight sensitivity analysis, which requires the inversion of the error Hessian matrix of the loss function (i.e. mean squared error). This inversion is specially susceptible to numerical problems due to poor conditioning of the Hessian matrix and demands great computational efforts. Another kind of pruning method is based on the regularization of the loss function, but it requires the determination of the regularization parameter by trial and error. The proposed method is based on "Maximum Correlation Errors Principle" (MAXCORE). The idea in this principle is to evaluate the importance of each network connection by calculating the cross correlation among errors in a layer and the back-propagated errors in the preceding layer, starting from the output layer and working through the network until the input layer is reached. The connections which have larger correlations remain and the others are pruned from the network. The evident advantage of this procedure is its simplicity, since matrix inversion or parameter adjustment are not necessary. The performance of the proposed method is evaluated in pattern classification tasks and the results are compared to those achieved by the OBS/OBD techniques and also by regularization-based method. For this purpose, artificial data sets are used to highlight some important characteristics of the proposed methodology. Furthermore, well known benchmarking data sets, such as IRIS, WINE and DERMATOLOGY, are also used for the sake of evaluation. A real-world biomedical data set related to pathologies of the vertebral column is also used. The results obtained show that the proposed method achieves equivalent or superior performance compared to conventional pruning methods, with the additional advantages of low computational cost and simplicity. The proposed method also presents eficient behavior in pruning the input units, which suggests its use as a feature selection method.

APA, Harvard, Vancouver, ISO, and other styles

44

KAVOOSIFAR, MOHAMMAD REZA. "Data Mining and Indexing Big Multimedia Data." Doctoral thesis, Politecnico di Torino, 2019. http://hdl.handle.net/11583/2742526.

Full text

APA, Harvard, Vancouver, ISO, and other styles

45

Біланенко, Наталія Євгеніївна, та Nataliia Yevheniivna Bilanenko. "Особливості психологічного відбору біатлоністів до національних збірних команд". Master's thesis, СумДПУ імені А. С. Макаренка, 2021. http://repository.sspu.edu.ua/handle/123456789/12245.

Full text

Abstract:

У магістерській роботі досліджуються особливості здійснення психологічного відбору біатлоністів до юнацьких, юніорських та дорослих збірних команд України. Визначено типологічні особливості біатлоністів високої кваліфікації, такі як: пластичність, енергійність, темп, емоційність, екстраверсія-інтроверсія, нейротизм. На основі отриманих результатів розроблено гіпотетичну психологічну модель спортсмена жіночої і чоловічої статі, яка дозволяє здійснювати відбір в юному віці.<br>In the master's thesis the peculiarities of psychological selection of biathletes to youth, junior and adult national teams of Ukraine are investigated. Typological features of highly qualified biathletes, such as: plasticity, energy, pace, emotionality, extraversion-introversion, neuroticism are determined. Based on the obtained results, a hypothetical psychological model of a female and male athlete was developed, which allows for selection at a young age.

APA, Harvard, Vancouver, ISO, and other styles

46

Saigiridharan, Lakshidaa. "Dynamic prediction of repair costs in heavy-duty trucks." Thesis, Linköpings universitet, Statistik och maskininlärning, 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-166133.

Full text

Abstract:

Pricing of repair and maintenance (R&M) contracts is one among the most important processes carried out at Scania. Predictions of repair costs at Scania are carried out using experience-based prediction methods which do not involve statistical methods for the computation of average repair costs for contracts terminated in the recent past. This method is difficult to apply for a reference population of rigid Scania trucks. Hence, the purpose of this study is to perform suitable statistical modelling to predict repair costs of four variants of rigid Scania trucks. The study gathers repair data from multiple sources and performs feature selection using the Akaike Information Criterion (AIC) to extract the most significant features that influence repair costs corresponding to each truck variant. The study proved to show that the inclusion of operational features as a factor could further influence the pricing of contracts. The hurdle Gamma model, which is widely used to handle zero inflations in Generalized Linear Models (GLMs), is used to train the data which consists of numerous zero and non-zero values. Due to the inherent hierarchical structure within the data expressed by individual chassis, a hierarchical hurdle Gamma model is also implemented. These two statistical models are found to perform much better than the experience-based prediction method. This evaluation is done using the mean absolute error (MAE) and root mean square error (RMSE) statistics. A final model comparison is conducted using the AIC to draw conclusions based on the goodness of fit and predictive performance of the two statistical models. On assessing the models using these statistics, the hierarchical hurdle Gamma model was found to perform predictions the best

APA, Harvard, Vancouver, ISO, and other styles

47

Bornelöv, Susanne. "Rule-based Models of Transcriptional Regulation and Complex Diseases : Applications and Development." Doctoral thesis, Uppsala universitet, Beräknings- och systembiologi, 2014. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-230159.

Full text

Abstract:

As we gain increased understanding of genetic disorders and gene regulation more focus has turned towards complex interactions. Combinations of genes or gene and environmental factors have been suggested to explain the missing heritability behind complex diseases. Furthermore, gene activation and splicing seem to be governed by a complex machinery of histone modification (HM), transcription factor (TF), and DNA sequence signals. This thesis aimed to apply and develop multivariate machine learning methods for use on such biological problems. Monte Carlo feature selection was combined with rule-based classification to identify interactions between HMs and to study the interplay of factors with importance for asthma and allergy. Firstly, publicly available ChIP-seq data (Paper I) for 38 HMs was studied. We trained a classifier for predicting exon inclusion levels based on the HMs signals. We identified HMs important for splicing and illustrated that splicing could be predicted from the HM patterns. Next, we applied a similar methodology on data from two large birth cohorts describing asthma and allergy in children (Paper II). We identified genetic and environmental factors with importance for allergic diseases which confirmed earlier results and found candidate gene-gene and gene-environment interactions. In order to interpret and present the classifiers we developed Ciruvis, a web-based tool for network visualization of classification rules (Paper III). We applied Ciruvis on classifiers trained on both simulated and real data and compared our tool to another methodology for interaction detection using classification. Finally, we continued the earlier study on epigenetics by analyzing HM and TF signals in genes with or without evidence of bidirectional transcription (Paper IV). We identified several HMs and TFs with different signals between unidirectional and bidirectional genes. Among these, the CTCF TF was shown to have a well-positioned peak 60-80 bp upstream of the transcription start site in unidirectional genes.

APA, Harvard, Vancouver, ISO, and other styles

48

Koniaris, Christos. "Perceptually motivated speech recognition and mispronunciation detection." Doctoral thesis, KTH, Tal-kommunikation, 2012. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-102321.

Full text

Abstract:

This doctoral thesis is the result of a research effort performed in two fields of speech technology, i.e., speech recognition and mispronunciation detection. Although the two areas are clearly distinguishable, the proposed approaches share a common hypothesis based on psychoacoustic processing of speech signals. The conjecture implies that the human auditory periphery provides a relatively good separation of different sound classes. Hence, it is possible to use recent findings from psychoacoustic perception together with mathematical and computational tools to model the auditory sensitivities to small speech signal changes. The performance of an automatic speech recognition system strongly depends on the representation used for the front-end. If the extracted features do not include all relevant information, the performance of the classification stage is inherently suboptimal. The work described in Papers A, B and C is motivated by the fact that humans perform better at speech recognition than machines, particularly for noisy environments. The goal is to make use of knowledge of human perception in the selection and optimization of speech features for speech recognition. These papers show that maximizing the similarity of the Euclidean geometry of the features to the geometry of the perceptual domain is a powerful tool to select or optimize features. Experiments with a practical speech recognizer confirm the validity of the principle. It is also shown an approach to improve mel frequency cepstrum coefficients (MFCCs) through offline optimization. The method has three advantages: i) it is computationally inexpensive, ii) it does not use the auditory model directly, thus avoiding its computational cost, and iii) importantly, it provides better recognition performance than traditional MFCCs for both clean and noisy conditions. The second task concerns automatic pronunciation error detection. The research, described in Papers D, E and F, is motivated by the observation that almost all native speakers perceive, relatively easily, the acoustic characteristics of their own language when it is produced by speakers of the language. Small variations within a phoneme category, sometimes different for various phonemes, do not change significantly the perception of the language’s own sounds. Several methods are introduced based on similarity measures of the Euclidean space spanned by the acoustic representations of the speech signal and the Euclidean space spanned by an auditory model output, to identify the problematic phonemes for a given speaker. The methods are tested for groups of speakers from different languages and evaluated according to a theoretical linguistic study showing that they can capture many of the problematic phonemes that speakers from each language mispronounce. Finally, a listening test on the same dataset verifies the validity of these methods.<br><p>QC 20120914</p><br>European Union FP6-034362 research project ACORNS<br>Computer-Animated language Teachers (CALATea)

APA, Harvard, Vancouver, ISO, and other styles

49

Guedes, Silva Ronaldo Rouvher. "Nonparametric Models for Dependent Functional Data." Doctoral thesis, Università degli studi di Padova, 2016. http://hdl.handle.net/11577/3427133.

Full text

Abstract:

In the framework of functional data analysis we propose two Bayesian Nonparametric models. In the first model, motivated by an application in neuroimaging, functions are assumed to be spatially correlated and clustered together by an underlying Functional Dependent Dirichlet process which encodes a conditional autoregressive dependence structure to guide the spatial selection. Spatial symmetries of the functional responses in the brain can be appropriately accounted for in our framework. Motivated by the Italian natural gas balancing platform, in the second model time dependence are induced in the weights of the underlying Functional Dependent Dirichlet process through a dynamic linear model defined over a partitioned function space. Typical shape characteristics of the functions are modeled by flexible spline-based curve estimates as atoms of the process. In both applications Bayesian variable selection techniques are used to select significant sets of bases coefficients in each cluster. Gibbs sampling algorithms are developed for posterior computation, simulation studies and application to real data assess the performance of our approaches.<br>Nel contesto dell'analisi di dati funzionali, in questa tesi, vengono presentati due modelli Bayesiani non parametrici. Il primo modello è motivato da un problema di analisi di neuroimmagini, e considera funzioni correlate spazialmente e raggruppate seguendo un processo di Dirichlet dipendente funzionale (Functional Dependent Dirichlet process) che include una struttura di dipendenza autoregressiva condizionata per modellare la selezione spaziale. Un tale modello permette di considerare in maniera appropriata simmetrie spaziali delle risposte funzionali nel cervello. Il secondo modello è invece motivato dalla piattaforma italiana di bilanciamento del mercato del gas naturale e include una dipendenza temporale tra funzioni attraverso i pesi di un processo di Dirichlet dipendente funzionale basato su un modello lineare dinamico definito su una partizione dello spazio funzionale. Forme caratteristiche tipiche delle funzioni vengono modellate da curve flessibili basate su splines che formano gli atomi del processo di Dirichlet. In entrambe le applicazioni vengono usate tecniche bayesiane di selezione di variabili per scegliere le funzioni di base per le splines in ciascun cluster. Algoritmi di tipo Gibbs sampling sono sviluppati per il calcolo delle distribuzioni a posteriori. Vengono proposti studi di simulazione e applicazioni a dati reali per verificare l’appropriatezza degli approcci proposti.

APA, Harvard, Vancouver, ISO, and other styles

50

Lee, Seungwon Shawn. "Exploratory study of the impact of Information and Communication Technology (ICT)-based features in conference center selection/recommendation by meeting planners." Diss., Virginia Tech, 2009. http://hdl.handle.net/10919/37858.

Full text

Abstract:

This study examined the perceived importance of availability of ICT-based features and technical support on meeting plannersâ recommendation/selection of a conference center. In addition, this study attempted to explain relationships between meeting plannersâ beliefs (perceived usefulness and perceived ease of use) toward ICT-based features and other factors: personal innovativeness of ICT (PIICT); perceived importance of the availability of technical support; self efficacy; and result demonstrability. A conceptually integrated and expanded model of the Technology Acceptance Model (TAM) developed by Davis (1986, 1989) was used as a theoretical frame. The subjects of the study were meeting planners who used the selected two conference centers for their meetings or were considering them for their future meetings. A total of 167 usable responses were gathered and the proposed model was empirically examined using the data collected. The results of the model test revealed that the expanded TAM with the integration of key factors provided a systematic view of the meeting plannersâ beliefs in selection/recommendation of a conference center with ICT-based features. In addition, factor analysis of the fifteen ICT-based features revealed three underlying dimensions based on meeting plannersâ perceived importance of the availability of each feature for a conference center selection: 1) high-speed wireless Internet; 2) network backbone; and 3) ICT-based service outlet. Specifically, high-speed wireless Internet was the most important ICT-based determinant of a conference center selection/recommendation to all types of meeting planners. Due to the exploratory nature of this study, the results provided limited facets of the impact of ICT-based feature and technical support on meeting facility selection/recommendation. Nevertheless, this study is the first research effort of its kind to investigate what type of ICT-based feature and technical support impact conference center selection/recommendation by different types of meeting planners the most. The results revealed that corporate meeting planners consider wireless Internet and a fast network more important in selection than other types of meeting planners do. The availability of ICT-based features was less important to association meeting planners when they make a conference center selection. This study also identified that there is a serious lack of knowledge in terms related to network backbones across all types of meeting planners. Technical support, especially on-site technical support, was perceived as very important to all types of meeting planners. This study also identified that meeting planners with high PIICT possess stronger confidence in using and visualizing the advantages of ICT-based features. Thus conference centers should make efforts to measure meeting plannersâ PIICT and use the score effectively for their marketing of ICT-based features. The study also identified result demonstrabilityâ the visualizing of positive outcomes of using ICT-based featuresâ as very important to meeting planners. PIICT and result demonstrability were positively related to perceived ease of use and perceived usefulness which were identified as key antecedents of actual acceptance/usage of technology in previous studies. The results of the current study present an important step toward providing practical as well as theoretical implications for future technology impact studies in the context of meeting facility selection.<br>Ph. D.

APA, Harvard, Vancouver, ISO, and other styles

We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!