To see the other types of publications on this topic, follow the link: Scikit-learn.

Dissertations / Theses on the topic 'Scikit-learn'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 15 dissertations / theses for your research on the topic 'Scikit-learn.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.

1

Кулініч, Маргарита Миколаївна, Маргарита Николаевна Кулинич, and Marharyta Mykolaivna Kulinich. "Дослідження та розробка інтелектуальних систем керування проектами." Магістерська робота, ЗДІА, 2018. https://dspace.znu.edu.ua/jspui/handle/12345/357.

Full text
Abstract:
Кулініч, М. М. Дослідження та розробка інтелектуальних систем керування проектами [Електронний ресурс] : робота на здобуття кваліфікаційного ступеня магістра ; спец. : 121 – інженерія програмного забезпечення / М.М. Кулініч ; ЗДІА ; наук. кер. В.Г. Вербицький. – Запоріжжя, 2018. - 114 с.
UA : Метою роботи є дослідження методів автоматичного розподілення задач, та ство-рення автоматизованої системи розподілу проектних задач між виконавцями для най-більш оптимального розподілення часу ви-конання проекту. В результаті роботи досліджені методи автоматичного розподілення задач, пробле-ми сучасних систем керування проектами, обрано мову програмування Python. Для роз-робки були використані фреймворк Django, як фронтенд фреймворк, різні методи ма-шинного навчання. Досліджено принципи роботи і можливості обраних технологій. Ре-зультатами роботи є створення програмного продукту, що буде надавати можливості ав-томатично розподіляти проекти і задачі між виконавцями.
RU : Целью работы является исследование методов автоматического распределения за-дач, создание автоматизированной системы распределения проектных задач между ис-полнителями для более оптимального рас-пределения времени выполнения проекта. Результатом работы является исследо-вание методов автоматического распределе-ния задач, проблемы современных систем управления проектами, выбран язык про-граммирования Python. Для разработки были использованы фреймворк Django, как фрон-тенд фреймворк, разные методы машинного обучения. Исследовано принципы работы и возможности выбранных технологий. Ре-зультатом работы является создание про-граммного продукта, который предоставит возможность автоматически распределять задачи между исполнителями.
EN : Objective: The research of methods for the automatic allocation of tasks, and the creation of an automated system for distributing project tasks among the performers for the optimal allo-cation of project implementation time. Results: The methods of automatic alloca-tion of tasks, problems of modern project man-agement systems, the Python programming lan-guage were chosen. For development, the framework of Django, as a frontend framework, and various methods of machine learning, were used. The principles of work and possibilities of the chosen technologies are investigated. The results of the work are the creation of a software product, which will provide an opportunity to automatically distribute tasks between perform-ers.
APA, Harvard, Vancouver, ISO, and other styles
2

Nguyen, John, and Kasper Lindén. "Creating a Back Stock to Increase Order Delivery and Pickup Availability." Thesis, KTH, Hälsoinformatik och logistik, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-252798.

Full text
Abstract:
Apotek Hjärtat wants to keep developing their e-commerce website and improve retrieval and delivery of orders to customers. Click and Collect and Click and Express are two options for retrieving e-commerce orders that are available if all products in the order are present in the store. By implementing a back stock in the stores with popular e-commercial items, all products of an order will more often be present in the store. The back stock will in such a way increase the availability of Click and Collect and Click and Express. The goals for the study are to conduct a pilot study, compare methods and possible solutions to implement a model to reach the goals. The pilot study was made by studying previous works in mathematical statistics methods and machine learning methods. The statistical method was accomplished through the analytical tool Statistical Package for the Social Sciences (SPSS) and Java. The machine learning method was accomplished through Python and the Scikit-learn library. The machine learning method was performed by a regression algorithm that was used to find relations between category sales and pollen forecasts. The statistical and machine learning methods were compared to each other. Both gave identical results, but the machine learning method was more functional and easier to further develop and consequently was chosen. Several models were created for a few selected product categories. The categories that did not work for the models had an unrealistic amount of sold products. These amounts could be negative or extremely high when unknown inputs were introduced. A simulation was made of the back stock to estimate how it would increase the availability of Click and Collect/Click and Express. The machine learning models could need more data for more accurate predictions. A conclusion could be made though that is possible to predict the amount of sold products of certain categories such as Allergy and Child Medicine with pollen halt taken into account.
Apotek Hjärtat vill fortsätta utveckla sin e-handelssida och förbättra upphämtning och leverans av ordrar till kund. Click and Collect och Click and Express är två val för att hämta upp e-handelsordrar som finns tillgängliga om alla produkter i ordern finns i butik. Genom att implementera ett baklager i butiker med populära unika ehandelsprodukter kommer alla produkter i en order oftare att finnas i butik. Baklagret kommer på så vis öka tillgängligheten av Click and Collect och Click and Express. Målen är att utföra en förstudie, samt att jämföra och hitta en bra lösning att implementera en modell för att uppnå målen. Förstudien gick ut på att analysera tidigare arbeten inom matematiska statistikmetoder och maskininlärningsmetoder. Den statistiska metoden utfördes genom det analytiska verktyget Statistical Package for the Social Sciences (SPSS) och Java. Maskininlärningsmetoden utvecklades med hjälp av Python och Scikit-learn biblioteket. Maskinlärningsmetoden utfördes genom en regressionsalgoritm som användes för att ta fram flera modeller för relationer mellan försäljning av kategorier och pollenprognoser. Statistiska metoden och maskininlärningsmetoden jämfördes med varandra. Båda gav identiska resultat men maskininlärning var mer funktionellt och enklare att vidareutveckla och därför valdes den metoden. Flera olika modeller lyckades tas fram för en del produktkategorier. De kategorier som inte fungerade för modellerna hade orealistiska mängder sålda varor. Dessa mängder kunde vara negativa eller extremt höga när okända inputs introducerades. Med hjälp av simulationen var det möjligt att uppskatta hur baklagret skulle öka tillgängligheten av Click and Collect/Express. Maskininlärningsmodellerna skulle behöva mer data, som kommer i framtiden, för att ge en mer precis prediktering mellan pollenvärden. Som slutsats är det möjligt att använda dem i framtiden för vissa kategorier som allergi och barnmedicin.
APA, Harvard, Vancouver, ISO, and other styles
3

Paulavets, Anastasiya. "Návrh systému pro doporučování pracovních příležitostí." Master's thesis, Vysoká škola ekonomická v Praze, 2014. http://www.nusl.cz/ntk/nusl-193343.

Full text
Abstract:
This thesis deals with recommender systems in the field of e-recruitment. The main objective is to design a job recommender system for career portal UNIjobs.cz. First, the theoretical background of recommender systems is provided. In the following part, specific properties of job recommender systems are discussed, as well as existing approaches to recommendation in the e-recruitment environment. The last part of the thesis is dedicated to designing a recommender system for career portal UNIjobs.cz. The output of that part is the main contribution of the thesis.
APA, Harvard, Vancouver, ISO, and other styles
4

Panopoulos, Vasileios. "Near Real-time Detection of Masquerade attacks in Web applications : catching imposters using their browsing behavor." Thesis, KTH, Kommunikationsnät, 2016. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-183777.

Full text
Abstract:
This Thesis details the research on Machine Learning techniques that are central in performing Anomaly and Masquerade attack detection. The main focus is put on Web Applications because of their immense popularity and ubiquity. This popularity has led to an increase in attacks, making them the most targeted entry point to violate a system. Specifically, a group of attacks that range from identity theft using social engineering to cross site scripting attacks, aim at exploiting and masquerading users. Masquerading attacks are even harder to detect due to their resemblance with normal sessions, thus posing an additional burden. Concerning prevention, the diversity and complexity of those systems makes it harder to define reliable protection mechanisms. Additionally, new and emerging attack patterns make manually configured and Signature based systems less effective with the need to continuously update them with new rules and signatures. This leads to a situation where they eventually become obsolete if left unmanaged. Finally the huge amount of traffic makes manual inspection of attacks and False alarms an impossible task. To tackle those issues, Anomaly Detection systems are proposed using powerful and proven Machine Learning algorithms. Gravitating around the context of Anomaly Detection and Machine Learning, this Thesis initially defines several basic definitions such as user behavior, normality and normal and anomalous behavior. Those definitions aim at setting the context in which the proposed method is targeted and at defining the theoretical premises. To ease the transition into the implementation phase, the underlying methodology is also explained in detail. Naturally, the implementation is also presented, where, starting from server logs, a method is described on how to pre-process the data into a form suitable for classification. This preprocessing phase was constructed from several statistical analyses and normalization methods (Univariate Selection, ANOVA) to clear and transform the given logs and perform feature selection. Furthermore, given that the proposed detection method is based on the source and1request URLs, a method of aggregation is proposed to limit the user privacy and classifier over-fitting issues. Subsequently, two popular classification algorithms (Multinomial Naive Bayes and Support Vector Machines) have been tested and compared to define which one performs better in our given situations. Each of the implementation steps (pre-processing and classification) requires a number of different parameters to be set and thus a method called Hyper-parameter optimization is defined. This method searches for the parameters that improve the classification results. Moreover, the training and testing methodology is also outlined alongside the experimental setup. The Hyper-parameter optimization and the training phases are the most computationally intensive steps, especially given a large number of samples/users. To overcome this obstacle, a scaling methodology is also defined and evaluated to demonstrate its ability to handle larger data sets. To complete this framework, several other options have been also evaluated and compared to each other to challenge the method and implementation decisions. An example of this, is the "Transitions-vs-Pages" dilemma, the block restriction effect, the DR usefulness and the classification parameters optimization. Moreover, a Survivability Analysis is performed to demonstrate how the produced alarms could be correlated affecting the resulting detection rates and interval times. The implementation of the proposed detection method and outlined experimental setup lead to interesting results. Even so, the data-set that has been used to produce this evaluation is also provided online to promote further investigation and research on this field.
Det här arbetet behandlar forskningen på maskininlärningstekniker som är centrala i utförandet av detektion av anomali- och maskeradattacker. Huvud-fokus läggs på webbapplikationer på grund av deras enorma popularitet och att de är så vanligt förekommande. Denna popularitet har lett till en ökning av attacker och har gjort dem till den mest utsatta punkten för att bryta sig in i ett system. Mer specifikt så syftar en grupp attacker som sträcker sig från identitetsstölder genom social ingenjörskonst, till cross-site scripting-attacker, på att exploatera och maskera sig som olika användare. Maskeradattacker är ännu svårare att upptäcka på grund av deras likhet med vanliga sessioner, vilket utgör en ytterligare börda. Vad gäller förebyggande, gör mångfalden och komplexiteten av dessa system det svårare att definiera pålitliga skyddsmekanismer. Dessutom gör nya och framväxande attackmönster manuellt konfigurerade och signaturbaserade system mindre effektiva på grund av behovet att kontinuerligt uppdatera dem med nya regler och signaturer. Detta leder till en situation där de så småningom blir obsoleta om de inte sköts om. Slutligen gör den enorma mängden trafik manuell inspektion av attacker och falska alarm ett omöjligt uppdrag. För att ta itu med de här problemen, föreslås anomalidetektionssystem som använder kraftfulla och beprövade maskininlärningsalgoritmer. Graviterande kring kontexten av anomalidetektion och maskininlärning, definierar det här arbetet först flera enkla definitioner såsom användarbeteende, normalitet, och normalt och anomalt beteende. De här definitionerna syftar på att fastställa sammanhanget i vilket den föreslagna metoden är måltavla och på att definiera de teoretiska premisserna. För att under-lätta övergången till implementeringsfasen, förklaras även den bakomliggande metodologin i detalj. Naturligtvis presenteras även implementeringen, där, med avstamp i server-loggar, en metod för hur man kan för-bearbeta datan till en form som är lämplig för klassificering beskrivs. Den här för´-bearbetningsfasen konstruerades från flera statistiska analyser och normaliseringsmetoder (univariate se-lection, ANOVA) för att rensa och transformera de givna loggarna och utföra feature selection. Dessutom, givet att en föreslagen detektionsmetod är baserad på käll- och request-URLs, föreslås en metod för aggregation för att begränsa problem med överanpassning relaterade till användarsekretess och klassificerare. Efter det så testas och jämförs två populära klassificeringsalgoritmer (Multinomialnaive bayes och Support vector machines) för att definiera vilken som fungerar bäst i våra givna situationer. Varje implementeringssteg (för-bearbetning och klassificering) kräver att ett antal olika parametrar ställs in och således definieras en metod som kallas Hyper-parameter optimization. Den här metoden söker efter parametrar som förbättrar klassificeringsresultaten. Dessutom så beskrivs tränings- och test-ningsmetodologin kortfattat vid sidan av experimentuppställningen. Hyper-parameter optimization och träningsfaserna är de mest beräkningsintensiva stegen, särskilt givet ett stort urval/stort antal användare. För att övervinna detta hinder så definieras och utvärderas även en skalningsmetodologi baserat på dess förmåga att hantera stora datauppsättningar. För att slutföra detta ramverk, utvärderas och jämförs även flera andra alternativ med varandra för att utmana metod- och implementeringsbesluten. Ett exempel på det är ”Transitions-vs-Pages”-dilemmat, block restriction-effekten, DR-användbarheten och optimeringen av klassificeringsparametrarna. Dessu-tom så utförs en survivability analysis för att demonstrera hur de producerade alarmen kan korreleras för att påverka den resulterande detektionsträ˙säker-heten och intervalltiderna. Implementeringen av den föreslagna detektionsmetoden och beskrivna experimentuppsättningen leder till intressanta resultat. Icke desto mindre är datauppsättningen som använts för att producera den här utvärderingen också tillgänglig online för att främja vidare utredning och forskning på området.
APA, Harvard, Vancouver, ISO, and other styles
5

Gustavsson, Vilhelm. "Machine Learning for a Network-based Intrusion Detection System : An application using Zeek and the CICIDS2017 dataset." Thesis, KTH, Hälsoinformatik och logistik, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-253273.

Full text
Abstract:
Cyber security is an emerging field in the IT-sector. As more devices are connected to the internet, the attack surface for hackers is steadily increasing. Network-based Intrusion Detection Systems (NIDS) can be used to detect malicious traffic in networks and Machine Learning is an up and coming approach for improving the detection rate. In this thesis the NIDS Zeek is used to extract features based on time and data size from network traffic. The features are then analyzed with Machine Learning in Scikit-Learn in order to detect malicious traffic. A 98.58% Bayesian detection rate was achieved for the CICIDS2017 which is about the same level as the results from previous works on CICIDS2017 (without Zeek). The best performing algorithms were K-Nearest Neighbors, Random Forest and Decision Tree.
IT-säkerhet är ett växande fält inom IT-sektorn. I takt med att allt fler saker ansluts till internet, ökar även angreppsytan och risken för IT-attacker. Ett Nätverksbaserat Intrångsdetekteringssystem (NIDS) kan användas för att upptäcka skadlig trafik i nätverk och maskininlärning har blivit ett allt vanligare sätt att förbättra denna förmåga. I det här examensarbetet används ett NIDS som heter Zeek för att extrahera parametrar baserade på tid och datastorlek från nätverkstrafik. Dessa parametrar analyseras sedan med maskininlärning i Scikit-Learn för att upptäcka skadlig trafik. För datasetet CICIDS2017 uppnåddes en Bayesian detection rate på 98.58% vilket är på ungefär samma nivå som resultat från tidigare arbeten med CICIDS2017 (utan Zeek). Algoritmerna som gav bäst resultat var K-Nearest Neighbors, Random Forest och Decision Tree.
APA, Harvard, Vancouver, ISO, and other styles
6

Avena, Anna. "Tecniche di data mining applicate alla decodifica di dati neurali." Bachelor's thesis, Alma Mater Studiorum - Università di Bologna, 2017. http://amslaurea.unibo.it/14800/.

Full text
Abstract:
Gli studi sulla decodifica dell'attività neuronale permettono di mappare gli impulsi elettrici della corteccia cerebrale in segnali da inviare a determinati dispositivi per poterli monitorare. È su questo tema che la ricerca scientifica si sta concentrando, al fine di aiutare le persone affette da gravi lesioni fisiche ad ottenere un maggiore grado di autonomia nelle piccole azioni di tutti i giorni. In questo elaborato, sono stati analizzati dati derivanti da attività neuronali raccolti da esperimenti effettuati su primati non umani, eseguiti dal gruppo di ricerca della professoressa Patrizia Fattori nel Dipartimento di Farmacia e Biotecnologie dell'Università di Bologna. Per lo svolgimento di questo esperimento, la cavia, è stata addestrata a svolgere un compito che consiste nell'afferrare gli oggetti proposti, uno alla volta, in ordine casuale. Durante l'esercizio, l'attività neuronale della cavia è stata registrata in vettori contenenti l'attività di spiking. Ciò che si cerca di fare in questa tesi è ricostruire l'informazione relativa all'attività di una popolazione di neuroni, dato il suo spike vector. Sono stati testati diversi algoritmi di classificazione e feature al fine di stabilire quale configurazione sia più affidabile per il riconoscimento dell'attività motoria svolta dalla cavia durante l'esperimento. A tal proposito, è stato implementato un processo di data mining attraverso l'utilizzo del linguaggio python e del framework Scikit-learn che permette di effettuare più classificazioni e stabilire quale fornisce una migliore performance. I risultati dell'analisi dimostrano che alcune feature forniscono alti tassi di riconoscimento e che, a seconda del dominio del problema, è più indicato un determinato tipo di preprocessing rispetto ad un altro.
APA, Harvard, Vancouver, ISO, and other styles
7

Valešová, Nikola. "Bioinformatický nástroj pro klasifikaci bakterií do taxonomických kategorií na základě sekvence genu 16S rRNA." Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2019. http://www.nusl.cz/ntk/nusl-403138.

Full text
Abstract:
Tato práce se zabývá problematikou automatizované klasifikace a rozpoznávání bakterií po získání jejich DNA procesem sekvenování. V rámci této práce je navržena a popsána nová metoda klasifikace založená na základě segmentu 16S rRNA. Představený princip je vytvořen podle stromové struktury taxonomických kategorií a používá známé algoritmy strojového učení pro klasifikaci bakterií do jedné ze tříd na nižší taxonomické úrovni. Součástí práce je dále implementace popsaného algoritmu a vyhodnocení jeho přesnosti predikce. Přesnost klasifikace různých typů klasifikátorů a jejich nastavení je prozkoumána a je určeno nastavení, které dosahuje nejlepších výsledků. Přesnost implementovaného algoritmu je také porovnána s několika existujícími metodami. Během validace dosáhla implementovaná aplikace KTC více než 45% přesnosti při predikci rodu na datových sadách BLAST 16S i BLAST V4. Na závěr je zmíněno i několik možností vylepšení a rozšíření stávající implementace algoritmu.
APA, Harvard, Vancouver, ISO, and other styles
8

Ramanayaka, Mudiyanselage Asanga. "Data Engineering and Failure Prediction for Hard Drive S.M.A.R.T. Data." Bowling Green State University / OhioLINK, 2020. http://rave.ohiolink.edu/etdc/view?acc_num=bgsu1594957948648404.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Haglund, Robin. "Automated analysis of battery articles." Thesis, Uppsala universitet, Strukturkemi, 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-403738.

Full text
Abstract:
Journal articles are the formal medium for the communication of results among scientists, and often contain valuable data. However, manually collecting article data from a large field like lithium-ion battery chemistry is tedious and time consuming, which is an obstacle when searching for statistical trends and correlations to inform research decisions. To address this a platform for the automatic retrieval and analysis of large numbers of articles is created and applied to the field of lithium-ion battery chemistry. Example data produced by the platform is presented and evaluated and sources of error limiting this type of platform are identified, with problems related to text extraction and pattern matching being especially significant. Some solutions to these problems are presented and potential future improvements are proposed.
APA, Harvard, Vancouver, ISO, and other styles
10

Urbanczyk, Martin. "Webový simulátor fotbalových lig a turnajů." Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2019. http://www.nusl.cz/ntk/nusl-403171.

Full text
Abstract:
This thesis is about the creation of a simulator of football leagues and championships. I studied the problematics of football competitions and their systems and also about the base of machine learning. There was also an analysis of similar and existing solutions and I took inspiration for my proposal from them. After that, I made the design of the whole simulator structure and of all of its key parts. Then the simulator was implemented and tested. The application allows simulating top five competitions in UEFA club coefficients rating.
APA, Harvard, Vancouver, ISO, and other styles
11

Lanzarone, Lorenzo Biagio. "Manutenzione predittiva di macchinari industriali tramite tecniche di intelligenza artificiale: una valutazione sperimentale." Master's thesis, Alma Mater Studiorum - Università di Bologna, 2021. http://amslaurea.unibo.it/22853/.

Full text
Abstract:
Nella società è in corso un processo di evoluzione tecnologica, il quale sviluppa una connessione tra l’ambiente fisico e l’ambiente digitale, per scambiare dati e informazioni. Nella presente tesi si approfondisce, nel contesto dell’Industria 4.0, la tematica della manutenzione predittiva di macchinari industriali tramite tecniche di intelligenza artificiale, per prevedere in anticipo il verificarsi di un imminente guasto, identificandolo prima ancora che si possa verificare. La presente tesi è divisa in due parti complementari, nella prima parte si approfondiscono gli aspetti teorici relativi al contesto e allo stato dell’arte, mentre nella seconda parte gli aspetti pratici e progettuali. In particolare, la prima parte è dedicata a fornire una panoramica sull’Industria 4.0 e su una sua applicazione, rappresentata dalla manutenzione predittiva. Successivamente vengono affrontate le tematiche inerenti l’intelligenza artificiale e la Data Science, tramite le quali è possibile applicare la manutenzione predittiva. Nella seconda parte invece, si propone un progetto pratico, ossia il lavoro da me svolto durante un tirocinio presso la software house Open Data di Funo di Argelato (Bologna). L’obiettivo del progetto è stato la realizzazione di un sistema informatico di manutenzione predittiva di macchinari industriali per lo stampaggio plastico a iniezione, utilizzando tecniche di intelligenza artificiale. Il fine ultimo è l’integrazione di tale sistema all’interno del software Opera MES sviluppato dall’azienda.
APA, Harvard, Vancouver, ISO, and other styles
12

Giuliani, Luca. "Extending the Moving Targets Method for Injecting Constraints in Machine Learning." Master's thesis, Alma Mater Studiorum - Università di Bologna, 2021. http://amslaurea.unibo.it/23885/.

Full text
Abstract:
Informed Machine Learning is an umbrella term that comprises a set of methodologies in which domain knowledge is injected into a data-driven system in order to improve its level of accuracy, satisfy some external constraint, and in general serve the purposes of explainability and reliability. The said topid has been widely explored in the literature by means of many different techniques. Moving Targets is one such a technique particularly focused on constraint satisfaction: it is based on decomposition and bi-level optimization and proceeds by iteratively refining the target labels through a master step which is in charge of enforcing the constraints, while the training phase is delegated to a learner. In this work, we extend the algorithm in order to deal with semi-supervised learning and soft constraints. In particular, we focus our empirical evaluation on both regression and classification tasks involving monotonicity shape constraints. We demonstrate that our method is robust with respect to its hyperparameters, as well as being able to generalize very well while reducing the number of violations on the enforced constraints. Additionally, the method can even outperform, both in terms of accuracy and constraint satisfaction, other state-of-the-art techniques such as Lattice Models and Semantic-based Regularization with a Lagrangian Dual approach for automatic hyperparameter tuning.
APA, Harvard, Vancouver, ISO, and other styles
13

Konečný, Antonín. "Využití umělé inteligence v technické diagnostice." Master's thesis, Vysoké učení technické v Brně. Fakulta strojního inženýrství, 2021. http://www.nusl.cz/ntk/nusl-443221.

Full text
Abstract:
The diploma thesis is focused on the use of artificial intelligence methods for evaluating the fault condition of machinery. The evaluated data are from a vibrodiagnostic model for simulation of static and dynamic unbalances. The machine learning methods are applied, specifically supervised learning. The thesis describes the Spyder software environment, its alternatives, and the Python programming language, in which the scripts are written. It contains an overview with a description of the libraries (Scikit-learn, SciPy, Pandas ...) and methods — K-Nearest Neighbors (KNN), Support Vector Machines (SVM), Decision Trees (DT) and Random Forests Classifiers (RF). The results of the classification are visualized in the confusion matrix for each method. The appendix includes written scripts for feature engineering, hyperparameter tuning, evaluation of learning success and classification with visualization of the result.
APA, Harvard, Vancouver, ISO, and other styles
14

Mervin, Lewis. "Improved in silico methods for target deconvolution in phenotypic screens." Thesis, University of Cambridge, 2018. https://www.repository.cam.ac.uk/handle/1810/283004.

Full text
Abstract:
Target-based screening projects for bioactive (orphan) compounds have been shown in many cases to be insufficiently predictive for in vivo efficacy, leading to attrition in clinical trials. Phenotypic screening has hence undergone a renaissance in both academia and in the pharmaceutical industry, partly due to this reason. One key shortcoming of this paradigm shift is that the protein targets modulated need to be elucidated subsequently, which is often a costly and time-consuming procedure. In this work, we have explored both improved methods and real-world case studies of how computational methods can help in target elucidation of phenotypic screens. One limitation of previous methods has been the ability to assess the applicability domain of the models, that is, when the assumptions made by a model are fulfilled and which input chemicals are reliably appropriate for the models. Hence, a major focus of this work was to explore methods for calibration of machine learning algorithms using Platt Scaling, Isotonic Regression Scaling and Venn-Abers Predictors, since the probabilities from well calibrated classifiers can be interpreted at a confidence level and predictions specified at an acceptable error rate. Additionally, many current protocols only offer probabilities for affinity, thus another key area for development was to expand the target prediction models with functional prediction (activation or inhibition). This extra level of annotation is important since the activation or inhibition of a target may positively or negatively impact the phenotypic response in a biological system. Furthermore, many existing methods do not utilize the wealth of bioactivity information held for orthologue species. We therefore also focused on an in-depth analysis of orthologue bioactivity data and its relevance and applicability towards expanding compound and target bioactivity space for predictive studies. The realized protocol was trained with 13,918,879 compound-target pairs and comprises 1,651 targets, which has been made available for public use at GitHub. Consequently, the methodology was applied to aid with the target deconvolution of AstraZeneca phenotypic readouts, in particular for the rationalization of cytotoxicity and cytostaticity in the High-Throughput Screening (HTS) collection. Results from this work highlighted which targets are frequently linked to the cytotoxicity and cytostaticity of chemical structures, and provided insight into which compounds to select or remove from the collection for future screening projects. Overall, this project has furthered the field of in silico target deconvolution, by improving the performance and applicability of current protocols and by rationalizing cytotoxicity, which has been shown to influence attrition in clinical trials.
APA, Harvard, Vancouver, ISO, and other styles
15

Erickson, Joshua N. "Evaluation of computational methods for data prediction." Thesis, 2014. http://hdl.handle.net/1828/5662.

Full text
Abstract:
Given the overall increase in the availability of computational resources, and the importance of forecasting the future, it should come as no surprise that prediction is considered to be one of the most compelling and challenging problems for both academia and industry in the world of data analytics. But how is prediction done, what factors make it easier or harder to do, how accurate can we expect the results to be, and can we harness the available computational resources in meaningful ways? With efforts ranging from those designed to save lives in the moments before a near field tsunami to others attempting to predict the performance of Major League Baseball players, future generations need to have realistic expectations about prediction methods and analytics. This thesis takes a broad look at the problem, including motivation, methodology, accuracy, and infrastructure. In particular, a careful study involving experiments in regression, the prediction of continuous, numerical values, and classification, the assignment of a class to each sample, is provided. The results and conclusions of these experiments cover only the included data sets and the applied algorithms as implemented by the Python library. The evaluation includes accuracy and running time of different algorithms across several data sets to establish tradeoffs between the approaches, and determine the impact of variations in the size of the data sets involved. As scalability is a key characteristic required to meet the needs of future prediction problems, a discussion of some of the challenges associated with parallelization is included.
Graduate
0984
erickson@uvic.ca
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography