Dissertations / Theses on the topic 'Mining software engineering data'
Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles
Consult the top 50 dissertations / theses for your research on the topic 'Mining software engineering data.'
Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.
You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.
Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.
Delorey, Daniel Pierce. "Observational Studies of Software Engineering Using Data from Software Repositories." Diss., CLICK HERE for online access, 2007. http://contentdm.lib.byu.edu/ETD/image/etd1716.pdf.
Full textUnterkalmsteiner, Michael. "Coordinating requirements engineering and software testing." Doctoral thesis, Blekinge Tekniska Högskola, Institutionen för programvaruteknik, 2015. http://urn.kb.se/resolve?urn=urn:nbn:se:bth-663.
Full textSantamaría, Diego, and Álvaro de Ramón. "Data Mining Web-Tool Prototype Using Monte Carlo Simulations." Thesis, Blekinge Tekniska Högskola, Avdelningen för programvarusystem, 2008. http://urn.kb.se/resolve?urn=urn:nbn:se:bth-3164.
Full textWaters, Robert Lee. "Obtaining Architectural Descriptions from Legacy Systems: The Architectural Synthesis Process (ASP)." Diss., Available online, Georgia Institute of Technology, 2004:, 2004. http://etd.gatech.edu/theses/available/etd-10272004-160115/unrestricted/waters%5Frobert%5Fl%5F200412%5Fphd.pdf.
Full textRick Kazman, Committee Member ; Colin Potts, Committee Member ; Mike McCracken, Committee Member ; Gregory Abowd, Committee Chair ; Spencer Rugaber, Committee Member. Includes bibliographical references.
Matyja, Dariusz. "Applications of data mining algorithms to analysis of medical data." Thesis, Blekinge Tekniska Högskola, Avdelningen för programvarusystem, 2007. http://urn.kb.se/resolve?urn=urn:nbn:se:bth-4253.
Full textImam, Ayad Tareq. "Relative-fuzzy : a novel approach for handling complex ambiguity for software engineering of data mining models." Thesis, De Montfort University, 2010. http://hdl.handle.net/2086/3909.
Full textThun, Julia, and Rebin Kadouri. "Automating debugging through data mining." Thesis, KTH, Data- och elektroteknik, 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-203244.
Full textDagens system genererar stora mängder av loggmeddelanden. Dessa meddelanden kan effektivt lagras, sökas och visualiseras genom att använda sig av logghanteringsverktyg. Analys av loggmeddelanden ger insikt i systemets beteende såsom prestanda, serverstatus och exekveringsfel som kan uppkomma i webbapplikationer. iStone AB vill undersöka möjligheten att automatisera felsökning. Eftersom iStone till mestadels utför deras felsökning manuellt så tar det tid att hitta fel inom systemet. Syftet var att därför att finna olika lösningar som reducerar tiden det tar att felsöka. En analys av loggmeddelanden inom access – och konsolloggar utfördes för att välja de mest lämpade data mining tekniker för iStone’s system. Data mining algoritmer och logghanteringsverktyg jämfördes. Resultatet av jämförelserna visade att ELK Stacken samt en blandning av Eclat och en hybrid algoritm (Eclat och Apriori) var de lämpligaste valen. För att visa att så är fallet så implementerades ELK Stacken och Eclat. De framställda resultaten visar att data mining och användning av en plattform för logganalys kan underlätta och minska den tid det tar för att felsöka.
Sobolewska, Katarzyna-Ewa. "Web links utility assessment using data mining techniques." Thesis, Blekinge Tekniska Högskola, Avdelningen för programvarusystem, 2006. http://urn.kb.se/resolve?urn=urn:nbn:se:bth-2936.
Full textakasha.kate@gmail.com
Saltin, Joakim. "Interactive visualization of financial data : Development of a visual data mining tool." Thesis, Uppsala universitet, Institutionen för informationsteknologi, 2012. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-181225.
Full textAllahyari, Hiva. "On the concept of Understandability as a Property of Data mining Quality." Thesis, Blekinge Tekniska Högskola, Sektionen för datavetenskap och kommunikation, 2010. http://urn.kb.se/resolve?urn=urn:nbn:se:bth-6134.
Full textGupta, Shweta. "Software Development Productivity Metrics, Measurements and Implications." Thesis, University of Oregon, 2018. http://hdl.handle.net/1794/23816.
Full textGüneş, Serkan. "Investment and Financial Forecasting : A Data Mining Approach on Port Industry." Thesis, Blekinge Tekniska Högskola, Sektionen för datavetenskap och kommunikation, 2009. http://urn.kb.se/resolve?urn=urn:nbn:se:bth-5340.
Full textBarysau, Mikalai. "Developers' performance analysis based on code review data : How to perform comparisons of different groups of developers." Thesis, Blekinge Tekniska Högskola, Institutionen för programvaruteknik, 2016. http://urn.kb.se/resolve?urn=urn:nbn:se:bth-13335.
Full textPolańska, Julia, and Michał Zyznarski. "Elaboration of a method for comparison of Business Intelligence Systems which support data mining process." Thesis, Blekinge Tekniska Högskola, Sektionen för datavetenskap och kommunikation, 2009. http://urn.kb.se/resolve?urn=urn:nbn:se:bth-2078.
Full textAftarczuk, Kamila. "Evaluation of selected data mining algorithms implemented in Medical Decision Support Systems." Thesis, Blekinge Tekniska Högskola, Avdelningen för programvarusystem, 2007. http://urn.kb.se/resolve?urn=urn:nbn:se:bth-6194.
Full textCruzes, Daniela Soares. "Analise secundaria de estudos experimentais em engenharia de software." [s.n.], 2007. http://repositorio.unicamp.br/jspui/handle/REPOSIP/260999.
Full textTese (doutorado) - Universidade Estadual de Campinas, Faculdade de Engenharia Eletrica e de Computação
Made available in DSpace on 2018-08-09T03:08:37Z (GMT). No. of bitstreams: 1 Cruzes_DanielaSoares_D.pdf: 5878913 bytes, checksum: 3daddec5bb0c08c955c288b74419bccc (MD5) Previous issue date: 2007
Resumo: Enquanto é claro que existem muitas fontes de variação de um contexto de desenvolvimento de software para outro, não é claro, a priori, quais variáveis específicas influenciarão a eficácia de um processo, de uma técnica ou de um método em um determinado contexto. Por esta razão, o conhecimento sobre a engenharia de software deve ser construído a partir de muitos estudos, executados tanto em contextos similares como em contextos diferentes entre si. Trabalhos precedentes discutiram como projetar estudos relacionados documentando tão precisamente quanto possível os valores de variáveis do contexto para assim poder comparálos com os valores observados em novos estudos. Esta abordagem é importante, porém argumentamos neste trabalho que uma abordagem oportunística também é prática. A abordagem de análise secundária de estudos discutida neste trabalho (SecESE) visa combinar resultados de múltiplos estudos individuais realizados independentemente, permitindo a expansão do conhecimento experimental em engenharia de software. Usamos uma abordagem baseada na codificação da informação extraída dos artigos e dos dados experimentais em uma base estruturada. Esta base pode então ser minerada para extrair novos conhecimentos de maneira simples e flexível
Abstract: While it is clear that there are many sources of variation from one software development context to another, it is not clear a priori, what specific variables will influence the effectiveness of a process, technique, or method in a given context. For this reason, we argue that knowledge about software engineering must be built from many studies, in which related studies are run within similar contexts as well as very different ones. Previous works have discussed how to design related studies so as to document as precisely as possible the values of context variables and be able to compare with those observed in new studies. While such a planned approach is important, we argue that an opportunistic approach is also practical. This approach would combine results from multiple individual studies after the fact, enabling the expansion of empirical software engineering knowledge from large evidence bases. In this dissertation, we describe a process to build empirical knowledge about software engineering. It uses an approach based on encoding the information extracted from papers and experimental data into a structured base. This base can then be mined to extract new knowledge from it in a simple and flexible way
Doutorado
Engenharia de Computação
Doutor em Engenharia Elétrica
Burji, Supreeth Jagadish. "Reverse Engineering of a Malware : Eyeing the Future of Computer Security." Akron, OH : University of Akron, 2009. http://rave.ohiolink.edu/etdc/view?acc%5Fnum=akron1247447165.
Full text"August, 2009." Title from electronic thesis title page (viewed 11/11/2009) Advisor, Kathy J. Liszka; Faculty Readers, Timothy W. O'Neil, Wolfgang Pelz; Department Chair, Chien-Chung Chan; Dean of the College, Chand Midha; Dean of the Graduate School, George R. Newkome. Includes bibliographical references.
Taylor, Quinn Carlson. "Analysis and Characterization of Author Contribution Patterns in Open Source Software Development." BYU ScholarsArchive, 2012. https://scholarsarchive.byu.edu/etd/2971.
Full textKrüger, Franz David, and Mohamad Nabeel. "Hyperparameter Tuning Using Genetic Algorithms : A study of genetic algorithms impact and performance for optimization of ML algorithms." Thesis, Mittuniversitetet, Institutionen för informationssystem och –teknologi, 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:miun:diva-42404.
Full textAs machine learning (ML) is being more and more frequent in the business world, information gathering through Data mining (DM) is on the rise, and DM-practitioners are generally using several thumb rules to avoid having to spend a decent amount of time to tune the hyperparameters (parameters that control the learning process) of an ML algorithm to gain a high accuracy score. The proposal in this report is to conduct an approach that systematically optimizes the ML algorithms using genetic algorithms (GA) and to evaluate if and how the model should be constructed to find global solutions for a specific data set. By implementing a GA approach on two ML-algorithms, K-nearest neighbors, and Random Forest, on two numerical data sets, Iris data set and Wisconsin breast cancer data set, the model is evaluated by its accuracy scores as well as the computational time which then is compared towards a search method, specifically exhaustive search. The results have shown that it is assumed that GA works well in finding great accuracy scores in a reasonable amount of time. There are some limitations as the parameter’s significance towards an ML algorithm may vary.
Chu, Justin. "CONTEXT-AWARE DEBUGGING FOR CONCURRENT PROGRAMS." UKnowledge, 2017. https://uknowledge.uky.edu/cs_etds/61.
Full textvan, Schaik Sebastiaan Johannes. "A framework for processing correlated probabilistic data." Thesis, University of Oxford, 2014. http://ora.ox.ac.uk/objects/uuid:91aa418d-536e-472d-9089-39bef5f62e62.
Full textKamenieva, Iryna. "Research Ontology Data Models for Data and Metadata Exchange Repository." Thesis, Växjö University, School of Mathematics and Systems Engineering, 2009. http://urn.kb.se/resolve?urn=urn:nbn:se:vxu:diva-6351.
Full textFor researches in the field of the data mining and machine learning the necessary condition is an availability of various input data set. Now researchers create the databases of such sets. Examples of the following systems are: The UCI Machine Learning Repository, Data Envelopment Analysis Dataset Repository, XMLData Repository, Frequent Itemset Mining Dataset Repository. Along with above specified statistical repositories, the whole pleiad from simple filestores to specialized repositories can be used by researchers during solution of applied tasks, researches of own algorithms and scientific problems. It would seem, a single complexity for the user will be search and direct understanding of structure of so separated storages of the information. However detailed research of such repositories leads us to comprehension of deeper problems existing in usage of data. In particular a complete mismatch and rigidity of data files structure with SDMX - Statistical Data and Metadata Exchange - standard and structure used by many European organizations, impossibility of preliminary data origination to the concrete applied task, lack of data usage history for those or other scientific and applied tasks.
Now there are lots of methods of data miming, as well as quantities of data stored in various repositories. In repositories there are no methods of DM (data miming) and moreover, methods are not linked to application areas. An essential problem is subject domain link (problem domain), methods of DM and datasets for an appropriate method. Therefore in this work we consider the building problem of ontological models of DM methods, interaction description of methods of data corresponding to them from repositories and intelligent agents allowing the statistical repository user to choose the appropriate method and data corresponding to the solved task. In this work the system structure is offered, the intelligent search agent on ontological model of DM methods considering the personal inquiries of the user is realized.
For implementation of an intelligent data and metadata exchange repository the agent oriented approach has been selected. The model uses the service oriented architecture. Here is used the cross platform programming language Java, multi-agent platform Jadex, database server Oracle Spatial 10g, and also the development environment for ontological models - Protégé Version 3.4.
Shokat, Imran. "Computational Analyses of Scientific Publications Using Raw and Manually Curated Data with Applications to Text Visualization." Thesis, Linnéuniversitetet, Institutionen för datavetenskap och medieteknik (DM), 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:lnu:diva-78995.
Full textXiaojun, Chen, and Premlal Bhattrai. "A Method for Membership Card Generation Based on Clustering and Optimization Models in A Hypermarket." Thesis, Blekinge Tekniska Högskola, Sektionen för datavetenskap och kommunikation, 2011. http://urn.kb.se/resolve?urn=urn:nbn:se:bth-2227.
Full textPietruszewski, Przemyslaw. "Association rules analysis for objects hierarchy." Thesis, Blekinge Tekniska Högskola, Avdelningen för programvarusystem, 2006. http://urn.kb.se/resolve?urn=urn:nbn:se:bth-3512.
Full textp.pietruszewski@op.pl
Kurin, Erik, and Adam Melin. "Data-driven test automation : augmenting GUI testing in a web application." Thesis, Linköpings universitet, Programvara och system, 2013. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-96380.
Full textMacedo, Charles Mendes de. "Aplicação de algoritmos de agrupamento para descoberta de padrões de defeito em software JavaScript." Universidade de São Paulo, 2018. http://www.teses.usp.br/teses/disponiveis/100/100131/tde-29012019-152129/.
Full textApplications developed with JavaScript language are increasing every day, not only for client-side, but also for server-side and for mobile devices. In this context, the existence of tools to identify faults is fundamental in order to assist developers during the evolution of their applications. Most of these tools use a list of predened faults that are discovered from the observation of the programming best practices and developer intuition. To improve these tools, the automatic discovery of faults and code smells is important because it allows to identify which ones actually occur in practice and frequently. A tool that implements a semiautomatic strategy for discovering bug patterns by grouping the changes made during the project development is the BugAID. The objective of this work is to contribute to the BugAID tool, extending this tool with improvements in the extraction of characteristics to be used by the clustering algorithm. The extended module that extracts the characteristics is called BE+. Additionally, an evaluation of the clustering algorithms used for discovering fault patterns in JavaScript software is performed
Åström, Gustav. "Kognitiva tjänster på en myndighet : Förstudie om hur Lantmäteriet kan tillämpa IBM Watson." Thesis, Mittuniversitetet, Avdelningen för informationssystem och -teknologi, 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:miun:diva-30902.
Full textMånga milstolpar har passerats inom datavetenskapen och just nu håller vi på att passera en till: artificiell intelligens. En av de egenskaper som kännetecknar AI är att kunna tolka s.k. ostrukturerad data, alltså sådan data som saknar struktur. Ostrukturerad data vara användbar och med de nya verktygen inom AI är det möjligt att tolka för sedan använda det till att lösa problem. Detta har potential att vara användbart inom praktiska applikationer såsom handläggning och beslutsstöd. Arbetet har skett på företaget Apendo AB som har Lantmäteriet som kund. Arbetet går ut på att undersöka hur AI-drivna kognitiva tjänster genom IBM Watson kan tillämpas på Lantmäteriet. Målet är att besvara följande frågor: Är det möjligt att tillämpa kognitiva tjänster genom Watsons tjänster för att ge beslutsstöd åt Lantmäteriet redan i dagsläget? På vilka sätt kan man använda Watsons tjänster för att skapa ett beslutsstöd? Hur effektiv kan lösningen för Lantmäteriet bli, d.v.s. hur mycket tid och kostnader kan de tänkas spara genom att använda Watsons tjänster på valt koncept? Som praktisk del av studien om AI utvecklades och utvärderades en perceptron. Genom ett agilt förhållningssätt har tester och studier om IBM Watson skett parallellt med intervjuer med anställda på Lantmäteriet. Testerna utfördes i PaaS-tjänsten IBM Bluemix med både Node- RED och egenbyggd webbapplikation. Av intervjuerna blev Watson-tjänsten Retrieve and Rank intressant och undersöktes noggrannare. Med Retrieve and Rank kan man få frågor besvarade genom rankning av stycken av valt korpus som sedan tränas upp för bättre svar. Uppladdning av korpus med tillhörade frågor gav att 75 % av frågorna besvarades korrekt. Tillämpningarna Lantmäteriet kan då vara en kognitiv uppträningsbar sökfunktion som hjälper handläggare att söka information i handböcker och lagboken.
Alsalama, Ahmed. "A Hybrid Recommendation System Based on Association Rules." TopSCHOLAR®, 2013. http://digitalcommons.wku.edu/theses/1250.
Full textTaylor, Phillip. "Data mining of vehicle telemetry data." Thesis, University of Warwick, 2015. http://wrap.warwick.ac.uk/77645/.
Full textKanellopoulos, Yiannis. "Supporting software systems maintenance using data mining techniques." Thesis, University of Manchester, 2008. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.496254.
Full textMaden, Engin. "Data Mining On Architecture Simulation." Master's thesis, METU, 2010. http://etd.lib.metu.edu.tr/upload/2/12611635/index.pdf.
Full textArtchounin, Daniel. "Tuning of machine learning algorithms for automatic bug assignment." Thesis, Linköpings universitet, Programvara och system, 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-139230.
Full textKagdi, Huzefa H. "Mining Software Repositories to Support Software Evolution." Kent State University / OhioLINK, 2008. http://rave.ohiolink.edu/etdc/view?acc_num=kent1216149768.
Full textWang, Grant J. (Grant Jenhorn) 1979. "Algorithms for data mining." Thesis, Massachusetts Institute of Technology, 2006. http://hdl.handle.net/1721.1/38315.
Full textIncludes bibliographical references (p. 81-89).
Data of massive size are now available in a wide variety of fields and come with great promise. In theory, these massive data sets allow data mining and exploration on a scale previously unimaginable. However, in practice, it can be difficult to apply classic data mining techniques to such massive data sets due to their sheer size. In this thesis, we study three algorithmic problems in data mining with consideration to the analysis of massive data sets. Our work is both theoretical and experimental - we design algorithms and prove guarantees for their performance and also give experimental results on real data sets. The three problems we study are: 1) finding a matrix of low rank that approximates a given matrix, 2) clustering high-dimensional points into subsets whose points lie in the same subspace, and 3) clustering objects by pairwise similarities/distances.
by Grant J. Wang.
Ph.D.
Bala, Saimir. "Mining Projects from Structured and Unstructured Data." Jens Gulden, Selmin Nurcan, Iris Reinhartz-Berger, Widet Guédria, Palash Bera, Sérgio Guerreiro, Michael Fellman, Matthias Weidlich, 2017. http://epub.wu.ac.at/7205/1/ProjecMining%2DCamera%2DReady.pdf.
Full textDai, Jianyong. "Detecting malicious software by dynamic execution." Orlando, Fla. : University of Central Florida, 2009. http://purl.fcla.edu/fcla/etd/CFE0002798.
Full textLiebchen, Gernot Armin. "Data cleaning techniques for software engineering data sets." Thesis, Brunel University, 2010. http://bura.brunel.ac.uk/handle/2438/5951.
Full textGu, Zhuoer. "Mining previously unknown patterns in time series data." Thesis, University of Warwick, 2017. http://wrap.warwick.ac.uk/99207/.
Full textSomaraki, Vassiliki. "A framework for trend mining with application to medical data." Thesis, University of Huddersfield, 2013. http://eprints.hud.ac.uk/id/eprint/23482/.
Full textDai, Jianyong. "DETECTING MALICIOUS SOFTWARE BY DYNAMICEXECUTION." Doctoral diss., University of Central Florida, 2009. http://digital.library.ucf.edu/cdm/ref/collection/ETD/id/2849.
Full textPh.D.
School of Electrical Engineering and Computer Science
Engineering and Computer Science
Computer Science PhD
Roberts, J. (Juho). "Iterative root cause analysis using data mining in software testing processes." Master's thesis, University of Oulu, 2016. http://urn.fi/URN:NBN:fi:oulu-201604271548.
Full textKilpailuedun säilyttämiseksi yritysten on pysyttävä ajan tasalla markkinoiden viimeisimpien kehityssuuntien kanssa. Massadata ja sen jatkojalostaminen, eli tiedonlouhinta, ovat tällä hetkellä mm. IT- ja markkinointialan muotisanoja. Esineiden internetin ja viidennen sukupolven matkapuhelinverkon (5G) yleistyessä tiedonlouhinnan merkitys tulee kasvamaan entisestään. Yritysten on kyettävä tunnistamaan luomansa massadatan merkitys omissa toiminnoissaan, ja mietittävä kuinka soveltaa tiedonlouhintamenetelmiä kilpailuedun luomiseksi. Matkapuhelinverkon tukiasemien vika-analyysi on haastavaa tukiasemien monimutkaisen luonteen sekä valtavan datamäärän ulostulon vuoksi. Tämän tutkimuksen tavoitteena on arvioida tiedonlouhinnan soveltuvuutta vika-analyysin edesauttamiseksi. Tämä pro gradu -tutkielma toteutettiin toimintatutkimuksen muodossa matkapuhelinverkon tukiasemia valmistavassa yrityksessä. Tämä pro gradu -tutkielma koostui sekä kirjallisuuskatsauksesta, jossa perehdyttiin siihen, kuinka tiedonlouhintaa on sovellettu vika-analyysissä aikaisemmissa tutkimuksissa että empiirisestä osiosta, jossa esitetään uudenlaista iteratiivista lähestymistapaa vika-analyysiin tiedonlouhintaa hyödyntämällä. Tiedonlouhinta toteutettiin Splunk -nimistä tiedonlouhintatyökalua hyödyntäen, mutta tutkimuksessa esitelty teoria voidaan toteuttaa muitakin työkaluja käyttäen. Tutkimuksessa louhittiin tukiaseman synnyttämiä lokitiedostoja, joista pyrittiin selvittämään, mikä tukiaseman ohjelmistokomponentti esti tukiasemaa saavuttamasta suorituskyvyllisiä laatuvaatimuksia. Tutkimuksen tulokset osoittivat tiedonlouhinnan olevan oivallinen lähestymistapa vika-analyysiin sekä huomattava etu työn tehokkuuden lisäämiseksi verrattuna nykyiseen käsin tehtyyn analyysiin
Poyias, Andreas. "Engineering compact dynamic data structures and in-memory data mining." Thesis, University of Leicester, 2018. http://hdl.handle.net/2381/42282.
Full textKriukov, Illia. "Multi-version software quality analysis through mining software repositories." Thesis, Linnéuniversitetet, Institutionen för datavetenskap och medieteknik (DM), 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:lnu:diva-74424.
Full textKidwell, Billy R. "MiSFIT: Mining Software Fault Information and Types." UKnowledge, 2015. http://uknowledge.uky.edu/cs_etds/33.
Full textTibbetts, Kevin (Kevin Joseph). "Data mining for structure type prediction." Thesis, Massachusetts Institute of Technology, 2004. http://hdl.handle.net/1721.1/34413.
Full textIncludes bibliographical references (p. 41-42).
Determining the stable structure types of an alloy is critical to determining many properties of that material. This can be done through experiment or computation. Both methods can be expensive and time consuming. Computational methods require energy calculations of hundreds of structure types. Computation time would be greatly improved if this large number of possible structure types was reduced. A method is discussed here to predict the stable structure types for an alloy based on compiled data. This would include experimentally observed stable structure types and calculated energies of structure types. In this paper I will describe the state of this technology. This will include an overview of past and current work. Curtarolo et al. showed a factor of three improvement in the number of calculations required to determine a given percentage of the ground state structure types for an alloy system by using correlations among a database of over 6000 calculated energies.I will show correlations among experimentally determined stable structure types appearing in the same alloy system through statistics computed from the Pauling File Inorganic Materials Database Binaries edition. I will compare a method to predict stable structure types based on correlations among pairs of structure types that appear in the same alloy system with a method based simply on the frequency of occurrence of each structure type. I will show a factor of two improvement in the number of calculations required to determine the ground state structure types between these two methods. This paper will examine the potential market value for a software tool used to predict likely stable structure types. A timeline for introduction of this product and an analysis of the market for such a tool will be included. There is no established market for structure type prediction software, but the market will be similar to that of materials database software and energy calculation software.The potential market is small, but the production and maintenance costs are also small. These small costs, combined with the potential of this tool to improve greatly over time, make this a potentially promising investment. These methods are still in development. The key to the value of this tool lies in the accuracy of the prediction methods developed over the next few years.
by Kevin Tibbetts.
M.Eng.
Hu, Weikun. "Overdue invoice forecasting and data mining." Thesis, Massachusetts Institute of Technology, 2016. http://hdl.handle.net/1721.1/104327.
Full textCataloged from PDF version of thesis.
Includes bibliographical references (pages 64-67).
The account receivable is one of the main challenges in the business operation. With poor management of invoice to cash collection process, the over due invoice may pile up, and the increasing amount of unpaid invoice may lead to cash flow problems. In this thesis, I addressed the proactive approach to improving account receivable management using predictive modeling. To complete the task, I built supervised learning models to identity the delayed invoices in advance and made recommendations on improving performance of order to cash collection process. The main procedures of the research work are data cleaning and processing, statistical analysis, building machine learning models and evaluating model performance. The analytical and modeling of the study are based on the real-world invoice data from a Fortune 500 company. The thesis also discussed approaches of dealing with imbalanced data, which includes sampling techniques, performance measurements and ensemble algorithms. The invoice data used in this thesis is imbalanced, because on-time invoice and delayed invoice classes are not approximately equally represented. The cost sensitivity learning techniques demonstrates favorable improvement on classification results. The results of the thesis reveal that the supervised machine learning models can predict the potential late payment of invoice with high accuracy.
by Weikun Hu.
S.M. in Transportation
Kim, Edward Soo. "Data-mining natural language materials syntheses." Thesis, Massachusetts Institute of Technology, 2019. https://hdl.handle.net/1721.1/122075.
Full textThesis: Ph. D., Massachusetts Institute of Technology, Department of Materials Science and Engineering, 2019
Cataloged from student-submitted PDF version of thesis.
Includes bibliographical references.
Discovering, designing, and developing a novel material is an arduous task, involving countless hours of human effort and ingenuity. While some aspects of this process have been vastly accelerated by the advent of first-principles-based computational techniques and high throughput experimental methods, a vast ocean of untapped historical knowledge lies dormant in the scientific literature. Namely, the precise methods by which many inorganic compounds are synthesized are recorded only as text within journal articles. This thesis aims to realize the potential of this data for informing the syntheses of inorganic materials through the use of data-mining algorithms. Critically, the methods used and produced in this thesis are fully automated, thus maximizing the impact for accelerated synthesis planning by human researchers.
There are three primary objectives of this thesis: 1) aggregate and codify synthesis knowledge contained within scientific literature, 2) identify synthesis "driving factors" for different synthesis outcomes (e.g., phase selection) and 3) autonomously learn synthesis hypotheses from the literature and extend these hypotheses to predicted syntheses for novel materials. Towards the first goal of this thesis, a pipeline of algorithms is developed in order to extract and codify materials synthesis information from journal articles into a structured, machine readable format, analogous to existing databases for materials structures and properties. To efficiently guide the extraction of materials data, this pipeline leverages domain knowledge regarding the allowable relations between different types of information (e.g., concentrations often correspond to solutions).
Both unsupervised and supervised machine learning algorithms are also used to rapidly extract synthesis information from the literature. To examine the autonomous learning of driving factors for morphology selection during hydrothermal syntheses, TiO₂ nanotube formation is found to be correlated with NaOH concentrations and reaction temperatures, using models that are given no internal chemistry knowledge. Additionally, the capacity for transfer learning is shown by predicting phase symmetry in materials systems unseen by models during training, outperforming heuristic physically-motivated baseline stratgies, and again with chemistry-agnostic models. These results suggest that synthesis parameters possess some intrinsic capability for predicting synthesis outcomes. The nature of this linkage between synthesis parameters and synthesis outcomes is then further explored by performing virtual synthesis parameter screening using generative models.
Deep neural networks (variational autoencoders) are trained to learn low-dimensional representations of synthesis routes on augmented datasets, created by aggregated synthesis information across materials with high structural similarity. This technique is validated by predicting ion-mediated polymorph selection effects in MnO₂, using only data from the literature (i.e., without knowledge of competing free energies). This method of synthesis parameter screening is then applied to suggest a new hypothesis for solvent-driven formation of the rare TiO₂ phase, brookite. To extend the capability of synthesis planning with literature-based generative models, a sequence-based conditional variational autoencoder (CVAE) neural network is developed. The CVAE allows a materials scientist to query the model for synthesis suggestions of arbitrary materials, including those that the model has not observed before.
In a demonstrative experiment, the CVAE suggests the correct precursors for literature-reported syntheses of two perovskite materials using training data published more than a decade prior to the target syntheses. Thus, the CVAE is used as an additional materials synthesis screening utility that is complementary to techniques driven by density functional theory calculations. Finally, this thesis provides a broad commentary on the status quo for the reporting of written materials synthesis methods, and suggests a new format which improves both human and machine readability. The thesis concludes with comments on promising future directions which may build upon the work described in this document.
by Edward Soo Kim.
Ph. D.
Ph.D. Massachusetts Institute of Technology, Department of Materials Science and Engineering
Wang, Jie. "MATRIX DECOMPOSITION FOR DATA DISCLOSURE CONTROL AND DATA MINING APPLICATIONS." UKnowledge, 2008. http://uknowledge.uky.edu/gradschool_diss/677.
Full textDondero, Robert Michael Jr Hislop Gregory W. "Predicting software change coupling /." Philadelphia, Pa. : Drexel University, 2008. http://hdl.handle.net/1860/2759.
Full text