To see the other types of publications on this topic, follow the link: Panel data analysis and Exploratory Data analysis.

Dissertations / Theses on the topic 'Panel data analysis and Exploratory Data analysis'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 dissertations / theses for your research on the topic 'Panel data analysis and Exploratory Data analysis.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.

1

Park, Jeong Il. "Foreign direct investment and sustainable local economic development: spatial patterns of manufacturing foreign direct investment and its impacts on middle class earnings." Diss., Georgia Institute of Technology, 2014. http://hdl.handle.net/1853/51851.

Full text
Abstract:
Foreign Direct Investment (FDI) in the United States, which predominately occurs in the manufacturing sector, remains critically important for a strong regional and local economy, due to the resulting increase in employment, wages, and tax revenue. Traditionally, local economic development strategies have focused on attracting external manufacturing plants or facilities as the primary route to economic growth, through the expansion of the tax base and/or an increase in employment. In comparison, Sustainable Local Economic Development (SLED) emphasizes the establishment of a minimum standard of living for all and an increase in this standard over time; a reduction in the steady growth in inequality among people; a reduction in spatial inequality; and the promotion and encouragement of sustainable resource use and production (Blakely & Leigh, 2010). These essential SLED principles motivate this study, which will seek to develop a better understanding of whether and how FDI contributes to SLED in terms of its spatial patterns and its impact on middle class earnings. By selecting Georgia as a case study area, this research specifically examines whether and how the location of manufacturing FDI has reduced (or increased) spatial inequality at the intra-state and intra-metropolitan levels. It also identifies whether and how manufacturing FDI has reduced (or increased) inequality among people, focusing on its impact on middle class earnings. This study finds a strong spatial concentration of manufacturing FDI employment in metropolitan areas, particularly in a large metropolitan area, at the intra-state spatial pattern analysis. The results of panel regression analysis suggest that presence of agglomeration economies in metropolitan areas has positively influenced the location of manufacturing FDI jobs. The study also finds a suburbanization pattern of manufacturing FDI employment in the intra-metropolitan spatial pattern analysis. This intra-metropolitan suburbanization of FDI in manufacturing jobs is associated with loss of urban industrial land in the central areas within a large metropolitan area. These uneven distribution patterns of manufacturing FDI jobs indicate increased spatial inequality at both intra-state and intra-metropolitan levels, but the implications of this finding are mixed. Using individual earnings data from the American Community Survey Public Use Microdata Sample files, this study also conducts a quantile regression to estimate the earnings distribution effects that a concentration of manufacturing FDI may have on different earnings groups. The findings both from place-of-work and place-of-residence earnings analysis suggest that manufacturing FDI generally has reduced inequality among people. The concentration of manufacturing FDI in a certain area show the largest distribution effects on area workers in the lower earnings group and residents in the middle earnings group.
APA, Harvard, Vancouver, ISO, and other styles
2

Wu, Shaowen. "Nonstationary panel data analysis." Connect to resource, 1998. http://rave.ohiolink.edu/etdc/view.cgi?acc%5Fnum=osu1261321005.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

He, Xin. "Semiparametric analysis of panel count data." Diss., Columbia, Mo. : University of Missouri-Columbia, 2007. http://hdl.handle.net/10355/4774.

Full text
Abstract:
Thesis (Ph. D.)--University of Missouri-Columbia, 2007.<br>The entire dissertation/thesis text is included in the research.pdf file; the official abstract appears in the short.pdf file (which also appears in the research.pdf); a non-technical general description, or public abstract, appears in the public.pdf file. Title from title screen of research.pdf file (viewed on November 27, 2007) Vita. Includes bibliographical references.
APA, Harvard, Vancouver, ISO, and other styles
4

Romaniuk, Helena. "Analysis of product usage panel data." Thesis, University of Southampton, 1999. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.326798.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Laxminarayan, Parameshvyas. "Exploratory analysis of human sleep data." Worcester, Mass. : Worcester Polytechnic Institute, 2004. http://www.wpi.edu/Pubs/ETD/Available/etd-0119104-120134/.

Full text
Abstract:
Thesis (M.S.)--Worcester Polytechnic Institute.<br>Keywords: association rule mining; logistic regression; statistical significance of rules; window-based association rule mining; data mining; sleep data. Includes bibliographical references (leaves 166-167).
APA, Harvard, Vancouver, ISO, and other styles
6

Walls, L. A. "The exploratory analysis of reliability data." Thesis, Nottingham Trent University, 1987. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.377574.

Full text
Abstract:
The thesis outlines the usual parametric analysis of field failure time data for repairable equipments. Due to shortcomings of this black-box approach, exploratory reliability analysis has been adopted to exploit the available data and so learn more about the physical failure process. Elements of exploratory analysis have appeared in recent statistical applications of point process, time series and multivariate methods in the area. These approaches are reviewed and investigated. Exploratory analysis of much field time between failure and limited repair time data for hardware equipments has been undertaken. Despite being from different physical mechanisms, software failure interval data has the same underlying statistical point process as such hardware data and has been similarly investigated. Simple graphs, often with simulation bounds, inference procedures for nonhomogeneous Poisson processes and Box-Jenkins analysis have been used to search for and model aspects of structure expected in reliability data. The appropriateness of the methods is discussed. As well as revealing that (constant) failure rates are often unsuitable summaries, exploratory analysis has highlighted features previously unknown or ignored. The identified time structures, data irregularities and other complexities are described. Exploratory analysis indicated potential dependent failures. A simulation-based graphical tool for highlighting these important events is described. Applications to real data have shown this is a promising approach. Principal coordinates and cluster analyses have been used to explore multivariate field data for automatic fire detection systems in an attempt to identify circumstances leading to false alarms. Data problems limited this analysis. Exploratory analysis has revealed it is common in reliability to assume a too simplistic model formulation compared with the true complex data structures. The implications of this for reliability data collection. storage and analysis are discussed. While an exploratory approach is generally successful, some specialisation of standard statistical methods for reliability is desirable.
APA, Harvard, Vancouver, ISO, and other styles
7

Mamo, Fikirte. "Economic growth and Inflation : A panel data analysis." Thesis, Södertörns högskola, Institutionen för samhällsvetenskaper, 2012. http://urn.kb.se/resolve?urn=urn:nbn:se:sh:diva-17463.

Full text
Abstract:
One of the most important objectives for any countries is to sustain high economic growth. Even though there are main factors that affect economic growth, the concern of this paper is only about inflation. The relationship between economic growth and inflation is debatable. The first objective of this study is to investigate the relationship between inflation and economic growth. This study uses panel data which includes 13 SSA countries from 1969 to 2009. To analyze the data the model is formed by taking economic growth as dependent variable and four variables (i.e. inflation, investment, population and initial GDP) as independent variables. The result indicates that there is a negative relationship between economic growth and inflation. This study is also examined the causality relationship between economic growth and inflation by using panel Granger causality test. Panel granger causality test shows that inflation can be used in order to predict growth for all countries in the sample, while the opposite it is only true for Congo, Dep. Rep and Zimbabwe.
APA, Harvard, Vancouver, ISO, and other styles
8

Bun, Maurice Josephus Gerardus. "Accurate statistical analysis in dynamic panel data models." [Amsterdam : Amsterdam : Thela Thesis] ; Universiteit van Amsterdam [Host], 2001. http://dare.uva.nl/document/57690.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Müller, Werner, and Michaela Nettekoven. "A Panel Data Analysis: Research & Development Spillover." Department of Statistics and Mathematics, WU Vienna University of Economics and Business, 1998. http://epub.wu.ac.at/620/1/document.pdf.

Full text
Abstract:
Panel data analysis has become an important tool in applied econometrics and the respective statistical techniques are well described in several recent textbooks. However, for an analyst using these methods there remains the task of choosing a reasonable model for the behavior of the panel data. Of special importance is the choice between so-called fixed and random coefficient models. This choice can have a crucial effect on the interpretation of the analyzed phenomenon, which is demonstrated by an application on research and development spillover. (author's abstract)<br>Series: Forschungsberichte / Institut für Statistik
APA, Harvard, Vancouver, ISO, and other styles
10

Karamancı, Kaan. "Exploratory data analysis for preemptive quality control." Thesis, Massachusetts Institute of Technology, 2009. http://hdl.handle.net/1721.1/53126.

Full text
Abstract:
Thesis (M. Eng.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2009.<br>Includes bibliographical references (p. 113).<br>In this thesis, I proposed and implemented a methodology to perform preemptive quality control on low-tech industrial processes with abundant process data. This involves a 4 stage process which includes understanding the process, interpreting and linking the available process parameter and quality control data, developing an exploratory data toolset and presenting the findings in a visual and easily implementable fashion. In particular, the exploratory data techniques used rely on visual human pattern recognition through data projection and machine learning techniques for clustering. The presentation of finding is achieved via software that visualizes high dimensional data with Chernoff faces. Performance is tested on both simulated and real industry data. The data obtained from a company was not suitable, but suggestions on how to collect suitable data was given.<br>by Kaan Karamancı.<br>M.Eng.
APA, Harvard, Vancouver, ISO, and other styles
11

Hossain, Mahmud Shahriar. "Exploratory Data Analysis using Clusters and Stories." Diss., Virginia Tech, 2012. http://hdl.handle.net/10919/28085.

Full text
Abstract:
Exploratory data analysis aims to study datasets through the use of iterative, investigative, and visual analytic algorithms. Due to the difficulty in managing and accessing the growing volume of unstructured data, exploratory analysis of datasets has become harder than ever and an interest to data mining researchers. In this dissertation, we study new algorithms for exploratory analysis of data collections using clusters and stories. Clustering brings together similar entities whereas stories connect dissimilar objects. The former helps organize datasets into regions of interest, and the latter explores latent information by connecting the dots between disjoint instances. This dissertation specifically focuses on five different research aspects to demonstrate the applicability and usefulness of clusters and stories as exploratory data analysis tools. In the area of clustering, we investigate whether clustering algorithms can be automatically "alternatized" and how they can be guided to obtain alternative results using flexible constraints as "scatter-gather" operations. We demonstrate the application of these ideas in many application domains, including studying the bat biosonar system and designing sustainable products. In the area of storytelling, we develop algorithms that can generate stories using distance, clique, and syntactic constraints. We explore the use of storytelling for studying document collections in the biomedical literature and intelligence analysis domain.<br>Ph. D.
APA, Harvard, Vancouver, ISO, and other styles
12

Schröder, Martin. "Exploratory data analysis with non-linear and missing data in geochemistry." Thesis, Aston University, 2009. http://publications.aston.ac.uk/15384/.

Full text
Abstract:
Exploratory analysis of data seeks to find common patterns to gain insights into the structure and distribution of the data. In geochemistry it is a valuable means to gain insights into the complicated processes making up a petroleum system. Typically linear visualisation methods like principal components analysis, linked plots, or brushing are used. These methods can not directly be employed when dealing with missing data and they struggle to capture global non-linear structures in the data, however they can do so locally. This thesis discusses a complementary approach based on a non-linear probabilistic model. The generative topographic mapping (GTM) enables the visualisation of the effects of very many variables on a single plot, which is able to incorporate more structure than a two dimensional principal components plot. The model can deal with uncertainty, missing data and allows for the exploration of the non-linear structure in the data. In this thesis a novel approach to initialise the GTM with arbitrary projections is developed. This makes it possible to combine GTM with algorithms like Isomap and fit complex non-linear structure like the Swiss-roll. Another novel extension is the incorporation of prior knowledge about the structure of the covariance matrix. This extension greatly enhances the modelling capabilities of the algorithm resulting in better fit to the data and better imputation capabilities for missing data. Additionally an extensive benchmark study of the missing data imputation capabilities of GTM is performed. Further a novel approach, based on missing data, will be introduced to benchmark the fit of probabilistic visualisation algorithms on unlabelled data. Finally the work is complemented by evaluating the algorithms on real-life datasets from geochemical projects.
APA, Harvard, Vancouver, ISO, and other styles
13

Cancado, Luciana Pacheco. "Economic growth panel data evidence from Latin America /." Ohio : Ohio University, 2005. http://www.ohiolink.edu/etd/view.cgi?ohiou1127143858.

Full text
APA, Harvard, Vancouver, ISO, and other styles
14

Lund, Christensen Mette. "Essays in empirical demand analysis : evidence from panel data /." Copenhagen, 2005. http://www.gbv.de/dms/zbw/500354073.pdf.

Full text
APA, Harvard, Vancouver, ISO, and other styles
15

Salazar, Llano Lorena. "Portraying urban diversity patterns through exploratory data analysis." Doctoral thesis, Universitat Politècnica de Catalunya, 2019. http://hdl.handle.net/10803/668423.

Full text
Abstract:
This thesis analyzes the complexity of the urban system, being described with multiple variables that represent the environmental, economic, and social characters of the city. The portrayal of the urban diversity and its relationship with a better response of the city to disturbances, hence to its sustainability, is the main motivation of the study. Certainly, this thesis aims to provide theoretical knowledge through the application of statistical and computational methodologies that are developed progressively in its chapters. Beginning with the introduction, which draws the city as an abstract urban system and reviews the concepts and measures of diversity within the theoretical frameworks of sustainability, urban ecology, and complex systems theory. Afterward, the city of Barcelona is introduced as the case study: it is constituted by a set of districts and represented by an information system that contains temporal measurements of multiple environmental, economic, and social variables. A first approach to the sustainability of the city is made with the entropy of information as a measure of the urban system's diversity. But the fundamental contribution of the thesis focuses on the application of loratory Multivariate Analysis (EMA) to the urban system: Principal Component Analysis (PCA), Multiple Factorial Analysis (MFA), and Hierarchical Cluster Analysis (HCA). From this EMA approach, diversity is analyzed by identifying the similarity -or dissimilarity- between the different parts that make up the urban system. Some other techniques based on computer science and physics are proposed to evaluate the temporal transformation of the urban system, understood as a three-dimensional data cloud that gradually deforms. Differentiated characters and distinctive functions of districts are identifiable in the EMA application to the case study. Moreover, the temporal dependency of the dataset reveals information about the district's differentiation or homogenization trends. Finally, the conclusions of the most relevant results are presented and some future lines of research are proposed.<br>Esta tesis analiza la complejidad del sistema urbano, descrito con múltiples variables que representan las características ambientales, económicas y sociales de la ciudad. La motivación fundamental para emprender este estudio consiste en describir la diversidad de la ciudad y su relación con una mejor respuesta a perturbaciones y amenazas, y por lo tanto, a su sostenibilidad. La tesis plantea aportar conocimiento teórico mediante la aplicación de metodologías estadísticas y computacionales que se desarrollan progresivamente en sus capítulos. En la introducción se presenta la abstracción de la ciudad como un sistema urbano, y se hace una revisión de los conceptos y medidas de la diversidad dentro de los marcos teóricos de la sostenibilidad, la ecología urbana y la teoría de los sistemas complejos. Posteriormente, se introduce el sistema urbano de la ciudad de Barcelona, constituido por un conjunto de distritos y representado mediante un sistema de información que contiene mediciones temporales de múltiples variables ambientales, económicas y sociales. Se hace una primera aproximación a la sostenibilidad de la ciudad empleando la entropía de la información como medida de diversidad del sistema urbano. Pero el aporte fundamental de la tesis se centra en la aplicación del Análisis Exploratorio Multivariante (EMA) en el sistema urbano: Análisis de Componentes principales (PCA), Análisis Factorial Múltiple (MFA) y Análisis de Agrupamiento Jerárquico (HCA). Desde dicho enfoque se analiza la diversidad identificando la similaridad -o disimilaridad- entre las distintas partes que componen el sistema urbano. Se plantean también algunas de las técnicas de las ciencias de la computación y la física para evaluar la transformación temporal del sistema urbano, entendido como una nube de datos tridimensionales que se deforma gradualmente. En el análisis del estudio de caso se identifican características diferenciadas y funciones distintivas de los distritos. Además, la dependencia temporal del conjunto de datos revela información sobre las tendencias de diferenciación u homogeneización de los distritos. Finalmente, se exponen las conclusiones de los resultados más relevantes y se enuncian algunas líneas futuras de investigaciónes
APA, Harvard, Vancouver, ISO, and other styles
16

Tenev, Tichomir Gospodinov. "SpreadCube--a visualization tool for exploratory data analysis." Thesis, Massachusetts Institute of Technology, 1997. http://hdl.handle.net/1721.1/43924.

Full text
Abstract:
Thesis (M. Eng.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 1997.<br>Includes bibliographical references (p. 153-154).<br>by Tichomir Gospodinov Tenev.<br>M.Eng.
APA, Harvard, Vancouver, ISO, and other styles
17

Hopkins, Julie Anne. "Sampling designs for exploratory multivariate analysis." Thesis, University of Sheffield, 2000. http://etheses.whiterose.ac.uk/14798/.

Full text
Abstract:
This thesis is concerned with problems of variable selection, influence of sample size and related issues in the applications of various techniques of exploratory multivariate analysis (in particular, correspondence analysis, biplots and canonical correspondence analysis) to archaeology and ecology. Data sets (both published and new) are used to illustrate these methods and to highlight the problems that arise - these practical examples are returned to throughout as the various issues are discussed. Much of the motivation for the development of the methodology has been driven by the needs of the archaeologists providing the data, who were consulted extensively during the study. The first (introductory) chapter includes a detailed description of the data sets examined and the archaeological background to their collection. Chapters Two, Three and Four explain in detail the mathematical theory behind the three techniques. Their uses are illustrated on the various examples of interest, raising data-driven questions which become the focus of the later chapters. The main objectives are to investigate the influence of various design quantities on the inferences made from such multivariate techniques. Quantities such as the sample size (e.g. number of artefacts collected), the number of categories of classification (e.g. of sites, wares, contexts) and the number of variables measured compete for fixed resources in archaeological and ecological applications. Methods of variable selection and the assessment of the stability of the results are further issues of interest and are investigated using bootstrapping and procrustes analysis. Jack-knife methods are used to detect influential sites, wares, contexts, species and artefacts. Some existing methods of investigating issues such as those raised above are applied and extended to correspondence analysis in Chapters Five and Six. Adaptions of them are proposed for biplots in Chapters Seven and Eight and for canonical correspondence analysis in Chapter Nine. Chapter Ten concludes the thesis.
APA, Harvard, Vancouver, ISO, and other styles
18

Chintagunta, Pradeep Kumar. "Issues in panel data analysis a theoretical and empirical investigation /." access full-text, 1990. http://libweb.cityu.edu.hk/cgi-bin/ezdb/umi-r.pl?9114533.pdf.

Full text
APA, Harvard, Vancouver, ISO, and other styles
19

Bams, Wilhelmus Fransiscus Maria. "The term structure of interest rates a panel data analysis /." [Maastricht : Maastricht : Universiteit Maastricht] ; University Library, Maastricht University [Host], 1999. http://arno.unimaas.nl/show.cgi?fid=6719.

Full text
APA, Harvard, Vancouver, ISO, and other styles
20

Fu, Bo, and 傅博. "Some topics in longitudinal data analysis and panel time seriesmodels." Thesis, The University of Hong Kong (Pokfulam, Hong Kong), 2003. http://hub.hku.hk/bib/B31244166.

Full text
APA, Harvard, Vancouver, ISO, and other styles
21

Zhang, Miao. "The comparison of stochastic frontier analysis with panel data models." Thesis, Loughborough University, 2012. https://dspace.lboro.ac.uk/2134/9643.

Full text
Abstract:
From the idea of efficiency raised by Koopmans in 1951, and the panel data first introduced into the efficiency analysis by Pitt and Lee (1981) and Schmidt and Sickles (1984), the techniques of stochastic frontier analysis are fast developed and the applications of stochastic frontier are widely used in different areas, such as education, industry and hospital. But most researchers focus on only one aspect, either the development of new models or empirical applications. This thesis attempts to fill the gap to get a general idea of the properties of different panel data stochastic frontier models, on both statistical aspects and economic aspects, by the comparison of different models applied to different production applications. The thesis is also attempt to shed light on whether particular panel data stochastic frontier models are better suited to different data sets. The models selected capture the simplest situation, with no heterogeneity or heteroscedasticity, and complicated ones, with exogenous variables included in the models. Not only the classical models, such as the Pitt and Lee (1981) and Battese and Coelli (1992.1995), but also the new developed models, such as the latent class model and fixed management model are detected in the thesis. On the economic aspect, the data selected captures both microeconomic and macroeconomic, with the application to the World GDP and the Italian manufacturing industry. The results show that: the panel data stochastic frontier models perform better on the microeconomic level than on the macroeconomic level; the classical models perform better than the new developed ones; some panel data stochastic frontier models make ideal assumptions but the requirements to the dataset are hard to achieve; that the influence from the exogenous variables is quite strong.
APA, Harvard, Vancouver, ISO, and other styles
22

Vertua, Stefano. "DETERMINANTS OF THE PREVALENCE OF UNICORNS : A panel data analysis." Thesis, Umeå universitet, Nationalekonomi, 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:umu:diva-188020.

Full text
Abstract:
The objective of this study is to find out the determinants of the prevalence of unicorns per Country. In order to do that, I have collected data from 45 countries in two time periods: from 2010 until 2018 and from 2019 until April 2021. The independent variables used are the GDP per capita, the ease of doing business, the network readiness index, global innovation index and the expenditure on R&amp;D as a percentage of the GDP. The analysis was carried out using panel data. The main results were that the expenditure on r&amp;d is, by a large margin, the factor that contributes the most to the number of unicorns per capita. It is followed by the global innovation index and the GDP per capita.
APA, Harvard, Vancouver, ISO, and other styles
23

Sakaguchi, Shosei. "Essays on Econometric Methods for Panel and Duration Data Analysis." Kyoto University, 2018. http://hdl.handle.net/2433/232205.

Full text
APA, Harvard, Vancouver, ISO, and other styles
24

Nagaraj, Eashwar. "Skilled Immigration and the Great Recession: A Panel Data Analysis." Miami University / OhioLINK, 2020. http://rave.ohiolink.edu/etdc/view?acc_num=miami1578473970490175.

Full text
APA, Harvard, Vancouver, ISO, and other styles
25

Ahunov, Husanboy, and Andreas Eriksson. "Sustainability Reporting by Swedish Family Firms : A Panel Data Analysis." Thesis, Uppsala universitet, Företagsekonomiska institutionen, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-387514.

Full text
Abstract:
Introduction - Sustainability reporting is becoming more and more important for businesses all around the world. Extant empirical literature investigating the relationship between family status and sustainability reporting provides inconclusive results. No previous studies investigated this association in the Swedish setting. Purpose - The purpose of this study is to investigate how family control and influence affects sustainability reporting behavior of Swedish listed firms. Theoretical framework – Sustainability disclosures are considered as effective means for companies to communicate with their stakeholders. Family firms are more concerned about their internal and external stakeholders in order to protect family’s socioemotional endowments. Methodology design – We use panel data on Swedish listed firms over the period of 2008-2015. We analyze data with random-effects ordered probit regression for panel data. Empirical findings - When we treat all family firms as homogenous, there are no statistically significant differences in the levels of reports of family and non-family firms. However, when we take into account internal contexts of family firms, we find that a family member(s) in top management or a family CEO make family firms more transparent about their sustainability performance. Conclusion – We document that presence of a family top manager(s) or of a family CEO is associated with higher level of details of sustainability reports. Family top managers are more likely to be concerned about internal and external stakeholders to preserve the family’s SEW.
APA, Harvard, Vancouver, ISO, and other styles
26

Carter, Jason W. "Testing effectiveness of genetic algorithms for exploratory data analysis." Thesis, Monterey, California. Naval Postgraduate School, 1997. http://hdl.handle.net/10945/9065.

Full text
Abstract:
Approved for public release; distribution is unlimited<br>Heuristic methods of solving exploratory data analysis problems suffer from one major weakness - uncertainty regarding the optimality of the results. The developers of DaMI (Data Mining Initiative), a genetic algorithm designed to mine the CCEP (Comprehensive Clinical Evaluation Program) database in the search for a Persian Gulf War syndrome, proposed a method to overcome this weakness: reproducibility -- the conjecture that consistent convergence on the same solutions is both necessary and sufficient to ensure a genetic algorithm has effectively searched an unknown solution space. We demonstrate the weakness of this conjecture in light of accepted genetic algorithm theory. We then test the conjecture by modifying the CCEP database with the insertion of an interesting solution of known quality and performing a discovery session using DaMI on this modified database. The necessity of reproducibility as a terminating condition is falsified by the algorithm finding the optimal solution without yielding strong reproducibility. The sufficiency of reproducibility as a terminating condition is analyzed by manual examination of the CCEP database in which strong reproducibility was experienced. Ex post facto knowledge of the solution space is used to prove that DaMI had not found the optimal solutions though it gave strong reproducibility, causing us to reject the conjecture that strong reproducibile is a sufficient terminating condition.
APA, Harvard, Vancouver, ISO, and other styles
27

寛康, 阿部, and Hiroyasu Abe. "Extensions of nonnegative matrix factorization for exploratory data analysis." Thesis, https://doors.doshisha.ac.jp/opac/opac_link/bibid/BB13001149/?lang=0, 2017. https://doors.doshisha.ac.jp/opac/opac_link/bibid/BB13001149/?lang=0.

Full text
Abstract:
非負値行列因子分解(NMF)は,全要素が非負であるデータ行列に対する行列分解法である.本論文では,実在するデータ行列に頻繁に見られる特徴や解釈容易性の向上を考慮に入れ,探索的にデータ分析を行うためのNMFの拡張について論じている.具体的には,零過剰行列や外れ値を含む行列を扱うための確率分布やダイバージェンス,さらには分解結果である因子行列の数や因子行列への直交制約について述べている.<br>Nonnegative matrix factorization (NMF) is a matrix decomposition technique to analyze nonnegative data matrices, which are matrices of which all elements are nonnegative. In this thesis, we discuss extensions of NMF for exploratory data analysis considering common features of a real nonnegative data matrix and an easy interpretation. In particular, we discuss probability distributions and divergences for zero-inflated data matrix and data matrix with outliers, two-factor vs. three-factor, and orthogonal constraint to factor matrices.<br>博士(文化情報学)<br>Doctor of Culture and Information Science<br>同志社大学<br>Doshisha University
APA, Harvard, Vancouver, ISO, and other styles
28

Foley, Michael. "An Exploratory Statistical Analysis of NASDAQ Provided Trade Data." ScholarWorks @ UVM, 2014. http://scholarworks.uvm.edu/graddis/295.

Full text
Abstract:
Since Benoit Mandelbrot's discovery of the fractal nature of financial price series, the quantitative analysis of financial markets has been an area of increasing interest for scientists, traders, and regulators. Further, major technological advances over this time have facilitated not only financial innovations, but also the computational ability to analyze and model markets. The stylized facts are qualitative statistical signatures of financial market data that hold true across different stocks and over many different timescales. In pursuit of a mechanistic understanding of markets, we look to accurately quantify such statistics. With this quantification, we can test computational market models against the stylized facts and run controlled experiments. This requires both discovery of new stylized facts, and a persistent testing of old stylized facts on new data. Using NASDAQ provided data covering the years 2008-2009, we analyze the trades of 120 stocks. An analysis of the stylized facts guides our exploration of the data, where our results corroborate other findings in the existing body of literature. In addition, we search for statistical indicators of market instability in our data sets. We find promising areas for further study, and obtain three key results. Throughout our sample data, high frequency trading plays a larger role in rapid price changes of all sizes than would be randomly expected, but plays a smaller role than usual during rapid price changes of large magnitude. Our analysis also yields further evidence of the long term persistence in the autocorrelations of signed order flow, as well as evidence of long range dependence in price returns.
APA, Harvard, Vancouver, ISO, and other styles
29

Serlenga, Laura. "Three essays on the panel data approach to an analysis of economics and financial data." Thesis, University of Edinburgh, 2004. http://hdl.handle.net/1842/25172.

Full text
Abstract:
This thesis provides an extension of panel data models on the analysis of Economics and Finance Data, discusses methods of estimation and evaluation for such models and presents empirical applications. The thesis consists in three Chapters.  The first Chapter proposes three alternative approaches to test the Permanent Income Hypothesis (PIH) in the context of dynamic panels: the aggregate consumption approach, the Euler equation approach and finally Friedman (1957)’s original characteristic tests. The empirical evidence, using the British Household Panel Survey (BHPS) data, strongly supports the PIH. Hence, the analysis presented can be considered as supporting the view that empirical tests of PIH, based on aggregate time-series data, might suffer from misspecification or overlook some fundamental characteristics of micro data. The second Chapter addresses the issue of testing for factor price misspecification via a panel data approach. A theoretically coherent framework based on panel data techniques has been constructed. This allows for both the homogeneous and heterogeneous parameters that are present when testing for anomalies in factor pricing models. The tests presented have a clear advantage over the traditional two-pass based tests because they do not suffer from the errors in variable problem and have all the usual desirable asymptotic properties associated with the maximum likelihood approach. The empirical illustration shows that book to market equity and market value firm specific characteristics help explain asset returns in the UK over 1968-2002 even when all three of Fama and French’s factors are present. This finding, which is in contrast to much of the literature, may be due to the improved efficiency of our estimates and power of our tests relative to those based on the two-pass method predominant in the existing literature. Finally, we find that the overall significance of firm size is attributable to its importance during the 1980’s subsample, a period of relatively calm stock market growth. Lastly the third Chapter presents an application of Hausman-Taylor estimation in heterogeneous panels with time-specific common factors to gravity models of intra-EU trade. As an extension to the Hausman-­Taylor procedure, an efficient instrumental variable estimation of a panel data, which includes time-­specific common factors and their heterogeneous individual parameters, is presented. The underlying econometric techniques are developed and an alternative source of instruments is suggested. This methodology is applied to gravity models for international flows of trade using data on fifteen European countries over 42 years (1960-2001). Following the most recent developments of the literature, a complete analysis of the sources of bilateral trade amongst European countries is presented using three different specifications. The final model is also originally extended in order to allow for observed common factors and their heterogeneous parameters. The empirical evidence confirms the effectiveness of the gravity model in explaining international trade flows. Results also encourage the use of our extended approach as a valid alternative to the basic time dummy specification.
APA, Harvard, Vancouver, ISO, and other styles
30

Yuan, Yuan. "Bayesian Conjoint Analyses with Multi-Category Consumer Panel Data." University of Cincinnati / OhioLINK, 2021. http://rave.ohiolink.edu/etdc/view?acc_num=ucin162766827512258.

Full text
APA, Harvard, Vancouver, ISO, and other styles
31

Dzansi, James Yao. "Human Capital and Economic Performance : Empirical evidence from Panel Data Analysis." Thesis, Jönköping University, Jönköping International Business School, 2005. http://urn.kb.se/resolve?urn=urn:nbn:se:hj:diva-273.

Full text
APA, Harvard, Vancouver, ISO, and other styles
32

Akinc, Deniz. "Statistical Modelling Of Financial Statements Of Turkey: A Panel Data Analysis." Master's thesis, METU, 2008. http://etd.lib.metu.edu.tr/upload/2/12609824/index.pdf.

Full text
Abstract:
Financial failure is an important subject for both the economical development of the country and for the self - evaluation of individual companies. Increase in the number of financially failed companies points out the misuse of the country resources. Recently, financial failure threatens both small and large companies in Turkey. It is important to determine factors that affect the financial failure by analyzing models and to use these models for auditing the financial situation. In today&rsquo<br>s Turkey, the statistical methods that are used for this purpose involve single level models applied to cross-sectional data. However, multilevel models applied to panel data are more preferable as they gather more information, and also, enable the calculated financial success probabilities to be more trustworthy. In this thesis, publicly available panel data that are collected from The Istanbul Stock Exchange are investigated. Mainly, financial success of companies from two sectors, namely industry and services, are investigated. For the analysis of this panel data, data exploration methods, missing data imputation, possible solutions to multicollinearity problem, single level logistic regression models and multilevel models are used. By these models, financial success probabilities for each company are calculated<br>the factors related to the financial failure are determined, and changes in time are observed. Models and early warning systems resulted in correct classification rates of up to 100%. In the services sector, a small number of companies having publicly available data result in a decline in the success of models. It is concluded that sharing data with more subjects observed in a longer time period collected in the same format with academicians, will result in better justified outputs, which are useful for both academicians and managers.
APA, Harvard, Vancouver, ISO, and other styles
33

Melku, Semere M. "Exchange Rate Volatility and Foreign Direct Investment : A Panel Data Analysis." Thesis, Södertörns högskola, Institutionen för samhällsvetenskaper, 2012. http://urn.kb.se/resolve?urn=urn:nbn:se:sh:diva-16995.

Full text
Abstract:
This thesis examines both the long run and the short run impact of Exchange Rate Volatility on Foreign Direct Investment using an unbalanced panel data from three Sub-Saharan African countries of Kenya, Uganda and Tanzania. This is accomplished by generating Exchange Rate Volatility figures by the GARCH(1,1) methodology. The control variables included in this study include GDP, GDP growth, Economic Openness and Exchange rate. In order to capture the impact of economic openness on exchange rate volatility and thus foreign direct investment, different econometric specifications are adopted. The unbalanced panel data used in the analysis ranges for different time period for the specific countries considered in the panel.
APA, Harvard, Vancouver, ISO, and other styles
34

Son, Sang-ik. "Financial sector development and economic growth: evidence from panel data analysis." Thesis, University of Manchester, 2003. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.488435.

Full text
APA, Harvard, Vancouver, ISO, and other styles
35

Adnan, Arisman. "Analysis of taste-panel data using ANOVA and ordinal logistic regression." Thesis, University of Newcastle Upon Tyne, 2002. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.402150.

Full text
APA, Harvard, Vancouver, ISO, and other styles
36

Fernández-Val, Iván. "Three essays on nonlinear panel data models and quantile regression analysis." Thesis, Massachusetts Institute of Technology, 2005. http://hdl.handle.net/1721.1/32408.

Full text
Abstract:
Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Economics, 2005.<br>Includes bibliographical references.<br>This dissertation is a collection of three independent essays in theoretical and applied econometrics, organized in the form of three chapters. In the first two chapters, I investigate the properties of parametric and semiparametric fixed effects estimators for nonlinear panel data models. The first chapter focuses on fixed effects maximum likelihood estimators for binary choice models, such as probit, logit, and linear probability model. These models are widely used in economics to analyze decisions such as labor force participation, union membership, migration, purchase of durable goods, marital status, or fertility. The second chapter looks at generalized method of moments estimation in panel data models with individual-specific parameters. An important example of these models is a random coefficients linear model with endogenous regressors. The third chapter (co-authored with Joshua Angrist and Victor Chernozhukov) studies the interpretation of quantile regression estimators when the linear model for the underlying conditional quantile function is possibly misspecified.<br>by Iván Fernández-Val.<br>Ph.D.
APA, Harvard, Vancouver, ISO, and other styles
37

Akhter, Md Selim. "Financial soundness and development a multi-country analysis using panel data /." View thesis, 2008. http://handle.uws.edu.au:8081/1959.7/41341.

Full text
Abstract:
Thesis (Ph.D.)--University of Western Sydney, 2008.<br>A thesis submitted to the University of Western Sydney, College of Business, School of Economics and Finance, in fulfillment of the requirements for the degree of Doctor of Philosophy. Includes bibliographical references.
APA, Harvard, Vancouver, ISO, and other styles
38

Hossain, Md Jobaer. "“Factors Influencing FDI Inflows in SouthAsian Countries: A Panel Data Analysis”." Thesis, KTH, Skolan för industriell teknik och management (ITM), 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-263960.

Full text
Abstract:
Foreign direct investment (FDI) is played a vital role for boosting up the economies of developing countries. Hence, it is necessary to know the factors that determines the flows of FDI in the developing countries. This study has attempted to investigate how different factors affect the inflow of foreign direct investment in South Asian Countries. To attain the objective this study has collected data on the respective variables for 45 years and considered seven countries. The relationship between different economic variables and their overall impact on FDI inflows have been examined through various panel models like basic pooled OLS estimation, entity fixed effect model, time fixed effect estimation and random effect model. The outcome of this study is that GDP of the country is the main factor behind the FDI inflows in South Asian countries.
APA, Harvard, Vancouver, ISO, and other styles
39

Lam, Heidi Lap Mun. "Visual exploratory analysis of large data sets : evaluation and application." Thesis, University of British Columbia, 2008. http://hdl.handle.net/2429/839.

Full text
Abstract:
Large data sets are difficult to analyze. Visualization has been proposed to assist exploratory data analysis (EDA) as our visual systems can process signals in parallel to quickly detect patterns. Nonetheless, designing an effective visual analytic tool remains a challenge. This challenge is partly due to our incomplete understanding of how common visualization techniques are used by human operators during analyses, either in laboratory settings or in the workplace. This thesis aims to further understand how visualizations can be used to support EDA. More specifically, we studied techniques that display multiple levels of visual information resolutions (VIRs) for analyses using a range of methods. The first study is a summary synthesis conducted to obtain a snapshot of knowledge in multiple-VIR use and to identify research questions for the thesis: (1) low-VIR use and creation; (2) spatial arrangements of VIRs. The next two studies are laboratory studies to investigate the visual memory cost of image transformations frequently used to create low-VIR displays and overview use with single-level data displayed in multiple-VIR interfaces. For a more well-rounded evaluation, we needed to study these techniques in ecologically-valid settings. We therefore selected the application domain of web session log analysis and applied our knowledge from our first three evaluations to build a tool called Session Viewer. Taking the multiple coordinated view and overview + detail approaches, Session Viewer displays multiple levels of web session log data and multiple views of session populations to facilitate data analysis from the high-level statistical to the low-level detailed session analysis approaches. Our fourth and last study for this thesis is a field evaluation conducted at Google Inc. with seven session analysts using Session Viewer to analyze their own data with their own tasks. Study observations suggested that displaying web session logs at multiple levels using the overview + detail technique helped bridge between high-level statistical and low-level detailed session analyses, and the simultaneous display of multiple session populations at all data levels using multiple views allowed quick comparisons between session populations. We also identified design and deployment considerations to meet the needs of diverse data sources and analysis styles.
APA, Harvard, Vancouver, ISO, and other styles
40

Wu, Ying. "Non-standard adaptation of linear projections for exploratory data analysis." Thesis, University of the West of Scotland, 2008. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.742771.

Full text
APA, Harvard, Vancouver, ISO, and other styles
41

Evaneshko, Veronica. "Exploratory data analysis of type II diabetes among Navajo Indians." Thesis, The University of Arizona, 1988. http://hdl.handle.net/10150/276762.

Full text
Abstract:
This research explicated the use of exploratory data analysis in describing type II diabetes mellitus among the Navajo Indians. A sample of 98 diagnosed diabetics was obtained from a retrospective chart review and had a 1.3:1 female to male ratio, a median age of 58.6 years, and a mean duration for diabetes of 7.66 years. Other characteristics included a median age at diagnosis of 50 years, a median weight prior to diagnosis (expressed in percent desired weight) of 140%, and a median blood glucose value at time of diagnosis of 241 mg/dl. The distribution patterns for age, weight, and blood glucose revealed several asymmetry problems which had implications for the appropriateness of using parametric statistics in numerical summarizations. Bivariate analyses revealed a negative association between age at diagnosis and percent desired weight prior to diagnosis. This finding identifies the risk that obesity brings to the young and that aging brings to the non-obese, Navajo Indian.
APA, Harvard, Vancouver, ISO, and other styles
42

Tekieh, Mohammad Hossein. "Analysis of Healthcare Coverage Using Data Mining Techniques." Thèse, Université d'Ottawa / University of Ottawa, 2012. http://hdl.handle.net/10393/20547.

Full text
Abstract:
This study explores healthcare coverage disparity using a quantitative analysis on a large dataset from the United States. One of the objectives is to build supervised models including decision tree and neural network to study the efficient factors in healthcare coverage. We also discover groups of people with health coverage problems and inconsistencies by employing unsupervised modeling including K-Means clustering algorithm. Our modeling is based on the dataset retrieved from Medical Expenditure Panel Survey with 98,175 records in the original dataset. After pre-processing the data, including binning, cleaning, dealing with missing values, and balancing, it contains 26,932 records and 23 variables. We build 50 classification models in IBM SPSS Modeler employing decision tree and neural networks. The accuracy of the models varies between 76% and 81%. The models can predict the healthcare coverage for a new sample based on its significant attributes. We demonstrate that the decision tree models provide higher accuracy that the models based on neural networks. Also, having extensively analyzed the results, we discover the most efficient factors in healthcare coverage to be: access to care, age, poverty level of family, and race/ethnicity.
APA, Harvard, Vancouver, ISO, and other styles
43

Yu, Jihai. "Essays on spatial dynamic panel data model theories and applications /." Columbus, Ohio : Ohio State University, 2007. http://rave.ohiolink.edu/etdc/view?acc%5Fnum=osu1179767430.

Full text
APA, Harvard, Vancouver, ISO, and other styles
44

Mabhena, Rejoice. "An application of synthetic panel data to poverty analysis in South Africa." Thesis, University of the Western Cape, 2019. http://hdl.handle.net/11394/7801.

Full text
Abstract:
Doctor Educationis<br>There is a wide-reaching consensus that data required for poverty analysis in developing countries are inadequate. Concerns have been raised on the accuracy and adequacy of household surveys, especially those emanating from Sub-Saharan Africa. Part of the debate has hinted on the existence of a statistical tragedy, but caution has also been voiced that African statistical offices are not similar and some statistical offices having stronger statistical capacities than others. The use of generalizations therefore fails to capture these variations. This thesis argues that African statistical offices are facing data challenges but not necessarily to the extent insinuated. In the post-1995 period, there has been an increase in the availability of household surveys from developing countries. This has also been accompanied by an expansion of poverty analyses efforts. Despite this surge in data availability, available household survey data remain inadequate in meeting the demand to answer poverty related enquiry. What is also evident is that cross sectional household surveys were conducted more extensively than panel data. Resultantly the paucity of panel data in developing counties is more pronounced. In South Africa, a country classified as ‘data rich’ in this thesis, there exists inadequate panel surveys that are nationally representative and covers a comprehensive period in the post-1995 period. Existing knowledge on poverty dynamics in the country has relied mostly on the use of the National Income Dynamic Study, KwaZulu Natal Dynamic Study and smaller cohort-based panels such as the Birth to Twenty and Birth to Ten cohort studies that have rarely been used in the analysis of poverty dynamics. Using mixed methods, this thesis engages these data issues. The qualitative component of this thesis uses key informants from Statistics South Africa and explores how the organization has measured poverty over the years. A historical background on the context of statistical conduct in the period before 1995 shows the shaky foundation that characterised statistical conduct in the country at the inception of Statistics South Africa in 1995. The organization since then has expanded its efforts in poverty measurement; partly a result of the availability of more household survey data. Improvements within the organization also are evidenced by the emergence of a fully-fledged Poverty and Inequality division within the organization. The agency has managed to embrace the measurement of multidimensional poverty. Nevertheless, there are issues surrounding xv available poverty related data. Issues of comparability affect poverty analysis, and these are discussed in this thesis. The informants agreed that there is need for more analysis of poverty using available surveys in South Africa. Against this backdrop, the use of pseudo panels to analyse poverty dynamics becomes an attractive option. Given the high costs associated with the conduct of panel surveys, pseudo panels are not only cost effective, but they enable the analysis of new research questions that would not be possible using existing data in its traditional forms. Elsewhere, pseudo panels have been used in the analysis of poverty dynamics in the absence of genuine panel data and the results have proved their importance. The methodology used to generate the pseudo panel in this thesis borrows from previous works including the work of Deaton and generates 13 birth cohorts using the Living Conditions Surveys of 2008/9 and 2014/15 as well as the IES of 2010. The birth cohorts under a set of given assumptions are ‘tracked’ in these three time periods. The thesis then analysed the expenditure patterns and poverty rates of birth cohorts. The findings suggested that in South Africa, expenditures are driven mostly with incomes from the labour market and social grants. The data however did not have adequate and comparative variables on the types of employment to further explore this debate. It also emerged that birth cohorts with male headship as well as birth cohorts in urban settlements and in White and Indian households have a higher percentage share of their income coming from labour market sources. On the other hand, birth cohorts with female headship and residing in rural, African and in Coloured households are more reliant on social grants. The majority of recipients of social grants receive the Child Social Grant and its minimalist value partly explains why birth cohorts reporting social grants as their main source of income are more likely to be poor when compared to birth cohorts who mostly earn their income from the labour market. Residing in a female-headed household, or in a rural area as well as in Black African and Coloured increases the chances of experiencing poverty. This supports existing knowledge on poverty in South Africa and confirms that these groups are deprived. The results of the pseudo panel analysis also show that poverty reduced between 2006 and 2011 for most birth cohorts but increased in 2015. Policy recommendations to reduce poverty therefore lie in the labour market. However, given the high levels of unemployment in the country today, more rigorous labour incentives are required.
APA, Harvard, Vancouver, ISO, and other styles
45

Kang, HyeJung. "Consolidation and productivity in Korean agriculture : analysis of farm-level panel data /." For electronic version search Digital dissertations database. Restricted to UC campuses. Access is free to UC campus dissertations, 2004. http://uclibs.org/PID/11984.

Full text
APA, Harvard, Vancouver, ISO, and other styles
46

Conti, Gabriella, Sylvia Frühwirth-Schnatter, James J. Heckman, and Rémi Piatek. "Bayesian exploratory factor analysis." Elsevier, 2014. http://dx.doi.org/10.1016/j.jeconom.2014.06.008.

Full text
Abstract:
This paper develops and applies a Bayesian approach to Exploratory Factor Analysis that improves on ad hoc classical approaches. Our framework relies on dedicated factor models and simultaneously determines the number of factors, the allocation of each measurement to a unique factor, and the corresponding factor loadings. Classical identification criteria are applied and integrated into our Bayesian procedure to generate models that are stable and clearly interpretable. A Monte Carlo study confirms the validity of the approach. The method is used to produce interpretable low dimensional aggregates from a high dimensional set of psychological measurements. (authors' abstract)
APA, Harvard, Vancouver, ISO, and other styles
47

Taylor, Daniel. "Evaluating Spatial Variability of Precipitation in Kentucky with Exploratory Data Analysis." TopSCHOLAR®, 2004. http://digitalcommons.wku.edu/theses/540.

Full text
Abstract:
Spatial variability of precipitation is examined over the state of Kentucky and surrounding areas. The study focuses on the analysis of monthly precipitation totals from the period of 1961-2000. The purpose of the study is to develop a set of indices to represent the spatial variability of the study area for a given month. Various exploratory data analysis methods such as variography, kriging, and cluster analysis were used. The study attempts to quantify the second order (local) effects of the spatial variation of precipitation as a means to provide insight into the prediction of precipitation randomness. This task can be a difficult one due to the distinction between first and second order effects being somewhat arbitrary. The study proposes that a qualitative map of mean monthly precipitation can be classified through the use of a quantitative measure. This approach allowed for the unique classification of numerous months of precipitation through the use of a standard methodology. The researcher found that trying to capture the spatial variation of precipitation with two indices is an arduous task. Months were classified based on percentiles of the variogram cloud. The data were condensed into distance bins for analysis and used for calculation of the indices. A Short Range Index (SRI) and Long Range Index (LRI) were calculated for each month. The indices were then analyzed through the use of cluster analysis. The Partioning Around Medoids (PAM) method was used for the analysis providing an average silhouette value of .32. The study found that the methods applied did not efficiently capture the spatial variability of precipitation across the study area. However, this study has provided insight into the methodologies that can be applied to investigate spatial patterns of precipitation.
APA, Harvard, Vancouver, ISO, and other styles
48

Emerton, Guy. "Data-driven methods for exploratory analysis in chemometrics and scientific experimentation." Thesis, Stellenbosch : Stellenbosch University, 2014. http://hdl.handle.net/10019.1/86366.

Full text
Abstract:
Thesis (MSc)--Stellenbosch University, 2014.<br>ENGLISH ABSTRACT: Background New methods to facilitate exploratory analysis in scientific data are in high demand. There is an abundance of available data used only for confirmatory analysis from which new hypotheses can be drawn. To this end, two new exploratory techniques are developed: one for chemometrics and another for visualisation of fundamental scientific experiments. The former transforms large-scale multiple raw HPLC/UV-vis data into a conserved set of putative features - something not often attempted outside of Mass-Spectrometry. The latter method ('StatNet'), applies network techniques to the results of designed experiments to gain new perspective on variable relations. Results The resultant data format from un-targeted chemometric processing was amenable to both chemical and statistical analysis. It proved to have integrity when machine-learning techniques were applied to infer attributes of the experimental set-up. The visualisation techniques were equally successful in generating hypotheses, and were easily extendible to three different types of experimental results. Conclusion The overall aim was to create useful tools for hypothesis generation in a variety of data. This has been largely reached through a combination of novel and existing techniques. It is hoped that the methods here presented are further applied and developed.<br>AFRIKAANSE OPSOMMING: Agtergrond Nuwe metodes om ondersoekende ontleding in wetenskaplike data te fasiliteer is in groot aanvraag. Daar is 'n oorvloed van beskikbaar data wat slegs gebruik word vir bevestigende ontleding waaruit nuwe hipoteses opgestel kan word. Vir hierdie doel, word twee nuwe ondersoekende tegnieke ontwikkel: een vir chemometrie en 'n ander vir die visualisering van fundamentele wetenskaplike eksperimente. Die eersgenoemde transformeer grootskaalse veelvoudige rou HPLC / UV-vis data in 'n bewaarde stel putatiewe funksies - iets wat nie gereeld buite Massaspektrometrie aangepak word nie. Die laasgenoemde metode ('StatNet') pas netwerktegnieke tot die resultate van ontwerpte eksperimente toe om sodoende ân nuwe perspektief op veranderlike verhoudings te verkry. Resultate Die gevolglike data formaat van die ongeteikende chemometriese verwerking was in 'n formaat wat vatbaar is vir beide chemiese en statistiese analise. Daar is bewys dat dit integriteit gehad het wanneer masjienleertegnieke toegepas is om eienskappe van die eksperimentele opstelling af te lei. Die visualiseringtegnieke was ewe suksesvol in die generering van hipoteses, en ook maklik uitbreibaar na drie verskillende tipes eksperimentele resultate. Samevatting Die hoofdoel was om nuttige middele vir hipotese generasie in 'n verskeidenheid van data te skep. Dit is grootliks bereik deur 'n kombinasie van oorspronklike en bestaande tegnieke. Hopelik sal die metodes wat hier aangebied is verder toegepas en ontwikkel word.
APA, Harvard, Vancouver, ISO, and other styles
49

Bennett, Daniel. "Exploratory Data Analysis of the Large Scale Gas Injection Test (Lasgit)." Thesis, Cardiff University, 2014. http://orca.cf.ac.uk/66203/.

Full text
Abstract:
This thesis presents an Exploratory Data Analysis (EDA) performed on the dataset arising from the operation of the Large Scale Gas Injection Test (Lasgit). Lasgit is a field scale experiment located approximately 420m underground at the Äspö Hard Rock Laboratory (HRL) in Sweden. The experiment is designed to study the impact of gas build-up and subsequent migration through the Engineered Barrier System (EBS) of a KBS-3 concept radioactive waste repository. Investigation of the smaller scale, or ‘second order’ features of the dataset are the focus of the EDA, with the study of such features intended to contribute to the understanding of the experiment. In order to investigate Lasgit’s substantial (26 million datum point) dataset, a bespoke computational toolkit, the Non-Uniform Data Analysis Toolkit (NUDAT), designed to expose and quantify difficult to observe phenomena in large, non-uniform datasets has been developed. NUDAT has been designed with capabilities including non-parametric trend detection, frequency domain analysis, and second order event candidate detection. The various analytical modules developed and presented in this thesis were verified against simulated data that possessed prescribed and quantified phenomena, before application to Lasgit’s dataset. The Exploratory Data Analysis of Lasgit’s dataset presented in this thesis reveals and quantifies a number of phenomena, for example: the tendency for spiking to occur within groups of sensor records; estimates for the long term trends; the temperature profile of the experiment with depth and time along with the approximate seasonal variation in stress/pore-water pressure; and, in particular, the identification of second order event candidates as small as 0.1% of the macro-scale behaviours in which they reside. A selection of the second order event candidates have been aggregated together into second order events using the event candidates’ mutual synchronicities. Interpretation of these events suggests the possibility of small scale discrete gas flow pathways forming, possibly via a dilatant flow mechanism. The interpreted events typical behaviours, in addition to the observed spiking tendency, also support the grouping of sensors by sensor type. The developed toolkit, NUDAT, and its subsequent application to Lasgit’s dataset have enabled an investigation into the small scale, or ‘second order’ features of the experiment’s results. The analysis presented in this thesis provides insight into Lasgit’s experimental behaviour, and as such, contributes to the understanding of the experiment.
APA, Harvard, Vancouver, ISO, and other styles
50

Sousa, Joel Agostinho Nunes Pinto de. "Exploratory Analysis of Meteorological Data." Master's thesis, 2019. https://hdl.handle.net/10216/126679.

Full text
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography