To see the other types of publications on this topic, follow the link: Matrix regression.

Dissertations / Theses on the topic 'Matrix regression'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 dissertations / theses for your research on the topic 'Matrix regression.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.

1

Fischer, Manfred M., and Philipp Piribauer. "Model uncertainty in matrix exponential spatial growth regression models." WU Vienna University of Economics and Business, 2013. http://epub.wu.ac.at/4013/1/wp158.pdf.

Full text
Abstract:
This paper considers the problem of model uncertainty associated with variable selection and specification of the spatial weight matrix in spatial growth regression models in general and growth regression models based on the matrix exponential spatial specification in particular. A natural solution, supported by formal probabilistic reasoning, is the use of Bayesian model averaging which assigns probabilities on the model space and deals with model uncertainty by mixing over models, using the posterior model probabilities as weights. This paper proposes to adopt Bayesian information criterion model weights since they have computational advantages over fully Bayesian model weights. The approach is illustrated for both identifying model covariates and unveiling spatial structures present in pan-European growth data. (authors' abstract)
Series: Department of Economics Working Paper Series
APA, Harvard, Vancouver, ISO, and other styles
2

Piribauer, Philipp, and Manfred M. Fischer. "Model uncertainty in matrix exponential spatial growth regression models." Wiley-Blackwell, 2015. http://dx.doi.org/10.1111/gean.12057.

Full text
Abstract:
This paper considers the most important aspects of model uncertainty for spatial regression models, namely the appropriate spatial weight matrix to be employed and the appropriate explanatory vari- ables. We focus on the spatial Durbin model (SDM) specification in this study that nests most models used in the regional growth literature, and develop a simple Bayesian model averaging approach that provides a unified and formal treatment of these aspects of model uncertainty for SDM growth models. The approach expands on the work by LeSage and Fischer (2008) by reducing the computational costs through the use of Bayesian information criterion model weights and a matrix exponential specification of the SDM model. The spatial Durbin matrix exponential model has theoretical and computational advantages over the spatial autoregressive specification due to the ease of inversion, differentiation and integration of the matrix expo- nential. In particular, the matrix exponential has a simple matrix determinant which vanishes for the case of a spatial weight matrix with a trace of zero (LeSage and Pace 2007). This allows for a larger domain of spatial growth regression models to be analysed with this approach, including models based on different classes of spatial weight matrices. The working of the approach is illustrated for the case of 32 potential determinants and three classes of spatial weight matrices (contiguity-based, k-nearest neighbor and distance-based spatial weight matrices), using a dataset of income per capita growth for 273 European regions. (authors' abstract)
APA, Harvard, Vancouver, ISO, and other styles
3

Li, Yihua M. Eng Massachusetts Institute of Technology. "Blind regression : understanding collaborative filtering from matrix completion to tensor completion." Thesis, Massachusetts Institute of Technology, 2016. http://hdl.handle.net/1721.1/105983.

Full text
Abstract:
Thesis: M. Eng., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2016.
This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.
Cataloged from student-submitted PDF version of thesis.
Includes bibliographical references (pages 37-39).
Neighborhood-based Collaborative filtering (CF) methods have proven to be successful in practice and are widely applied in commercial recommendation systems. Yet theoretical understanding of their performance is lacking. In this work, we introduce a new framework of Blind Regression which assumes that there are latent features associated with input variables, and we observe outputs of some Lipschitz continuous function over those unobserved features. We apply our framework to the problem of matrix completion and give a nonparametric method which, similar to CF, combines the local estimates according to the distance between the neighbors. We use the sample variance of the difference in ratings between neighbors as the proximity of the distance. Through error analysis, we show that the minimum sample variance is a good proxy of the prediction error in the estimates. Experiments on real-world datasets suggests that our matrix completion algorithm outperforms classic user-user and item-item CF approaches. Finally, our framework easily extends to the setting of higher-order tensors and we present our algorithm for tensor completion. The result from real-world application of image inpainting demonstrates that our method is competitive with the state-of-the-art tensor factorization approaches in terms of predictive performance.
by Yihua Li.
M. Eng.
APA, Harvard, Vancouver, ISO, and other styles
4

Fallowfield, Jonathan Andrew. "The role of matrix metalloproteinase-13 in the regression of liver fibrosis." Thesis, University of Southampton, 2007. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.443059.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Albertson, K. V. "Pre-test estimation in a regression model with a mis-specified error covariance matrix." Thesis, University of Canterbury. Economics, 1993. http://hdl.handle.net/10092/4315.

Full text
Abstract:
This thesis considers some finite sample properties of a number of preliminary test (pre-test) estimators of the unknown parameters of a linear regression model that may have been mis-specified as a result of incorrectly assuming that the disturbance term has a scalar covariance matrix, and/or as a result of the exclusion of relevant regressors. The pre-test itself is a test for exact linear restrictions and is conducted using the usual Wald statistic, which provides a Uniformly Most Powerful Invariant test of the restrictions in a well specified model. The parameters to be estimated are the coefficient vector, the prediction vector (i.e. the expectation of the dependent variable conditional on the regressors), and the regression scale parameter. Note that while the problem of estimating the prediction vector is merely a special case of estimating the coefficient vector when the model is well specified, this is not the case when the model is mis-specified. The properties of each of these estimators in a well specified regression model have been examined in the literature, as have the effects of a number of different model mis-specifications, and we survey these results in Chapter Two. We will extend the existing literature by generalising the error covariance matrix in conjunction with allowing for possibly excluded regressors. To motivate the consideration of a nonscalar error covariance matrix in the context of a pre-test situation we briefly examine the literature on autoregressive and heteroscedastic error processes in Chapter Three. In Chapters Four, Five, Six, and Seven we derive the cumulative distribution function of the test statistic, and exact formulae for the bias and risk (under quadratic loss) of the unrestricted, restricted and pre-test estimators, in a model with a general error covariance matrix and possibly excluded relevant regressors. These formulae are data dependent and, to illustrate the results, are evaluated for a number of regression models and forms of error covariance matrix. In particular we determine the effects of autoregressive errors and heteroscedastic errors on each of the regression models under consideration. Our evaluations confirm the known result that the presence of a non scalar error covariance matrix introduces a distortion into the pre-test power function and we show the effects of this on the pre-test estimators. In addition to this we show that one effect of the mis-specification may be that the pre-test and restricted estimators may be strictly dominated by the corresponding unrestricted estimator even if there are no relevant regressors excluded from the model. If there are relevant regressors excluded from the model it appears that the additional mis-specification of the error covariance matrix has little qualitative impact unless the coefficients on the excluded regressors are small in magnitude or the excluded regressors are not correlated with the included regressors. As one of the effects of the mis-specification is to introduce a distortion into the pre-test power function, in Chapter Eight we consider the problem of determining the optimal critical value (under the criterion of minimax regret) for the pre-test when estimating the regression coefficient vector. We show that the mis-specification of the error covariance matrix may have a substantial impact on the optimal critical value chosen for the pre-test under this criterion, although, generally, the actual size of the pre-test is relatively unaffected by increasing degrees of mis-specification. Chapter Nine concludes this thesis and provides a summary of the results obtained in the earlier chapters. In addition, we outline some possible future research topics in this general area.
APA, Harvard, Vancouver, ISO, and other styles
6

Mei, Jiali. "Time series recovery and prediction with regression-enhanced nonnegative matrix factorization applied to electricity consumption." Thesis, Université Paris-Saclay (ComUE), 2017. http://www.theses.fr/2017SACLS578/document.

Full text
Abstract:
Nous sommes intéressé par la reconstitution et la prédiction des séries temporelles multivariées à partir des données partiellement observées et/ou agrégées.La motivation du problème vient des applications dans la gestion du réseau électrique.Nous envisageons des outils capables de résoudre le problème d'estimation de plusieurs domaines.Après investiguer le krigeage, qui est une méthode de la litérature de la statistique spatio-temporelle, et une méthode hybride basée sur le clustering des individus, nous proposons un cadre général de reconstitution et de prédiction basé sur la factorisation de matrice nonnégative.Ce cadre prend en compte de manière intrinsèque la corrélation entre les séries temporelles pour réduire drastiquement la dimension de l'espace de paramètres.Une fois que le problématique est formalisé dans ce cadre, nous proposons deux extensions par rapport à l'approche standard.La première extension prend en compte l'autocorrélation temporelle des individus.Cette information supplémentaire permet d'améliorer la précision de la reconstitution.La deuxième extension ajoute une composante de régression dans la factorisation de matrice nonnégative.Celle-ci nous permet d'utiliser dans l'estimation du modèle des variables exogènes liées avec la consommation électrique, ainsi de produire des facteurs plus interprétatbles, et aussi améliorer la reconstitution.De plus, cette méthod nous donne la possibilité d'utiliser la factorisation de matrice nonnégative pour produire des prédictions.Sur le côté théorique, nous nous intéressons à l'identifiabilité du modèle, ainsi qu'à la propriété de la convergence des algorithmes que nous proposons.La performance des méthodes proposées en reconstitution et en prédiction est testé sur plusieurs jeux de données de consommation électrique à niveaux d'agrégation différents
We are interested in the recovery and prediction of multiple time series from partially observed and/or aggregate data.Motivated by applications in electricity network management, we investigate tools from multiple fields that are able to deal with such data issues.After examining kriging from spatio-temporal statistics and a hybrid method based on the clustering of individuals, we propose a general framework based on nonnegative matrix factorization.This frameworks takes advantage of the intrisic correlation between the multivariate time series to greatly reduce the dimension of the parameter space.Once the estimation problem is formalized in the nonnegative matrix factorization framework, two extensions are proposed to improve the standard approach.The first extension takes into account the individual temporal autocorrelation of each of the time series.This increases the precision of the time series recovery.The second extension adds a regression layer into nonnegative matrix factorization.This allows exogenous variables that are known to be linked with electricity consumption to be used in estimation, hence makes the factors obtained by the method to be more interpretable, and also increases the recovery precision.Moreover, this method makes the method applicable to prediction.We produce a theoretical analysis on the framework which concerns the identifiability of the model and the convergence of the algorithms that are proposed.The performance of proposed methods to recover and forecast time series is tested on several multivariate electricity consumption datasets at different aggregation level
APA, Harvard, Vancouver, ISO, and other styles
7

Bownds, Christopher D. "Updating the Navy's recruit quality matrix : an analysis of educational credentials and the success of first-term sailors /." Thesis, Monterey, Calif. : Springfield, Va. : Naval Postgraduate School ; Available from National Technical Information Service, 2004. http://library.nps.navy.mil/uhtbin/hyperion/04Mar%5FBownds.pdf.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Bogren, Patrik, and Isak Kristola. "Exploring the use of call stack depth limits to reduce regression testing costs." Thesis, Mittuniversitetet, Institutionen för data- och systemvetenskap, 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:miun:diva-43166.

Full text
Abstract:
Regression testing is performed after existing source code has been modified to verify that no new faults have been introduced by the changes. Test case selection can be used to reduce the effort of regression testing by selecting a smaller subset of the test suite for later execution. Several criteria and objectives can be used as constraints that should be satisfied by the selection process. One common criteria is function coverage, which can be represented by a coverage matrix that maps test cases to methods under test. The process of generating and evaluating these matrices can be very time consuming for large matrices since their complexity increases exponentially with the number of tests included. To the best of our knowledge, no techniques for reducing execution matrix size have been proposed. This thesis develops a matrix-reduction technique based on analysis of call stack data. It studies the effects of limiting the call stack depth in terms of coverage accuracy, matrix size, and generation costs. Further, it uses a tool that can instrument Java projects using Java’s instrumentation API to collect coverage information on open-source Java projects for varying depth limits of the call stack. Our results show that the stack depth limit can be significantly reduced while retaining high coverage and that matrix size can be decreased by up to 50%. The metric we used to indicate the difficulty of splitting up the matrix closely resembled the curve for coverage. However, we did not see any significant differences in execution time for lower depth limits.
APA, Harvard, Vancouver, ISO, and other styles
9

Kuljus, Kristi. "Rank Estimation in Elliptical Models : Estimation of Structured Rank Covariance Matrices and Asymptotics for Heteroscedastic Linear Regression." Doctoral thesis, Uppsala universitet, Matematisk statistik, 2008. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-9305.

Full text
Abstract:
This thesis deals with univariate and multivariate rank methods in making statistical inference. It is assumed that the underlying distributions belong to the class of elliptical distributions. The class of elliptical distributions is an extension of the normal distribution and includes distributions with both lighter and heavier tails than the normal distribution. In the first part of the thesis the rank covariance matrices defined via the Oja median are considered. The Oja rank covariance matrix has two important properties: it is affine equivariant and it is proportional to the inverse of the regular covariance matrix. We employ these two properties to study the problem of estimating the rank covariance matrices when they have a certain structure. The second part, which is the main part of the thesis, is devoted to rank estimation in linear regression models with symmetric heteroscedastic errors. We are interested in asymptotic properties of rank estimates. Asymptotic uniform linearity of a linear rank statistic in the case of heteroscedastic variables is proved. The asymptotic uniform linearity property enables to study asymptotic behaviour of rank regression estimates and rank tests. Existing results are generalized and it is shown that the Jaeckel estimate is consistent and asymptotically normally distributed also for heteroscedastic symmetric errors.
APA, Harvard, Vancouver, ISO, and other styles
10

Wang, Shuo. "An Improved Meta-analysis for Analyzing Cylindrical-type Time Series Data with Applications to Forecasting Problem in Environmental Study." Digital WPI, 2015. https://digitalcommons.wpi.edu/etd-theses/386.

Full text
Abstract:
This thesis provides a case study on how the wind direction plays an important role in the amount of rainfall, in the village of Somi$acute{o}$. The primary goal is to illustrate how a meta-analysis, together with circular data analytic methods, helps in analyzing certain environmental issues. The existing GLS meta-analysis combines the merits of usual meta-analysis that yields a better precision and also accounts for covariance among coefficients. But, it is quite limited since information about the covariance among coefficients is not utilized. Hence, in my proposed meta-analysis, I take the correlations between adjacent studies into account when employing the GLS meta-analysis. Besides, I also fit a time series linear-circular regression as a comparable model. By comparing the confidence intervals of parameter estimates, covariance matrix, AIC, BIC and p-values, I discuss an improvement on the GLS meta analysis model in its application to forecasting problem in Environmental study.
APA, Harvard, Vancouver, ISO, and other styles
11

Kim, Jingu. "Nonnegative matrix and tensor factorizations, least squares problems, and applications." Diss., Georgia Institute of Technology, 2011. http://hdl.handle.net/1853/42909.

Full text
Abstract:
Nonnegative matrix factorization (NMF) is a useful dimension reduction method that has been investigated and applied in various areas. NMF is considered for high-dimensional data in which each element has a nonnegative value, and it provides a low-rank approximation formed by factors whose elements are also nonnegative. The nonnegativity constraints imposed on the low-rank factors not only enable natural interpretation but also reveal the hidden structure of data. Extending the benefits of NMF to multidimensional arrays, nonnegative tensor factorization (NTF) has been shown to be successful in analyzing complicated data sets. Despite the success, NMF and NTF have been actively developed only in the recent decade, and algorithmic strategies for computing NMF and NTF have not been fully studied. In this thesis, computational challenges regarding NMF, NTF, and related least squares problems are addressed. First, efficient algorithms of NMF and NTF are investigated based on a connection from the NMF and the NTF problems to the nonnegativity-constrained least squares (NLS) problems. A key strategy is to observe typical structure of the NLS problems arising in the NMF and the NTF computation and design a fast algorithm utilizing the structure. We propose an accelerated block principal pivoting method to solve the NLS problems, thereby significantly speeding up the NMF and NTF computation. Implementation results with synthetic and real-world data sets validate the efficiency of the proposed method. In addition, a theoretical result on the classical active-set method for rank-deficient NLS problems is presented. Although the block principal pivoting method appears generally more efficient than the active-set method for the NLS problems, it is not applicable for rank-deficient cases. We show that the active-set method with a proper starting vector can actually solve the rank-deficient NLS problems without ever running into rank-deficient least squares problems during iterations. Going beyond the NLS problems, it is presented that a block principal pivoting strategy can also be applied to the l1-regularized linear regression. The l1-regularized linear regression, also known as the Lasso, has been very popular due to its ability to promote sparse solutions. Solving this problem is difficult because the l1-regularization term is not differentiable. A block principal pivoting method and its variant, which overcome a limitation of previous active-set methods, are proposed for this problem with successful experimental results. Finally, a group-sparsity regularization method for NMF is presented. A recent challenge in data analysis for science and engineering is that data are often represented in a structured way. In particular, many data mining tasks have to deal with group-structured prior information, where features or data items are organized into groups. Motivated by an observation that features or data items that belong to a group are expected to share the same sparsity pattern in their latent factor representations, We propose mixed-norm regularization to promote group-level sparsity. Efficient convex optimization methods for dealing with the regularization terms are presented along with computational comparisons between them. Application examples of the proposed method in factor recovery, semi-supervised clustering, and multilingual text analysis are presented.
APA, Harvard, Vancouver, ISO, and other styles
12

Nasseri, Sahand. "Application of an Improved Transition Probability Matrix Based Crack Rating Prediction Methodology in Florida’s Highway Network." Scholar Commons, 2008. https://scholarcommons.usf.edu/etd/424.

Full text
Abstract:
With the growing need to maintain roadway systems for provision of safety and comfort for travelers, network level decision-making becomes more vital than ever. In order to keep pace with this fast evolving trend, highway authorities must maintain extremely effective databases to keep track of their highway maintenance needs. Florida Department of Transportation (FDOT), as a leader in transportation innovations in the U.S., maintains Pavement Condition Survey (PCS) database of cracking, rutting, and ride information that are updated annually. Crack rating is an important parameter used by FDOT for making maintenance decisions and budget appropriation. By establishing a crack rating threshold below which traveler comfort is not assured, authorities can screen the pavement sections which are in need of Maintenance and Rehabilitation (M&R). Hence, accurate and reliable prediction of crack thresholds is essential to optimize the rehabilitation budget and manpower. Transition Probability Matrices (TPM) can be utilized to accurately predict the deterioration of crack ratings leading to the threshold. Such TPMs are usually developed by historical data or expert or experienced maintenance engineers' opinion. When historical data are used to develop TPMs, deterioration trends have been used vii indiscriminately, i.e. with no discrimination made between pavements that degrade at different rates. However, a more discriminatory method is used in this thesis to develop TPMs based on classifying pavements first into two groups. They are pavements with relatively high traffic and, pavements with a history of excessive degradation due to delayed rehabilitation. The new approach uses a multiple non-linear regression process to separately optimize TPMs for the two groups selected by prior screening of the database. The developed TPMs are shown to have minimal prediction errors with respect to crack ratings in the database that were not used in the TPM formation. It is concluded that the above two groups are statistically different from each other with respect to the rate of cracking. The observed significant differences in the deterioration trends would provide a valuable tool for the authorities in making critical network-level decisions. The same methodology can be applied in other transportation agencies based on the corresponding databases.
APA, Harvard, Vancouver, ISO, and other styles
13

Torp, Emil, and Patrik Önnegren. "Driving Cycle Generation Using Statistical Analysis and Markov Chains." Thesis, Linköpings universitet, Fordonssystem, 2013. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-94147.

Full text
Abstract:
A driving cycle is a velocity profile over time. Driving cycles can be used for environmental classification of cars and to evaluate vehicle performance. The benefit by using stochastic driving cycles instead of predefined driving cycles, i.e. the New European Driving Cycle, is for instance that the risk of cycle beating is reduced. Different methods to generate stochastic driving cycles based on real-world data have been used around the world, but the representativeness of the generated driving cycles has been difficult to ensure. The possibility to generate stochastic driving cycles that captures specific features from a set of real-world driving cycles is studied. Data from more than 500 real-world trips has been processed and categorized. The driving cycles are merged into several transition probability matrices (TPMs), where each element corresponds to a specific state defined by its velocity and acceleration. The TPMs are used with Markov chain theory to generate stochastic driving cycles. The driving cycles are validated using percentile limits on a set of characteristic variables, that are obtained from statistical analysis of real-world driving cycles. The distribution of the generated driving cycles is investigated and compared to real-world driving cycles distribution. The generated driving cycles proves to represent the original set of real-world driving cycles in terms of key variables determined through statistical analysis. Four different methods are used to determine which statistical variables that describes the features of the provided driving cycles. Two of the methods uses regression analysis. Hierarchical clustering of statistical variables is proposed as a third alternative, and the last method combines the cluster analysis with the regression analysis. The entire process is automated and a graphical user interface is developed in Matlab to facilitate the use of the software.
En körcykel är en beskriving av hur hastigheten för ett fordon ändras under en körning. Körcykler används bland annat till att miljöklassa bilar och för att utvärdera fordonsprestanda. Olika metoder för att generera stokastiska körcykler baserade på verklig data har använts runt om i världen, men det har varit svårt att efterlikna naturliga körcykler. Möjligheten att generera stokastiska körcykler som representerar en uppsättning naturliga körcykler studeras. Data från över 500 körcykler bearbetas och kategoriseras. Dessa används för att skapa överergångsmatriser där varje element motsvarar ett visst tillstånd, med hastighet och acceleration som tillståndsvariabler. Matrisen tillsammans med teorin om Markovkedjor används för att generera stokastiska körcykler. De genererade körcyklerna valideras med hjälp percentilgränser för ett antal karaktäristiska variabler som beräknats för de naturliga körcyklerna. Hastighets- och accelerationsfördelningen hos de genererade körcyklerna studeras och jämförs med de naturliga körcyklerna för att säkerställa att de är representativa. Statistiska egenskaper jämfördes och de genererade körcyklerna visade sig likna den ursprungliga uppsättningen körcykler. Fyra olika metoder används för att bestämma vilka statistiska variabler som beskriver de naturliga körcyklerna. Två av metoderna använder regressionsanalys. Hierarkisk klustring av statistiska variabler föreslås som ett tredje alternativ. Den sista metoden kombinerar klusteranalysen med regressionsanalysen. Hela processen är automatiserad och ett grafiskt användargränssnitt har utvecklats i Matlab för att underlätta användningen av programmet.
APA, Harvard, Vancouver, ISO, and other styles
14

Deshpande, Seemantini R. "Evaluation of PM2.5 Components and Source Apportionment at a Rural Site in the Ohio River Valley Region." Ohio University / OhioLINK, 2007. http://rave.ohiolink.edu/etdc/view?acc_num=ohiou1187123906.

Full text
APA, Harvard, Vancouver, ISO, and other styles
15

VIZCAINO, Lelio Alejandro Arias. "Um novo resíduo para classes de modelos de regressão na família exponencial." Universidade Federal de Pernambuco, 2016. https://repositorio.ufpe.br/handle/123456789/18636.

Full text
Abstract:
Submitted by Fabio Sobreira Campos da Costa (fabio.sobreira@ufpe.br) on 2017-04-25T14:30:32Z No. of bitstreams: 2 license_rdf: 1232 bytes, checksum: 66e71c371cc565284e70f40736c94386 (MD5) Dissertacao_Lelio_Alejandro_Arias_Vizcaino.pdf: 1217481 bytes, checksum: 3e169ccf7afc8c3a244b8cc4a07c9cbf (MD5)
Made available in DSpace on 2017-04-25T14:30:32Z (GMT). No. of bitstreams: 2 license_rdf: 1232 bytes, checksum: 66e71c371cc565284e70f40736c94386 (MD5) Dissertacao_Lelio_Alejandro_Arias_Vizcaino.pdf: 1217481 bytes, checksum: 3e169ccf7afc8c3a244b8cc4a07c9cbf (MD5) Previous issue date: 2016-12-05
FACEPE
entre as principais metodologias estatísticas, a análise de regressão é uma das formas mais efetivas para modelar dados. Neste sentido, a análise de diagnóstico é imprescindível para determinar o que poder ter acontecido no processo gerador dos dados caso os pressupostos impostos a este não sejam plausíveis. Uma das ferramentas mais úteis em diagnóstico é a avaliação dos resíduos. Neste trabalho, propomos um novo resíduo para as classes de modelos de regressão linear e não linear baseados na família exponencial com dispersão variável (Smyth (1989)). A proposta permite incorporar de forma simultânea informações relativas aos submodelos da média e da dispersão sem fazer uso de matrizes de projeção para sua padronização. Resultados de simulação e de aplicações a dados reais mostram que o novo resíduo é altamente competitivo em relação ao resíduos amplamente usados e consolidados na literatura.
In statistical methodologies, regression analysis can be a very effective way to model data. In this sense, the diagnostic analysis is needed to try to determine what might happened in the data generating process if the conditions imposed to it are not true. One of the most useful techniques to detect the goodness of fit to the model is the evaluation of residuals. In this work, we propose a new residual to the class of linear and nonlinear regression models based on exponential family with variable dispersion (Smyth (1989)). The proposal incorporates simultaneously information from the sub-models of the mean and the dispersion without using projection matrices for its standardization. Simulation resultsandapplicationsinrealdatashowthatthenewresidualishighlycompetitivewith respect to residuals widely used and established in the literature.
APA, Harvard, Vancouver, ISO, and other styles
16

Shrestha, Prabha. "Application of Influence Function in Sufficient Dimension Reduction Models." Ohio University / OhioLINK, 2020. http://rave.ohiolink.edu/etdc/view?acc_num=ohiou1595278335591108.

Full text
APA, Harvard, Vancouver, ISO, and other styles
17

Savas, Berkant. "Algorithms in data mining using matrix and tensor methods." Doctoral thesis, Linköpings universitet, Beräkningsvetenskap, 2008. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-11597.

Full text
Abstract:
In many fields of science, engineering, and economics large amounts of data are stored and there is a need to analyze these data in order to extract information for various purposes. Data mining is a general concept involving different tools for performing this kind of analysis. The development of mathematical models and efficient algorithms is of key importance. In this thesis we discuss algorithms for the reduced rank regression problem and algorithms for the computation of the best multilinear rank approximation of tensors. The first two papers deal with the reduced rank regression problem, which is encountered in the field of state-space subspace system identification. More specifically the problem is \[ \min_{\rank(X) = k} \det (B - X A)(B - X A)\tp, \] where $A$ and $B$ are given matrices and we want to find $X$ under a certain rank condition that minimizes the determinant. This problem is not properly stated since it involves implicit assumptions on $A$ and $B$ so that $(B - X A)(B - X A)\tp$ is never singular. This deficiency of the determinant criterion is fixed by generalizing the minimization criterion to rank reduction and volume minimization of the objective matrix. The volume of a matrix is defined as the product of its nonzero singular values. We give an algorithm that solves the generalized problem and identify properties of the input and output signals causing a singular objective matrix. Classification problems occur in many applications. The task is to determine the label or class of an unknown object. The third paper concerns with classification of handwritten digits in the context of tensors or multidimensional data arrays. Tensor and multilinear algebra is an area that attracts more and more attention because of the multidimensional structure of the collected data in various applications. Two classification algorithms are given based on the higher order singular value decomposition (HOSVD). The main algorithm makes a data reduction using HOSVD of 98--99 \% prior the construction of the class models. The models are computed as a set of orthonormal bases spanning the dominant subspaces for the different classes. An unknown digit is expressed as a linear combination of the basis vectors. The resulting algorithm achieves 5\% in classification error with fairly low amount of computations. The remaining two papers discuss computational methods for the best multilinear rank approximation problem \[ \min_{\cB} \| \cA - \cB\| \] where $\cA$ is a given tensor and we seek the best low multilinear rank approximation tensor $\cB$. This is a generalization of the best low rank matrix approximation problem. It is well known that for matrices the solution is given by truncating the singular values in the singular value decomposition (SVD) of the matrix. But for tensors in general the truncated HOSVD does not give an optimal approximation. For example, a third order tensor $\cB \in \RR^{I \x J \x K}$ with rank$(\cB) = (r_1,r_2,r_3)$ can be written as the product \[ \cB = \tml{X,Y,Z}{\cC}, \qquad b_{ijk}=\sum_{\lambda,\mu,\nu} x_{i\lambda} y_{j\mu} z_{k\nu} c_{\lambda\mu\nu}, \] where $\cC \in \RR^{r_1 \x r_2 \x r_3}$ and $X \in \RR^{I \times r_1}$, $Y \in \RR^{J \times r_2}$, and $Z \in \RR^{K \times r_3}$ are matrices of full column rank. Since it is no restriction to assume that $X$, $Y$, and $Z$ have orthonormal columns and due to these constraints, the approximation problem can be considered as a nonlinear optimization problem defined on a product of Grassmann manifolds. We introduce novel techniques for multilinear algebraic manipulations enabling means for theoretical analysis and algorithmic implementation. These techniques are used to solve the approximation problem using Newton and Quasi-Newton methods specifically adapted to operate on products of Grassmann manifolds. The presented algorithms are suited for small, large and sparse problems and, when applied on difficult problems, they clearly outperform alternating least squares methods, which are standard in the field.
APA, Harvard, Vancouver, ISO, and other styles
18

Lind, Nilsson Rasmus. "Machine learning in logistics : Increasing the performance of machine learning algorithms on two specific logistic problems." Thesis, Luleå tekniska universitet, Institutionen för system- och rymdteknik, 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:ltu:diva-64761.

Full text
Abstract:
Data Ductus, a multination IT-consulting company, wants to develop an AI that monitors a logistic system and looks for errors. Once trained enough, this AI will suggest a correction and automatically right issues if they arise. This project presents how one works with machine learning problems and provides a deeper insight into how cross-validation and regularisation, among other techniques, are used to improve the performance of machine learning algorithms on the defined problem. Three techniques are tested and evaluated in our logistic system on three different machine learning algorithms, namely Naïve Bayes, Logistic Regression and Random Forest. The evaluation of the algorithms leads us to conclude that Random Forest, using cross-validated parameters, gives the best performance on our specific problems, with the other two falling behind in each tested category. It became clear to us that cross-validation is a simple, yet powerful tool for increasing the performance of machine learning algorithms.
Data Ductus, ett multinationellt IT-konsultföretag vill utveckla en AI som övervakar ett logistiksystem och uppmärksammar fel. När denna AI är tillräckligt upplärd ska den föreslå korrigering eller automatiskt korrigera problem som uppstår. Detta projekt presenterar hur man arbetar med maskininlärningsproblem och ger en djupare inblick i hur kors-validering och regularisering, bland andra tekniker, används för att förbättra prestandan av maskininlärningsalgoritmer på det definierade problemet. Dessa tekniker testas och utvärderas i vårt logistiksystem på tre olika maskininlärnings algoritmer, nämligen Naïve Bayes, Logistic Regression och Random Forest. Utvärderingen av algoritmerna leder oss till att slutsatsen är att Random Forest, som använder korsvaliderade parametrar, ger bästa prestanda på våra specifika problem, medan de andra två faller bakom i varje testad kategori. Det blev klart för oss att kors-validering är ett enkelt, men kraftfullt verktyg för att öka prestanda hos maskininlärningsalgoritmer.
APA, Harvard, Vancouver, ISO, and other styles
19

Chun, Yongwan. "Behavioral specifications of network autocorrelation in migration modeling an analysis of migration flows by spatial filtering /." Columbus, Ohio : Ohio State University, 2007. http://rave.ohiolink.edu/etdc/view?acc%5Fnum=osu1187188476.

Full text
APA, Harvard, Vancouver, ISO, and other styles
20

Carpenter, Lee Wyatt. "Valuing Natural Space and Landscape Fragmentation in Richmond, VA." VCU Scholars Compass, 2016. http://scholarscompass.vcu.edu/etd/4645.

Full text
Abstract:
Hedonic pricing methods and GIS (Geographic Information Systems) were used to evaluate relationships between sale price of single family homes and landscape fragmentation and natural land cover. Spatial regression analyses found that sale prices increase as landscapes become less fragmented and the amount of natural land cover around a home increases. The projected growth in population and employment in the Richmond, Virginia region and subsequent increases in land development and landscape fragmentation presents a challenge to sustaining intact healthy ecosystems in the Richmond region. Spatial regression analyses helped illuminate how land cover patterns influence sale prices and landscape patterns that are economically and ecologically advantageous.
APA, Harvard, Vancouver, ISO, and other styles
21

Hassani, Mujtaba. "CONSTRUCTION EQUIPMENT FUEL CONSUMPTION DURING IDLING : Characterization using multivariate data analysis at Volvo CE." Thesis, Mälardalens högskola, Akademin för ekonomi, samhälle och teknik, 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:mdh:diva-49007.

Full text
Abstract:
Human activities have increased the concentration of CO2 into the atmosphere, thus it has caused global warming. Construction equipment are semi-stationary machines and spend at least 30% of its life time during idling. The majority of the construction equipment is diesel powered and emits toxic emission into the environment. In this work, the idling will be investigated through adopting several statistical regressions models to quantify the fuel consumption of construction equipment during idling. The regression models which are studied in this work: Multivariate Linear Regression (ML-R), Support Vector Machine Regression (SVM-R), Gaussian Process regression (GP-R), Artificial Neural Network (ANN), Partial Least Square Regression (PLS-R) and Principal Components Regression (PC-R). Findings show that pre-processing has a significant impact on the goodness of the prediction of the explanatory data analysis in this field. Moreover, through mean centering and application of the max-min scaling feature, the accuracy of models increased remarkably. ANN and GP-R had the highest accuracy (99%), PLS-R was the third accurate model (98% accuracy), ML-R was the fourth-best model (97% accuracy), SVM-R was the fifth-best (73% accuracy) and the lowest accuracy was recorded for PC-R (83% accuracy). The second part of this project estimated the CO2 emission based on the fuel used and by adopting the NONROAD2008 model.  Keywords:
APA, Harvard, Vancouver, ISO, and other styles
22

Shrewsbury, John Stephen. "Calibration of trip distribution by generalised linear models." Thesis, University of Canterbury. Department of Civil and Natuaral Resources Engineering, 2012. http://hdl.handle.net/10092/7685.

Full text
Abstract:
Generalised linear models (GLMs) provide a flexible and sound basis for calibrating gravity models for trip distribution, for a wide range of deterrence functions (from steps to splines), with K factors and geographic segmentation. The Tanner function fitted Wellington Transport Strategy Model data as well as more complex functions and was insensitive to the formulation of intrazonal and external costs. Weighting from variable expansion factors and interpretation of the deviance under sparsity are addressed. An observed trip matrix is disaggregated and fitted at the household, person and trip levels with consistent results. Hierarchical GLMs (HGLMs) are formulated to fit mixed logit models, but were unable to reproduce the coefficients of simple nested logit models. Geospatial analysis by HGLM showed no evidence of spatial error patterns, either as random K factors or as correlations between them. Equivalence with hierarchical mode choice, duality with trip distribution, regularisation, lorelograms, and the modifiable areal unit problem are considered. Trip distribution is calibrated from aggregate data by the MVESTM matrix estimation package, incorporating period and direction factors in the intercepts. Counts across four screenlines showed a significance similar to a thousand-household travel survey. Calibration was possible only in conjuction with trip end data. Criteria for validation against screenline counts were met, but only if allowance was made for error in the trip end data.
APA, Harvard, Vancouver, ISO, and other styles
23

McDonald, Timothy Myles. "Making sense of genotype x environment interaction of Pinus radiata in New Zealand." Thesis, University of Canterbury. School of Forestry, 2009. http://hdl.handle.net/10092/3222.

Full text
Abstract:
In New Zealand, a formal tree improvement and breeding programme for Pinus radiata (D.Don) commenced in 1952. A countrywide series of progeny trials was progressively established on over seventy sites, and is managed by the Radiata Pine Breeding Company (RPBC). Diameter at breast height data from the series were used to investigate genotype x environment interaction with a view to establishing the need for partitioning breeding and deployment efforts for P. radiata. Nearly 300,000 measurements made this study one of the largest for genotype x environment interaction ever done. Bivariate analyses were conducted between all pairs of sites to determine genetic correlations between sites. Genetic correlations were used to construct a proximity matrix by subtracting each correlation from unity. The process of constructing the matrix highlighted issues of low connectivity between sites; whereby meaningful correlations between sites were established with just 5 % of the pairs. However, nearly two-thirds of these genetic correlations were between -1.0 and 0.6, indicating the presence of strong genotype x environment interactions. A technique known as multiple regression on resemblance matrices was carried out by regressing a number of environmental correlation matrices on the diameter at breast height correlation matrix. Genotype x environment interactions were found to be driven by extreme maximum temperatures (t-statistic of 2.03 against critical t-value of 1.96 at 95 % confidence level). When tested on its own, altitude was significant with genetic correlations between sites at the 90 % confidence level (t-statistic of 1.92 against critical t-value of 1.645). In addition, a method from Graph Theory using proximity thresholds was utilised as a form of clustering. However, this study highlighted the existence of high internal cohesion within trial series, and high external isolation between trial series. That is, grouping of sites (in terms of diameter) was observed to be a reflection of the series of trials for which each site was established. This characteristic is particularly unhelpful for partitioning sites into regions of similar propensity to genotype x environment interaction, as the genotype x environment effect is effectively over-ridden by the genotype effect. Better cohesion between past, present and future trial series, and more accurate bioclimatic data should allow more useful groupings of sites to be extracted from the data. Given this, however, it is clear that there are a large number of interactive families contained in the RPBC dataset. It is concluded that partitioning of New Zealand’s P. radiata breeding programme cannot be ruled out as an advantageous option.
APA, Harvard, Vancouver, ISO, and other styles
24

Allan, Michelle L. "Measuring Skill Importance in Women's Soccer and Volleyball." Diss., CLICK HERE for online access, 2009. http://contentdm.lib.byu.edu/ETD/image/etd2809.pdf.

Full text
APA, Harvard, Vancouver, ISO, and other styles
25

Bilal, Mustafa. "Relationships Between Felt Intensity And Recorded Ground Motion Parameters For Turkey." Master's thesis, METU, 2013. http://etd.lib.metu.edu.tr/upload/12615426/index.pdf.

Full text
Abstract:
Earthquakes are among natural disasters with significant damage potential
however it is possible to reduce the losses by taking several remedies. Reduction of seismic losses starts with identifying and estimating the expected damage to some accuracy. Since both the design styles and the construction defects exhibit mostly local properties all over the world, damage estimations should be performed at regional levels. Another important issue in disaster mitigation is to determine a robust measure of ground motion intensity parameters. As of now, well-built correlations between shaking intensity and instrumental ground motion parameters are not yet studied in detail for Turkish data. In the first part of this thesis, regional empirical Damage Probability Matrices (DPMs) are formed for Turkey. As the input data, the detailed damage database of the 17 August 1999 Kocaeli earthquake (Mw=7.4) is used. The damage probability matrices are derived for Sakarya, Bolu and Kocaeli, for both reinforced concrete and masonry buildings. Results are compared with previous similar studies and the differences are discussed. After validation with future data, these DPMs can be used in the calculation of earthquake insurance premiums. In the second part of this thesis, two relationships between the felt-intensity and peak ground motion parameters are generated using linear least-squares regression technique. The first one correlates Modified Mercalli Intensity (MMI) to Peak Ground Acceleration (PGA) whereas the latter one does the same for Peak Ground Velocity (PGV). Old damage reports and isoseismal maps are employed for deriving 92 data pairs of MMI, PGA and PGV used in the regression analyses. These local relationships can be used in the future for ShakeMap applications in rapid response and disaster management activities.
APA, Harvard, Vancouver, ISO, and other styles
26

Somé, Sobom Matthieu. "Estimations non paramétriques par noyaux associés multivariés et applications." Thesis, Besançon, 2015. http://www.theses.fr/2015BESA2030/document.

Full text
Abstract:
Dans ce travail, l'approche non-paramétrique par noyaux associés mixtes multivariés est présentée pour les fonctions de densités, de masse de probabilité et de régressions à supports partiellement ou totalement discrets et continus. Pour cela, quelques aspects essentiels des notions d'estimation par noyaux continus (dits classiques) multivariés et par noyaux associés univariés (discrets et continus) sont d'abord rappelés. Les problèmes de supports sont alors révisés ainsi qu'une résolution des effets de bords dans les cas des noyaux associés univariés. Le noyau associé multivarié est ensuite défini et une méthode de leur construction dite mode-dispersion multivarié est proposée. Il s'ensuit une illustration dans le cas continu utilisant le noyau bêta bivarié avec ou sans structure de corrélation de type Sarmanov. Les propriétés des estimateurs telles que les biais, les variances et les erreurs quadratiques moyennes sont également étudiées. Un algorithme de réduction du biais est alors proposé et illustré sur ce même noyau avec structure de corrélation. Des études par simulations et applications avec le noyau bêta bivarié avec structure de corrélation sont aussi présentées. Trois formes de matrices des fenêtres, à savoir, pleine, Scott et diagonale, y sont utilisées puis leurs performances relatives sont discutées. De plus, des noyaux associés multiples ont été efficaces dans le cadre de l'analyse discriminante. Pour cela, on a utilisé les noyaux univariés binomial, catégoriel, triangulaire discret, gamma et bêta. Par la suite, les noyaux associés avec ou sans structure de corrélation ont été étudiés dans le cadre de la régression multiple. En plus des noyaux univariés ci-dessus, les noyaux bivariés avec ou sans structure de corrélation ont été aussi pris en compte. Les études par simulations montrent l'importance et les bonnes performances du choix des noyaux associés multivariés à matrice de lissage pleine ou diagonale. Puis, les noyaux associés continus et discrets sont combinés pour définir les noyaux associés mixtes univariés. Les travaux ont aussi donné lieu à la création d'un package R pour l'estimation de fonctions univariés de densités, de masse de probabilité et de régression. Plusieurs méthodes de sélections de fenêtres optimales y sont implémentées avec une interface facile d'utilisation. Tout au long de ce travail, la sélection des matrices de lissage se fait généralement par validation croisée et parfois par les méthodes bayésiennes. Enfin, des compléments sur les constantes de normalisations des estimateurs à noyaux associés des fonctions de densité et de masse de probabilité sont présentés
This work is about nonparametric approach using multivariate mixed associated kernels for densities, probability mass functions and regressions estimation having supports partially or totally discrete and continuous. Some key aspects of kernel estimation using multivariate continuous (classical) and (discrete and continuous) univariate associated kernels are recalled. Problem of supports are also revised as well as a resolution of boundary effects for univariate associated kernels. The multivariate associated kernel is then defined and a construction by multivariate mode-dispersion method is provided. This leads to an illustration on the bivariate beta kernel with Sarmanov's correlation structure in continuous case. Properties of these estimators are studied, such as the bias, variances and mean squared errors. An algorithm for reducing the bias is proposed and illustrated on this bivariate beta kernel. Simulations studies and applications are then performed with bivariate beta kernel. Three types of bandwidth matrices, namely, full, Scott and diagonal are used. Furthermore, appropriated multiple associated kernels are used in a practical discriminant analysis task. These are the binomial, categorical, discrete triangular, gamma and beta. Thereafter, associated kernels with or without correlation structure are used in multiple regression. In addition to the previous univariate associated kernels, bivariate beta kernels with or without correlation structure are taken into account. Simulations studies show the performance of the choice of associated kernels with full or diagonal bandwidth matrices. Then, (discrete and continuous) associated kernels are combined to define mixed univariate associated kernels. Using the tools of unification of discrete and continuous analysis, the properties of the mixed associated kernel estimators are shown. This is followed by an R package, created in univariate case, for densities, probability mass functions and regressions estimations. Several smoothing parameter selections are implemented via an easy-to-use interface. Throughout the paper, bandwidth matrix selections are generally obtained using cross-validation and sometimes Bayesian methods. Finally, some additionnal informations on normalizing constants of associated kernel estimators are presented for densities or probability mass functions
APA, Harvard, Vancouver, ISO, and other styles
27

Balmand, Samuel. "Quelques contributions à l'estimation de grandes matrices de précision." Thesis, Paris Est, 2016. http://www.theses.fr/2016PESC1024/document.

Full text
Abstract:
Sous l'hypothèse gaussienne, la relation entre indépendance conditionnelle et parcimonie permet de justifier la construction d'estimateurs de l'inverse de la matrice de covariance -- également appelée matrice de précision -- à partir d'approches régularisées. Cette thèse, motivée à l'origine par la problématique de classification d'images, vise à développer une méthode d'estimation de la matrice de précision en grande dimension, lorsque le nombre $n$ d'observations est petit devant la dimension $p$ du modèle. Notre approche repose essentiellement sur les liens qu'entretiennent la matrice de précision et le modèle de régression linéaire. Elle consiste à estimer la matrice de précision en deux temps. Les éléments non diagonaux sont tout d'abord estimés en considérant $p$ problèmes de minimisation du type racine carrée des moindres carrés pénalisés par la norme $ell_1$.Les éléments diagonaux sont ensuite obtenus à partir du résultat de l'étape précédente, par analyse résiduelle ou maximum de vraisemblance. Nous comparons ces différents estimateurs des termes diagonaux en fonction de leur risque d'estimation. De plus, nous proposons un nouvel estimateur, conçu de sorte à tenir compte de la possible contamination des données par des {em outliers}, grâce à l'ajout d'un terme de régularisation en norme mixte $ell_2/ell_1$. L'analyse non-asymptotique de la convergence de notre estimateur souligne la pertinence de notre méthode
Under the Gaussian assumption, the relationship between conditional independence and sparsity allows to justify the construction of estimators of the inverse of the covariance matrix -- also called precision matrix -- from regularized approaches. This thesis, originally motivated by the problem of image classification, aims at developing a method to estimate the precision matrix in high dimension, that is when the sample size $n$ is small compared to the dimension $p$ of the model. Our approach relies basically on the connection of the precision matrix to the linear regression model. It consists of estimating the precision matrix in two steps. The off-diagonal elements are first estimated by solving $p$ minimization problems of the type $ell_1$-penalized square-root of least-squares. The diagonal entries are then obtained from the result of the previous step, by residual analysis of likelihood maximization. This various estimators of the diagonal entries are compared in terms of estimation risk. Moreover, we propose a new estimator, designed to consider the possible contamination of data by outliers, thanks to the addition of a $ell_2/ell_1$ mixed norm regularization term. The nonasymptotic analysis of the consistency of our estimator points out the relevance of our method
APA, Harvard, Vancouver, ISO, and other styles
28

Žiupsnys, Giedrius. "Klientų duomenų valdymas bankininkystėje." Master's thesis, Lithuanian Academic Libraries Network (LABT), 2011. http://vddb.laba.lt/obj/LT-eLABa-0001:E.02~2010~D_20110709_152442-86545.

Full text
Abstract:
Darbas apima banko klientų kredito istorinių duomenų dėsningumų tyrimą. Pirmiausia nagrinėjamos banko duomenų saugyklos, siekiant kuo geriau perprasti bankinius duomenis. Vėliau naudojant banko duomenų imtis, kurios apima kreditų grąžinimo istoriją, siekiama įvertinti klientų nemokumo riziką. Tai atliekama adaptuojant algoritmus bei programinę įrangą duomenų tyrimui, kuris pradedamas nuo informacijos apdorojimo ir paruošimo. Paskui pritaikant įvairius klasifikavimo algoritmus, sudarinėjami modeliai, kuriais siekiama kuo tiksliau suskirstyti turimus duomenis, nustatant nemokius klientus. Taip pat siekiant įvertinti kliento vėluojamų mokėti paskolą dienų skaičių pasitelkiami regresijos algoritmai bei sudarinėjami prognozės modeliai. Taigi darbo metu atlikus numatytus tyrimus, pateikiami duomenų vitrinų modeliai, informacijos srautų schema. Taip pat nurodomi klasifikavimo ir prognozavimo modeliai bei algoritmai, geriausiai įvertinantys duotas duomenų imtis.
This work is about analysing regularities in bank clients historical credit data. So first of all bank information repositories are analyzed to comprehend banks data. Then using data mining algorithms and software for bank data sets, which describes credit repayment history, clients insolvency risk is being tried to estimate. So first step in analyzis is information preprocessing for data mining. Later various classification algorithms is used to make models wich classify our data sets and help to identify insolvent clients as accurate as possible. Besides clasiffication, regression algorithms are analyzed and prediction models are created. These models help to estimate how long client are late to pay deposit. So when researches have been done data marts and data flow schema are presented. Also classification and regressions algorithms and models, which shows best estimation results for our data sets, are introduced.
APA, Harvard, Vancouver, ISO, and other styles
29

Augusto, Taize Machado. "A regressão da prostata ventral de ratos pos-castração envolve alterações no conteudo de heparam sulfato e da expressão da heparanase." [s.n.], 2007. http://repositorio.unicamp.br/jspui/handle/REPOSIP/317566.

Full text
Abstract:
Orientador: Hernandes Faustino de Carvalho
Dissertação (mestrado) - Universidade Estadual de Campinas, Instituto de Biologia
Made available in DSpace on 2018-08-10T03:50:54Z (GMT). No. of bitstreams: 1 Augusto_TaizeMachado_M.pdf: 1393330 bytes, checksum: 02bc668b22957f9820d07a1791a2218b (MD5) Previous issue date: 2007
Resumo: O crescimento e fisiologia da próstata são dependentes de andrógenos e sua privação resulta numa regressão acentuada na glândula, com uma redução a 10% do tamanho original após 21 dias de castração. Esta redução no tamanho é causada pela perda de ...
Abstract: The growth and physiology of the prostate are dependent on androgens and androgen deprivation results in marked regression of the organ, which is reduced to 10% of the original size 21 days after castration. This reduction in size is caused by t...
Mestrado
Biologia Celular
Mestre em Biologia Celular e Estrutural
APA, Harvard, Vancouver, ISO, and other styles
30

Söderberg, Max Joel, and Axel Meurling. "Feature selection in short-term load forecasting." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-259692.

Full text
Abstract:
This paper investigates correlation between energy consumption 24 hours ahead and features used for predicting energy consumption. The features originate from three categories: weather, time and previous energy. The correlations are calculated using Pearson correlation and mutual information. This resulted in the highest correlated features being those representing previous energy consumption, followed by temperature and month. Two identical feature sets containing all attributes1 were obtained by ranking the features according to correlation. Three feature sets were created manually. The first set contained seven attributes representing previous energy consumption over the course of seven days prior to the day of prediction. The second set consisted of weather and time attributes. The third set consisted of all attributes from the first and second set. These sets were then compared on different machine learning models. It was found the set containing all attributes and the set containing previous energy attributes yielded the best performance for each machine learning model. 1In this report, the words ”attribute” and ”feature” are used interchangeably.
I denna rapport undersöks korrelation och betydelsen av olika attribut för att förutspå energiförbrukning 24 timmar framåt. Attributen härstammar från tre kategorier: väder, tid och tidigare energiförbrukning. Korrelationerna tas fram genom att utföra Pearson Correlation och Mutual Information. Detta resulterade i att de högst korrelerade attributen var de som representerar tidigare energiförbrukning, följt av temperatur och månad. Två identiska attributmängder erhölls genom att ranka attributen över korrelation. Tre attributmängder skapades manuellt. Den första mängden innehåll sju attribut som representerade tidigare energiförbrukning, en för varje dag, sju dagar innan datumet för prognosen av energiförbrukning. Den andra mängden bestod av väderoch tidsattribut. Den tredje mängden bestod av alla attribut från den första och andra mängden. Dessa mängder jämfördes sedan med hjälp av olika maskininlärningsmodeller. Resultaten visade att mängden med alla attribut och den med tidigare energiförbrukning gav bäst resultat för samtliga modeller.
APA, Harvard, Vancouver, ISO, and other styles
31

Chen, I.-Chen. "Improved Methods and Selecting Classification Types for Time-Dependent Covariates in the Marginal Analysis of Longitudinal Data." UKnowledge, 2018. https://uknowledge.uky.edu/epb_etds/19.

Full text
Abstract:
Generalized estimating equations (GEE) are popularly utilized for the marginal analysis of longitudinal data. In order to obtain consistent regression parameter estimates, these estimating equations must be unbiased. However, when certain types of time-dependent covariates are presented, these equations can be biased unless an independence working correlation structure is employed. Moreover, in this case regression parameter estimation can be very inefficient because not all valid moment conditions are incorporated within the corresponding estimating equations. Therefore, approaches using the generalized method of moments or quadratic inference functions have been proposed for utilizing all valid moment conditions. However, we have found that such methods will not always provide valid inference and can also be improved upon in terms of finite-sample regression parameter estimation. Therefore, we propose a modified GEE approach and a selection method that will both ensure the validity of inference and improve regression parameter estimation. In addition, these modified approaches assume the data analyst knows the type of time-dependent covariate, although this likely is not the case in practice. Whereas hypothesis testing has been used to determine covariate type, we propose a novel strategy to select a working covariate type in order to avoid potentially high type II error rates with these hypothesis testing procedures. Parameter estimates resulting from our proposed method are consistent and have overall improved mean squared error relative to hypothesis testing approaches. Finally, for some real-world examples the use of mean regression models may be sensitive to skewness and outliers in the data. Therefore, we extend our approaches from their use with marginal quantile regression to modeling the conditional quantiles of the response variable. Existing and proposed methods are compared in simulation studies and application examples.
APA, Harvard, Vancouver, ISO, and other styles
32

Průša, Petr. "Multi-label klasifikace textových dokumentů." Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2012. http://www.nusl.cz/ntk/nusl-412872.

Full text
Abstract:
The master's thesis deals with automatic classifi cation of text document. It explains basic terms and problems of text mining. The thesis explains term clustering and shows some basic clustering algoritms. The thesis also shows some methods of classi fication and deals with matrix regression closely. Application using matrix regression for classifi cation was designed and developed. Experiments were focused on normalization and thresholding.
APA, Harvard, Vancouver, ISO, and other styles
33

Casadiego, Jose, Mor Nitzan, Sarah Hallerberg, and Marc Timme. "Model-free inference of direct network interactions from nonlinear collective dynamics." Saechsische Landesbibliothek- Staats- und Universitaetsbibliothek Dresden, 2018. http://nbn-resolving.de/urn:nbn:de:bsz:14-qucosa-232175.

Full text
Abstract:
The topology of interactions in network dynamical systems fundamentally underlies their function. Accelerating technological progress creates massively available data about collective nonlinear dynamics in physical, biological, and technological systems. Detecting direct interaction patterns from those dynamics still constitutes a major open problem. In particular, current nonlinear dynamics approaches mostly require to know a priori a model of the (often high dimensional) system dynamics. Here we develop a model-independent framework for inferring direct interactions solely from recording the nonlinear collective dynamics generated. Introducing an explicit dependency matrix in combination with a block-orthogonal regression algorithm, the approach works reliably across many dynamical regimes, including transient dynamics toward steady states, periodic and non-periodic dynamics, and chaos. Together with its capabilities to reveal network (two point) as well as hypernetwork (e.g., three point) interactions, this framework may thus open up nonlinear dynamics options of inferring direct interaction patterns across systems where no model is known.
APA, Harvard, Vancouver, ISO, and other styles
34

Mercado, Salazar Jorge Anibal, and S. M. Masud Rana. "A Confirmatory Analysis for Automating the Evaluation of Motivation Letters to Emulate Human Judgment." Thesis, Högskolan Dalarna, Institutionen för information och teknik, 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:du-37469.

Full text
Abstract:
Manually reading, evaluating, and scoring motivation letters as part of the admissions process is a time-consuming and tedious task for Dalarna University's program managers. An automated scoring system would provide them with relief as well as the ability to make much faster decisions when selecting applicants for admission. The aim of this thesis was to analyse current human judgment and attempt to emulate it using machine learning techniques. We used various topic modelling methods, such as Latent Dirichlet Allocation and Non-Negative Matrix Factorization, to find the most interpretable topics, build a bridge between topics and human-defined factors, and finally evaluate model performance by predicting scoring values and finding accuracy using logistic regression, discriminant analysis, and other classification algorithms. Despite the fact that we were able to discover the meaning of almost all human factors on our own, the topic models' accuracy in predicting overall score was unexpectedly low. Setting a threshold on overall score to select applicants for admission yielded a good overall accuracy result, but did not yield a good consistent precision or recall score. During our investigation, we attempted to determine the possible causes of these unexpected results and discovered that not only is topic modelling limitation to blame, but human bias also plays a role.
APA, Harvard, Vancouver, ISO, and other styles
35

Mai, Xiaoyi. "Méthodes des matrices aléatoires pour l’apprentissage en grandes dimensions." Thesis, Université Paris-Saclay (ComUE), 2019. http://www.theses.fr/2019SACLC078/document.

Full text
Abstract:
Le défi du BigData entraîne un besoin pour les algorithmes d'apprentissage automatisé de s'adapter aux données de grande dimension et de devenir plus efficace. Récemment, une nouvelle direction de recherche est apparue qui consiste à analyser les méthodes d’apprentissage dans le régime moderne où le nombre n et la dimension p des données sont grands et du même ordre. Par rapport au régime conventionnel où n>>p, le régime avec n,p sont grands et comparables est particulièrement intéressant, car les performances d’apprentissage dans ce régime restent sensibles à l’ajustement des hyperparamètres, ouvrant ainsi une voie à la compréhension et à l’amélioration des techniques d’apprentissage pour ces données de grande dimension.L'approche technique de cette thèse s'appuie sur des outils avancés de statistiques de grande dimension, nous permettant de mener des analyses allant au-delà de l'état de l’art. La première partie de la thèse est consacrée à l'étude de l'apprentissage semi-supervisé sur des grandes données. Motivés par nos résultats théoriques, nous proposons une alternative supérieure à la méthode semi-supervisée de régularisation laplacienne. Les méthodes avec solutions implicites, comme les SVMs et la régression logistique, sont ensuite étudiées sous des modèles de mélanges réalistes, fournissant des détails exhaustifs sur le mécanisme d'apprentissage. Plusieurs conséquences importantes sont ainsi révélées, dont certaines sont même en contradiction avec la croyance commune
The BigData challenge induces a need for machine learning algorithms to evolve towards large dimensional and more efficient learning engines. Recently, a new direction of research has emerged that consists in analyzing learning methods in the modern regime where the number n and the dimension p of data samples are commensurately large. Compared to the conventional regime where n>>p, the regime with large and comparable n,p is particularly interesting as the learning performance in this regime remains sensitive to the tuning of hyperparameters, thus opening a path into the understanding and improvement of learning techniques for large dimensional datasets.The technical approach employed in this thesis draws on several advanced tools of high dimensional statistics, allowing us to conduct more elaborate analyses beyond the state of the art. The first part of this dissertation is devoted to the study of semi-supervised learning on high dimensional data. Motivated by our theoretical findings, we propose a superior alternative to the standard semi-supervised method of Laplacian regularization. The methods involving implicit optimizations, such as SVMs and logistic regression, are next investigated under realistic mixture models, providing exhaustive details on the learning mechanism. Several important consequences are thus revealed, some of which are even in contradiction with common belief
APA, Harvard, Vancouver, ISO, and other styles
36

Casadiego, Jose, Mor Nitzan, Sarah Hallerberg, and Marc Timme. "Model-free inference of direct network interactions from nonlinear collective dynamics." Nature Publishing Group, 2017. https://tud.qucosa.de/id/qucosa%3A30728.

Full text
Abstract:
The topology of interactions in network dynamical systems fundamentally underlies their function. Accelerating technological progress creates massively available data about collective nonlinear dynamics in physical, biological, and technological systems. Detecting direct interaction patterns from those dynamics still constitutes a major open problem. In particular, current nonlinear dynamics approaches mostly require to know a priori a model of the (often high dimensional) system dynamics. Here we develop a model-independent framework for inferring direct interactions solely from recording the nonlinear collective dynamics generated. Introducing an explicit dependency matrix in combination with a block-orthogonal regression algorithm, the approach works reliably across many dynamical regimes, including transient dynamics toward steady states, periodic and non-periodic dynamics, and chaos. Together with its capabilities to reveal network (two point) as well as hypernetwork (e.g., three point) interactions, this framework may thus open up nonlinear dynamics options of inferring direct interaction patterns across systems where no model is known.
APA, Harvard, Vancouver, ISO, and other styles
37

Hanusek, Lubomír. "Míry kvality klasifikačních modelů a jejich převod." Doctoral thesis, Vysoká škola ekonomická v Praze, 2003. http://www.nusl.cz/ntk/nusl-77091.

Full text
Abstract:
Predictive power of classification models can be evaluated by various measures. The most popular measures in data mining (DM) are Gini coefficient, Kolmogorov-Smirnov statistic and lift. These measures are each based on a completely different way of calculation. If an analyst is used to one of these measures it can be difficult for him to asses the predictive power of a model evaluated by another measure. The aim of this thesis is to develop a method how to convert one performance measure into another. Even though this thesis focuses mainly on the above-mentioned measures, it deals also with other measures like sensitivity, specificity, total accuracy and area under ROC curve. During development of DM models you may need to work with a sample that is stratified by values of the target variable Y instead of working with the whole population containing millions of observations. If you evaluate a model developed on a stratified data you may need to convert these measures to the whole population. This thesis describes a way, how to carry out this conversion. A software application (CPM) enabling all these conversions makes part of this thesis. With this application you can not only convert one performance measure to another, but you can also convert measures calculated on a stratified sample to the whole population. Besides the above mentioned performance measures (sensitivity, specificity, total accuracy, Gini coefficient, Kolmogorov-Smirnov statistic), CPM will also generate confusion matrix and performance charts (lift chart, gains chart, ROC chart and KS chart). This thesis comprises the user manual to this application as well as the web address where the application can be downloaded. The theory described in this thesis was verified on the real data.
APA, Harvard, Vancouver, ISO, and other styles
38

Cavalcanti, Alexsandro Bezerra. "Aperfeiçoamento de métodos estatísticos em modelos de regressão da família exponencial." Universidade de São Paulo, 2009. http://www.teses.usp.br/teses/disponiveis/45/45133/tde-05082009-170043/.

Full text
Abstract:
Neste trabalho, desenvolvemos três tópicos relacionados a modelos de regressão da família exponencial. No primeiro tópico, obtivemos a matriz de covariância assintótica de ordem $n^$, onde $n$ é o tamanho da amostra, dos estimadores de máxima verossimilhança corrigidos pelo viés de ordem $n^$ em modelos lineares generalizados, considerando o parâmetro de precisão conhecido. No segundo tópico calculamos o coeficiente de assimetria assintótico de ordem n^{-1/2} para a distribuição dos estimadores de máxima verossimilhança dos parâmetros que modelam a média e dos parâmetros de precisão e dispersão em modelos não-lineares da família exponencial, considerando o parâmetro de dispersão desconhecido, porém o mesmo para todas as observações. Finalmente, obtivemos fatores de correção tipo-Bartlett para o teste escore em modelos não-lineares da família exponencial, considerando covariáveis para modelar o parâmetro de dispersão. Avaliamos os resultados obtidos nos três tópicos desenvolvidos por meio de estudos de simulação de Monte Carlo
In this work, we develop three topics related to the exponential family nonlinear regression. First, we obtain the asymptotic covariance matrix of order $n^$, where $n$ is the sample size, for the maximum likelihood estimators corrected by the bias of order $n^$ in generalized linear models, considering the precision parameter known. Second, we calculate an asymptotic formula of order $n^{-1/2}$ for the skewness of the distribution of the maximum likelihood estimators of the mean parameters and of the precision and dispersion parameters in exponential family nonlinear models considering that the dispersion parameter is the same although unknown for all observations. Finally, we obtain Bartlett-type correction factors for the score test in exponential family nonlinear models assuming that the precision parameter is modelled by covariates. Monte Carlo simulation studies are developed to evaluate the results obtained in the three topics.
APA, Harvard, Vancouver, ISO, and other styles
39

NÓBREGA, Caio Santos Bezerra. "Uma estratégia para predição da taxa de aprendizagem do gradiente descendente para aceleração da fatoração de matrizes." Universidade Federal de Campina Grande, 2014. http://dspace.sti.ufcg.edu.br:8080/jspui/handle/riufcg/362.

Full text
Abstract:
Submitted by Johnny Rodrigues (johnnyrodrigues@ufcg.edu.br) on 2018-04-11T14:50:08Z No. of bitstreams: 1 CAIO SANTOS BEZERRA NÓBREGA - DISSERTAÇÃO PPGCC 2014..pdf: 983246 bytes, checksum: 5eca7651706ce317dc514ec2f1aa10c3 (MD5)
Made available in DSpace on 2018-04-11T14:50:08Z (GMT). No. of bitstreams: 1 CAIO SANTOS BEZERRA NÓBREGA - DISSERTAÇÃO PPGCC 2014..pdf: 983246 bytes, checksum: 5eca7651706ce317dc514ec2f1aa10c3 (MD5) Previous issue date: 2014-07-30
Capes
Sugerir os produtos mais apropriados aos diversos tipos de consumidores não é uma tarefa trivial, apesar de ser um fator chave para aumentar satisfação e lealdade destes. Devido a esse fato, sistemas de recomendação têm se tornado uma ferramenta importante para diversas aplicações, tais como, comércio eletrônico, sites personalizados e redes sociais. Recentemente, a fatoração de matrizes se tornou a técnica mais bem sucedida de implementação de sistemas de recomendação. Os parâmetros do modelo de fatoração de matrizes são tipicamente aprendidos por meio de métodos numéricos, tal como o gradiente descendente. O desempenho do gradiente descendente está diretamente relacionada à configuração da taxa de aprendizagem, a qual é tipicamente configurada para valores pequenos, com o objetivo de não perder um mínimo local. Consequentemente, o algoritmo pode levar várias iterações para convergir. Idealmente,é desejada uma taxa de aprendizagem que conduza a um mínimo local nas primeiras iterações, mas isto é muito difícil de ser realizado dada a alta complexidade do espaço de valores a serem pesquisados. Começando com um estudo exploratório em várias bases de dados de sistemas de recomendação, observamos que, para a maioria das bases, há um padrão linear entre a taxa de aprendizagem e o número de iterações necessárias para atingir a convergência. A partir disso, propomos utilizar modelos de regressão lineares simples para predizer, para uma base de dados desconhecida, um bom valor para a taxa de aprendizagem inicial. A ideia é estimar uma taxa de aprendizagem que conduza o gradiente descendenteaummínimolocalnasprimeirasiterações. Avaliamosnossatécnicaem8bases desistemasderecomendaçãoreaisecomparamoscomoalgoritmopadrão,oqualutilizaum valorfixoparaataxadeaprendizagem,ecomtécnicasqueadaptamataxadeaprendizagem extraídas da literatura. Nós mostramos que conseguimos reduzir o número de iterações até em 40% quando comparados à abordagem padrão.
Suggesting the most suitable products to different types of consumers is not a trivial task, despite being a key factor for increasing their satisfaction and loyalty. Due to this fact, recommender systems have be come an important tool for many applications, such as e-commerce, personalized websites and social networks. Recently, Matrix Factorization has become the most successful technique to implement recommendation systems. The parameters of this model are typically learned by means of numerical methods, like the gradient descent. The performance of the gradient descent is directly related to the configuration of the learning rate, which is typically set to small values, in order to do not miss a local minimum. As a consequence, the algorithm may take several iterations to converge. Ideally, one wants to find a learning rate that will lead to a local minimum in the early iterations, but this is very difficult to achieve given the high complexity of search space. Starting with an exploratory study on several recommendation systems datasets, we observed that there is an over all linear relationship between the learnin grate and the number of iterations needed until convergence. From this, we propose to use simple linear regression models to predict, for a unknown dataset, a good value for an initial learning rate. The idea is to estimate a learning rate that drives the gradient descent as close as possible to a local minimum in the first iteration. We evaluate our technique on 8 real-world recommender datasets and compared it with the standard Matrix Factorization learning algorithm, which uses a fixed value for the learning rate over all iterations, and techniques fromt he literature that adapt the learning rate. We show that we can reduce the number of iterations until at 40% compared to the standard approach.
APA, Harvard, Vancouver, ISO, and other styles
40

Ferreira, do Nascimento Melo da Silva Tatiane. "Estimação do posto da matriz dos parâmetros do modelo de regressão Dirichlet." Universidade Federal de Pernambuco, 2004. https://repositorio.ufpe.br/handle/123456789/6586.

Full text
Abstract:
Made available in DSpace on 2014-06-12T18:06:16Z (GMT). No. of bitstreams: 2 arquivo7254_1.pdf: 833559 bytes, checksum: 75784cd8f4e56be9d720245b23538da4 (MD5) license.txt: 1748 bytes, checksum: 8a4605be74aa9ea9d79846c1fba20a33 (MD5) Previous issue date: 2004
Coordenação de Aperfeiçoamento de Pessoal de Nível Superior
O modelo de regressão Dirichlet é útil, por exemplo, na modelagem de taxas e proporções, onde a soma das componentes de cada vetor de observações é igual a um. Os coeficientes deste modelo de regressão constituem uma matriz. Se esta matriz não tem posto completo, ou seja, se alguns de seus elementos podem ser escritos como combinações lineares de outros, então a quantidade de parâmetros do modelo a serem estimados é menor. Nosso objetivo é estimar o posto desta matriz de parâmetros, utilizando uma estatística de teste proposta por Ratsimalahelo (2003), através do procedimento de teste sequencial e dos critérios de informação BIC (Bayesian Information Criteria) e HQIC (Hannan Quinn Information Criteria). Em seguida, avaliamos o desempenho dos estimadores do posto da matriz de coeficientes, baseados nestes procedimentos. Neste trabalho consideramos dois modelos de regressão Dirichlet. Através dos resultados de simulação de Monte Carlo, observamos que quando utilizamos o procedimento de teste sequencial, para estimar o posto da matriz de coefiientes dos modelos de regressão Dirichlet, o desempenho dos estimadores, em geral, é melhor em termos de viés e erro quadrático médio do que quando utilizamos os critérios de informação BIC e HQIC
APA, Harvard, Vancouver, ISO, and other styles
41

Cardoso, Alexandre Bruni. "Envolvimento de metaloproteinases de matriz no desenvolvimento e na regressão da prostata ventral de roedores." [s.n.], 2010. http://repositorio.unicamp.br/jspui/handle/REPOSIP/317558.

Full text
Abstract:
Orientador: Hernandes Faustino de Carvalho
Tese (doutorado) - Universidade Estadual de Campinas, Instituto de Biologia
Made available in DSpace on 2018-08-15T10:03:30Z (GMT). No. of bitstreams: 1 Cardoso_AlexandreBruni_D.pdf: 5087353 bytes, checksum: c88d322680c216b0a0f0c99aef894042 (MD5) Previous issue date: 2010
Resumo: Importante glândula acessória do trato reprodutor de mamíferos, a próstata é um órgão alvo de várias doenças benignas e malignas, que ocorrem principalmente com o envelhecimento. Tanto o desenvolvimento prostático pós-natal como a regressão da glândula após a ablação hormonal são caracterizados por intensa modificação no comportamento das células e remodelação da matriz extracelular (MEC). As metaloproteinases de matriz (MMP) constituem uma família de enzimas que degradam principalmente componentes de MEC. Portanto, pareceu-nos plausível que as MMPs tenham papel crucial na remodelação tecidual que ocorre em decorrência dos eventos morfogenéticos da próstata ventral (PV) e na progressiva regressão prostática pós-castração. Assim, o objetivo desse trabalho foi investigar o papel da MMP-2 no desenvolvimento prostático pós-natal em roedores e das MMP-2, -7 e - 9 na regressão da PV de ratos pós-castração. Para isso, foram empregadas técnicas moleculares, bioquímicas e análises morfológicas. A aplicação do siRNA específico para MMP-2 comprometeu o crescimento, a ramificação, a formação de lúmen e a proliferação de células epiteliais da PV de ratos in vitro, além de provocar um acúmulo de fibras colagênicas no compartimento estromal. A PV do camundongo MMP-2-/- adultos apresentou peso relativo reduzido e um menor volume epitelial, que resultaram de menor proliferação epitelial, menor ramificação ductal e maior estabilização da matriz colagênica durante a primeira semana de desenvolvimento pós-natal. Na regressão da próstata ventral de ratos após a castração encontraram-se múltiplas ondas de morte celular e uma relação direta entre a expressão e atividade das MMP-2, -7 e -9 e o pico de apoptose que ocorre 11 dias após a castração. Conclui-se através dos resultados apresentados neste trabalho, que tanto o desenvolvimento prostático pós-natal, como a regressão prostática pós-castração são dependentes da expressão e atividade das MMPs.
Abstract: The prostate is an important gland of the reproductive tract of mammals, which is a target of several benign and malign diseases affecting the elder. Both postnatal prostate development and prostate regression after androgenic ablation are characterized by intense modification in cell behavior and remodeling of extracellular matrix (ECM). MMPs constitute a family of endopeptidases which are able to cleave preferentially ECM components. Thus, it seems reasonable that these enzymes play a crucial role in tissue remodeling that happens during the ventral prostate (VP) morphogenesis and in the prostate regression after castration. In this study, we aimed to define the involvment of MMP-2 in the postnatal prostate development of rodent and the involvement of MMP-2, -7 and -9 rat ventral prostate regression after castration. For this aim, we have used molecular, biochemical and morphological approaches. siRNA specific for MMP-2 compromised the rat VP growth, branching, lumen formation and epithelial cell proliferation, besides leading an accumulation of collagen fibers in the stroma. MMP-2-/- VP showed a reduced relative weight and epithelial volume, besides displaying a decreased epithelial proliferation and branching and a stabilization of collagen matrix at the end of the first postnatal week. In the prostate regression after castration, we found multiple waves of cell death and a direct association between activity and expression of MMP-2, -7 and -9 and an apoptotic peak that occurs at the 11th Day after castration. In conclusion, the results presented here showed that both postnatal prostate development and prostate regression after castration are dependent on the expression and activity of MMPs.
Doutorado
Biologia Celular
Doutor em Biologia Celular e Estrutural
APA, Harvard, Vancouver, ISO, and other styles
42

Fridgeirsdottir, Gudrun A. "The development of a multiple linear regression model for aiding formulation development of solid dispersions." Thesis, University of Nottingham, 2018. http://eprints.nottingham.ac.uk/52176/.

Full text
Abstract:
As poor solubility continues to be problem for new chemical entities (NCEs) in medicines development the use and interest in solid dispersions as a formulation-based solution has grown. Solid dispersions, where a drug is typically dispersed in a molecular state within an amorphous water-soluble polymer, present a good strategy to significantly enhance the effective drug solubility and hence bioavailability of drugs. The main drawback of this formulation strategy is the inherent instability of the amorphous form. With the right choice of polymer and manufacturing method, sufficient stability can be accomplished. However, finding the right combination of carrier and manufacturing method can be challenging, being labour, time and material costly. Therefore, a knowledge based support tool based upon a statistically significant data set to help with the formulation process would be of great value in the pharmaceutical industry. Here, 60 solid dispersion formulations were produced using ten, poorly soluble, chemically diverse APIs, three commonly used polymers and two manufacturing methods (spray drying and hot-melt extrusion). A long term stability study, up to one year, was performed on all formulations at accelerated conditions. Samples were regularly checked for the onset of crystallisation during the period, using mainly, polarised light microscopy. The stability data showed a large variance in stability between, methods, polymers and APIs. No obvious trends could be observed. Using statistical modelling, the experimental data in combination with calculated and predicted physicochemical properties of the APIs, several multiple linear regression (MLR) models were built. These had a good adjusted R2 and most showed good predictability in leave-one-out cross validations. Additionally, a validation on half of the models (eg. those based on spray-drying models) using an external dataset showed excellent predictability, with the correct ranking of formulations and accurate prediction of stability. In conclusion, this work has provided important insight into the complex correlations between the physical stability of amorphous solid dispersions and factors such as manufacturing method, carrier and properties of the API. Due to the expansive number of formulations studied here, which is far greater than previously published in the literature in a single study, more general conclusions can be drawn about these correlations than has previously been possible. This thesis has shown the potential of using well-founded statistical models in the formulation development of solid dispersion and given more insight into the complexity of these systems and how stability of these is dependent on multiple factors.
APA, Harvard, Vancouver, ISO, and other styles
43

Reynaldo, Cristiane. "Regressão "Ridge" : um metodo alternativo para o mal condicionamento da matriz das regressoras." [s.n.], 1997. http://repositorio.unicamp.br/jspui/handle/REPOSIP/306421.

Full text
Abstract:
Orientador: Reinaldo Charnet
Dissertação (mestrado) - Universidade Estadual de Campinas, Instituto de Matematica, Estatistica e Computação Cientifica
Made available in DSpace on 2018-07-23T01:02:58Z (GMT). No. of bitstreams: 1 Reynaldo_Cristiane_M.pdf: 5575729 bytes, checksum: 4b593e80e6808d756d592b909b7f9c70 (MD5) Previous issue date: 1997
Resumo: Nas análises de regressão linear múltipla existem muitas situações onde o mal condicionamento da matriz das regressoras está presente. De forma geral, o que se costuma fazer é eliminar uma das variáveis do modelo de regressão. Entretanto, supomos que este processo já foi realizado e o mal condicionamento ainda permanece. Essa situações não é ilusória uma vez que existem muitos exemplos em dados econômicos. Assim, sugerimos a regressão "ridge" como um método alternativo. Existem várias maneiras de se obter os estimadores "ridge", aqui, fornecemos algumas delas. Portanto, o objetivo deste trabalho é comparar os estimadores "ridge" e mostrar suas vantagens sobre os estimadores de mínimos quadrados, quando os dados estão mal condicionados.
Abstract: Not informed.
Mestrado
Mestre em Estatística
APA, Harvard, Vancouver, ISO, and other styles
44

MIGLIAVACCA, ELDER. "Modelagem do desempenho separativo de ultracentrifugas por regressao multivariada com matriz de covariancia." reponame:Repositório Institucional do IPEN, 2004. http://repositorio.ipen.br:8080/xmlui/handle/123456789/11155.

Full text
Abstract:
Made available in DSpace on 2014-10-09T12:48:58Z (GMT). No. of bitstreams: 0
Made available in DSpace on 2014-10-09T14:05:51Z (GMT). No. of bitstreams: 1 09624.pdf: 7500503 bytes, checksum: 0fbfc877328a1063a37709866b6cefdc (MD5)
Dissertacao (Mestrado)
IPEN/D
Instituto de Pesquisas Energeticas e Nucleares - IPEN/CNEN-SP
APA, Harvard, Vancouver, ISO, and other styles
45

Stecenková, Marina. "Srovnání vybraných klasifikačních metod pro vícerozměrná data." Master's thesis, Vysoká škola ekonomická v Praze, 2012. http://www.nusl.cz/ntk/nusl-124516.

Full text
Abstract:
The aim of this thesis is comparison of selected classification methods which are logistic regression (binary and multinominal), multilayer perceptron and classification trees, CHAID and CRT. The first part is reminiscent of the theoretical basis of these methods and explains the nature of parameters of the models. The next section applies the above classification methods to the six data sets and then compares the outputs of these methods. Particular emphasis is placed on the discriminatory power rating models, which a separate chapter is devoted to. Rating discriminatory power of the model is based on the overall accuracy, F-measure and size of the area under the ROC curve. The benefit of this work is not only a comparison of selected classification methods based on statistical models evaluating discriminatory power, but also an overview of the strengths and weaknesses of each method.
APA, Harvard, Vancouver, ISO, and other styles
46

Kalender, Emre. "Parametric Estimation Of Clutter Autocorrelation Matrix For Ground Moving Target Indication." Master's thesis, METU, 2013. http://etd.lib.metu.edu.tr/upload/12615313/index.pdf.

Full text
Abstract:
In airborne radar systems with Ground Moving Target Indication (GMTI) mode, it is desired to detect the presence of targets in the interference consisting of noise, ground clutter, and jamming signals. These interference components usually mask the target return signal, such that the detection requires suppression of the interference signals. Space-time adaptive processing is a widely used interference suppression technique which uses temporal and spatial information to eliminate the effects of clutter and jamming and enables the detection of moving targets with small radial velocity. However, adaptive estimation of the interference requires high computation capacity as well as large secondary sample data support. The available secondary range cells may be fewer than required due to non-homogeneity problems and computational capacity of the radar system may not be sufficient for the computations required. In order to reduce the computational load and the required number of secondary data for estimation, parametric methods use a priori information on the structure of the clutter covariance matrix. Space Time Auto-regressive (STAR) filtering, which is a parametric adaptive method, and full parametric model-based approaches for interference suppression are proposed as alternatives to STAP in the literature. In this work, space time auto-regressive filtering and model-based GMTI approaches are investigated. Performance of these approaches are evaluated by both simulated and flight test data and compared with the performance of sample matrix inversion space time adaptive processing.
APA, Harvard, Vancouver, ISO, and other styles
47

Tomek, Peter. "Approximation of Terrain Data Utilizing Splines." Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2012. http://www.nusl.cz/ntk/nusl-236488.

Full text
Abstract:
Pro optimalizaci letových trajektorií ve velmi malé nadmorské výšce, terenní vlastnosti musí být zahrnuty velice přesne. Proto rychlá a efektivní evaluace terenních dat je velice důležitá vzhledem nato, že čas potrebný pro optimalizaci musí být co nejkratší. Navyše, na optimalizaci letové trajektorie se využívájí metody založené na výpočtu gradientu. Proto musí být aproximační funkce terenních dat spojitá do určitého stupne derivace. Velice nádejná metoda na aproximaci terenních dat je aplikace víceroměrných simplex polynomů. Cílem této práce je implementovat funkci, která vyhodnotí dané terenní data na určitých bodech spolu s gradientem pomocí vícerozměrných splajnů. Program by měl vyčíslit více bodů najednou a měl by pracovat v $n$-dimensionálním prostoru.
APA, Harvard, Vancouver, ISO, and other styles
48

Aoki, Reiko. "Uma possivel solução para o problema de mal condicionamento da matriz do modelo de regressão." [s.n.], 1992. http://repositorio.unicamp.br/jspui/handle/REPOSIP/307058.

Full text
Abstract:
Orientador : Euclydes Custodio Lima Filho
Dissertação (mestrado) - Universidade Estadual de Campinas, Instituto de Matematica, Estatistica e Computação Cientifica
Made available in DSpace on 2018-07-18T08:18:45Z (GMT). No. of bitstreams: 1 Aoki_Reiko_M.pdf: 1136090 bytes, checksum: 97731b7b71e8a97d18ebf6356157f5be (MD5) Previous issue date: 1992
Resumo: Não informado.
Abstract: Not informed.
Mestrado
Mestre em Estatística
APA, Harvard, Vancouver, ISO, and other styles
49

Tomaya, Lorena Yanet Cáceres. "Inferência em modelos de regressão com erros de medição sob enfoque estrutural para observações replicadas." Universidade Federal de São Carlos, 2014. https://repositorio.ufscar.br/handle/ufscar/4584.

Full text
Abstract:
Made available in DSpace on 2016-06-02T20:06:10Z (GMT). No. of bitstreams: 1 6069.pdf: 3171774 bytes, checksum: a737da63d3ddeb0d44dfc38839337d42 (MD5) Previous issue date: 2014-03-10
Financiadora de Estudos e Projetos
The usual regression model fits data under the assumption that the explanatory variable is measured without error. However, in many situations the explanatory variable is observed with measurement errors. In these cases, measurement error models are recommended. We study a structural measurement error model for replicated observations. Estimation of parameters of the proposed models was obtained by the maximum likelihood and maximum pseudolikelihood methods. The behavior of the estimators was assessed in a simulation study with different numbers of replicates. Moreover, we proposed the likelihood ratio test, Wald test, score test, gradient test, Neyman's C test and pseudolikelihood ratio test in order to test hypotheses of interest related to the parameters. The proposed test statistics are assessed through a simulation study. Finally, the model was fitted to a real data set comprising measurements of concentrations of chemical elements in samples of Egyptian pottery. The computational implementation was developed in R language.
Um dos procedimentos usuais para estudar uma relação entre variáveis é análise de regressão. O modelo de regressão usual ajusta os dados sob a suposição de que as variáveis explicativas são medidas sem erros. Porém, em diversas situações as variáveis explicativas apresentam erros de medição. Nestes casos são utilizados os modelos com erros de medição. Neste trabalho estudamos um modelo estrutural com erros de medição para observações replicadas. A estimação dos parâmetros dos modelos propostos foi efetuada pelos métodos de máxima verossimilhança e de máxima pseudoverossimilhança. O comportamento dos estimadores de alguns parâmetros foi analisado por meio de simulações para diferentes números de réplicas. Além disso, são propostos o teste da razão de verossimilhanças, o teste de Wald, o teste escore, o teste gradiente, o teste C de Neyman e o teste da razão de pseudoverossimilhanças com o objetivo de testar algumas hipóteses de interesse relacionadas aos parâmetros. As estatísticas propostas são avaliadas por meio de simulações. Finalmente, o modelo foi ajustado a um conjunto de dados reais referentes a medições de concentrações de elementos químicos em amostras de cerâmicas egípcias. A implementação computacional foi desenvolvida em linguagem R.
APA, Harvard, Vancouver, ISO, and other styles
50

Durif, Ghislain. "Multivariate analysis of high-throughput sequencing data." Thesis, Lyon, 2016. http://www.theses.fr/2016LYSE1334/document.

Full text
Abstract:
L'analyse statistique de données de séquençage à haut débit (NGS) pose des questions computationnelles concernant la modélisation et l'inférence, en particulier à cause de la grande dimension des données. Le travail de recherche dans ce manuscrit porte sur des méthodes de réductions de dimension hybrides, basées sur des approches de compression (représentation dans un espace de faible dimension) et de sélection de variables. Des développements sont menés concernant la régression "Partial Least Squares" parcimonieuse (supervisée) et les méthodes de factorisation parcimonieuse de matrices (non supervisée). Dans les deux cas, notre objectif sera la reconstruction et la visualisation des données. Nous présenterons une nouvelle approche de type PLS parcimonieuse, basée sur une pénalité adaptative, pour la régression logistique. Cette approche sera utilisée pour des problèmes de prédiction (devenir de patients ou type cellulaire) à partir de l'expression des gènes. La principale problématique sera de prendre en compte la réponse pour écarter les variables non pertinentes. Nous mettrons en avant le lien entre la construction des algorithmes et la fiabilité des résultats.Dans une seconde partie, motivés par des questions relatives à l'analyse de données "single-cell", nous proposons une approche probabiliste pour la factorisation de matrices de comptage, laquelle prend en compte la sur-dispersion et l'amplification des zéros (caractéristiques des données single-cell). Nous développerons une procédure d'estimation basée sur l'inférence variationnelle. Nous introduirons également une procédure de sélection de variables probabiliste basée sur un modèle "spike-and-slab". L'intérêt de notre méthode pour la reconstruction, la visualisation et le clustering de données sera illustré par des simulations et par des résultats préliminaires concernant une analyse de données "single-cell". Toutes les méthodes proposées sont implémentées dans deux packages R: plsgenomics et CMF
The statistical analysis of Next-Generation Sequencing data raises many computational challenges regarding modeling and inference, especially because of the high dimensionality of genomic data. The research work in this manuscript concerns hybrid dimension reduction methods that rely on both compression (representation of the data into a lower dimensional space) and variable selection. Developments are made concerning: the sparse Partial Least Squares (PLS) regression framework for supervised classification, and the sparse matrix factorization framework for unsupervised exploration. In both situations, our main purpose will be to focus on the reconstruction and visualization of the data. First, we will present a new sparse PLS approach, based on an adaptive sparsity-inducing penalty, that is suitable for logistic regression to predict the label of a discrete outcome. For instance, such a method will be used for prediction (fate of patients or specific type of unidentified single cells) based on gene expression profiles. The main issue in such framework is to account for the response to discard irrelevant variables. We will highlight the direct link between the derivation of the algorithms and the reliability of the results. Then, motivated by questions regarding single-cell data analysis, we propose a flexible model-based approach for the factorization of count matrices, that accounts for over-dispersion as well as zero-inflation (both characteristic of single-cell data), for which we derive an estimation procedure based on variational inference. In this scheme, we consider probabilistic variable selection based on a spike-and-slab model suitable for count data. The interest of our procedure for data reconstruction, visualization and clustering will be illustrated by simulation experiments and by preliminary results on single-cell data analysis. All proposed methods were implemented into two R-packages "plsgenomics" and "CMF" based on high performance computing
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography