To see the other types of publications on this topic, follow the link: Variable sample size.

Dissertations / Theses on the topic 'Variable sample size'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 31 dissertations / theses for your research on the topic 'Variable sample size.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.

1

Nataša, Krklec Jerinkić. "Line search methods with variable sample size." Phd thesis, Univerzitet u Novom Sadu, Prirodno-matematički fakultet u Novom Sadu, 2014. http://dx.doi.org/10.2298/NS20140117KRKLEC.

Full text
Abstract:
The problem under consideration is an unconstrained optimization&nbsp;problem with the objective function in the form of mathematical ex-pectation. The expectation is with respect to the random variable that represents the uncertainty. Therefore, the objective &nbsp;function is in fact deterministic. However, nding the analytical form of that objective function can be very dicult or even impossible. This is the reason why the sample average approximation is often used. In order to obtain reasonable good approximation of the objective function, we have to use relatively large sample size. We assume that the sample is generated at the beginning of the optimization process and therefore we can consider this sample average objective function as the deterministic one. However, applying some deterministic method on that sample average function from the start can be very costly. The number of evaluations of the function under expectation is a common way of measuring the cost of an algorithm. Therefore, methods that vary the sample size throughout the optimization process are developed. Most of them are trying to determine the optimal dynamics of increasing the sample size.The main goal of this thesis is to develop the clas of methods that&nbsp;can decrease the cost of an algorithm by decreasing the number of&nbsp;function evaluations. The idea is to decrease the sample size whenever&nbsp;it seems to be reasonable - roughly speaking, we do not want to impose&nbsp;a large precision, i.e. a large sample size when we are far away from the&nbsp;solution we search for. The detailed description of the new methods&nbsp;is presented in Chapter 4 together with the convergence analysis. It&nbsp;is shown that the approximate solution is of the same quality as the&nbsp;one obtained by dealing with the full sample from the start.Another important characteristic of the methods that are proposed&nbsp;here is the line search technique which is used for obtaining the sub-sequent iterates. The idea is to nd a suitable direction and to search&nbsp;along it until we obtain a sucient decrease in the &nbsp;function value. The&nbsp;sucient decrease is determined throughout the line search rule. In&nbsp;Chapter 4, that rule is supposed to be monotone, i.e. we are imposing&nbsp;strict decrease of the function value. In order to decrease the cost of&nbsp;the algorithm even more and to enlarge the set of suitable search directions, we use nonmonotone line search rules in Chapter 5. Within that chapter, these rules are modied to t the variable sample size framework. Moreover, the conditions for the global convergence and the R-linear rate are presented.&nbsp;In Chapter 6, numerical results are presented. The test problems&nbsp;are various - some of them are academic and some of them are real&nbsp;world problems. The academic problems are here to give us more&nbsp;insight into the behavior of the algorithms. On the other hand, data&nbsp;that comes from the real world problems are here to test the real&nbsp;applicability of the proposed algorithms. In the rst part of that&nbsp;chapter, the focus is on the variable sample size techniques. Different&nbsp;implementations of the proposed algorithm are compared to each other&nbsp;and to the other sample schemes as well. The second part is mostly&nbsp;devoted to the comparison of the various line search rules combined&nbsp;with dierent search directions in the variable sample size framework.&nbsp;The overall numerical results show that using the variable sample size&nbsp;can improve the performance of the algorithms signicantly, especially&nbsp;when the nonmonotone line search rules are used.The rst chapter of this thesis provides the background material&nbsp;for the subsequent chapters. In Chapter 2, basics of the nonlinear&nbsp;optimization are presented and the focus is on the line search, while&nbsp;Chapter 3 deals with the stochastic framework. These chapters are&nbsp;here to provide the review of the relevant known results, while the&nbsp;rest of the thesis represents the original contribution.&nbsp;<br>U okviru ove teze posmatra se problem optimizacije bez ograničenja pri čcemu je funkcija cilja u formi matematičkog očekivanja. Očekivanje se odnosi na slučajnu promenljivu koja predstavlja neizvesnost. Zbog toga je funkcija cilja, u stvari, deterministička veličina. Ipak, odredjivanje analitičkog oblika te funkcije cilja može biti vrlo komplikovano pa čak i nemoguće. Zbog toga se za aproksimaciju često koristi uzoračko očcekivanje. Da bi se postigla dobra aproksimacija, obično je neophodan obiman uzorak. Ako pretpostavimo da se uzorak realizuje pre početka procesa optimizacije, možemo posmatrati uzoračko očekivanje kao determinističku funkciju. Medjutim, primena nekog od determinističkih metoda direktno na tu funkciju&nbsp; moze biti veoma skupa jer evaluacija funkcije pod ocekivanjem često predstavlja veliki tro&scaron;ak i uobičajeno je da se ukupan tro&scaron;ak optimizacije meri po broju izračcunavanja funkcije pod očekivanjem. Zbog toga su razvijeni metodi sa promenljivom veličinom uzorka. Većcina njih je bazirana na odredjivanju optimalne dinamike uvećanja uzorka.Glavni cilj ove teze je razvoj algoritma koji, kroz smanjenje broja izračcunavanja funkcije, smanjuje ukupne tro&scaron;skove optimizacije. Ideja je da se veličina uzorka smanji kad god je to moguće. Grubo rečeno, izbegava se koriscenje velike preciznosti&nbsp; (velikog uzorka) kada smo daleko od re&scaron;senja. U čcetvrtom poglavlju ove teze opisana je nova klasa metoda i predstavljena je analiza konvergencije. Dokazano je da je aproksimacija re&scaron;enja koju dobijamo bar toliko dobra koliko i za metod koji radi sa celim uzorkom sve vreme.Jo&scaron; jedna bitna karakteristika metoda koji su ovde razmatrani je primena linijskog pretražzivanja u cilju odredjivanja naredne iteracije. Osnovna ideja je da se nadje odgovarajući pravac i da se duž njega vr&scaron;si pretraga za dužzinom koraka koja će dovoljno smanjiti vrednost funkcije. Dovoljno smanjenje je odredjeno pravilom linijskog pretraživanja. U čcetvrtom poglavlju to pravilo je monotono &scaron;to znači da zahtevamo striktno smanjenje vrednosti funkcije. U cilju jos većeg smanjenja tro&scaron;kova optimizacije kao i pro&scaron;irenja skupa pogodnih pravaca, u petom poglavlju koristimo nemonotona pravila linijskog pretraživanja koja su modifikovana zbog promenljive velicine uzorka. Takodje, razmatrani su uslovi za globalnu konvergenciju i R-linearnu brzinu konvergencije.Numerički rezultati su predstavljeni u &scaron;estom poglavlju. Test problemi su razliciti - neki od njih su akademski, a neki su realni. Akademski problemi su tu da nam daju bolji uvid u pona&scaron;anje algoritama. Sa druge strane, podaci koji poticu od stvarnih problema služe kao pravi test za primenljivost pomenutih algoritama. U prvom delu tog poglavlja akcenat je na načinu ažuriranja veličine uzorka. Različite varijante metoda koji su ovde predloženi porede se medjusobno kao i sa drugim &scaron;emama za ažuriranje veličine uzorka. Drugi deo poglavlja pretežno je posvećen poredjenju različitih pravila linijskog pretraživanja sa različitim pravcima pretraživanja u okviru promenljive veličine uzorka. Uzimajuci sve postignute rezultate u obzir dolazi se do zaključcka da variranje veličine uzorka može značajno popraviti učinak algoritma, posebno ako se koriste nemonotone metode linijskog pretraživanja.U prvom poglavlju ove teze opisana je motivacija kao i osnovni pojmovi potrebni za praćenje preostalih poglavlja. U drugom poglavlju je iznet pregled osnova nelinearne optimizacije sa akcentom na metode linijskog pretraživanja, dok su u trećem poglavlju predstavljene osnove stohastičke optimizacije. Pomenuta poglavlja su tu radi pregleda dosada&scaron;njih relevantnih rezultata dok je originalni doprinos ove teze predstavljen u poglavljima 4-6.
APA, Harvard, Vancouver, ISO, and other styles
2

Oymak, Okan. "Sample size determination for estimation of sensor detection probabilities based on a test variable." Thesis, Monterey, Calif. : Naval Postgraduate School, 2007. http://bosun.nps.edu/uhtbin/hyperion-image.exe/07Jun%5FOymak.pdf.

Full text
Abstract:
Thesis (M.S. in Operations Research)--Naval Postgraduate School, June 2007.<br>Thesis Advisor(s): Lyn R. Whitaker. "June 2007." Includes bibliographical references (p. 95-96). Also available in print.
APA, Harvard, Vancouver, ISO, and other styles
3

Andrea, Rožnjik. "Optimizacija problema sa stohastičkim ograničenjima tipa jednakosti – kazneni metodi sa promenljivom veličinom uzorka." Phd thesis, Univerzitet u Novom Sadu, Prirodno-matematički fakultet u Novom Sadu, 2019. https://www.cris.uns.ac.rs/record.jsf?recordId=107819&source=NDLTD&language=en.

Full text
Abstract:
U disertaciji je razmatran problem stohastičkog programiranja s ograničenjima tipa jednakosti, odnosno problem minimizacije s ograničenjima koja su u obliku matematičkog očekivanja. Za re&scaron;avanje posmatranog problema kreirana su dva iterativna postupka u kojima se u svakoj iteraciji računa s uzoračkim očekivanjem kao aproksimacijom matematičkog očekivanja. Oba postupka koriste prednosti postupaka s promenljivom veličinom uzorka zasnovanih na adaptivnom ažuriranju veličine uzorka. To znači da se veličina uzorka određuje na osnovu informacija u tekućoj iteraciji. Konkretno, tekuće informacije o preciznosti aproksimacije očekivanja i tačnosti aproksimacije re&scaron;enja problema defini&scaron;u veličinu uzorka za narednu iteraciju. Oba iterativna postupka su zasnovana na linijskom pretraživanju, a kako je u pitanju problem s ograničenjima, i na kvadratnom kaznenom postupku prilagođenom stohastičkom okruženju. Postupci su zasnovani na istim idejama, ali s različitim pristupom.Po prvom pristupu postupak je kreiran za re&scaron;avanje SAA reformulacije problema stohastičkog programiranja, dakle za re&scaron;avanje aproksimacije originalnog problema. To znači da je uzorak definisan pre iterativnog postupka, pa je analiza konvergencije algoritma deterministička. Pokazano je da se, pod standardnim pretpostavkama, navedenim algoritmom dobija podniz iteracija čija je tačka nagomilavanja KKT tačka SAA reformulacije.Po drugom pristupu je formiran algoritam za re&scaron;avanje samog problemastohastičkog programiranja, te je analiza konvergencije stohastička. Predstavljenim algoritmom se generi&scaron;e podniz iteracija čija je tačka nagomilavanja, pod standardnim pretpostavkama za stohastičku optimizaciju, skoro sigurnoKKT tačka originalnog problema.Predloženi algoritmi su implementirani na istim test problemima. Rezultati numeričkog testiranja prikazuju njihovu efikasnost u re&scaron;avanju posmatranih problema u poređenju s postupcima u kojima je ažuriranje veličine uzorkazasnovano na unapred definisanoj &scaron;emi. Za meru efikasnosti je upotrebljenbroj izračunavanja funkcija. Dakle, na osnovu rezultata dobijenih na skuputestiranih problema može se zaključiti da se adaptivnim ažuriranjem veličineuzorka može u&scaron;tedeti u broju evaluacija funkcija kada su u pitanju i problemi sograničenjima.Kako je posmatrani problem deterministički, a formulisani postupci su stohastički, prva tri poglavlja disertacije sadrže osnovne pojmove determinističkei stohastiˇcke optimizacije, ali i kratak pregled definicija i teorema iz drugihoblasti potrebnih za lak&scaron;e praćenje analize originalnih rezultata. Nastavak disertacije čini prikaz formiranih algoritama, analiza njihove konvergencije i numerička implementacija.&nbsp;<br>Stochastic programming problem with equality constraints is considered within thesis. More precisely, the problem is minimization problem with constraints in the form of mathematical expectation. We proposed two iterative methods for solving considered problem. Both procedures, in each iteration, use a sample average function instead of the mathematical expectation function, and employ the advantages of the variable sample size method based on adaptive updating the sample size. That means, the sample size is determined at every iteration using information from the current iteration. Concretely, the current precision of the approximation of expectation and the quality of the approximation of solution determine the sample size for the next iteration. Both iterative procedures are based on the line search technique as well as on the quadratic penalty method adapted to stochastic environment, since the considered problem has constraints. Procedures relies on same ideas, but the approach is different.By first approach, the algorithm is created for solving an SAA reformulation of the stochastic programming problem, i.e., for solving the approximation of the original problem. That means the sample size is determined before the iterative procedure, so the convergence analyses is deterministic. We show that, under the standard assumptions, the proposed algorithm generates a subsequence which accumulation point is the KKT point of the SAA problem. Algorithm formed by the second approach is for solving the stochastic programming problem, and therefore the convergence analyses is stochastic. It generates a subsequence with&nbsp; accumulation point that is almost surely the KKT point of the original problem, under the standard assumptions for stochastic optimization.for sample size. The number of function evaluations is used as measure of efficiency. Results of the set of tested problems suggest that it is possible to make smaller number of function evaluations by adaptive sample size scheduling in the case of constrained problems, too.Since the considered problem is deterministic, but the formed procedures are stochastic, the first three chapters of thesis contain basic notations of deterministic and stochastic optimization, as well as a short sight of definitions and theorems from another fields necessary for easier tracking the original results analysis. The rest of thesis consists of the presented algorithms, their convergence analysis and numerical implementation.
APA, Harvard, Vancouver, ISO, and other styles
4

Marsal, Mora Josep Ramon. "Estimación del tamaño muestral requerido en el uso de variables de respuesta combinadas: nuevas aportaciones." Doctoral thesis, Universitat Autònoma de Barcelona, 2019. http://hdl.handle.net/10803/666768.

Full text
Abstract:
Se define como Variable de Resultado Combinada (VRC), la combinación de dos o más sucesos clínicamente relevantes en un único evento que se utilizará como variable de resultado principal en un ensayo clínico. Los eventos combinados deberían tener la misma importancia para el paciente, tener incidencias parecidas y el efecto de la intervención estudiada debería ser parecido. Una de las ventajas del uso de VRC es la reducción del Tamaño de Muestra Requerido (TMR) para demostrar el efecto de una intervención debido a un incremento de la potencia estadística. Su principal inconveniente es el incremento en cuanto a la complejidad tanto de análisis como de su interpretación. La cuantificación del TMR depende de la incidencia de ocurrencia de cada uno de los eventos combinados, del efecto que la intervención tiene sobre éstos y del grado en que se asocian los eventos entre sí. La forma en que afecta al TMR la probabilidad de ocurrencia y el efecto de la intervención es conocido ampliamente. No obstante, la influencia del grado de asociación entre los eventos que conforman la VRC en el TMR apenas ha sido explorado. En esta Tesis se realiza una aproximación pragmática en la creación de herramientas que objetivamente ayuden a los diferentes profesionales involucrados en el diseño de Ensayos Clínicos Aleatorizados al crear VRC binarias eficientes en cuanto al TMR. Previo al desarrollo de dicha herramienta se realiza un estudio formal tanto de la cuantificación del grado de asociación entre variables binarias como de la forma en que dicha asociación afecta al TMR cuando se define una VRC donde se combinan solamente dos eventos binarios. En una primera parte, se define y caracteriza la asociación entre variables binarias, estudiando en profundidad el concepto de asociación. Se listan diferentes formas de estimar la asociación y se definen diferentes métricas que servirán para compararlas entre ellas. Concluimos que la correlación de Pearson, a pesar de ser un buen estimador del grado de asociación, no es óptimo cuando se usa en el contexto de variables binarias, en comparación con la probabilidad conjunta o el grado relativo de solapamiento, que muestran mejores características. En una segunda parte, se identifican mediante simulación los escenarios en los que el uso de una VRC binaria es preferible al uso de un único Evento Relevante para reducir el TMR. Determinamos en qué sentido y magnitud afectan las variaciones en las incidencias, la magnitud del efecto de la intervención y, especialmente, el grado de asociación entre los distintos eventos. El grado de asociación puede determinar que la unión de dos eventos sea aconsejable o no para reducir el TMR. Finalmente, se desarrolla una herramienta que determina, a partir de un conjunto de eventos binarios, la combinación óptima para conseguir la máxima reducción del TMR. Esta herramienta se ha desarrollado teniendo en cuenta el perfil clínico de los usuarios. Ha sido programada utilizando software libre y es de acceso gratuito a todos los usuarios que lo deseen en https://uesca-apps.shinyapps.io/bincep.<br>We define a Composite Endpoint (CE) as the combination of two or more clinically relevant events in a unique event. The CE will be used as a primary endpoint in a clinical trial (CT). The events to combine must comply with characteristics such as similar incidence, similar magnitude of the intervention effect and importance for patients. The main advantage of the use of CE is the potential reduction on the Sample Size Requirement (SSR) resulting in an increase of statistical power (i.e. increase of the net number of patients with one or more events). On other hand, the main disadvantage of the use of CEs is an increase in the complexity of both the statistical analysis and the interpretation of results. The quantification of SSR depends mainly on the incidence and the effect of each of the combined events but also on the grade of association between the combined events. The impact of incidence and the magnitude of the effect of the combined events in the SSR quantification is well-known. However, the impact of the association between events has not been fully assessed. Using a pragmatic approximation, we have created a web application that quantifies the SSR when using a binary CE that is available for professionals designing CT (i.e. clinicians, trialists and biostatistics). As a previous step for the development of the tool, we studied in depth the concept of strength of association between two binary variables. We also studied the impact of the association between two binary events conforming a CE on the SSR. In the first section of this Thesis, we define and characterize the concept of association between binary events. We list different ways of quantifying the association and different metrics which will be used to compare them. We conclude that the Pearson’s correlation is not the best indicator of association between two binary variables. The joint probability (the probability of both events) or the overlap show better characteristics. In the second section, we define, using simulation, the scenarios where the use of a binary CE is better than a single relevant endpoint in terms of SSR reduction. We evaluate the impact of incidence, the impact of the magnitude of the intervention effect and the impact of the magnitude of association between the two binary events on the SSR. We conclude that the magnitude of the association will determine whether a combination of two endpoints in a CE is efficient in terms of SSR reduction. Finally, we develop Bin-CE, a free tool that calculates the SSR of a CE when combining a set of binary events. This tool identifies the combination of events which minimizes the SSR. It has been built under a clinical point-of-view. Bin-CE is accessible on: https://uesca-apps.shinyapps.io/bincep.
APA, Harvard, Vancouver, ISO, and other styles
5

Saha, Dibakar. "Improved Criteria for Estimating Calibration Factors for Highway Safety Manual (HSM) Applications." FIU Digital Commons, 2014. http://digitalcommons.fiu.edu/etd/1701.

Full text
Abstract:
The Highway Safety Manual (HSM) estimates roadway safety performance based on predictive models that were calibrated using national data. Calibration factors are then used to adjust these predictive models to local conditions for local applications. The HSM recommends that local calibration factors be estimated using 30 to 50 randomly selected sites that experienced at least a total of 100 crashes per year. It also recommends that the factors be updated every two to three years, preferably on an annual basis. However, these recommendations are primarily based on expert opinions rather than data-driven research findings. Furthermore, most agencies do not have data for many of the input variables recommended in the HSM. This dissertation is aimed at determining the best way to meet three major data needs affecting the estimation of calibration factors: (1) the required minimum sample sizes for different roadway facilities, (2) the required frequency for calibration factor updates, and (3) the influential variables affecting calibration factors. In this dissertation, statewide segment and intersection data were first collected for most of the HSM recommended calibration variables using a Google Maps application. In addition, eight years (2005-2012) of traffic and crash data were retrieved from existing databases from the Florida Department of Transportation. With these data, the effect of sample size criterion on calibration factor estimates was first studied using a sensitivity analysis. The results showed that the minimum sample sizes not only vary across different roadway facilities, but they are also significantly higher than those recommended in the HSM. In addition, results from paired sample t-tests showed that calibration factors in Florida need to be updated annually. To identify influential variables affecting the calibration factors for roadway segments, the variables were prioritized by combining the results from three different methods: negative binomial regression, random forests, and boosted regression trees. Only a few variables were found to explain most of the variation in the crash data. Traffic volume was consistently found to be the most influential. In addition, roadside object density, major and minor commercial driveway densities, and minor residential driveway density were also identified as influential variables.
APA, Harvard, Vancouver, ISO, and other styles
6

Caltabiano, Ana Maria de Paula. "Gráficos de controle com tamanho de amostra variável : classificando sua estratégia conforme sua destinação por intermédio de um estudo bibliométrico /." Guaratinguetá, 2018. http://hdl.handle.net/11449/180553.

Full text
Abstract:
Orientador: Antonio Fernando Branco Costa<br>Resumo: Os gráficos de controle foram criados por Shewhart em torno de 1924. Desde então foram propostas muitas estratégias para melhorar o desempenho de tais ferramentas estatísticas. Dentre elas, destaca-se a estratégia dos parâmetros adaptativos, que deu origem a uma linha de pesquisa bastante fértil. Uma de suas vertentes está voltada ao gráfico de tamanho da amostra variável, que depende da posição do ponto amostral atual. Se ele está perto da linha central, a próxima amostra será pequena. Se ele está distante, mas ainda não na região de ação, a próxima amostra será grande. Este esquema de amostragem com tamanho de amostra variável se tornou conhecido com esquema VSS (variable sample size). Esta dissertação revisa os trabalhos da área de monitoramento de processos que tem como foco principal os esquemas VSS de amostragem. Foi feita uma revisão sistemática da literatura, por intermédio de uma análise bibliométrica do período de 1980 a 2018 com o objetivo de classificar a estratégia VSS, segundo sua destinação, por exemplo, os gráficos de com parâmetros conhecidos e observação independente. As destinações foram divididas em dez classes: I – tipo de VSS ; II – tipo de monitoramento; III – número de variáveis sob monitoramento; IV – tipo de gráfico; V – parâmetros do processo; VI – regras de sinalização; VII – natureza do processo; VIII – tipo de otimização; IX – modelo matemático das propriedades do gráfico; X – tipo de produção. A conclusão principal deste estudo foi que nas class... (Resumo completo, clicar acesso eletrônico abaixo)<br>Mestre
APA, Harvard, Vancouver, ISO, and other styles
7

You, Zhiying. "Power and sample size of cluster randomized trials." Thesis, Birmingham, Ala. : University of Alabama at Birmingham, 2008. https://www.mhsl.uab.edu/dt/2009r/you.pdf.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Fernandes, Jessica Katherine de Sousa. "Estudo de algoritmos de otimização estocástica aplicados em aprendizado de máquina." Universidade de São Paulo, 2017. http://www.teses.usp.br/teses/disponiveis/45/45134/tde-28092017-182905/.

Full text
Abstract:
Em diferentes aplicações de Aprendizado de Máquina podemos estar interessados na minimização do valor esperado de certa função de perda. Para a resolução desse problema, Otimização estocástica e Sample Size Selection têm um papel importante. No presente trabalho se apresentam as análises teóricas de alguns algoritmos destas duas áreas, incluindo algumas variações que consideram redução da variância. Nos exemplos práticos pode-se observar a vantagem do método Stochastic Gradient Descent em relação ao tempo de processamento e memória, mas, considerando precisão da solução obtida juntamente com o custo de minimização, as metodologias de redução da variância obtêm as melhores soluções. Os algoritmos Dynamic Sample Size Gradient e Line Search with variable sample size selection apesar de obter soluções melhores que as de Stochastic Gradient Descent, a desvantagem se encontra no alto custo computacional deles.<br>In different Machine Learnings applications we can be interest in the minimization of the expected value of some loss function. For the resolution of this problem, Stochastic optimization and Sample size selection has an important role. In the present work, it is shown the theoretical analysis of some algorithms of these two areas, including some variations that considers variance reduction. In the practical examples we can observe the advantage of Stochastic Gradient Descent in relation to the processing time and memory, but considering accuracy of the solution obtained and the cost of minimization, the methodologies of variance reduction has the best solutions. In the algorithms Dynamic Sample Size Gradient and Line Search with variable sample size selection, despite of obtaining better solutions than Stochastic Gradient Descent, the disadvantage lies in their high computational cost.
APA, Harvard, Vancouver, ISO, and other styles
9

Pfister, Mark. "Distribution of a Sum of Random Variables when the Sample Size is a Poisson Distribution." Digital Commons @ East Tennessee State University, 2018. https://dc.etsu.edu/etd/3459.

Full text
Abstract:
A probability distribution is a statistical function that describes the probability of possible outcomes in an experiment or occurrence. There are many different probability distributions that give the probability of an event happening, given some sample size n. An important question in statistics is to determine the distribution of the sum of independent random variables when the sample size n is fixed. For example, it is known that the sum of n independent Bernoulli random variables with success probability p is a Binomial distribution with parameters n and p: However, this is not true when the sample size is not fixed but a random variable. The goal of this thesis is to determine the distribution of the sum of independent random variables when the sample size is randomly distributed as a Poisson distribution. We will also discuss the mean and the variance of this unconditional distribution.
APA, Harvard, Vancouver, ISO, and other styles
10

Kim, Keunpyo. "Process Monitoring with Multivariate Data:Varying Sample Sizes and Linear Profiles." Diss., Virginia Tech, 2003. http://hdl.handle.net/10919/29741.

Full text
Abstract:
Multivariate control charts are used to monitor a process when more than one quality variable associated with the process is being observed. The multivariate exponentially weighted moving average (MEWMA) control chart is one of the most commonly recommended tools for multivariate process monitoring. The standard practice, when using the MEWMA control chart, is to take samples of fixed size at regular sampling intervals for each variable. In the first part of this dissertation, MEWMA control charts based on sequential sampling schemes with two possible stages are investigated. When sequential sampling with two possible stages is used, observations at a sampling point are taken in two groups, and the number of groups actually taken is a random variable that depends on the data. The basic idea is that sampling starts with a small initial group of observations, and no additional sampling is done at this point if there is no indication of a problem with the process. But if there is some indication of a problem with the process then an additional group of observations is taken at this sampling point. The performance of the sequential sampling (SS) MEWMA control chart is compared to the performance of standard control charts. It is shown that that the SS MEWMA chart is substantially more efficient in detecting changes in the process mean vector than standard control charts that do not use sequential sampling. Also the situation is considered where different variables may have different measurement costs. MEWMA control charts with unequal sample sizes based on differing measurement costs are investigated in order to improve the performance of process monitoring. Sequential sampling plans are applied to MEWMA control charts with unequal sample sizes and compared to the standard MEWMA control charts with a fixed sample size. The steady-state average time to signal (SSATS) is computed using simulation and compared for some selected sets of sample sizes. When different variables have significantly different measurement costs, using unequal sample sizes can be more cost effective than using the same fixed sample size for each variable. In the second part of this dissertation, control chart methods are proposed for process monitoring when the quality of a process or product is characterized by a linear function. In the historical analysis of Phase I data, methods including the use of a bivariate <i>T</i>² chart to check for stability of the regression coefficients in conjunction with a univariate Shewhart chart to check for stability of the variation about the regression line are recommended. The use of three univariate control charts in Phase II is recommended. These three charts are used to monitor the <i>Y</i>-intercept, the slope, and the variance of the deviations about the regression line, respectively. A simulation study shows that this type of Phase II method can detect sustained shifts in the parameters better than competing methods in terms of average run length (ARL) performance. The monitoring of linear profiles is also related to the control charting of regression-adjusted variables and other methods.<br>Ph. D.
APA, Harvard, Vancouver, ISO, and other styles
11

Forgo, Vincent Z. Mr. "A Distribution of the First Order Statistic When the Sample Size is Random." Digital Commons @ East Tennessee State University, 2017. https://dc.etsu.edu/etd/3181.

Full text
Abstract:
Statistical distributions also known as probability distributions are used to model a random experiment. Probability distributions consist of probability density functions (pdf) and cumulative density functions (cdf). Probability distributions are widely used in the area of engineering, actuarial science, computer science, biological science, physics, and other applicable areas of study. Statistics are used to draw conclusions about the population through probability models. Sample statistics such as the minimum, first quartile, median, third quartile, and maximum, referred to as the five-number summary, are examples of order statistics. The minimum and maximum observations are important in extreme value theory. This paper will focus on the probability distribution of the minimum observation, also known as the first order statistic, when the sample size is random.
APA, Harvard, Vancouver, ISO, and other styles
12

Chen, Mei-Kuang. "Who Are the Cigarette Smokers in Arizona." Thesis, The University of Arizona, 2007. http://hdl.handle.net/10150/193268.

Full text
Abstract:
The purpose of this study was to investigate the relationship between cigarette smoking and socio-demographic variables based on the empirical literature and the primitive theories in the field. Two regression approaches, logistic regression and linear multiple regression, were conducted on the two most recent Arizona Adult Tobacco Surveys to test the hypothesized models. The results showed that cigarette smokers in Arizona are mainly residents who have not completed a four-year college degree, who are unemployed, White, non-Hispanic, or young to middle-aged adults. Among the socio-demographic predictors of interest, education is the most important variable in identifying cigarette smokers, even though the predictive power of these socio-demographic variables is small. Practical and methodological implications of these findings are discussed.
APA, Harvard, Vancouver, ISO, and other styles
13

Tʻang, Min. "Extention of evaluating the operating characteristics for dependent mixed variables-attributes sampling plans to large first sample size /." Online version of thesis, 1991. http://hdl.handle.net/1850/11208.

Full text
APA, Harvard, Vancouver, ISO, and other styles
14

Widman, Tracy. "Factors that Influence Cross-validation of Hierarchical Linear Models." Digital Archive @ GSU, 2011. http://digitalarchive.gsu.edu/eps_diss/71.

Full text
Abstract:
While use of hierarchical linear modeling (HLM) to predict an outcome is reasonable and desirable, employing the model for prediction without first establishing the model’s predictive validity is ill-advised. Estimating the predictive validity of a regression model by cross-validation has been thoroughly researched, but there is a dearth of research investigating the cross-validation of hierarchical linear models. One of the major obstacles in cross-validating HLM is the lack of a measure of explained variance similar to the squared multiple correlation coefficient in regression analysis. The purpose of this Monte Carlo simulation study is to explore the impact of sample size, centering, and predictor-criterion correlation magnitudes on potential cross-validation measurements for hierarchical linear modeling. This study considered the impact of 64 simulated conditions across three explained variance approaches: Raudenbush and Bryk’s (2002) proportional reduction in error variance, Snijders and Bosker’s (1994) modeled variance, and a measure of explained variance proposed by Gagné and Furlow (2009). For each of the explained variance approaches, a cross-validation measurement, shrinkage, was obtained. The results indicate that sample size, predictor-criterion correlations, and centering impact the cross-validation measurement. The degree and direction of the impact differs with the explained variance approach employed. Under some explained variance approaches, shrinkage decreased with larger level-2 sample sizes and increased in others. Likewise, in comparing group- and grand-mean centering, with some approaches grand-mean centering resulted in higher shrinkage estimates but smaller estimates in others. Larger total sample sizes yielded smaller shrinkage estimates, as did the predictor-criterion correlation combination in which the group-level predictor had a stronger correlation. The approaches to explained variance differed substantially in their usability for cross-validation. The Snijders and Bosker approach provided relatively large shrinkage estimates, and, depending on the predictor-criterion correlation, shrinkage under both Raudenbush and Bryk approaches could be sizable to the degree that the estimate begins to lack meaning. Researchers seeking to cross-validate HLM need to be mindful of the interplay between the explained variance approach employed and the impact of sample size, centering, and predictor-criterion correlations on shrinkage estimates when making research design decisions.
APA, Harvard, Vancouver, ISO, and other styles
15

Liv, Per. "Efficient strategies for collecting posture data using observation and direct measurement." Doctoral thesis, Umeå universitet, Yrkes- och miljömedicin, 2012. http://urn.kb.se/resolve?urn=urn:nbn:se:umu:diva-59132.

Full text
Abstract:
Relationships between occupational physical exposures and risks of contracting musculoskeletal disorders are still not well understood; exposure-response relationships are scarce in the musculoskeletal epidemiology literature, and many epidemiological studies, including intervention studies, fail to reach conclusive results. Insufficient exposure assessment has been pointed out as a possible explanation for this deficiency. One important aspect of assessing exposure is the selected measurement strategy; this includes issues related to the necessary number of data required to give sufficient information, and to allocation of measurement efforts, both over time and between subjects in order to achieve precise and accurate exposure estimates. These issues have been discussed mainly in the occupational hygiene literature considering chemical exposures, while the corresponding literature on biomechanical exposure is sparse. The overall aim of the present thesis was to increase knowledge on the relationship between data collection design and the resulting precision and accuracy of biomechanical exposure assessments, represented in this thesis by upper arm postures during work, data which have been shown to be relevant to disorder risk. Four papers are included in the thesis. In papers I and II, non-parametric bootstrapping was used to investigate the statistical efficiency of different strategies for distributing upper arm elevation measurements between and within working days into different numbers of measurement periods of differing durations. Paper I compared the different measurement strategies with respect to the eventual precision of estimated mean exposure level. The results showed that it was more efficient to use a higher number of shorter measurement periods spread across a working day than to use a smaller number for longer uninterrupted measurement periods, in particular if the total sample covered only a small part of the working day. Paper II evaluated sampling strategies for the purpose of determining posture variance components with respect to the accuracy and precision of the eventual variance component estimators. The paper showed that variance component estimators may be both biased and imprecise when based on sampling from small parts of working days, and that errors were larger with continuous sampling periods. The results suggest that larger posture samples than are conventionally used in ergonomics research and practice may be needed to achieve trustworthy estimates of variance components. Papers III and IV focused on method development. Paper III examined procedures for estimating statistical power when testing for a group difference in postures assessed by observation. Power determination was based either on a traditional analytical power analysis or on parametric bootstrapping, both of which accounted for methodological variance introduced by the observers to the exposure data. The study showed that repeated observations of the same video recordings may be an efficient way of increasing the power in an observation-based study, and that observations can be distributed between several observers without loss in power, provided that all observers contribute data to both of the compared groups, and that the statistical analysis model acknowledges observer variability. Paper IV discussed calibration of an inferior exposure assessment method against a superior “golden standard” method, with a particular emphasis on calibration of observed posture data against postures determined by inclinometry. The paper developed equations for bias correction of results obtained using the inferior instrument through calibration, as well as for determining the additional uncertainty of the eventual exposure value introduced through calibration. In conclusion, the results of the present thesis emphasize the importance of carefully selecting a measurement strategy on the basis of statistically well informed decisions. It is common in the literature that postural exposure is assessed from one continuous measurement collected over only a small part of a working day. In paper I, this was shown to be highly inefficient compared to spreading out the corresponding sample time across the entire working day, and the inefficiency was also obvious when assessing variance components, as shown in paper II. The thesis also shows how a well thought-out strategy for observation-based exposure assessment can reduce the effects of measurement error, both for random methodological variance (paper III) and systematic observation errors (bias) (paper IV).
APA, Harvard, Vancouver, ISO, and other styles
16

Hagen, Clinton Ernest. "Comparing the performance of four calculation methods for estimating the sample size in repeated measures clinical trials where difference in treatment groups means is of interest." Oklahoma City : [s.n.], 2008.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
17

Thiebaut, Nicolene Magrietha. "Statistical properties of forward selection regression estimators." Diss., University of Pretoria, 2011. http://hdl.handle.net/2263/29520.

Full text
APA, Harvard, Vancouver, ISO, and other styles
18

Martin, Thomas Newton. "Modelo estocástico para estimação da produtividade de soja no Estado de São Paulo utilizando simulação normal bivariada." Universidade de São Paulo, 2007. http://www.teses.usp.br/teses/disponiveis/11/11136/tde-20032007-161308/.

Full text
Abstract:
A disponibilidade de recursos, tanto de ordem financeira quanto de mão-de-obra, é escassa. Sendo assim, deve-se incentivar o planejamento regional que minimize a utilização de recursos. A previsão de safra por intermédio de técnicas de modelagem deve ser realizada anteriormente com base nas características regionais, indicando assim as diretrizes básicas da pesquisa, bem como o planejamento regional. Dessa forma, os objetivos deste trabalho são: (i) caracterizar as variáveis do clima por intermédio de diferentes distribuições de probabilidade; (ii) verificar a homogeneidade espacial e temporal para as variáveis do clima; (iii) utilizar a distribuição normal bivariada para simular parâmetros utilizados na estimação de produtividade da cultura de soja; e (iv) propor um modelo para estimar a ordem de magnitude da produtividade potencial (dependente da interação genótipo, temperatura, radiação fotossinteticamente ativa e fotoperíodo) e da produtividade deplecionada (dependente da podutividade potencial, da chuva e do armazenamento de água no solo) de grãos de soja, baseados nos valores diários de temperatura, insolação e chuva, para o estado de São Paulo. As variáveis utilizadas neste estudo foram: temperatura média, insolação, radiação solar fotossinteticamente ativa e precipitação pluvial, em escala diária, obtidas em 27 estações localizadas no Estado de São Paulo e seis estações localizadas em Estados vizinhos. Primeiramente, verificou-se a aderência das variáveis a cinco distribuições de probabilidade (normal, log-normal, exponencial, gama e weibull), por intermédio do teste de Kolmogorov-Smirnov. Verificou-se a homogeneidade espacial e temporal dos dados por intermédio da análise de agrupamento pelo método de Ward e estimou-se o tamanho de amostra (número de anos) para as variáveis. A geração de números aleatórios foi realizada por intermédio do método Monte Carlo. A simulação dos dados de radiação fotossinteticamente ativa e temperatura foram realizadas por intermédio de três casos (i) distribuição triangular assimétrica (ii) distribuição normal truncada a 1,96 desvio padrão da média e (iii) distribuição normal bivariada. Os dados simulados foram avaliados por intermédio do teste de homogeneidade de variância de Bartlett e do teste F, teste t, índice de concordância de Willmott, coeficiente angular da reta, o índice de desempenho de Camargo (C) e aderência à distribuição normal (univariada). O modelo utilizado para calcular a produtividade potencial da cultura de soja foi desenvolvido com base no modelo de De Wit, incluindo contribuições de Van Heenst, Driessen, Konijn, de Vries, dentre outros. O cálculo da produtividade deplecionada foi dependente da evapotranspiração potencial, da cultura e real e coeficiente de sensibilidade a deficiência hídrica. Os dados de precipitação pluvial foram amostrados por intermédio da distribuição normal. Sendo assim, a produção diária de carboidrato foi deplecionada em função do estresse hídrico e número de horas diárias de insolação. A interpolação dos dados, de modo a englobar todo o Estado de São Paulo, foi realizada por intermédio do método da Krigagem. Foi verificado que a maior parte das variáveis segue a distribuição normal de probabilidade. Além disso, as variáveis apresentam variabilidade espacial e temporal e o número de anos necessários (tamanho de amostra) para cada uma delas é bastante variável. A simulação utilizando a distribuição normal bivariada é a mais apropriada por representar melhor as variáveis do clima. E o modelo de estimação das produtividades potencial e deplecionada para a cultura de soja produz resultados coerentes com outros resultados obtidos na literatura.<br>The availability of resources, as much of financial order and human labor, is scarse. Therefore, it must stimulates the regional planning that minimizes the use of resources. Then, the forecast of harvests through modelling techniques must previously on the basis of be carried through the regional characteristics, thus indicating the routes of the research, as well as the regional planning. Then, the aims of this work are: (i) to characterize the climatic variables through different probability distributions; (ii) to verify the spatial and temporal homogeneity of the climatic variables; (iii) to verify the bivaried normal distribution to simulate parameters used to estimate soybean crop productivity; (iv) to propose a model of estimating the magnitud order of soybean crop potential productivity (it depends on the genotype, air temperature, photosynthetic active radiation; and photoperiod) and the depleted soybean crop productivity (it pedends on the potential productivity, rainfall and soil watter availability) based on daily values of temperature, insolation and rain, for the State of São Paulo. The variable used in this study had been the minimum, maximum and average air temperature, insolation, solar radiation, fotosynthetic active radiation and pluvial precipitation, in daily scale, gotten in 27 stations located in the State of São Paulo and six stations located in neighboring States. First, it was verified tack of seven variables in five probability distributions (normal, log-normal, exponential, gamma and weibull), through of Kolmogorov-Smirnov. The spatial and temporal verified through the analysis of grouping by Ward method and estimating the sample size (number of years) for the variable. The generation of random numbers was carried through the Monte Carlo Method. The simulation of the data of photosyntetic active radiation and temperature had been carried through three cases: (i) nonsymetric triangular distribution (ii) normal distribution truncated at 1.96 shunting line standard of the average and (iii) bivaried normal distribution. The simulated data had been evaluated through the test of homogeneity of variance of Bartlett and the F test, t test, agreement index of Willmott, angular coefficient of the straight line, the index of performance index of Camargo (C) and tack the normal distribution (univarieted). The proposed model to simulate the potential productivity of soybean crop was based on the de Wit concepts, including Van Heenst, Driessen, Konijn, Vries, and others researchers. The computation of the depleted productivity was dependent of the potential, crop and real evapotranspirations and the sensitivity hydric deficiency coefficient. The insolation and pluvial precipitation data had been showed through the normal distribution. Being thus, the daily production of carbohydrate was depleted as function of hydric stress and insolation. The interpolation of the data, in order to consider the whole State of Sao Paulo, was carried through the Kriging method. The results were gotten that most of the variable can follow the normal distribution. Moreover, the variable presents spatial and temporal variability and the number of necessary years (sample size) for each one of them is sufficiently changeable. The simulation using the bivaried normal distribution is most appropriate for better representation of climate variable. The model of estimating potential and depleted soybean crop productivities produces coherent values with the literature results.
APA, Harvard, Vancouver, ISO, and other styles
19

Ruengvirayudh, Pornchanok. "A Monte Carlo Study of Parallel Analysis, Minimum Average Partial, Indicator Function, and Modified Average Roots for Determining the Number of Dimensions with Binary Variables in Test Data: Impact of Sample Size and Factor Structure." Ohio University / OhioLINK, 2018. http://rave.ohiolink.edu/etdc/view?acc_num=ohiou151516919677091.

Full text
APA, Harvard, Vancouver, ISO, and other styles
20

Högberg, Hans. "Some properties of measures of disagreement and disorder in paired ordinal data." Doctoral thesis, Örebro universitet, Handelshögskolan vid Örebro universitet, 2010. http://urn.kb.se/resolve?urn=urn:nbn:se:oru:diva-12350.

Full text
Abstract:
The measures studied in this thesis were a measure of disorder, D, and a measure of the individual part of the disagreement, the measure of relative rank variance, RV, proposed by Svensson in 1993. The measure of disorder is a useful measure of order consistency in paired assessments of scales with a different number of possible values. The measure of relative rank variance is a useful measure in evaluating reliability and for evaluating change in qualitative outcome variables. In Paper I an overview of methods used in the analysis of dependent ordinal data and a comparison of the methods regarding the assumptions, specifications, applicability, and implications for use were made. In Paper II an application, and a comparison of the results of some standard models, tests, and measures to two different research problems were made. The sampling distribution of the measure of disorder was studied both analytically and by a simulation experiment in Paper III. The asymptotic normal distribution was shown by the theory of U-statistics and the simulation experiments for finite sample sizes and various amount of disorder showed that the sampling distribution was approximately normal for sample sizes of about 40 to 60 for moderate sizes of D and for smaller sample sizes for substantial sizes of D. The sampling distribution of the relative rank variance was studied in a simulation experiment in Paper IV. The simulation experiment showed that the sampling distribution was approximately normal for sample sizes of 60-100 for moderate size of RV, and for smaller sample sizes for substantial size of RV. In Paper V a procedure for inference regarding relative rank variances from two or more samples was proposed. Pair-wise comparison by jackknife technique for variance estimation and the use of normal distribution as approximation in inference for parameters in independent samples based on the results in Paper IV were demonstrated. Moreover, an application of Kruskal-Wallis test for independent samples and Friedman’s test for dependent samples were conducted.<br>Statistical methods for ordinal data
APA, Harvard, Vancouver, ISO, and other styles
21

Moreira, Ana Sofia Pereira. "Study of modifications induced by thermal and oxidative treatment in oligo and polysaccharides of coffee by mass spectrometry." Doctoral thesis, Universidade de Aveiro, 2016. http://hdl.handle.net/10773/17074.

Full text
Abstract:
Doutoramento em Bioquímica<br>Os polissacarídeos são os componentes maioritários dos grãos de café verde e torrado e da bebida de café. Os mais abundantes são as galactomananas, seguindo-se as arabinogalactanas. Durante o processo de torra, as galactomananas e arabinogalactanas sofrem modificações estruturais, as quais estão longe de estar completamente elucidadas devido à sua diversidade e à complexidade estrutural dos compostos formados. Durante o processo de torra, as galactomananas e arabinogalactanas reagem com proteínas, ácidos clorogénicos e sacarose, originando compostos castanhos de alto peso molecular contendo nitrogénio, designados de melanoidinas. As melanoidinas do café apresentam diversas atividades biológicas e efeitos benéficos para a saúde. No entanto, a sua estrutura exata e os mecanismos envolvidos na sua formação permanecem desconhecidos, bem como a relação estrutura-atividade biológica. A utilização de sistemas modelo e a análise por espectrometria de massa permitem obter uma visão global e, simultaneamente, detalhada das modificações estruturais nos polissacarídeos do café promovidas pela torra, contribuindo para a elucidação das estruturas e mecanismos de formação das melanoidinas. Com base nesta tese, oligossacarídeos estruturalmente relacionados com a cadeia principal das galactomananas, (β1→4)-Dmanotriose (Man3), e as cadeias laterais das arabinogalactanas, (α1→5)-Larabinotriose (Ara3), isoladamente ou em misturas com ácido 5-Ocafeoilquínico (5-CQA), o ácido clorogénico mais abundante nos grãos de café verde, e péptidos compostos por tirosina e leucina, usados como modelos das proteínas, foram sujeitos a tratamento térmico a seco, mimetizando o processo de torra. A oxidação induzida por radicais hidroxilo (HO•) foi também estudada, uma vez que estes radicais parecem estar envolvidos na modificação dos polissacarídeos durante a torra. A identificação das modificações estruturais induzidas por tratamento térmico e oxidativo dos compostos modelo foi feita por estratégias analíticas baseadas principalmente em espectrometria de massa, mas também em cromatografia líquida. A cromatografia de gás foi usada na análise de açúcares neutros e ligações glicosídicas. Para validar as conclusões obtidas com os compostos modelo, foram também analisadas amostras de polissacarídeos do café obtidas a partir de resíduo de café e café instantâneo. Os resultados obtidos a partir dos oligossacarídeos modelo quando submetidos a tratamento térmico (seco), assim como à oxidação induzida por HO• (em solução), indicam a ocorrência de despolimerização, o que está de acordo com estudos anteriores que reportam a despolimerização das galactomananas e arabinogalactanas do café durante a torra. Foram ainda identificados outros compostos resultantes da quebra do anel de açúcares formados durante o tratamento térmico e oxidativo da Ara3. Por outro lado, o tratamento térmico a seco dos oligossacarídeos modelo (individualmente ou quando misturados) promoveu a formação de oligossacarídeos com um maior grau de polimerização, e também polissacarídeos com novos tipos de ligações glicosídicas, evidenciando a ocorrência de polimerização através reações de transglicosilação não enzimática induzidas por tratamento térmico a seco. As reações de transglicosilação induzidas por tratamento térmico a seco podem ocorrer entre resíduos de açúcares provenientes da mesma origem, mas também de origens diferentes com formação de estruturas híbridas, contendo arabinose e manose como observado nos casos dos compostos modelo usados. Os resultados obtidos a partir de amostras do resíduo de café e de café instantâneo sugerem a presença de polissacarídeos híbridos nestas amostras de café processado, corroborando a ocorrência de transglicosilação durante o processo de torra. Além disso, o estudo de misturas contendo diferentes proporções de cada oligossacarídeo modelo, mimetizando regiões do grão de café com composição distinta em polissacarídeos, sujeitos a diferentes períodos de tratamento térmico, permitiu inferir que diferentes estruturas híbridas e não híbridas podem ser formadas a partir das arabinogalactanas e galactomananas, dependendo da sua distribuição nas paredes celulares do grão e das condições de torra. Estes resultados podem explicar a heterogeneidade de estruturas de melanoidinas formadas durante a torra do café. Os resultados obtidos a partir de misturas modelo contendo um oligossacarídeo (Ara3 ou Man3) e 5-CQA sujeitas a tratamento térmico a seco, assim como de amostras provenientes do resíduo de café, mostraram a formação de compostos híbridos compostos por moléculas de CQA ligadas covalentemente a um número variável de resíduos de açúcar. Além disso, os resultados obtidos a partir da mistura contendo Man3 e 5-CQA mostraram que o CQA atua como catalisador das reações de transglicosilação. Por outro lado, nas misturas modelo contendo um péptido, mesmo contendo também 5-CQA e sujeitas ao mesmo tratamento, observou-se uma diminuição na extensão das reações transglicosilação. Este resultado pode explicar a baixa extensão das reações de transglicosilação não enzimáticas durante a torra nas regiões do grão de café mais ricas em proteínas, apesar dos polissacarídeos serem os componentes maioritários dos grãos de café. A diminuição das reações de transglicosilação na presença de péptidos/proteínas pode dever-se ao facto de os resíduos de açúcares redutores reagirem preferencialmente com os grupos amina de péptidos/proteínas por reação de Maillard, diminuindo o número de resíduos de açúcares redutores disponíveis para as reações de transglicosilação. Além dos compostos já descritos, uma diversidade de outros compostos foram formados a partir dos sistemas modelo, nomeadamente derivados de desidratação formados durante o tratamento térmico a seco. Em conclusão, a tipificação das modificações estruturais promovidas pela torra nos polissacarídeos do café abre o caminho para a compreensão dos mecanismos de formação das melanoidinas e da relação estrutura-atividade destes compostos.<br>Polysaccharides are the major components of green and roasted coffee beans, and coffee brew. The most abundant ones are galactomannans, followed by arabinogalactans. During the roasting process, galactomannans and arabinogalactans undergo structural modifications that are far to be completely elucidated due to their diversity and complexity of the compounds formed. During the roasting process, galactomannans and arabinogalactans react with proteins, chlorogenic acids, and sucrose, originating high molecular weight brown compounds containing nitrogen, known as melanoidins. Several biological activities and beneficial health effects have been attributed to coffee melanoidins. However, their exact structures and the mechanisms involved in their formation remain unknown, as well as the structure-biological activity relationship. The use of model systems and mass spectrometry analysis allow to obtain an overall view and, simultaneously, detailed, of the structural modifications in coffee polysaccharides promoted by roasting, contributing to the elucidation of the structures and formation mechanisms of melanoidins. Based on this thesis, oligosaccharides structurally related to the backbone of galactomannans, (β1→4)-D-mannotriose, and the side chains of arabinogalactans, (α1→5)-Larabinotriose, alone or in mixtures with 5-O-caffeoylquinic acid, the most abundant chlorogenic acid in green coffee beans, and dipeptides composed by tyrosine and leucine, used as models of proteins, were submitted to dry thermal treatments, mimicking the coffee roasting process. The oxidation induced by hydroxyl radicals (HO•) was also studied, since these radicals seem to be involved in the modification of the polysaccharides during roasting. The identification of the structural modifications induced by thermal and oxidative treatment of the model compounds was performed mostly by mass spectrometry-based analytical strategies, but also using liquid chromatography. Gas chromatography was used in the analysis of neutral sugars and glycosidic linkages. To validate the conclusions achieved with the model compounds, coffee polysaccharide samples obtained from spent coffee grounds and instant coffee were also analysed. The results obtained from the model oligosaccharides when submitted to thermal treatment (dry) or oxidation induced by HO• (in solution) indicate the occurrence of depolymerization, which is in line with previous studies reporting the depolymerization of coffee galactomannans and arabinogalactans during roasting. Compounds resulting from sugar ring cleavage were also formed during thermal treatment and oxidative treatment of Ara3. On the other hand, the dry thermal treatment of the model oligosaccharides (alone or when mixed) promoted the formation of oligosaccharides with a higher degree of polymerization, and also polysaccharides with new type of glycosidic linkages, evidencing the occurrence of polymerization via non-enzymatic transglycosylation reactions induced by dry thermal treatment. The transglycosylation reactions induced by dry thermal treatment can occur between sugar residues from the same origin, but also of different origins, with formation of hybrid structures, containing arabinose and mannose in the case of the model compounds used. The results obtained from spent coffee grounds and instant coffee samples suggest the presence of hybrid polysaccharides in these processed coffee samples, corroborating the occurrence of transglycosylation during the roasting process. Furthermore, the study of mixtures containing different proportions of each model oligosaccharide, mimicking coffee bean regions with distinct polysaccharide composition, subjected to different periods of thermal treatment, allowed to infer that different hybrid and non-hybrid structures may be formed from arabinogalactans and galactomannans, depending on their distribution in the bean cell walls and on roasting conditions. These results may explain the heterogeneity of melanoidins structures formed during coffee roasting. The results obtained from model mixtures containing an oligosaccharide (Ara3 or Man3) and 5-CQA and subjected to dry thermal treatment, as well as samples derived from spent coffee grounds, showed the formation of hybrid compounds composed by CQA molecules covalently linked to a variable number of sugar residues. Moreover, the results obtained from the mixture containing Man3 and 5-CQA showed that CQA acts as catalyst of transglycosylation reactions. On the other hand, in the model mixtures containing a peptide, even if containing 5-CQA and subjected to the same treatment, it was observed a decrease in the extent of transglycosylation reactions. This outcome can explain the low extent of non-enzymatic transglycosylation reactions during roasting in coffee bean regions enriched in proteins, although polysaccharides are the major components of the coffee beans. The decrease of transglycosylation reactions in the presence of peptides/proteins can be related with the preferential reactivity of reducing residues with the amino groups of peptides/proteins by Maillard reaction, decreasing the number of reducing residues available to be directly involved in the transglycosylation reactions. In addition to the compounds already described, a diversity of other compounds were formed from model systems, namely dehydrated derivatives formed during dry thermal treatment. In conclusion, the identification of the structural modifications in coffee polysaccharides promoted by roasting pave the way to the understanding of the mechanisms of formation of melanoidins and structure-activity relationship of these compounds.
APA, Harvard, Vancouver, ISO, and other styles
22

Huang, Guo-Tai, and 黃國泰. "A Study of Control Charts with Variable Sample Size." Thesis, 2004. http://ndltd.ncl.edu.tw/handle/51993886110283599484.

Full text
Abstract:
碩士<br>國立中山大學<br>應用數學系研究所<br>92<br>Shewhart X bar control charts with estimated control limits are widely used in practice. When the sample size is not fixed,we propose seven statistics to estimate the standard deviation sigma . These estimators are applied to estimate the control limits of Shewhart X bar control chart. The estimated results through simulated computation are given and discussed. Finally, we investigate the performance of the Shewhart X bar control charts based on the seven estimators of sigma via its simulated average run length (ARL).
APA, Harvard, Vancouver, ISO, and other styles
23

Yang, Hau-Yu, and 楊濠宇. "The study of Variable Sample Size Cpm Control Chart." Thesis, 2015. http://ndltd.ncl.edu.tw/handle/4ksa47.

Full text
APA, Harvard, Vancouver, ISO, and other styles
24

"Sample size determination for Poisson regression when the exposure variable contains misclassification errors." Tulane University, 1994.

Find full text
Abstract:
Sample size calculation methods for Poisson regression to detect linear trend in logarithm of incidence rates (multiplicative models) and incidence rates (additive models) over ordered exposure groups are developed. These methods are parallel to those of Bull (1993) for logistic regression Moreover, when reliable ancillary misclassification information is available, a slight modification of these calculation methods can be used to determine the required sample size based on the correction of the estimate of the trend parameter in the analysis stage. In which the correction methods is modified from Reade-Christopher and Kupper (1991) We find that, as would be expected, the gradient of incidence rates over these groups and misclassification rate strongly affect the sample size requirements. In a one year study, when exposure variable contains no misclassification, the sample size required varies from 5,054 to 64,534, according to different gradients. Moreover, when a misclassification rate of 30% is assumed, these numbers are multiplied by approximately 1.3 for all gradients The distribution of subjects across exposure groups also affect the sample size requirements. In environmental and occupational studies, subjects may be grouped according to the continuous exposure and the groups chosen are often terciles, quartiles or quintiles, i.e., even distribution over the exposure groups. We find that less sample size is required for this type of distribution Finally, although the use of correction methods reduces the bias of the estimates, there was always greater variance in the estimate than when no correction is used. It would appreciate that when the gradient of incidence rate is small and the misclassification is not severe, then, based on the percentage of the true parameter included in the 95% confidence interval, use of the correction method may not be necessary<br>acase@tulane.edu
APA, Harvard, Vancouver, ISO, and other styles
25

"Sample size approximation for estimating the arithmetic mean of a lognormally distributed random variable with and without Type I censoring." Tulane University, 1995.

Find full text
Abstract:
This research presents the approximate sample size needed to estimate the true arithmetic mean of a log normally distributed random variable to within a specified accuracy (100$\pi$) percent difference from the true arithmetic mean with geometric standard deviations (GSDs) up to 4.0 and with a specified level of confidence Exact minimum required sample sizes are based on the confidence interval width of Land's exact interval. For the non censored case, sample size tables and nomograms are presented. Box-Cox transformations were used to derive formulae for approximating these exact sample sizes. In addition, new formulae, adjusting the classic central limit approach, were derived. Each of these formulas as well as other existing formulas (the classical central limit approach and Hewett's (1995) formula) were compared to the exact sample size to determine under which conditions they perform optimally These comparisons lead to the following recommendations for the 95% confidence level: The Box-Cox transformation formula is recommended for GSD = 1.5 and 100$\pi>$ 20% levels; for GSDs of 2, 2.5 and 3 and 100$\pi>$ 20% levels; for GSD = 3 and 100$\pi>$ 30% levels; and for GSD = 4 and 100$\pi>$ 40% levels. The adjusted classical formula is recommended for GSD = 1.1 and all 100$\pi$ levels; for GSD = 1.5 and 100$\pi>$ 25% levels; and for GSDs of 2, 2.5 and 100$\pi\leq$ 20% levels. The classical formula is recommended for GSD = 3 and 100$\pi\leq$ 20% levels; for GSD = 3.5 and 100$\pi\leq$ 30% levels and for GSD = 4 and 100$\pi\leq$ 40% levels Exact sample size requirements are presented for samples where 10%, 20% and 50% are Type I left censored. These sample sizes are based on Land's exact confidence interval width of the bias corrected maximum likelihood estimates The accuracy of the confidence intervals based on these proposed sample sizes are evaluated using Monte Carlo simulations for both the non censored and censored case. These simulations showed conservative results regarding targeted significant levels<br>acase@tulane.edu
APA, Harvard, Vancouver, ISO, and other styles
26

Hung, Sheng-Wei, and 洪昇偉. "Testing Reliability Assurance Using Capability Indices CPU and CPL Based on Multiple Samples with Variable Sample Sizes." Thesis, 2004. http://ndltd.ncl.edu.tw/handle/20363968918264245796.

Full text
Abstract:
碩士<br>淡江大學<br>管理科學研究所<br>92<br>For stably normal processes with one-sided specification limits, capability indices CPU and CPL have been used to provide numerical measures for product reliability assurance from manufacturing perspective. Statistical properties of the estimators of CPU and CPL have been investigated extensively for cases with one single sample.In this paper, we consider testing product reliability assurance for cases of multiple samples with variable sample size. We obtain the uniform minimum variance unbiased estimators (UMVUEs) of CPU and CPL , and develop a powerful test for that purpose. We also implement Fortran programs to compute the -values, critical values, for testing product reliability assurance. A practical procedure using the UMVUEs is provided to assist the practitioners judging whether their processes are capable of reproducing reliable products. An example of voltage limiting amplifier (VLA) is presented to illustrate the practicality of our approach to actual data collected from the real-world applications.
APA, Harvard, Vancouver, ISO, and other styles
27

Pin, Cheng Chun, and 程俊賓. "A Study on the Comparison of Some Variable Control Charts with Variable Sample Sizes." Thesis, 2003. http://ndltd.ncl.edu.tw/handle/36157137808806820287.

Full text
Abstract:
碩士<br>輔仁大學<br>數學系研究所<br>91<br>Abstract In 2000, R. Sivasamy et al. proposed a control chart called MDS control chart. In their paper, they emphasized only the detecting ability but say nothing about the possible false alarm problem of such control chart. In this research, we study such problem why comparing the relative false alarm rate of three control chart, namely VSS 、VSS and MDS variable control chart with variable sample sizes.
APA, Harvard, Vancouver, ISO, and other styles
28

Tsai, Jerry, and 蔡傑瑞. "Influence of Sample Size and Number of Variables on Exploratory Factor Analysis." Thesis, 2018. http://ndltd.ncl.edu.tw/handle/ugj3nm.

Full text
Abstract:
碩士<br>國立陽明大學<br>公共衛生研究所<br>106<br>Factor analysis is a statistical method to construct unobserved latent factors that are related to correlated observed variables. This method is useful in verifying theoretical models and in reducing dimensions of correlated data. In biomedical and clinical research, collecting sample is often a major difficulty. Previous studies and rules of thumbs in the influences of overall sample sizes and numbers of variables on the performance of factor analysis are inconsistent. The purpose of this study is to evaluate the effects of samples sizes and numbers of observed variables on factor analysis through statistical simulations. Our study simulated many scenarios involving varying sample sizes, high and moderate factor loadings, different factor loading estimation methods including maximum likelihood estimators and principal component methods, different criteria of extracting factors including eigenvalues greater than one, proportion of variation explained treater than 0.8, and fixed number of factors. For each above scenarios, varying numbers of observed variables are also considered. Our results showed that as overall sample sizes increased, the performances of all indexes improved. When over sample sizes are small, both the numbers of observed variables and true factor loadings affected the performance. When the true factor loadings are high (0.8), the overall sample sizes greater than 54 and numbers of observed variables greater than 9 are enough to perform good results of factor analysis. On the other hands when the loadings are moderate (0.6), given sample sizes greater than 54, one needs 12 observed variables to perform good results. In general, the criteria of extracting factors using eigenvalues greater than 1 perform best results. Given selecting the correct numbers of latent factors, a sample size greater than 144 for moderate factor loadings and a sample size greater than 54 for high factor loadings are enough for good performances.
APA, Harvard, Vancouver, ISO, and other styles
29

林正民. "Sample Size Calculation of Confidence Intervals for Standardized Linear Contrasts of Means-Used Average Variance." Thesis, 2012. http://ndltd.ncl.edu.tw/handle/14756018361846323935.

Full text
Abstract:
碩士<br>國立交通大學<br>管理科學系所<br>100<br>Sample size determination is one of the most important aspects in the design of a study. How many sample size will be large enough to yield meaningful information? In this article, sample size formulas used average variance in linear contrasts is divided into two part discussion, one is used equal population variances assumption another is used heteroscedasticity assumption, both part will be examined from two population (k = 2)to multi-population (k = 4),through a fine-tuning parameters and researcher will use the expected interval width method and expected interval coverage probability method to assess this sample size formulas whether consistent with our original set. In this dissertation, the SAS software is used to construct model and simulation from10,000 Monte Carlo replications for each condition.
APA, Harvard, Vancouver, ISO, and other styles
30

Pham, Tung Huy. "Some problems in high dimensional data analysis." 2010. http://repository.unimelb.edu.au/10187/8399.

Full text
Abstract:
The bloom of economics and technology has had an enormous impact on society. Along with these developments, human activities nowadays produce massive amounts of data that can be easily collected for relatively low cost with the aid of new technologies. Many examples can be mentioned here including data from web term-document data, sensor arrays, gene expression, finance data, imaging and hyperspectral analysis. Because of the enormous amount of data from various different and new sources, more and more challenging scientific problems appear. These problems have changed the types of problems which mathematical scientists work.<br>In traditional statistics, the dimension of the data, p say, is low, with many observations, n say. In this case, classical rules such as the Central Limit Theorem are often applied to obtain some understanding from data. A new challenge to statisticians today is dealing with a different setting, when the data dimension is very large and the number of observations is small. The mathematical assumption now could be p > n, or even p goes to infinity and n fixed in many cases, for example, there are few patients with many genes. In these cases, classical methods fail to produce a good understanding of the nature of the problem. Hence, new methods need to be found to solve these problems. Mathematical explanations are also needed to generalize these cases.<br>The research preferred in this thesis includes two problems: Variable selection and Classification, in the case where the dimension is very large. The work on variable selection problems, in particular the Adaptive Lasso was completed by June 2007 and the research on classification has been carried out through out 2008 and 2009. The research on the Dantzig selector and the Lasso were finished in July 2009. Therefore, this thesis is divided into two parts. In the first part of the thesis we study the Adaptive Lasso, the Lasso and the Dantzig selector. In particular, in Chapter 2 we present some results for the Adaptive Lasso. Chapter 3 will provides two examples that show that neither the Dantzig selector or the Lasso is definitely better than the other. The second part of the thesis is organized as follows. In Chapter 5, we shall construct the model setting. In Chapter 6, we summarize the results of the scaled centroid-based classifier. We also prove some results on the scaled centroid-based classifier. Because there are similarities between the Support Vector Machine (SVM) and Distance Weighted Discrimination (DWD) classifiers, Chapter 8 introduces a class of distance-based classifiers that could be considered a generalization of the SVM and DWD classifiers. Chapters 9 and 10 are about the SVM and DWD classifiers. Chapter 11 demonstrates the performance of these classifiers on simulated data sets and some cancer data sets.
APA, Harvard, Vancouver, ISO, and other styles
31

"Determining Appropriate Sample Sizes and Their Effects on Key Parameters in Longitudinal Three-Level Models." Doctoral diss., 2016. http://hdl.handle.net/2286/R.I.40260.

Full text
Abstract:
abstract: Through a two study simulation design with different design conditions (sample size at level 1 (L1) was set to 3, level 2 (L2) sample size ranged from 10 to 75, level 3 (L3) sample size ranged from 30 to 150, intraclass correlation (ICC) ranging from 0.10 to 0.50, model complexity ranging from one predictor to three predictors), this study intends to provide general guidelines about adequate sample sizes at three levels under varying ICC conditions for a viable three level HLM analysis (e.g., reasonably unbiased and accurate parameter estimates). In this study, the data generating parameters for the were obtained using a large-scale longitudinal data set from North Carolina, provided by the National Center on Assessment and Accountability for Special Education (NCAASE). I discuss ranges of sample sizes that are inadequate or adequate for convergence, absolute bias, relative bias, root mean squared error (RMSE), and coverage of individual parameter estimates. The current study, with the help of a detailed two-part simulation design for various sample sizes, model complexity and ICCs, provides various options of adequate sample sizes under different conditions. This study emphasizes that adequate sample sizes at either L1, L2, and L3 can be adjusted according to different interests in parameter estimates, different ranges of acceptable absolute bias, relative bias, root mean squared error, and coverage. Under different model complexity and varying ICC conditions, this study aims to help researchers identify L1, L2, and L3 sample size or both as the source of variation in absolute bias, relative bias, RMSE, or coverage proportions for a certain parameter estimate. This assists researchers in making better decisions for selecting adequate sample sizes in a three-level HLM analysis. A limitation of the study was the use of only a single distribution for the dependent and explanatory variables, different types of distributions and their effects might result in different sample size recommendations.<br>Dissertation/Thesis<br>Doctoral Dissertation Educational Psychology 2016
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography