To see the other types of publications on this topic, follow the link: Variable sample size methods.

Dissertations / Theses on the topic 'Variable sample size methods'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 29 dissertations / theses for your research on the topic 'Variable sample size methods.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.

1

Nataša, Krklec Jerinkić. "Line search methods with variable sample size." Phd thesis, Univerzitet u Novom Sadu, Prirodno-matematički fakultet u Novom Sadu, 2014. http://dx.doi.org/10.2298/NS20140117KRKLEC.

Full text
Abstract:
The problem under consideration is an unconstrained optimization&nbsp;problem with the objective function in the form of mathematical ex-pectation. The expectation is with respect to the random variable that represents the uncertainty. Therefore, the objective &nbsp;function is in fact deterministic. However, nding the analytical form of that objective function can be very dicult or even impossible. This is the reason why the sample average approximation is often used. In order to obtain reasonable good approximation of the objective function, we have to use relatively large sample size. We assume that the sample is generated at the beginning of the optimization process and therefore we can consider this sample average objective function as the deterministic one. However, applying some deterministic method on that sample average function from the start can be very costly. The number of evaluations of the function under expectation is a common way of measuring the cost of an algorithm. Therefore, methods that vary the sample size throughout the optimization process are developed. Most of them are trying to determine the optimal dynamics of increasing the sample size.The main goal of this thesis is to develop the clas of methods that&nbsp;can decrease the cost of an algorithm by decreasing the number of&nbsp;function evaluations. The idea is to decrease the sample size whenever&nbsp;it seems to be reasonable - roughly speaking, we do not want to impose&nbsp;a large precision, i.e. a large sample size when we are far away from the&nbsp;solution we search for. The detailed description of the new methods&nbsp;is presented in Chapter 4 together with the convergence analysis. It&nbsp;is shown that the approximate solution is of the same quality as the&nbsp;one obtained by dealing with the full sample from the start.Another important characteristic of the methods that are proposed&nbsp;here is the line search technique which is used for obtaining the sub-sequent iterates. The idea is to nd a suitable direction and to search&nbsp;along it until we obtain a sucient decrease in the &nbsp;function value. The&nbsp;sucient decrease is determined throughout the line search rule. In&nbsp;Chapter 4, that rule is supposed to be monotone, i.e. we are imposing&nbsp;strict decrease of the function value. In order to decrease the cost of&nbsp;the algorithm even more and to enlarge the set of suitable search directions, we use nonmonotone line search rules in Chapter 5. Within that chapter, these rules are modied to t the variable sample size framework. Moreover, the conditions for the global convergence and the R-linear rate are presented.&nbsp;In Chapter 6, numerical results are presented. The test problems&nbsp;are various - some of them are academic and some of them are real&nbsp;world problems. The academic problems are here to give us more&nbsp;insight into the behavior of the algorithms. On the other hand, data&nbsp;that comes from the real world problems are here to test the real&nbsp;applicability of the proposed algorithms. In the rst part of that&nbsp;chapter, the focus is on the variable sample size techniques. Different&nbsp;implementations of the proposed algorithm are compared to each other&nbsp;and to the other sample schemes as well. The second part is mostly&nbsp;devoted to the comparison of the various line search rules combined&nbsp;with dierent search directions in the variable sample size framework.&nbsp;The overall numerical results show that using the variable sample size&nbsp;can improve the performance of the algorithms signicantly, especially&nbsp;when the nonmonotone line search rules are used.The rst chapter of this thesis provides the background material&nbsp;for the subsequent chapters. In Chapter 2, basics of the nonlinear&nbsp;optimization are presented and the focus is on the line search, while&nbsp;Chapter 3 deals with the stochastic framework. These chapters are&nbsp;here to provide the review of the relevant known results, while the&nbsp;rest of the thesis represents the original contribution.&nbsp;<br>U okviru ove teze posmatra se problem optimizacije bez ograničenja pri čcemu je funkcija cilja u formi matematičkog očekivanja. Očekivanje se odnosi na slučajnu promenljivu koja predstavlja neizvesnost. Zbog toga je funkcija cilja, u stvari, deterministička veličina. Ipak, odredjivanje analitičkog oblika te funkcije cilja može biti vrlo komplikovano pa čak i nemoguće. Zbog toga se za aproksimaciju često koristi uzoračko očcekivanje. Da bi se postigla dobra aproksimacija, obično je neophodan obiman uzorak. Ako pretpostavimo da se uzorak realizuje pre početka procesa optimizacije, možemo posmatrati uzoračko očekivanje kao determinističku funkciju. Medjutim, primena nekog od determinističkih metoda direktno na tu funkciju&nbsp; moze biti veoma skupa jer evaluacija funkcije pod ocekivanjem često predstavlja veliki tro&scaron;ak i uobičajeno je da se ukupan tro&scaron;ak optimizacije meri po broju izračcunavanja funkcije pod očekivanjem. Zbog toga su razvijeni metodi sa promenljivom veličinom uzorka. Većcina njih je bazirana na odredjivanju optimalne dinamike uvećanja uzorka.Glavni cilj ove teze je razvoj algoritma koji, kroz smanjenje broja izračcunavanja funkcije, smanjuje ukupne tro&scaron;skove optimizacije. Ideja je da se veličina uzorka smanji kad god je to moguće. Grubo rečeno, izbegava se koriscenje velike preciznosti&nbsp; (velikog uzorka) kada smo daleko od re&scaron;senja. U čcetvrtom poglavlju ove teze opisana je nova klasa metoda i predstavljena je analiza konvergencije. Dokazano je da je aproksimacija re&scaron;enja koju dobijamo bar toliko dobra koliko i za metod koji radi sa celim uzorkom sve vreme.Jo&scaron; jedna bitna karakteristika metoda koji su ovde razmatrani je primena linijskog pretražzivanja u cilju odredjivanja naredne iteracije. Osnovna ideja je da se nadje odgovarajući pravac i da se duž njega vr&scaron;si pretraga za dužzinom koraka koja će dovoljno smanjiti vrednost funkcije. Dovoljno smanjenje je odredjeno pravilom linijskog pretraživanja. U čcetvrtom poglavlju to pravilo je monotono &scaron;to znači da zahtevamo striktno smanjenje vrednosti funkcije. U cilju jos većeg smanjenja tro&scaron;kova optimizacije kao i pro&scaron;irenja skupa pogodnih pravaca, u petom poglavlju koristimo nemonotona pravila linijskog pretraživanja koja su modifikovana zbog promenljive velicine uzorka. Takodje, razmatrani su uslovi za globalnu konvergenciju i R-linearnu brzinu konvergencije.Numerički rezultati su predstavljeni u &scaron;estom poglavlju. Test problemi su razliciti - neki od njih su akademski, a neki su realni. Akademski problemi su tu da nam daju bolji uvid u pona&scaron;anje algoritama. Sa druge strane, podaci koji poticu od stvarnih problema služe kao pravi test za primenljivost pomenutih algoritama. U prvom delu tog poglavlja akcenat je na načinu ažuriranja veličine uzorka. Različite varijante metoda koji su ovde predloženi porede se medjusobno kao i sa drugim &scaron;emama za ažuriranje veličine uzorka. Drugi deo poglavlja pretežno je posvećen poredjenju različitih pravila linijskog pretraživanja sa različitim pravcima pretraživanja u okviru promenljive veličine uzorka. Uzimajuci sve postignute rezultate u obzir dolazi se do zaključcka da variranje veličine uzorka može značajno popraviti učinak algoritma, posebno ako se koriste nemonotone metode linijskog pretraživanja.U prvom poglavlju ove teze opisana je motivacija kao i osnovni pojmovi potrebni za praćenje preostalih poglavlja. U drugom poglavlju je iznet pregled osnova nelinearne optimizacije sa akcentom na metode linijskog pretraživanja, dok su u trećem poglavlju predstavljene osnove stohastičke optimizacije. Pomenuta poglavlja su tu radi pregleda dosada&scaron;njih relevantnih rezultata dok je originalni doprinos ove teze predstavljen u poglavljima 4-6.
APA, Harvard, Vancouver, ISO, and other styles
2

Fernandes, Jessica Katherine de Sousa. "Estudo de algoritmos de otimização estocástica aplicados em aprendizado de máquina." Universidade de São Paulo, 2017. http://www.teses.usp.br/teses/disponiveis/45/45134/tde-28092017-182905/.

Full text
Abstract:
Em diferentes aplicações de Aprendizado de Máquina podemos estar interessados na minimização do valor esperado de certa função de perda. Para a resolução desse problema, Otimização estocástica e Sample Size Selection têm um papel importante. No presente trabalho se apresentam as análises teóricas de alguns algoritmos destas duas áreas, incluindo algumas variações que consideram redução da variância. Nos exemplos práticos pode-se observar a vantagem do método Stochastic Gradient Descent em relação ao tempo de processamento e memória, mas, considerando precisão da solução obtida juntamente com o custo de minimização, as metodologias de redução da variância obtêm as melhores soluções. Os algoritmos Dynamic Sample Size Gradient e Line Search with variable sample size selection apesar de obter soluções melhores que as de Stochastic Gradient Descent, a desvantagem se encontra no alto custo computacional deles.<br>In different Machine Learnings applications we can be interest in the minimization of the expected value of some loss function. For the resolution of this problem, Stochastic optimization and Sample size selection has an important role. In the present work, it is shown the theoretical analysis of some algorithms of these two areas, including some variations that considers variance reduction. In the practical examples we can observe the advantage of Stochastic Gradient Descent in relation to the processing time and memory, but considering accuracy of the solution obtained and the cost of minimization, the methodologies of variance reduction has the best solutions. In the algorithms Dynamic Sample Size Gradient and Line Search with variable sample size selection, despite of obtaining better solutions than Stochastic Gradient Descent, the disadvantage lies in their high computational cost.
APA, Harvard, Vancouver, ISO, and other styles
3

Andrea, Rožnjik. "Optimizacija problema sa stohastičkim ograničenjima tipa jednakosti – kazneni metodi sa promenljivom veličinom uzorka." Phd thesis, Univerzitet u Novom Sadu, Prirodno-matematički fakultet u Novom Sadu, 2019. https://www.cris.uns.ac.rs/record.jsf?recordId=107819&source=NDLTD&language=en.

Full text
Abstract:
U disertaciji je razmatran problem stohastičkog programiranja s ograničenjima tipa jednakosti, odnosno problem minimizacije s ograničenjima koja su u obliku matematičkog očekivanja. Za re&scaron;avanje posmatranog problema kreirana su dva iterativna postupka u kojima se u svakoj iteraciji računa s uzoračkim očekivanjem kao aproksimacijom matematičkog očekivanja. Oba postupka koriste prednosti postupaka s promenljivom veličinom uzorka zasnovanih na adaptivnom ažuriranju veličine uzorka. To znači da se veličina uzorka određuje na osnovu informacija u tekućoj iteraciji. Konkretno, tekuće informacije o preciznosti aproksimacije očekivanja i tačnosti aproksimacije re&scaron;enja problema defini&scaron;u veličinu uzorka za narednu iteraciju. Oba iterativna postupka su zasnovana na linijskom pretraživanju, a kako je u pitanju problem s ograničenjima, i na kvadratnom kaznenom postupku prilagođenom stohastičkom okruženju. Postupci su zasnovani na istim idejama, ali s različitim pristupom.Po prvom pristupu postupak je kreiran za re&scaron;avanje SAA reformulacije problema stohastičkog programiranja, dakle za re&scaron;avanje aproksimacije originalnog problema. To znači da je uzorak definisan pre iterativnog postupka, pa je analiza konvergencije algoritma deterministička. Pokazano je da se, pod standardnim pretpostavkama, navedenim algoritmom dobija podniz iteracija čija je tačka nagomilavanja KKT tačka SAA reformulacije.Po drugom pristupu je formiran algoritam za re&scaron;avanje samog problemastohastičkog programiranja, te je analiza konvergencije stohastička. Predstavljenim algoritmom se generi&scaron;e podniz iteracija čija je tačka nagomilavanja, pod standardnim pretpostavkama za stohastičku optimizaciju, skoro sigurnoKKT tačka originalnog problema.Predloženi algoritmi su implementirani na istim test problemima. Rezultati numeričkog testiranja prikazuju njihovu efikasnost u re&scaron;avanju posmatranih problema u poređenju s postupcima u kojima je ažuriranje veličine uzorkazasnovano na unapred definisanoj &scaron;emi. Za meru efikasnosti je upotrebljenbroj izračunavanja funkcija. Dakle, na osnovu rezultata dobijenih na skuputestiranih problema može se zaključiti da se adaptivnim ažuriranjem veličineuzorka može u&scaron;tedeti u broju evaluacija funkcija kada su u pitanju i problemi sograničenjima.Kako je posmatrani problem deterministički, a formulisani postupci su stohastički, prva tri poglavlja disertacije sadrže osnovne pojmove determinističkei stohastiˇcke optimizacije, ali i kratak pregled definicija i teorema iz drugihoblasti potrebnih za lak&scaron;e praćenje analize originalnih rezultata. Nastavak disertacije čini prikaz formiranih algoritama, analiza njihove konvergencije i numerička implementacija.&nbsp;<br>Stochastic programming problem with equality constraints is considered within thesis. More precisely, the problem is minimization problem with constraints in the form of mathematical expectation. We proposed two iterative methods for solving considered problem. Both procedures, in each iteration, use a sample average function instead of the mathematical expectation function, and employ the advantages of the variable sample size method based on adaptive updating the sample size. That means, the sample size is determined at every iteration using information from the current iteration. Concretely, the current precision of the approximation of expectation and the quality of the approximation of solution determine the sample size for the next iteration. Both iterative procedures are based on the line search technique as well as on the quadratic penalty method adapted to stochastic environment, since the considered problem has constraints. Procedures relies on same ideas, but the approach is different.By first approach, the algorithm is created for solving an SAA reformulation of the stochastic programming problem, i.e., for solving the approximation of the original problem. That means the sample size is determined before the iterative procedure, so the convergence analyses is deterministic. We show that, under the standard assumptions, the proposed algorithm generates a subsequence which accumulation point is the KKT point of the SAA problem. Algorithm formed by the second approach is for solving the stochastic programming problem, and therefore the convergence analyses is stochastic. It generates a subsequence with&nbsp; accumulation point that is almost surely the KKT point of the original problem, under the standard assumptions for stochastic optimization.for sample size. The number of function evaluations is used as measure of efficiency. Results of the set of tested problems suggest that it is possible to make smaller number of function evaluations by adaptive sample size scheduling in the case of constrained problems, too.Since the considered problem is deterministic, but the formed procedures are stochastic, the first three chapters of thesis contain basic notations of deterministic and stochastic optimization, as well as a short sight of definitions and theorems from another fields necessary for easier tracking the original results analysis. The rest of thesis consists of the presented algorithms, their convergence analysis and numerical implementation.
APA, Harvard, Vancouver, ISO, and other styles
4

Hagen, Clinton Ernest. "Comparing the performance of four calculation methods for estimating the sample size in repeated measures clinical trials where difference in treatment groups means is of interest." Oklahoma City : [s.n.], 2008.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
5

Suen, Wai-sing Alan, and 孫偉盛. "Sample size planning for clinical trials with repeated measurements." Thesis, The University of Hong Kong (Pokfulam, Hong Kong), 2004. http://hub.hku.hk/bib/B31972172.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Winkelried, Diego. "Methods to improve the finite sample behaviour of instrumental variable estimators." Thesis, University of Cambridge, 2011. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.609238.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Oymak, Okan. "Sample size determination for estimation of sensor detection probabilities based on a test variable." Thesis, Monterey, Calif. : Naval Postgraduate School, 2007. http://bosun.nps.edu/uhtbin/hyperion-image.exe/07Jun%5FOymak.pdf.

Full text
Abstract:
Thesis (M.S. in Operations Research)--Naval Postgraduate School, June 2007.<br>Thesis Advisor(s): Lyn R. Whitaker. "June 2007." Includes bibliographical references (p. 95-96). Also available in print.
APA, Harvard, Vancouver, ISO, and other styles
8

Tan, Say Beng. "Bayesian decision theoretic methods for clinical trials." Thesis, Imperial College London, 1999. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.312988.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Bofill, Roig Marta. "Statistical methods and software for clinical trials with binary and survival endpoints : efficiency, sample size and two-sample comparison." Doctoral thesis, Universitat Politècnica de Catalunya, 2020. http://hdl.handle.net/10803/670371.

Full text
Abstract:
Defining the scientific question is the starting point for any clinical study. However, even though the main objective is generally clear, how this is addressed is not usually straightforward. Clinical studies very often encompass several questions, defined as primary and secondary hypotheses, and measured through different endpoints. In clinical trials with multiple endpoints, composite endpoints, defined as the union of several endpoints, are widely used as primary endpoints. The use of composite endpoints is mainly motivated because they are expected to increase the number of observed events and to capture more information than by only considering one endpoint. Besides, it is generally thought that the power of the study will increase if using composite endpoints and that the treatment effect on the composite endpoint will be similar to the average effect of its components. However, these assertions are not necessarily true and the design of a trial with a composite endpoint might be difficult. Different types of endpoints might be chosen for different research stages. This is the case for cancer trials, where short-term binary endpoints based on the tumor response are common in early-phase trials, whereas overall survival is the gold standard in late-phase trials. In the recent years, there has been a growing interest in designing seamless trials with both early response outcome and later event times. Considering these two endpoints together could provide a wider characterization of the treatment effect and also may reduce the duration of clinical trials and their costs. In this thesis, we provide novel methodologies to design clinical trials with composite binary endpoints and to compare two treatment groups based on binary and time-to-event endpoints. In addition, we present the implementation of the methodologies by means of different statistical tools. Specifically, in Chapter 2, we propose a general strategy for sizing a trial with a composite binary endpoint as primary endpoint based on previous information on its components. In Chapter 3, we present the ARE (Asymptotic Relative Efficiency) method to choose between a composite binary endpoint or one of its components as the primary endpoint of a trial. In Chapter 4, we propose a class of two-sample nonparametric statistics for testing the equality of proportions and the equality of survival functions. In Chapter 5, we describe the software developed to implement the methods proposed in this thesis. In particular, we present CompARE, a web-based tool for designing clinical trials with composite endpoints and its corresponding R package, and the R package SurvBin in which we have implemented the class of statistics presented in Chapter 4. We conclude this dissertation with general conclusions and some directions for future research in Chapter 6.<br>La evaluación de la eficacia de los tratamientos es uno de los mayores retos en el diseño de ensayos clínicos. La variable principal cuantifica la respuesta clínica y define, en gran medida, el ensayo. Los ensayos clínicos generalmente abarcan varias cuestiones de interés. En estos casos, se establecen hipótesis primarias y secundarias, que son evaluadas a través de diferentes variables. Los ensayos clínicos con múltiples variables de interés utilizan frecuentemente las llamadas variables compuestas. Una variable compuesta se define como la unión de diversas variables de interés. La utilización de variables compuestas en lugar de variables simples estriba en que con éstas aumenta el número de eventos observados y se obtiene una información más completa sobre la respuesta al tratamiento. También se plantea a menudo, por un lado, que la potencia estadística del estudio es mayor si se usan variables compuestas y, por otro, que el efecto del tratamiento de la variable compuesta será similar al efecto medio de las variables que la componen. Sin embargo, estas afirmaciones no son necesariamente ciertas y el diseño de un estudio con una variable compuesta suele ser complejo. El tipo de variable escogida como variable principal puede diferir en las diferentes etapas de investigación. Por ejemplo, en el caso de estudios oncológicos, las variables binarias evaluadas a corto plazo son usadas en fases tempranas del desarrollo del tratamiento; mientras que en fases más avanzadas, las variables más usadas son tiempos de vida. En los últimos años, ha habido un interés creciente en el diseño de ensayos fase II/III con variables binarias y tiempos de vida. Este tipo de ensayos podría proporcionar una caracterización más amplia del efecto del tratamiento y también podría reducir la duración de los ensayos clínicos y sus costes. En esta tesis, proponemos nuevas metodologías, junto con el software estadístico correspondiente, para el diseño de ensayos clínicos con variables compuestas y para la comparación de dos grupos de tratamiento en base a variables binarias y tiempos de vida. Específicamente, en el capítulo 2, proponemos una estrategia para calcular el tamaño muestral de un ensayo con una variable compuesta como variable principal del estudio basado en la información previa sobre sus componentes. En el capítulo 3, presentamos el método ARE (Asymptotic Relative Efficiency) para elegir entre una variable compuesta o una de sus componentes como variable principal de un ensayo. En el capítulo 4, proponemos una clase de estadísticos no paramétricos para contrastar la igualdad de proporciones y la igualdad de las funciones de supervivencia. En el capítulo 5, describimos el software desarrollado para implementar los métodos propuestos en esta tesis. En particular, presentamos CompARE, una herramienta web para diseñar ensayos clínicos con variables compuestas y su correspondiente paquete R, y el paquete R SurvBin en el que hemos implementado la clase de estadísticos presentadas en el capítulo 4. La tesis concluye con un resumen de las principales aportaciones, algunas conclusiones de carácter general así como con una discusión sobre diversos problemas abiertos y futuras líneas de investigación.<br>L’avaluació de l’eficàcia dels tractaments és un dels grans reptes en el disseny d'assajos clínics. La variable principal quantifica la resposta clínica i defineix, en gran manera, l'assaig. Els assaigs clínics generalment inclouen diverses qüestions d’interès. En aquests casos, s'estableixen hipòtesis primàries i secundàries, que són avaluades mitjançant diferents variables. Els assajos clínics amb múltiples variables d’interès utilitzen freqüentment les anomenades variables compostes. Una variable composta es defineix com la unió de diverses variables d’interès. La utilització de variables compostes en lloc de variables simples rau en el fet que amb aquestes augmenta el nombre d'esdeveniments observats i s’obté una informació més completa sobre la resposta al tractament. També es planteja sovint, d'una banda, que la potència estadística de l'estudi és més gran si es fan servir variables compostes i, de l'altra, que l'efecte del tractament de la variable composta serà semblant a l'efecte mitjà de les variables que la composen. No obstant això, aquestes afirmacions no són necessàriament certes i el disseny d'un estudi amb una variable composta sol ser complex. El tipus de variable escollida com a variable principal pot diferir en les diferents etapes d’investigació. Per exemple, en el cas d'estudis oncològics, les variables binàries avaluades a curt termini són utilitzades en fases inicials; mentre que en fases més avançades, les variables més utilitzades són temps de vida. En els últims anys, hi ha hagut un interès creixent en el disseny d'assaigs fase II/III amb variables binàries i temps de vida. Aquest tipus d'assajos podria proporcionar una caracterització més àmplia de l'efecte del tractament i també podria reduir la durada dels assaigs clínics i els seus costos. En aquesta tesi, proposem noves metodologies, juntament amb el software estadístic corresponent, per al disseny d'assajos clínics amb variables compostes i per a la comparació de dos grups de tractament a partir de variables binàries i temps de vida. Específicament, en el capítol 2, proposem una estratègia per calcular la mida mostral d'un assaig amb una variable composta com a variable principal d'estudi basat en la informació prèvia sobre els seus components. En el capítol 3, presentem el mètode ARE (Asymptotic Relative Efficiency) per triar entre una variable composta o una de les seves components com a variable principal d'un assaig. En el capítol 4, proposem una classe d’estadístics no paramètrics per contrastar la igualtat de proporcions i la igualtat de les funcions de supervivència. En el capítol 5, descrivim el software desenvolupat per implementar els mètodes proposats en aquesta tesi. En particular, presentem CompARE, una eina web per dissenyar assajos clínics amb variables compostes i el seu corresponent paquet d'R, i el paquet d'R SurvBin on hem implementat la classe d’estadístics presentada en el capítol 4. La tesi conclou amb un resum de les principals aportacions, algunes conclusions de caràcter general així com amb una discussió sobre diversos problemes oberts i futures línies d’investigació.
APA, Harvard, Vancouver, ISO, and other styles
10

Matsouaka, Roland Albert. "Contributions to Imputation Methods Based on Ranks and to Treatment Selection Methods in Personalized Medicine." Thesis, Harvard University, 2012. http://dissertations.umi.com/gsas.harvard:10078.

Full text
Abstract:
The chapters of this thesis focus two different issues that arise in clinical trials and propose novel methods to address them. The first issue arises in the analysis of data with non-ignorable missing observations. The second issue concerns the development of methods that provide physicians better tools to understand and treat diseases efficiently by using each patient's characteristics and personal biomedical profile. Inherent to most clinical trials is the issue of missing data, specially those that arise when patients drop out the study without further measurements. Proper handling of missing data is crucial in all statistical analyses because disregarding missing observations can lead to biased results. In the first two chapters of this thesis, we deal with the "worst-rank score" missing data imputation technique in pretest-posttest clinical trials. Subjects are randomly assigned to two treatments and the response is recorded at baseline prior to treatment (pretest response), and after a pre-specified follow-up period (posttest response). The treatment effect is then assessed on the change in response from baseline to the end of follow-up time. Subjects with missing response at the end of follow-up are assign values that are worse than any observed response (worst-rank score). Data analysis is then conducted using Wilcoxon-Mann-Whitney test. In the first chapter, we derive explicit closed-form formulas for power and sample size calculations using both tied and untied worst-rank score imputation, where the worst-rank scores are either a fixed value (tied score) or depend on the time of withdrawal (untied score). We use simulations to demonstrate the validity of these formulas. In addition, we examine and compare four different simplification approaches to estimate sample sizes. These approaches depend on whether data from the literature or a pilot study are available. In second chapter, we introduce the weighted Wilcoxon-Mann-Whitney test on un-tied worst-rank score (composite) outcome. First, we demonstrate that the weighted test is exactly the ordinary Wilcoxon-Mann-Whitney test when the weights are equal. Then, we derive optimal weights that maximize the power of the corresponding weighted Wilcoxon-Mann-Whitney test. We prove, using simulations, that the weighted test is more powerful than the ordinary test. Furthermore, we propose two different step-wise procedures to analyze data using the weighted test and assess their performances through simulation studies. Finally, we illustrate the new approach using data from a recent randomized clinical trial of normobaric oxygen therapy on patients with acute ischemic stroke. The third and last chapter of this thesis concerns the development of robust methods for treatment groups identification in personalized medicine. As we know, physicians often have to use a trial-and-error approach to find the most effective medication for their patients. Personalized medicine methods aim at tailoring strategies for disease prevention, detection or treatment by using each individual subject's personal characteristics and medical profile. This would result to (1) better diagnosis and earlier interventions, (2) maximum therapeutic benefits and reduced adverse events, (3) more effective therapy, and (4) more efficient drug development. Novel methods have been proposed to identify subgroup of patients who would benefit from a given treatment. In the last chapter of this thesis, we develop a robust method for treatment assignment for future patients based on the expected total outcome. In addition, we provide a method to assess the incremental value of new covariate(s) in improving treatment assignment. We evaluate the accuracy of our methods through simulation studies and illustrate them with two examples using data from two HIV/AIDS clinical trials.
APA, Harvard, Vancouver, ISO, and other styles
11

Heard, Astrid. "APPLICATION OF STATISTICAL METHODS IN RISK AND RELIABILITY." Doctoral diss., University of Central Florida, 2005. http://digital.library.ucf.edu/cdm/ref/collection/ETD/id/2602.

Full text
Abstract:
The dissertation considers construction of confidence intervals for a cumulative distribution function F(z) and its inverse at some fixed points z and u on the basis of an i.i.d. sample where the sample size is relatively small. The sample is modeled as having the flexible Generalized Gamma distribution with all three parameters being unknown. This approach can be viewed as an alternative to nonparametric techniques which do not specify distribution of X and lead to less efficient procedures. The confidence intervals are constructed by objective Bayesian methods and use the Jeffreys noninformative prior. Performance of the resulting confidence intervals is studied via Monte Carlo simulations and compared to the performance of nonparametric confidence intervals based on binomial proportion. In addition, techniques for change point detection are analyzed and further evaluated via Monte Carlo simulations. The effect of a change point on the interval estimators is studied both analytically and via Monte Carlo simulations.<br>Ph.D.<br>Department of Mathematics<br>Arts and Sciences<br>Mathematics
APA, Harvard, Vancouver, ISO, and other styles
12

Walters, Stephen John. "The use of bootstrap methods for estimating sample size and analysing health-related quality of life outcomes (particularly the SF-36)." Thesis, University of Sheffield, 2003. http://etheses.whiterose.ac.uk/6053/.

Full text
Abstract:
Health-Related Quality of Life (HRQoL) measures are becoming increasingly used in clinical trials and health services research, both as primary and secondary outcome measures. Investigators are now asking statisticians for advice on how to plan (e. g. sample size) and analyse studies using HRQoI_ outcomes. HRQoL outcomes like the SF-36 are usually measured on an ordinal scale. However, most investigators assume that there exists an underlying continuous latent variable that measures HRQoL, and that the actual measured outcomes (the ordered categories), reflect contiguous intervals along this continuum. The ordinal scaling of HRQoL measures means they tend to generate data that have discrete, bounded and skewed distributions. Thus, standard methods of analysis such as the t-test and linear regression that assume Normality and constant variance may not be appropriate. For this reason, non-parametric methods are often used to analyse HRQoL data. The bootstrap is one such computer intensive non-parametric method for estimating sample sizes and analysing data. From a review of the literature, I found five methods of estimating sample sizes for two-group cross-sectional comparisons of HRQoL outcomes. All five methods (amongst other factors) require the specification of an effect size, which varies according to the method of sample size estimation. The empirical effect sizes calculated from the various datasets suggested that large differences in HRQoL (as measured by the SF-36) between groups are unlikely, particularly from the RCT comparisons. Most of the observed effect sizes were mainly in the 'small' to 'moderate' range (0.2 to 0.5) using Cohen's (1988) criteria. I compared the power of various methods of sample size estimation for two-group cross-sectional study designs via bootstrap simulation. The results showed that under the location shift alternative hypothesis, conventional methods of sample size estimation performed well, particularly Whitehead's (1993) method. Whitehead's method is recommended if the HRQoL outcome has a limited number of discrete values (< 7) and/or the expected proportion of cases at either of the bounds is high. If a pilot dataset is readily available (to estimate the shape of the distribution) then bootstrap simulation may provide a more accurate and reliable estimate, than conventional methods. Finally, I used the bootstrap for hypothesis testing and the estimation of standard errors and confidence intervals for parameters, in four datasets (which illustrate the different aspects of study design). I then compared and contrasted the bootstrap with standard methods of analysing HRQoL outcomes as described in Fayers and Machin (2000). Overall, in the datasets studied with the SF-36 outcome the use of the bootstrap for estimating sample sizes and analysing HRQoL data appears to produce results similar to conventional statistical methods. Therefore, the results of this thesis suggest that bootstrap methods are not more appropriate for analysing HRQoL outcome data than standard methods. This result requires confirmation with other HRQoL outcome measures, interventions and populations.
APA, Harvard, Vancouver, ISO, and other styles
13

Brungard, Colby W. "Alternative Sampling and Analysis Methods for Digital Soil Mapping in Southwestern Utah." DigitalCommons@USU, 2009. http://digitalcommons.usu.edu/etd/472.

Full text
Abstract:
Digital soil mapping (DSM) relies on quantitative relationships between easily measured environmental covariates and field and laboratory data. We applied innovative sampling and inference techniques to predict the distribution of soil attributes, taxonomic classes, and dominant vegetation across a 30,000-ha complex Great Basin landscape in southwestern Utah. This arid rangeland was characterized by rugged topography, diverse vegetation, and intricate geology. Environmental covariates calculated from digital elevation models (DEM) and spectral satellite data were used to represent factors controlling soil development and distribution. We investigated optimal sample size and sampled the environmental covariates using conditioned Latin Hypercube Sampling (cLHS). We demonstrated that cLHS, a type of stratified random sampling, closely approximated the full range of variability of environmental covariates in feature and geographic space with small sample sizes. Site and soil data were collected at 300 locations identified by cLHS. Random forests was used to generate spatial predictions and associated probabilities of site and soil characteristics. Balanced random forests and balanced and weighted random forests were investigated for their use in producing an overall soil map. Overall and class errors (referred to as out-of-bag [OOB] error) were within acceptable levels. Quantitative covariate importance was useful in determining what factors were important for soil distribution. Random forest spatial predictions were evaluated based on the conceptual framework developed during field sampling.
APA, Harvard, Vancouver, ISO, and other styles
14

Araújo, Elton Gean. "Métodos de amostragem e tamanho de amostra para avaliar o estado de maturação da uva Niágara Rosada." Universidade de São Paulo, 2008. http://www.teses.usp.br/teses/disponiveis/11/11134/tde-19032008-144154/.

Full text
Abstract:
O Estado de São Paulo é o principal produtor de uvas de mesa do país, sendo a Niágara Rosada (Vitis Labrusca) a cultivar predominante. Para se oferecer produtos de qualidade ao mercado, os produtores necessitam determinar, periodicamente, o estado de maturação das uvas, sendo o teor de sólidos solúveis (tss), a principal variável avaliada. Para essa determinação, utilizase a amostragem dos frutos na área cultivada. O presente trabalho discute, assim, os métodos de amostragem estratificado e aleatório, e o tamanho adequado da amostra de baga individual, para avaliar o estado de maturação da uva Niágara Rosada, com base no teor de sólidos solúveis. O tamanho adequado da amostra de baga individual foi encontrado para os dois métodos de amostragem, separadamente, utilizando-se os métodos Máxima Curvatura, Máxima Curvatura Modificado e Curva da Variancia. Os métodos de amostragem foram comparados utilizando-se uma análise univariada para dados com medidas repetidas, através dos procedimentos GLM e MIXED do SAS. Foram utilizados dois procedimentos, para que se produzisse resultados confiáveis. Os tamanhos mínimos de amostra de baga individual requeridos, para os métodos estratificado e aleatório foram aproximadamente 30 e 27 bagas por área, respectivamente. Os métodos de amostragem estudados apresentaram diferença significativa, e o método aleatório apresentou grande variação máxima e mínima por planta, devendo assim, ser evitado para esse tipo de estudo.<br>Sao Paulo state is the main table grape producer in Brazil, being the Niágara Rosada (Vitis Labrusca) the predominant cultivar. To offer quality products to the market, the producers need to determine, periodically, the grapes maturation state, being the content of soluble solids the main variable measured. To determine this content, a sample of fruits in an area is collected. This work approaches the random and the stratified sampling methods and the appropriate sample size of individual berry to evaluate the maturation state of the Niágara Rosada based on the content of soluble solids. The appropriate sample size for individual berry was obtained for two sampling methods, separately, using the Maximum Curvature, Modified Maximum Curvature and Variance Curve methods. The sampling methods were compared using a univariate analysis for repeated measures data using the SAS GLM and MIXED procedures. Two different procedures were used to attain reliable results. The minimum berry sample size required for stratified and random methods were approximately 30 and 27 berries by area, respectively. The sampling methods investigated present significantly different results, and the random method presented high maximum and minimum variation by plant and should be avoided for this kind of study.
APA, Harvard, Vancouver, ISO, and other styles
15

Leander, Aggeborn Noah, and Kristian Norgren. "An Empirical Study of Students’ Performance at Assessing Normality of Data Through Graphical Methods." Thesis, Uppsala universitet, Statistiska institutionen, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-385507.

Full text
Abstract:
When applying statistical methods for analyzing data, with normality as an assumption there are different procedures of determining if a sample is drawn from a normally distributed population. Because normality is such a central assumption, the reliability of the procedures is of most importance. Much research focus on how good formal tests of normality are, while the performance of statisticians when using graphical methods are far less examined. Therefore, the aim of the study was to empirically examine how good students in statistics are at assessing if samples are drawn from normally distributed populations through graphical methods, done by a web survey. The results of the study indicate that the students distinctly get better at accurately determining normality in data drawn from a normally distributed population when the sample size increases. Further, the students are very good at accurately rejecting normality of data when the sample is drawn from a symmetrical non-normal population and fairly good when the sample is drawn from an asymmetrical distribution. In comparison to some common formal tests of normality, the students' performance is superior at accurately rejecting normality for small sample sizes and inferior for large, when drawn from a non-normal population.
APA, Harvard, Vancouver, ISO, and other styles
16

Woo, Hin Kyeol. "Multiscale fractality with application and statistical modeling and estimation for computer experiment of nano-particle fabrication." Diss., Georgia Institute of Technology, 2012. http://hdl.handle.net/1853/45819.

Full text
Abstract:
The first chapter proposes multifractal analysis to measure inhomogeneity of regularity of 1H-NMR spectrum using wavelet-based multifractal tools. The geometric summaries of multifractal spectrum are informative summaries, and as such employed to discriminate 1H-NMR spectra associated with different treatments. The methodology is applied to evaluate the effect of sulfur amino acids. The second part of this thesis provides essential materials for understanding engineering background of a nano-particle fabrication process. The third chapter introduces a constrained random effect model. Since there are certain combinations of process variables resulting to unproductive process outcomes, a logistic model is used to characterize such a process behavior. For the cases with productive outcomes a normal regression serves the second part of the model. Additionally, random-effects are included in both logistics and normal regression models to describe the potential spatial correlation among data. This chapter researches a way to approximate the likelihood function and to find estimates for maximizing the approximated likelihood. The last chapter presents a method to decide the sample size under multi-layer system. The multi-layer is a series of layers, which become smaller and smaller. Our focus is to decide the sample size in each layer. The sample size decision has several objectives, and the most important purpose is the sample size should be enough to give a right direction to the next layer. Specifically, the bottom layer, which is the smallest neighborhood around the optimum, should meet the tolerance requirement. Performing the hypothesis test of whether the next layer includes the optimum gives the required sample size.
APA, Harvard, Vancouver, ISO, and other styles
17

Saha, Dibakar. "Improved Criteria for Estimating Calibration Factors for Highway Safety Manual (HSM) Applications." FIU Digital Commons, 2014. http://digitalcommons.fiu.edu/etd/1701.

Full text
Abstract:
The Highway Safety Manual (HSM) estimates roadway safety performance based on predictive models that were calibrated using national data. Calibration factors are then used to adjust these predictive models to local conditions for local applications. The HSM recommends that local calibration factors be estimated using 30 to 50 randomly selected sites that experienced at least a total of 100 crashes per year. It also recommends that the factors be updated every two to three years, preferably on an annual basis. However, these recommendations are primarily based on expert opinions rather than data-driven research findings. Furthermore, most agencies do not have data for many of the input variables recommended in the HSM. This dissertation is aimed at determining the best way to meet three major data needs affecting the estimation of calibration factors: (1) the required minimum sample sizes for different roadway facilities, (2) the required frequency for calibration factor updates, and (3) the influential variables affecting calibration factors. In this dissertation, statewide segment and intersection data were first collected for most of the HSM recommended calibration variables using a Google Maps application. In addition, eight years (2005-2012) of traffic and crash data were retrieved from existing databases from the Florida Department of Transportation. With these data, the effect of sample size criterion on calibration factor estimates was first studied using a sensitivity analysis. The results showed that the minimum sample sizes not only vary across different roadway facilities, but they are also significantly higher than those recommended in the HSM. In addition, results from paired sample t-tests showed that calibration factors in Florida need to be updated annually. To identify influential variables affecting the calibration factors for roadway segments, the variables were prioritized by combining the results from three different methods: negative binomial regression, random forests, and boosted regression trees. Only a few variables were found to explain most of the variation in the crash data. Traffic volume was consistently found to be the most influential. In addition, roadside object density, major and minor commercial driveway densities, and minor residential driveway density were also identified as influential variables.
APA, Harvard, Vancouver, ISO, and other styles
18

Schintler, Laurie A., and Manfred M. Fischer. "The Analysis of Big Data on Cites and Regions - Some Computational and Statistical Challenges." WU Vienna University of Economics and Business, 2018. http://epub.wu.ac.at/6637/1/2018%2D10%2D28_Big_Data_on_cities_and_regions_untrack_changes.pdf.

Full text
Abstract:
Big Data on cities and regions bring new opportunities and challenges to data analysts and city planners. On the one side, they hold great promise to combine increasingly detailed data for each citizen with critical infrastructures to plan, govern and manage cities and regions, improve their sustainability, optimize processes and maximize the provision of public and private services. On the other side, the massive sample size and high-dimensionality of Big Data and their geo-temporal character introduce unique computational and statistical challenges. This chapter provides overviews on the salient characteristics of Big Data and how these features impact on paradigm change of data management and analysis, and also on the computing environment.<br>Series: Working Papers in Regional Science
APA, Harvard, Vancouver, ISO, and other styles
19

Holzchuh, Ricardo. "Estudo da reprodutibilidade do exame de microscopia especular de córnea em amostras com diferentes números de células." Universidade de São Paulo, 2011. http://www.teses.usp.br/teses/disponiveis/5/5149/tde-01122011-111704/.

Full text
Abstract:
INTRODUÇÃO: O endotélio corneal exerce papel primordial para a fisiologia da córnea. Seus dados morfológicos gerados pelo microscópio especular (MEC) como densidade endotelial (DE), área celular média (ACM), coeficiente de variação (CV) e porcentagem de células hexagonais (HEX) são importantes para avaliar sua vitalidade. Para interpretar estes dados de forma padronizada e reprodutível, foi utilizado um programa estatístico de análise amostral, Cells Analyzer PAT. REQ.(CA). OBJETIVO: Demonstrar valores de referência para DE, ACM, CV e HEX. Demonstrar o percentual de células endoteliais marcadas e desconsideradas no exame ao marcar-se 40, 100 e 150 células em uma única imagem do mosaico endotelial e o perfil do intervalo de confiança (IC) das variáveis estudadas ao se considerar 40, 100, 150 e tantas células quantas indicadas pelo CA. Demonstrar o erro amostral de cada grupo estudado. MÉTODOS: Estudo transversal. Os exames de MEC foram realizados com o aparelho Konan NONCON ROBO® SP-8000, nos 122 olhos de 61 portadores de catarata (63,97 ± 8,15 anos de idade). As imagens endoteliais caracterizaram se pelo número de células marcadas e consideradas para cálculo dos seguintes dados: DE, ACM, CV e HEX. Os grupos foram formados de 40, 100, 150 células marcadas numa única imagem endotelial e Grupo CA em que foram marcadas tantas células quanto necessárias em diferentes imagens, para obter o erro relativo calculado inferior ao planejado (0,05), conforme orientação do programa CA. Estudou-se o efeito do número de células sobre IC para as variáveis endoteliais utilizadas. RESULTADOS: A média dos valores de referência encontrados para DE foi 2395,37 ± 294,34 cel/mm2; ACM 423,64 ± 51,09 m2; CV 0,40 ± 0,04 e HEX 54,77 ± 4,19%. O percentual de células endoteliais desconsideradas no Grupo 40 foi 51,20%; no Grupo 100, 35,07% e no Grupo 150, 29,83%. O número médio de células calculado inicialmente pelo CA foi 247,48 ± 51,61 e o número médio de células efetivamente incluídas no final do processo amostral foi 425,25 ± 102,24. O erro amostral dos exames no Grupo 40 foi 0,157 ± 0,031; Grupo 100, 0,093 ± 0,024; Grupo 150, 0,075 ± 0,010 e Grupo CA, 0,037 ± 0,005. O aumento do número de células diminuiu a amplitude do IC nos olhos direito e esquerdo para a DE em 75,79% e 77,39%; ACM em 75,95% e 77,37%; CV em 72,72% e 76,92%; HEX em 75,93% e 76,71%. CONCLUSÃO: Os valores de referência da DE foi 2395,37 ± 294,34 cel/mm2; ACM foi 423,64 ± 51,09 m2; CV foi 0,40 ± 0,04 e HEX foi 54,77 ± 4,19%. O percentual de células endoteliais desconsideradas no Grupo 40 foi 51,20%; no Grupo 100 foi 35,07% e no Grupo 150 foi 29,83%. O programa CA considerou correto os exames nos quais 425,25 ± 102,24 células foram marcadas entre duas e cinco imagens (erro relativo calculado de 0,037 ± 0,005). O aumento do número de células diminuiu a amplitude do IC para todas as variáveis endoteliais avaliadas pela MEC<br>INTRODUCTION: Corneal endothelium plays an important role in physiology of the cornea. Morphological data generated from specular microscope such as endothelial cell density (CD), average cell area (ACA), coefficient of variance (CV) and percentage of hexagonal cells (HEX) are important to analyze corneal status. For a standard and reproducible analysis of the morphological data, a sampling statistical software called Cells Analyzer PAT. REC (CA) was used. PURPOSE: To determine normal reference values of CD, ACA, CV and HEX. To analyze the percentage of marked and excluded cells when the examiner counted 40, 100, 150 cells in one endothelial image. To analyze the percentage of marked and excluded cells according to the statistical software. To determine the confidence interval of these morphological data. METHODS: Transversal study of 122 endothelial specular microscope image (Konan, non-contact NONCON ROBO® SP- 8000 Specular Microscope) of 61 human individuals with cataract (63.97 ± 8.15 years old) was analyzed statistically using CA. Each image was submitted to standard cell counting. 40 cells were counted in study Group 40; 100 cells were counted in study Group 100; and 150 cells were counted in study Group 150. In study group CA, the number of counted cells was determined by the statistical analysis software in order to achieve the most reliable clinical information (relative error < 0,05). Relative error of the morphological data generated by the specular microscope were then analyzed by statistical analysis using CA software. For Group CA, relative planned error was set as 0.05. RESULTS: The average normal reference value of CD was 2395.37 ± 294.34 cells/mm2, ACA was 423.64 ± 51.09 m2, CV was 0.40 ± 0.04 and HEX was 54.77 ± 4.19%. The percentage of cells excluded for analysis was 51.20% in Group 40; 35.07% in Group 100; and 29.83% in Group 150. The average number of cells calculated initially by the statistical software was 247.48 ± 51.61 cells and the average number of cells included in the final sampling process was 425.25 ± 102.24 cells. The average relative error was 0.157 ± 0.031 for Group 40; 0.093 ± 0.024 for Group 100; 0.075 ± 0.010 for Group 150 and 0.037 ± 0.005 for Group CA. The increase of the marked cells decreases the amplitude of confidence interval (right and left eyes respectively) in 75.79% and 77.39% for CD; 75.95% and 77.37% for ACA; 72.72% and 76.92% for CV; 75.93% and 76.71% for HEX. CONCLUSION: The average normal reference value of CD was 2395.37 ± 294.34 cells/mm2, ACA was 423.64 ± 51.09 m2, CV was 0.40 ± 0.04 and HEX was 54.77 ± 4.19%. The percentage of excluded cells for analysis was 51.20% in Group 40; 35.07% in Group 100 and 29.83% in Group 150. CA software has considered reliable data when 425.25 ± 102.24 cells were marked by the examiner in two to five specular images (calculated relative error of 0.037 ± 0.005). The increase of the marked cells decreases the amplitude of confidence interval for all morphological data generated by the specular microscope
APA, Harvard, Vancouver, ISO, and other styles
20

Marsal, Mora Josep Ramon. "Estimación del tamaño muestral requerido en el uso de variables de respuesta combinadas: nuevas aportaciones." Doctoral thesis, Universitat Autònoma de Barcelona, 2019. http://hdl.handle.net/10803/666768.

Full text
Abstract:
Se define como Variable de Resultado Combinada (VRC), la combinación de dos o más sucesos clínicamente relevantes en un único evento que se utilizará como variable de resultado principal en un ensayo clínico. Los eventos combinados deberían tener la misma importancia para el paciente, tener incidencias parecidas y el efecto de la intervención estudiada debería ser parecido. Una de las ventajas del uso de VRC es la reducción del Tamaño de Muestra Requerido (TMR) para demostrar el efecto de una intervención debido a un incremento de la potencia estadística. Su principal inconveniente es el incremento en cuanto a la complejidad tanto de análisis como de su interpretación. La cuantificación del TMR depende de la incidencia de ocurrencia de cada uno de los eventos combinados, del efecto que la intervención tiene sobre éstos y del grado en que se asocian los eventos entre sí. La forma en que afecta al TMR la probabilidad de ocurrencia y el efecto de la intervención es conocido ampliamente. No obstante, la influencia del grado de asociación entre los eventos que conforman la VRC en el TMR apenas ha sido explorado. En esta Tesis se realiza una aproximación pragmática en la creación de herramientas que objetivamente ayuden a los diferentes profesionales involucrados en el diseño de Ensayos Clínicos Aleatorizados al crear VRC binarias eficientes en cuanto al TMR. Previo al desarrollo de dicha herramienta se realiza un estudio formal tanto de la cuantificación del grado de asociación entre variables binarias como de la forma en que dicha asociación afecta al TMR cuando se define una VRC donde se combinan solamente dos eventos binarios. En una primera parte, se define y caracteriza la asociación entre variables binarias, estudiando en profundidad el concepto de asociación. Se listan diferentes formas de estimar la asociación y se definen diferentes métricas que servirán para compararlas entre ellas. Concluimos que la correlación de Pearson, a pesar de ser un buen estimador del grado de asociación, no es óptimo cuando se usa en el contexto de variables binarias, en comparación con la probabilidad conjunta o el grado relativo de solapamiento, que muestran mejores características. En una segunda parte, se identifican mediante simulación los escenarios en los que el uso de una VRC binaria es preferible al uso de un único Evento Relevante para reducir el TMR. Determinamos en qué sentido y magnitud afectan las variaciones en las incidencias, la magnitud del efecto de la intervención y, especialmente, el grado de asociación entre los distintos eventos. El grado de asociación puede determinar que la unión de dos eventos sea aconsejable o no para reducir el TMR. Finalmente, se desarrolla una herramienta que determina, a partir de un conjunto de eventos binarios, la combinación óptima para conseguir la máxima reducción del TMR. Esta herramienta se ha desarrollado teniendo en cuenta el perfil clínico de los usuarios. Ha sido programada utilizando software libre y es de acceso gratuito a todos los usuarios que lo deseen en https://uesca-apps.shinyapps.io/bincep.<br>We define a Composite Endpoint (CE) as the combination of two or more clinically relevant events in a unique event. The CE will be used as a primary endpoint in a clinical trial (CT). The events to combine must comply with characteristics such as similar incidence, similar magnitude of the intervention effect and importance for patients. The main advantage of the use of CE is the potential reduction on the Sample Size Requirement (SSR) resulting in an increase of statistical power (i.e. increase of the net number of patients with one or more events). On other hand, the main disadvantage of the use of CEs is an increase in the complexity of both the statistical analysis and the interpretation of results. The quantification of SSR depends mainly on the incidence and the effect of each of the combined events but also on the grade of association between the combined events. The impact of incidence and the magnitude of the effect of the combined events in the SSR quantification is well-known. However, the impact of the association between events has not been fully assessed. Using a pragmatic approximation, we have created a web application that quantifies the SSR when using a binary CE that is available for professionals designing CT (i.e. clinicians, trialists and biostatistics). As a previous step for the development of the tool, we studied in depth the concept of strength of association between two binary variables. We also studied the impact of the association between two binary events conforming a CE on the SSR. In the first section of this Thesis, we define and characterize the concept of association between binary events. We list different ways of quantifying the association and different metrics which will be used to compare them. We conclude that the Pearson’s correlation is not the best indicator of association between two binary variables. The joint probability (the probability of both events) or the overlap show better characteristics. In the second section, we define, using simulation, the scenarios where the use of a binary CE is better than a single relevant endpoint in terms of SSR reduction. We evaluate the impact of incidence, the impact of the magnitude of the intervention effect and the impact of the magnitude of association between the two binary events on the SSR. We conclude that the magnitude of the association will determine whether a combination of two endpoints in a CE is efficient in terms of SSR reduction. Finally, we develop Bin-CE, a free tool that calculates the SSR of a CE when combining a set of binary events. This tool identifies the combination of events which minimizes the SSR. It has been built under a clinical point-of-view. Bin-CE is accessible on: https://uesca-apps.shinyapps.io/bincep.
APA, Harvard, Vancouver, ISO, and other styles
21

Caltabiano, Ana Maria de Paula. "Gráficos de controle com tamanho de amostra variável : classificando sua estratégia conforme sua destinação por intermédio de um estudo bibliométrico /." Guaratinguetá, 2018. http://hdl.handle.net/11449/180553.

Full text
Abstract:
Orientador: Antonio Fernando Branco Costa<br>Resumo: Os gráficos de controle foram criados por Shewhart em torno de 1924. Desde então foram propostas muitas estratégias para melhorar o desempenho de tais ferramentas estatísticas. Dentre elas, destaca-se a estratégia dos parâmetros adaptativos, que deu origem a uma linha de pesquisa bastante fértil. Uma de suas vertentes está voltada ao gráfico de tamanho da amostra variável, que depende da posição do ponto amostral atual. Se ele está perto da linha central, a próxima amostra será pequena. Se ele está distante, mas ainda não na região de ação, a próxima amostra será grande. Este esquema de amostragem com tamanho de amostra variável se tornou conhecido com esquema VSS (variable sample size). Esta dissertação revisa os trabalhos da área de monitoramento de processos que tem como foco principal os esquemas VSS de amostragem. Foi feita uma revisão sistemática da literatura, por intermédio de uma análise bibliométrica do período de 1980 a 2018 com o objetivo de classificar a estratégia VSS, segundo sua destinação, por exemplo, os gráficos de com parâmetros conhecidos e observação independente. As destinações foram divididas em dez classes: I – tipo de VSS ; II – tipo de monitoramento; III – número de variáveis sob monitoramento; IV – tipo de gráfico; V – parâmetros do processo; VI – regras de sinalização; VII – natureza do processo; VIII – tipo de otimização; IX – modelo matemático das propriedades do gráfico; X – tipo de produção. A conclusão principal deste estudo foi que nas class... (Resumo completo, clicar acesso eletrônico abaixo)<br>Mestre
APA, Harvard, Vancouver, ISO, and other styles
22

Vong, Camille. "Model-Based Optimization of Clinical Trial Designs." Doctoral thesis, Uppsala universitet, Institutionen för farmaceutisk biovetenskap, 2014. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-233445.

Full text
Abstract:
General attrition rates in drug development pipeline have been recognized as a necessity to shift gears towards new methodologies that allow earlier and correct decisions, and the optimal use of all information accrued throughout the process. The quantitative science of pharmacometrics using pharmacokinetic-pharmacodynamic models was identified as one of the strategies core to this renaissance. Coupled with Optimal Design (OD), they constitute together an attractive toolkit to usher more rapidly and successfully new agents to marketing approval. The general aim of this thesis was to investigate how the use of novel pharmacometric methodologies can improve the design and analysis of clinical trials within drug development. The implementation of a Monte-Carlo Mapped power method permitted to rapidly generate multiple hypotheses and to adequately compute the corresponding sample size within 1% of the time usually necessary in more traditional model-based power assessment. Allowing statistical inference across all data available and the integration of mechanistic interpretation of the models, the performance of this new methodology in proof-of-concept and dose-finding trials highlighted the possibility to reduce drastically the number of healthy volunteers and patients exposed to experimental drugs. This thesis furthermore addressed the benefits of OD in planning trials with bio analytical limits and toxicity constraints, through the development of novel optimality criteria that foremost pinpoint information and safety aspects. The use of these methodologies showed better estimation properties and robustness for the ensuing data analysis and reduced the number of patients exposed to severe toxicity by 7-fold.  Finally, predictive tools for maximum tolerated dose selection in Phase I oncology trials were explored for a combination therapy characterized by main dose-limiting hematological toxicity. In this example, Bayesian and model-based approaches provided the incentive to a paradigm change away from the traditional rule-based “3+3” design algorithm. Throughout this thesis several examples have shown the possibility of streamlining clinical trials with more model-based design and analysis supports. Ultimately, efficient use of the data can elevate the probability of a successful trial and increase paramount ethical conduct.
APA, Harvard, Vancouver, ISO, and other styles
23

Huang, Guo-Tai, and 黃國泰. "A Study of Control Charts with Variable Sample Size." Thesis, 2004. http://ndltd.ncl.edu.tw/handle/51993886110283599484.

Full text
Abstract:
碩士<br>國立中山大學<br>應用數學系研究所<br>92<br>Shewhart X bar control charts with estimated control limits are widely used in practice. When the sample size is not fixed,we propose seven statistics to estimate the standard deviation sigma . These estimators are applied to estimate the control limits of Shewhart X bar control chart. The estimated results through simulated computation are given and discussed. Finally, we investigate the performance of the Shewhart X bar control charts based on the seven estimators of sigma via its simulated average run length (ARL).
APA, Harvard, Vancouver, ISO, and other styles
24

Yang, Hau-Yu, and 楊濠宇. "The study of Variable Sample Size Cpm Control Chart." Thesis, 2015. http://ndltd.ncl.edu.tw/handle/4ksa47.

Full text
APA, Harvard, Vancouver, ISO, and other styles
25

"Sample size determination for Poisson regression when the exposure variable contains misclassification errors." Tulane University, 1994.

Find full text
Abstract:
Sample size calculation methods for Poisson regression to detect linear trend in logarithm of incidence rates (multiplicative models) and incidence rates (additive models) over ordered exposure groups are developed. These methods are parallel to those of Bull (1993) for logistic regression Moreover, when reliable ancillary misclassification information is available, a slight modification of these calculation methods can be used to determine the required sample size based on the correction of the estimate of the trend parameter in the analysis stage. In which the correction methods is modified from Reade-Christopher and Kupper (1991) We find that, as would be expected, the gradient of incidence rates over these groups and misclassification rate strongly affect the sample size requirements. In a one year study, when exposure variable contains no misclassification, the sample size required varies from 5,054 to 64,534, according to different gradients. Moreover, when a misclassification rate of 30% is assumed, these numbers are multiplied by approximately 1.3 for all gradients The distribution of subjects across exposure groups also affect the sample size requirements. In environmental and occupational studies, subjects may be grouped according to the continuous exposure and the groups chosen are often terciles, quartiles or quintiles, i.e., even distribution over the exposure groups. We find that less sample size is required for this type of distribution Finally, although the use of correction methods reduces the bias of the estimates, there was always greater variance in the estimate than when no correction is used. It would appreciate that when the gradient of incidence rate is small and the misclassification is not severe, then, based on the percentage of the true parameter included in the 95% confidence interval, use of the correction method may not be necessary<br>acase@tulane.edu
APA, Harvard, Vancouver, ISO, and other styles
26

LAI, CHEN-YU, and 賴振鈺. "Comparison of the Variable Step-Size Maximum Power Point Tracking Methods for Solar Cells." Thesis, 2015. http://ndltd.ncl.edu.tw/handle/29136618900403216579.

Full text
Abstract:
碩士<br>國立臺灣科技大學<br>電機工程系<br>103<br>Conventional fixed step-size maximum power point tracking (MPPT) methods exhibit a trade-off between tracking speed and tracking accuracy. By contrast, variable step-size MPPT techniques can achieve satisfactory tracking accuracy and speed while maintains advantage of simple calculation. This thesis aimed to study the performance of three different variable step-size Perturb and Observe MPPT techniques. Initially, a MATLAB based photovoltaic generation system (PGS) model is developed and validated. Then, variable step-size MPPT methods are compared in terms of various indices such as steady state power, rise time, settling time and steady-state tracking accuracy under various operating conditions. In addition, the tuning procedure of the parameters for each method is also proposed. Finally, advantages and disadvantages of each method are then summarized and relevant design suggestions are proposed.
APA, Harvard, Vancouver, ISO, and other styles
27

Daniyal and 單尼爾. "A guideline to determine the training sample size when applying data mining methods in clinical decision making." Thesis, 2019. http://ndltd.ncl.edu.tw/handle/4g499k.

Full text
Abstract:
碩士<br>國立中央大學<br>生醫科學與工程學系<br>107<br>Background: Biomedicine is a field rich in a variety of heterogeneous, evolving, complex and unstructured data, coming from autonomous sources (i.e. heterogeneous, autonomous, complex and evolving (HACE) theorem). Acquisition of biomedical data takes time, and human power, and usually are very expensive. So, it is difficult to work with populations, and hence, researchers work with samples. In recent years, two growing concerns have overwhelmed in the healthcare area: use of small sample size for experiment and extraction of useful information from massive medical data (big data). Researchers have claimed that overfitting causes false positive (type I error) or false negative (type II error) in small sample size studies in the biomedicine field which produces exaggerated results that do not represent a true effect. On the other hand, in last few years, the volume of data is getting bigger and more complicated due to the continuous generation of data from many sources such as Functional magnetic resonance imaging (fMRI), computed tomography (CT) scan, Positron-emission tomography (PET)/ Single-photon emission computed tomography (SPECT) and Electroencephalogram (EEG). Big data mining has become the most fascinating and fastest growing area which enables the selection, exploring and modelling the vast amount of medical data to help clinical decision making, prevent medication error, and enhance patients’ outcomes. However, there are few challenges in big data, such as missing values, heterogeneous nature of data, the complexity of managing data, etc. that may affect the outcome. So, it is essential to find an appropriate process and algorithm for big data mining to extract useful information out of massive data. Up to date, however, there is no guideline for this, especially about a fair sample size that consists of paramount information for reliable results. Purpose: The goal of this study is to explore the relationship among sample size, statistical parameters and performance of machine learning (ML) methods to ascertain an optimal sample size. Moreover, the study also examines the impact of standard deviations on sample sizes by analyzing the performance of machine learning methods. Method: In this study, I used two kinds of data: experimental data and simulated data. Experimental data is comprised two datasets-the first dataset has 63 stroke patients' brain signals (continuous data), and the other is consist of 120 sleep diaries (discrete categorical data) and each diary records one-person data. To find an optimal sample size, first, I divided each experimental dataset into multiple sample sizes by taking 10% proportion of each dataset. Then, I used these sample sizes in the four most used machine learning methods such as Support vector machine (SVM), Decision tree, Naive Bayes, and Logistic Regression. The ten-fold cross-validation was used to evaluate the classification accuracy. I also measured the grand variance, Eigen value, proportion among the samples of each sample size. On the other hand, I generated artificial dataset by taking an average of real data; the generated data mimicked the real data. I used this dataset to examine the effect of standard deviation on the accuracy of the classifiers when sample sizes were systematically increased from small to large sample sizes. In last, I applied classifiers’ results of both experimental datasets into Receiver operating characteristic curve (ROC) graph to find an appropriate sample size and influence of classifiers’ performance on different sample sizes, small to large size. Results: The results depicted a significant effect of sample sizes on the accuracy of classifiers, data variances, Eigen Value, and proportion in all datasets. Stroke and Sleep datasets showed the intrinsic property in the performance of ML classifiers, data variances (parameter wise variance and subject wise variance), Eigen Value, and proportion of variance. I used this intrinsic property to design two criteria for deciding an appropriate sample size. According to criteria 1, a sample is considered an optimal sample size when the performances of classifiers achieve intrinsic behaviour simultaneously with data variation. In the second criteria, I have used performance, Eigen value and proportion to decide a suitable sample size. When these factors indicate a simultaneous intrinsic property on a specific sample size, then the sample size is considered as an effective sample size. In this study, both criteria suggested similar optimal sample sizes 250 in sleep dataset, although, eigen value showed a little variation as compared to variance between 250 to 500 sample sizes. The variation in eigen values decreased after 500 samples. Thus, due to this trivial variation, criteria II suggested 500 samples size as an effective sample size. It should be noted that if criteria I & II recommend two different sample sizes, then choose a sample size that achieves earlier simultaneous intrinsic property between performance and variance or among performance, eigen value and proportion on a sample size. last, I also designed a third criterion that is based on the receiver operating characteristic curve. The ROC graph illustrates that classifiers have a good performance when the sample sizes have a large size. The large sample sizes have position above the diagonal line. On the other, small sample sizes show worse performance, and they are allocated below the diagonal line. However, the performances of classifiers improve with increment in sample sizes. Conclusion: All the results assert that the sample size has a dramatic impact on the performance of ML methods and data variance. The increment in sample size gives a steady outcome of machine learning methods when data variation has negligible fluctuation. In addition, the intrinsic property of sample size helps us to find an optimal sample size when accuracy, Eigen value, proportion and variance become independent of increment in samples.
APA, Harvard, Vancouver, ISO, and other styles
28

Huang, Ching-Ting, and 黃靖婷. "A Study on the Subspace LDA Methods for Solving the Small Sample Size Problem in Face Recognition." Thesis, 2014. http://ndltd.ncl.edu.tw/handle/37350366836829966832.

Full text
Abstract:
碩士<br>國立清華大學<br>資訊工程學系<br>102<br>In face recognition, LDA often encounters the so-called “small sample size” (SSS) problem, also known as “curse of dimensionality”. This problem occurs when the dimensionality of the data is quite large in comparison to the number of available training images. One of the approaches for handling this situation is the subspace LDA. It is a two-stage framework: it first uses PCA-based method for dimensionality reduction, and then LDA-based method is applied for classification. In this thesis, we investigate four popular subspace LDA methods: “Fisherface”, “complete PCA plus LDA”, “IDAface” and “BDPCA plus LDA” and compare their effectiveness when handling the SSS problem in face recognition. Extensive experiments have been performed on three publically available face databases: the JAFFE, ORL and FEI databases. Experimental results show that among the subspace LDA methods under investigation, the performance of the BDPCA plus LDA method is the best for solving the SSS problem in face recognition.
APA, Harvard, Vancouver, ISO, and other styles
29

"Sample size approximation for estimating the arithmetic mean of a lognormally distributed random variable with and without Type I censoring." Tulane University, 1995.

Find full text
Abstract:
This research presents the approximate sample size needed to estimate the true arithmetic mean of a log normally distributed random variable to within a specified accuracy (100$\pi$) percent difference from the true arithmetic mean with geometric standard deviations (GSDs) up to 4.0 and with a specified level of confidence Exact minimum required sample sizes are based on the confidence interval width of Land's exact interval. For the non censored case, sample size tables and nomograms are presented. Box-Cox transformations were used to derive formulae for approximating these exact sample sizes. In addition, new formulae, adjusting the classic central limit approach, were derived. Each of these formulas as well as other existing formulas (the classical central limit approach and Hewett's (1995) formula) were compared to the exact sample size to determine under which conditions they perform optimally These comparisons lead to the following recommendations for the 95% confidence level: The Box-Cox transformation formula is recommended for GSD = 1.5 and 100$\pi>$ 20% levels; for GSDs of 2, 2.5 and 3 and 100$\pi>$ 20% levels; for GSD = 3 and 100$\pi>$ 30% levels; and for GSD = 4 and 100$\pi>$ 40% levels. The adjusted classical formula is recommended for GSD = 1.1 and all 100$\pi$ levels; for GSD = 1.5 and 100$\pi>$ 25% levels; and for GSDs of 2, 2.5 and 100$\pi\leq$ 20% levels. The classical formula is recommended for GSD = 3 and 100$\pi\leq$ 20% levels; for GSD = 3.5 and 100$\pi\leq$ 30% levels and for GSD = 4 and 100$\pi\leq$ 40% levels Exact sample size requirements are presented for samples where 10%, 20% and 50% are Type I left censored. These sample sizes are based on Land's exact confidence interval width of the bias corrected maximum likelihood estimates The accuracy of the confidence intervals based on these proposed sample sizes are evaluated using Monte Carlo simulations for both the non censored and censored case. These simulations showed conservative results regarding targeted significant levels<br>acase@tulane.edu
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography