Dissertations / Theses: 'Sample size'

1

Salar, Kemal. "Sample size for correlation estimates." Thesis, Monterey, California. Naval Postgraduate School, 1989. http://hdl.handle.net/10945/27248.

Full text

APA, Harvard, Vancouver, ISO, and other styles

2

Denne, Jonathan S. "Sequential procedures for sample size estimation." Thesis, University of Bath, 1996. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.320460.

Full text

APA, Harvard, Vancouver, ISO, and other styles

3

Jinks, R. C. "Sample size for multivariable prognostic models." Thesis, University College London (University of London), 2012. http://discovery.ucl.ac.uk/1354112/.

Full text

Abstract:

Prognosis is one of the central principles of medical practice; useful prognostic models are vital if clinicians wish to predict patient outcomes with any success. However, prognostic studies are often performed retrospectively, which can result in poorly validated models that do not become valuable clinical tools. One obstacle to planning prospective studies is the lack of sample size calculations for developing or validating multivariable models. The often used 5 or 10 events per variable (EPV) rule (Peduzzi and Concato, 1995) can result in small sample sizes which may lead to overfitting and optimism. This thesis investigates the issue of sample size in prognostic modelling, and develops calculations and recommendations which may improve prognostic study design. In order to develop multivariable prediction models, their prognostic value must be measurable and comparable. This thesis focuses on time-to-event data analysed with the Cox proportional hazards model, for which there are many proposed measures of prognostic ability. A measure of discrimination, the D statistic (Royston and Sauerbrei, 2004), is chosen for use in this work, as it has an appealing interpretation and direct relationship with a measure of explained variation. Real datasets are used to investigate how estimates of D vary with number of events. Seeking a better alternative to EPV rules, two sample size calculations are developed and tested for use where a target value of D is estimated: one based on significance testing and one on confidence interval width. The calculations are illustrated using real datasets; in general the sample sizes required are quite large. Finally, the usability of the new calculations is considered. To use the sample size calculations, researchers must estimate a target value of D, but this can be difficult if no previous study is available. To aid this, published D values from prognostic studies are collated into a ‘library’, which could be used to obtain plausible values of D to use in the calculations. To expand the library further an empirical conversion is developed to transform values of the more widely-used C-index (Harrell et al., 1984) to D.

APA, Harvard, Vancouver, ISO, and other styles

4

Callan, Peggy Ann. "Developmental sentence scoring sample size comparison." PDXScholar, 1990. https://pdxscholar.library.pdx.edu/open_access_etds/4170.

Full text

Abstract:

In 1971, Lee and Canter developed a systematic tool for assessing children's expressive language: Developmental Sentence Scoring (DSS). It provides normative data against which a child's delayed or disordered language development can be compared with the normal language of children the same age. A specific scoring system is used to analyze children's use of standard English grammatical rules from a tape-recorded sample of their spontaneous speech during conversation with a clinician. The corpus of sentences for the DSS is obtained from a sample of 50 complete, different, consecutive, intelligible, non-echolalic sentences elicited from a child in conversation with an adult using stimulus materials in which the child is interested. There is limited research on the reliability of language samples smaller and larger than 50 utterances for DSS analysis. The purpose of this study was to determine if there is a significant difference among the scores obtained from language samples of 25, 50, and 75 utterances when using the DSS procedure for children aged 6.0 to 6.6 years. Twelve children, selected on the basis of chronological age, normal receptive vocabulary skills, normal hearing, and a monolingual background, were chosen as subjects.

APA, Harvard, Vancouver, ISO, and other styles

5

Cámara, Hagen Luis Tomás. "A consensus based Bayesian sample size criterion." Thesis, National Library of Canada = Bibliothèque nationale du Canada, 2000. http://www.collectionscanada.ca/obj/s4/f2/dsk2/ftp03/MQ64329.pdf.

Full text

APA, Harvard, Vancouver, ISO, and other styles

6

Ahn, Jeongyoun Marron James Stephen. "High dimension, low sample size data analysis." Chapel Hill, N.C. : University of North Carolina at Chapel Hill, 2006. http://dc.lib.unc.edu/u?/etd,375.

Full text

Abstract:

Thesis (Ph. D.)--University of North Carolina at Chapel Hill, 2006. Title from electronic title page (viewed Oct. 10, 2007). "... in partial fulfillment of the requirements for the degree of Doctor of Philosophy in the Department of Statistics and Operations Research." Discipline: Statistics and Operations Research; Department/School: Statistics and Operations Research.

APA, Harvard, Vancouver, ISO, and other styles

7

Nataša, Krklec Jerinkić. "Line search methods with variable sample size." Phd thesis, Univerzitet u Novom Sadu, Prirodno-matematički fakultet u Novom Sadu, 2014. http://dx.doi.org/10.2298/NS20140117KRKLEC.

Full text

Abstract:

The problem under consideration is an unconstrained optimization problem with the objective function in the form of mathematical ex-pectation. The expectation is with respect to the random variable that represents the uncertainty. Therefore, the objective  function is in fact deterministic. However, nding the analytical form of that objective function can be very dicult or even impossible. This is the reason why the sample average approximation is often used. In order to obtain reasonable good approximation of the objective function, we have to use relatively large sample size. We assume that the sample is generated at the beginning of the optimization process and therefore we can consider this sample average objective function as the deterministic one. However, applying some deterministic method on that sample average function from the start can be very costly. The number of evaluations of the function under expectation is a common way of measuring the cost of an algorithm. Therefore, methods that vary the sample size throughout the optimization process are developed. Most of them are trying to determine the optimal dynamics of increasing the sample size.The main goal of this thesis is to develop the clas of methods that can decrease the cost of an algorithm by decreasing the number of function evaluations. The idea is to decrease the sample size whenever it seems to be reasonable - roughly speaking, we do not want to impose a large precision, i.e. a large sample size when we are far away from the solution we search for. The detailed description of the new methods is presented in Chapter 4 together with the convergence analysis. It is shown that the approximate solution is of the same quality as the one obtained by dealing with the full sample from the start.Another important characteristic of the methods that are proposed here is the line search technique which is used for obtaining the sub-sequent iterates. The idea is to nd a suitable direction and to search along it until we obtain a sucient decrease in the  function value. The sucient decrease is determined throughout the line search rule. In Chapter 4, that rule is supposed to be monotone, i.e. we are imposing strict decrease of the function value. In order to decrease the cost of the algorithm even more and to enlarge the set of suitable search directions, we use nonmonotone line search rules in Chapter 5. Within that chapter, these rules are modied to t the variable sample size framework. Moreover, the conditions for the global convergence and the R-linear rate are presented. In Chapter 6, numerical results are presented. The test problems are various - some of them are academic and some of them are real world problems. The academic problems are here to give us more insight into the behavior of the algorithms. On the other hand, data that comes from the real world problems are here to test the real applicability of the proposed algorithms. In the rst part of that chapter, the focus is on the variable sample size techniques. Different implementations of the proposed algorithm are compared to each other and to the other sample schemes as well. The second part is mostly devoted to the comparison of the various line search rules combined with dierent search directions in the variable sample size framework. The overall numerical results show that using the variable sample size can improve the performance of the algorithms signicantly, especially when the nonmonotone line search rules are used.The rst chapter of this thesis provides the background material for the subsequent chapters. In Chapter 2, basics of the nonlinear optimization are presented and the focus is on the line search, while Chapter 3 deals with the stochastic framework. These chapters are here to provide the review of the relevant known results, while the rest of the thesis represents the original contribution.  U okviru ove teze posmatra se problem optimizacije bez ograničenja pri čcemu je funkcija cilja u formi matematičkog očekivanja. Očekivanje se odnosi na slučajnu promenljivu koja predstavlja neizvesnost. Zbog toga je funkcija cilja, u stvari, deterministička veličina. Ipak, odredjivanje analitičkog oblika te funkcije cilja može biti vrlo komplikovano pa čak i nemoguće. Zbog toga se za aproksimaciju često koristi uzoračko očcekivanje. Da bi se postigla dobra aproksimacija, obično je neophodan obiman uzorak. Ako pretpostavimo da se uzorak realizuje pre početka procesa optimizacije, možemo posmatrati uzoračko očekivanje kao determinističku funkciju. Medjutim, primena nekog od determinističkih metoda direktno na tu funkciju  moze biti veoma skupa jer evaluacija funkcije pod ocekivanjem često predstavlja veliki tro&scaron;ak i uobičajeno je da se ukupan tro&scaron;ak optimizacije meri po broju izračcunavanja funkcije pod očekivanjem. Zbog toga su razvijeni metodi sa promenljivom veličinom uzorka. Većcina njih je bazirana na odredjivanju optimalne dinamike uvećanja uzorka.Glavni cilj ove teze je razvoj algoritma koji, kroz smanjenje broja izračcunavanja funkcije, smanjuje ukupne tro&scaron;skove optimizacije. Ideja je da se veličina uzorka smanji kad god je to moguće. Grubo rečeno, izbegava se koriscenje velike preciznosti  (velikog uzorka) kada smo daleko od re&scaron;senja. U čcetvrtom poglavlju ove teze opisana je nova klasa metoda i predstavljena je analiza konvergencije. Dokazano je da je aproksimacija re&scaron;enja koju dobijamo bar toliko dobra koliko i za metod koji radi sa celim uzorkom sve vreme.Jo&scaron; jedna bitna karakteristika metoda koji su ovde razmatrani je primena linijskog pretražzivanja u cilju odredjivanja naredne iteracije. Osnovna ideja je da se nadje odgovarajući pravac i da se duž njega vr&scaron;si pretraga za dužzinom koraka koja će dovoljno smanjiti vrednost funkcije. Dovoljno smanjenje je odredjeno pravilom linijskog pretraživanja. U čcetvrtom poglavlju to pravilo je monotono &scaron;to znači da zahtevamo striktno smanjenje vrednosti funkcije. U cilju jos većeg smanjenja tro&scaron;kova optimizacije kao i pro&scaron;irenja skupa pogodnih pravaca, u petom poglavlju koristimo nemonotona pravila linijskog pretraživanja koja su modifikovana zbog promenljive velicine uzorka. Takodje, razmatrani su uslovi za globalnu konvergenciju i R-linearnu brzinu konvergencije.Numerički rezultati su predstavljeni u &scaron;estom poglavlju. Test problemi su razliciti - neki od njih su akademski, a neki su realni. Akademski problemi su tu da nam daju bolji uvid u pona&scaron;anje algoritama. Sa druge strane, podaci koji poticu od stvarnih problema služe kao pravi test za primenljivost pomenutih algoritama. U prvom delu tog poglavlja akcenat je na načinu ažuriranja veličine uzorka. Različite varijante metoda koji su ovde predloženi porede se medjusobno kao i sa drugim &scaron;emama za ažuriranje veličine uzorka. Drugi deo poglavlja pretežno je posvećen poredjenju različitih pravila linijskog pretraživanja sa različitim pravcima pretraživanja u okviru promenljive veličine uzorka. Uzimajuci sve postignute rezultate u obzir dolazi se do zaključcka da variranje veličine uzorka može značajno popraviti učinak algoritma, posebno ako se koriste nemonotone metode linijskog pretraživanja.U prvom poglavlju ove teze opisana je motivacija kao i osnovni pojmovi potrebni za praćenje preostalih poglavlja. U drugom poglavlju je iznet pregled osnova nelinearne optimizacije sa akcentom na metode linijskog pretraživanja, dok su u trećem poglavlju predstavljene osnove stohastičke optimizacije. Pomenuta poglavlja su tu radi pregleda dosada&scaron;njih relevantnih rezultata dok je originalni doprinos ove teze predstavljen u poglavljima 4-6.

APA, Harvard, Vancouver, ISO, and other styles

8

Serra, Puertas Jorge. "Shrinkage corrections of sample linear estimators in the small sample size regime." Doctoral thesis, Universitat Politècnica de Catalunya, 2016. http://hdl.handle.net/10803/404386.

Full text

Abstract:

We are living in a data deluge era where the dimensionality of the data gathered by inexpensive sensors is growing at a fast pace, whereas the availability of independent samples of the observed data is limited. Thus, classical statistical inference methods relying on the assumption that the sample size is large, compared to the observation dimension, are suffering a severe performance degradation. Within this context, this thesis focus on a popular problem in signal processing, the estimation of a parameter, observed through a linear model. This inference is commonly based on a linear filtering of the data. For instance, beamforming in array signal processing, where a spatial filter steers the beampattern of the antenna array towards a direction to obtain the signal of interest (SOI). In signal processing the design of the optimal filters relies on the optimization of performance measures such as the Mean Square Error (MSE) and the Signal to Interference plus Noise Ratio (SINR). When the first two moments of the SOI are known, the optimization of the MSE leads to the Linear Minimum Mean Square Error (LMMSE). When such statistical information is not available one may force a no distortion constraint towards the SOI in the optimization of the MSE, which is equivalent to maximize the SINR. This leads to the Minimum Variance Distortionless Response (MVDR) method. The LMMSE and MVDR are optimal, though unrealizable in general, since they depend on the inverse of the data correlation, which is not known. The common approach to circumvent this problem is to substitute it for the inverse of the sample correlation matrix (SCM), leading to the sample LMMSE and sample MVDR. This approach is optimal when the number of available statistical samples tends to infinity for a fixed observation dimension. This large sample size scenario hardly holds in practice and the sample methods undergo large performance degradations in the small sample size regime, which may be due to short stationarity constraints or to a system with a high observation dimension. The aim of this thesis is to propose corrections of sample estimators, such as the sample LMMSE and MVDR, to circumvent their performance degradation in the small sample size regime. To this end, two powerful tools are used, shrinkage estimation and random matrix theory (RMT). Shrinkage estimation introduces a structure on the filters that forces some corrections in small sample size situations. They improve sample based estimators by optimizing a bias variance tradeoff. As direct optimization of these shrinkage methods leads to unrealizable estimators, then a consistent estimate of these optimal shrinkage estimators is obtained, within the general asymptotics where both the observation dimension and the sample size tend to infinity, but at a fixed rate. That is, RMT is used to obtain consistent estimates within an asymptotic regime that deals naturally with the small sample size. This RMT approach does not require any assumptions about the distribution of the observations. The proposed filters deal directly with the estimation of the SOI, which leads to performance gains compared to related work methods based on optimizing a metric related to the data covariance estimate or proposing rather ad-hoc regularizations of the SCM. Compared to related work methods which also treat directly the estimation of the SOI and which are based on a shrinkage of the SCM, the proposed filter structure is more general. It contemplates corrections of the inverse of the SCM and considers the related work methods as particular cases. This leads to performance gains which are notable when there is a mismatch in the signature vector of the SOI. This mismatch and the small sample size are the main sources of degradation of the sample LMMSE and MVDR. Thus, in the last part of this thesis, unlike the previous proposed filters and the related work, we propose a filter which treats directly both sources of degradation. Estamos viviendo en una era en la que la dimensión de los datos, recogidos por sensores de bajo precio, está creciendo a un ritmo elevado, pero la disponibilidad de muestras estadísticamente independientes de los datos es limitada. Así, los métodos clásicos de inferencia estadística sufren una degradación importante, ya que asumen un tamaño muestral grande comparado con la dimensión de los datos. En este contexto, esta tesis se centra en un problema popular en procesado de señal, la estimación lineal de un parámetro observado mediante un modelo lineal. Por ejemplo, la conformación de haz en procesado de agrupaciones de antenas, donde un filtro enfoca el haz hacia una dirección para obtener la señal asociada a una fuente de interés (SOI). El diseño de los filtros óptimos se basa en optimizar una medida de prestación como el error cuadrático medio (MSE) o la relación señal a ruido más interferente (SINR). Cuando hay información sobre los momentos de segundo orden de la SOI, la optimización del MSE lleva a obtener el estimador lineal de mínimo error cuadrático medio (LMMSE). Cuando esa información no está disponible, se puede forzar la restricción de no distorsión de la SOI en la optimización del MSE, que es equivalente a maximizar la SINR. Esto conduce al estimador de Capon (MVDR). El LMMSE y MVDR son óptimos, pero no son realizables, ya que dependen de la inversa de la matriz de correlación de los datos, que no es conocida. El procedimiento habitual para solventar este problema es sustituirla por la inversa de la correlación muestral (SCM), esto lleva al LMMSE y MVDR muestral. Este procedimiento es óptimo cuando el tamaño muestral tiende a infinito y la dimensión de los datos es fija. En la práctica este tamaño muestral elevado no suele producirse y los métodos LMMSE y MVDR muestrales sufren una degradación importante en este régimen de tamaño muestral pequeño. Éste se puede deber a periodos cortos de estacionariedad estadística o a sistemas cuya dimensión sea elevada. El objetivo de esta tesis es proponer correcciones de los estimadores LMMSE y MVDR muestrales que permitan combatir su degradación en el régimen de tamaño muestral pequeño. Para ello se usan dos herramientas potentes, la estimación shrinkage y la teoría de matrices aleatorias (RMT). La estimación shrinkage introduce una estructura de los estimadores que mejora los estimadores muestrales mediante la optimización del compromiso entre media y varianza del estimador. La optimización directa de los métodos shrinkage lleva a métodos no realizables. Por eso luego se propone obtener una estimación consistente de ellos en el régimen asintótico en el que tanto la dimensión de los datos como el tamaño muestral tienden a infinito, pero manteniendo un ratio constante. Es decir RMT se usa para obtener estimaciones consistentes en un régimen asintótico que trata naturalmente las situaciones de tamaño muestral pequeño. Esta metodología basada en RMT no requiere suposiciones sobre el tipo de distribución de los datos. Los filtros propuestos tratan directamente la estimación de la SOI, esto lleva a ganancias de prestaciones en comparación a otros métodos basados en optimizar una métrica relacionada con la estimación de la covarianza de los datos o regularizaciones ad hoc de la SCM. La estructura de filtro propuesta es más general que otros métodos que también tratan directamente la estimación de la SOI y que se basan en un shrinkage de la SCM. Contemplamos correcciones de la inversa de la SCM y los métodos del estado del arte son casos particulares. Esto lleva a ganancias de prestaciones que son notables cuando hay una incertidumbre en el vector de firma asociado a la SOI. Esa incertidumbre y el tamaño muestral pequeño son las fuentes de degradación de los LMMSE y MVDR muestrales. Así, en la última parte de la tesis, a diferencia de métodos propuestos previamente en la tesis y en la literatura, se propone un filtro que trata de forma directa ambas fuentes de degradación.

APA, Harvard, Vancouver, ISO, and other styles

9

Banton, Dwaine Stephen. "A BAYESIAN DECISION THEORETIC APPROACH TO FIXED SAMPLE SIZE DETERMINATION AND BLINDED SAMPLE SIZE RE-ESTIMATION FOR HYPOTHESIS TESTING." Diss., Temple University Libraries, 2016. http://cdm16002.contentdm.oclc.org/cdm/ref/collection/p245801coll10/id/369007.

Full text

Abstract:

Statistics Ph.D. This thesis considers two related problems that has application in the field of experimental design for clinical trials: • fixed sample size determination for parallel arm, double-blind survival data analysis to test the hypothesis of no difference in survival functions, and • blinded sample size re-estimation for the same. For the first problem of fixed sample size determination, a method is developed generally for testing of hypothesis, then applied particularly to survival analysis; for the second problem of blinded sample size re-estimation, a method is developed specifically for survival analysis. In both problems, the exponential survival model is assumed. The approach we propose for sample size determination is Bayesian decision theoretical, using explicitly a loss function and a prior distribution. The loss function used is the intrinsic discrepancy loss function introduced by Bernardo and Rueda (2002), and further expounded upon in Bernardo (2011). We use a conjugate prior, and investigate the sensitivity of the calculated sample sizes to specification of the hyper-parameters. For the second problem of blinded sample size re-estimation, we use prior predictive distributions to facilitate calculation of the interim test statistic in a blinded manner while controlling the Type I error. The determination of the test statistic in a blinded manner continues to be nettling problem for researchers. The first problem is typical of traditional experimental designs, while the second problem extends into the realm of adaptive designs. To the best of our knowledge, the approaches we suggest for both problems have never been done hitherto, and extend the current research on both topics. The advantages of our approach, as far as we see it, are unity and coherence of statistical procedures, systematic and methodical incorporation of prior knowledge, and ease of calculation and interpretation. Temple University--Theses

APA, Harvard, Vancouver, ISO, and other styles

10

Timberlake, Allison M. "Sample Size in Ordinal Logistic Hierarchical Linear Modeling." Digital Archive @ GSU, 2011. http://digitalarchive.gsu.edu/eps_diss/72.

Full text

Abstract:

Most quantitative research is conducted by randomly selecting members of a population on which to conduct a study. When statistics are run on a sample, and not the entire population of interest, they are subject to a certain amount of error. Many factors can impact the amount of error, or bias, in statistical estimates. One important factor is sample size; larger samples are more likely to minimize bias than smaller samples. Therefore, determining the necessary sample size to obtain accurate statistical estimates is a critical component of designing a quantitative study. Much research has been conducted on the impact of sample size on simple statistical techniques such as group mean comparisons and ordinary least squares regression. Less sample size research, however, has been conducted on complex techniques such as hierarchical linear modeling (HLM). HLM, also known as multilevel modeling, is used to explain and predict an outcome based on knowledge of other variables in nested populations. Ordinal logistic HLM (OLHLM) is used when the outcome variable has three or more ordered categories. While there is a growing body of research on sample size for two-level HLM utilizing a continuous outcome, there is no existing research exploring sample size for OLHLM. The purpose of this study was to determine the impact of sample size on statistical estimates for ordinal logistic hierarchical linear modeling. A Monte Carlo simulation study was used to investigate this research query. Four variables were manipulated: level-one sample size, level-two sample size, sample outcome category allocation, and predictor-criterion correlation. Statistical estimates explored include bias in level-one and level-two parameters, power, and prediction accuracy. Results indicate that, in general, holding other conditions constant, bias decreases as level-one sample size increases. However, bias increases or remains unchanged as level-two sample size increases, holding other conditions constant. Power to detect the independent variable coefficients increased as both level-one and level-two sample size increased, holding other conditions constant. Overall, prediction accuracy is extremely poor. The overall prediction accuracy rate across conditions was 47.7%, with little variance across conditions. Furthermore, there is a strong tendency to over-predict the middle outcome category.

APA, Harvard, Vancouver, ISO, and other styles

11

Medeiros, José António Amaro Correia. "Optimal sample size for assessing bacterioneuston structural diversity." Master's thesis, Universidade de Aveiro, 2011. http://hdl.handle.net/10773/10901.

Full text

Abstract:

Mestrado em Biologia Aplicada - Microbiologia Clínica e Ambiental The surface microlayer (SML) is located at the interface atmospherehydrosphere and is theoretically defined as the top millimeter of the water column. However, the SML is operationally defined according to the sampling method used and the thickness varies with weather conditions and organic matter content, among other factors. The SML is a very dynamic compartment of the water column involved in the process of transport of materials between the hydrosphere and the atmosphere. Bacterial communities inhabiting the SML (bacterioneuston) are expected to be adapted to the particular SML environment which is characterized by physical and chemical stress associated to surface tension, high exposure to solar radiation and accumulation of hydrophobic compounds, some of which pollutants. However, the small volumes of SML water obtained with the different sampling methods reported in the literature, make the sampling procedure laborious and time-consuming. Sample size becomes even more critical when microcosm experiments are designed. The objective of this work was to determine the smallest sample size that could be used to assess bacterioneuston diversity by culture independent methods without compromising representativeness and therefore ecological significance. For that, two extraction methods were tested on samples of 0,5 mL, 5 mL and 10 mL of natural SML obtained at the estuarine system Ria de Aveiro. After DNA extraction, community structure was assessed by DGGE profiling of rRNA gene sequences. The CTAB-extraction procedure was selected as the most efficient extraction method and was later used with larger samples (1 mL, 20 mL and 50 mL). The DNA obtained was once more analyzed by DGGE and the results showed that the estimated diversity of the communities does not increase proportionally with increasing sample size and that a good estimate of the structural diversity of bacterioneuston communities can be obtained with very small samples. A microcamada superficial marinha (SML) situa-se na interface atmosferahidrosfera e teoricamente é definida como o milímetro mais superficial da coluna de água. Operacionalmente, a espessura da SML depende do método de amostragem utilizado e é também variável com outros fatores, nomeadamente, as condições meteorológicas e teor de matéria orgânica, entre outros. A SML é um compartimento muito dinâmico da coluna de água que está envolvida no processo de transporte de materiais entre a hidrosfera e a atmosfera. As comunidades bacterianas que habitam na SML são designadas de bacterioneuston e existem indícios de que estão adaptadas ao ambiente particular da SML, caracterizado por stresse físico e químico associado à tensão superficial, alta exposição à radiação solar e acumulação de compostos hidrofóbicos, alguns dos quais poluentes de elevada toxicidade. No entanto, o reduzido volume de água da SML obtidos em cada colheita individual com os diferentes dispositivos de amostragem reportados na literatura, fazem com que o procedimento de amostragem seja laborioso e demorado. O tamanho da amostra torna-se ainda mais crítico em experiências de microcosmos. O objectivo deste trabalho foi avaliar se amostras de pequeno volume podem ser usadas para avaliar a diversidade do bacterioneuston, através de métodos de cultura independente, sem comprometer a representatividade, e o significado ecológico dos resultados. Para isso, foram testados dois métodos de extracção em amostras de 0,5 mL, 5 mL e 10 mL de SML obtida no sistema estuarino da Ria de Aveiro. Após a extracção do DNA total, a estrutura da comunidade bacteriana foi avaliada através do perfil de DGGE das sequências de genes que codificam para a sub unidade 16S do rRNA. O procedimento de extracção com brometo de cetil trimetil de amônia (CTAB) foi selecionado como sendo o método de extração com melhor rendimento em termos de diversidade do DNA e mais tarde foi aplicado a amostras de maior dimensão (1 mL, 20 mL e 50 mL). O DNA obtido foi mais uma vez usado para análise dos perfis de DGGE de 16S rDNA da comunidade e os resultados mostraram que a estimativa da diversidade de microorganismos não aumentou proporcionalmente com o aumento do tamanho da amostra e que com amostras de pequeno volume podem ser obtidas boas estimativas da diversidade estrutural das comunidades de bacterioneuston.

APA, Harvard, Vancouver, ISO, and other styles

12

You, Zhiying. "Power and sample size of cluster randomized trials." Thesis, Birmingham, Ala. : University of Alabama at Birmingham, 2008. https://www.mhsl.uab.edu/dt/2009r/you.pdf.

Full text

APA, Harvard, Vancouver, ISO, and other styles

13

Kang, Qing. "Nonparametric tests of median for a size-biases sample /." Search for this dissertation online, 2005. http://wwwlib.umi.com/cr/ksu/main.

Full text

APA, Harvard, Vancouver, ISO, and other styles

14

Suen, Wai-sing Alan. "Sample size planning for clinical trials with repeated measurements." Click to view the E-thesis via HKUTO, 2004. http://sunzi.lib.hku.hk/hkuto/record/B31972172.

Full text

APA, Harvard, Vancouver, ISO, and other styles

15

Tse, Kwok Ho. "Sample size calculation : influence of confounding and interaction effects /." View abstract or full-text, 2006. http://library.ust.hk/cgi/db/thesis.pl?MATH%202006%20TSE.

Full text

APA, Harvard, Vancouver, ISO, and other styles

16

Suen, Wai-sing Alan, and 孫偉盛. "Sample size planning for clinical trials with repeated measurements." Thesis, The University of Hong Kong (Pokfulam, Hong Kong), 2004. http://hub.hku.hk/bib/B31972172.

Full text

APA, Harvard, Vancouver, ISO, and other styles

17

Islam, A. F. M. Saiful. "Loss functions, utility functions and Bayesian sample size determination." Thesis, Queen Mary, University of London, 2011. http://qmro.qmul.ac.uk/xmlui/handle/123456789/1259.

Full text

Abstract:

This thesis consists of two parts. The purpose of the first part of the research is to obtain Bayesian sample size determination (SSD) using loss or utility function with a linear cost function. A number of researchers have studied the Bayesian SSD problem. One group has considered utility (loss) functions and cost functions in the SSD problem and others not. Among the former most of the SSD problems are based on a symmetrical squared error (SE) loss function. On the other hand, in a situation when underestimation is more serious than overestimation or vice-versa, then an asymmetric loss function should be used. For such a loss function how many observations do we need to take to estimate the parameter under study? We consider different types of asymmetric loss functions and a linear cost function for sample size determination. For the purposes of comparison, firstly we discuss the SSD for a symmetric squared error loss function. Then we consider the SSD under different types of asymmetric loss functions found in the literature. We also introduce a new bounded asymmetric loss function and obtain SSD under this loss function. In addition, to estimate a parameter following a particular model, we present some theoretical results for the optimum SSD problem under a particular choice of loss function. We also develop computer programs to obtain the optimum SSD where the analytic results are not possible. In the two parameter exponential family it is difficult to estimate the parameters when both are unknown. The aim of the second part is to obtain an optimum decision for the two parameter exponential family under the two parameter conjugate utility function. In this case we discuss Lindley’s (1976) optimum decision for one 6 parameter exponential family under the conjugate utility function for the one parameter exponential family and then extend the results to the two parameter exponential family. We propose a two parameter conjugate utility function and then lay out the approximation procedure to make decisions on the two parameters. We also offer a few examples, normal distribution, trinomial distribution and inverse Gaussian distribution and provide the optimum decisions on both parameters of these distributions under the two parameter conjugate utility function.

APA, Harvard, Vancouver, ISO, and other styles

18

Cunningham, Tina. "Power and Sample Size for Three-Level Cluster Designs." VCU Scholars Compass, 2010. http://scholarscompass.vcu.edu/etd/148.

Full text

Abstract:

Over the past few decades, Cluster Randomized Trials (CRT) have become a design of choice in many research areas. One of the most critical issues in planning a CRT is to ensure that the study design is sensitive enough to capture the intervention effect. The assessment of power and sample size in such studies is often faced with many challenges due to several methodological difficulties. While studies on power and sample size for cluster designs with one and two levels are abundant, the evaluation of required sample size for three-level designs has been generally overlooked. First, the nesting effect introduces more than one intracluster correlation into the model. Second, the variance structure of the estimated treatment difference is more complicated. Third, sample size results required for several levels are needed. In this work, we developed sample size and power formulas for the three-level data structures based on the generalized linear mixed model approach. We derived explicit and general power and sample size equations for detecting a hypothesized effect on continuous Gaussian outcomes and binary outcomes. To confirm the accuracy of the formulas, we conducted several simulation studies and compared the results. To establish a connection between the theoretical formulas and their applications, we developed a SAS user-interface macro that allowed the researchers to estimate sample size for a three-level design for different scenarios. These scenarios depend on which randomization level is assigned and whether or not there is an interaction effect.

APA, Harvard, Vancouver, ISO, and other styles

19

Gibbons, Christopher. "Determination of power and sample size for Levene's test." Connect to online resource, 2007. http://gateway.proquest.com/openurl?url_ver=Z39.88-2004&rft_val_fmt=info:ofi/fmt:kev:mtx:dissertation&res_dat=xri:pqdiss&rft_dat=xri:pqdiss:1447667.

Full text

APA, Harvard, Vancouver, ISO, and other styles

20

McGrath, Neill. "Effective sample size in order statistics of correlated data." [Boise, Idaho] : Boise State University, 2009. http://scholarworks.boisestate.edu/td/32/.

Full text

APA, Harvard, Vancouver, ISO, and other styles

21

Chang, Yu-Wei. "Sample Size Determination for a Three-arm Biosimilar Trial." Diss., Temple University Libraries, 2014. http://cdm16002.contentdm.oclc.org/cdm/ref/collection/p245801coll10/id/298932.

Full text

Abstract:

Statistics Ph.D. The equivalence assessment usually consists of three tests and is often conducted through a three-arm clinical trial. The first two tests are to demonstrate the superiority of the test treatment and the reference treatment to placebo, and they are followed by the equivalence test between the test treatment and the reference treatment. The equivalence is commonly defined in terms of mean difference, mean ratio or ratio of mean differences, i.e. the ratio of the mean difference of the test and placebo to the mean difference of the reference and placebo. In this dissertation, the equivalence assessment for both continuous data and discrete data are discussed. For the continuous case, the test of the ratio of mean differences is applied. The advantage of this test is that it combines a superiority test of the test treatment over the placebo and an equivalence test through one hypothesis. For the discrete case, the two-step equivalence assessment approach is studied for both Poisson and negative binomial data. While a Poisson distribution implies that population mean and variance are the same, the advantage of applying a negative binomial model is that it accounts for overdispersion, which is a common phenomenon of count medical endpoints. The test statistics, power function, and required sample size examples for a three-arm equivalence trial are given for both continuous and discrete cases. In addition, discussions on power comparisons are complemented with numerical results. Temple University--Theses

APA, Harvard, Vancouver, ISO, and other styles

22

Che, Huiwen. "Cutoff sample size estimation for survival data: a simulation study." Thesis, Uppsala universitet, Statistiska institutionen, 2014. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-234982.

Full text

Abstract:

This thesis demonstrates the possible cutoff sample size point that balances goodness of es-timation and study expenditure by a practical cancer case. As it is crucial to determine the sample size in designing an experiment, researchers attempt to find the suitable sample size that achieves desired power and budget efficiency at the same time. The thesis shows how simulation can be used for sample size and precision calculations with survival data. The pre-sentation concentrates on the simulation involved in carrying out the estimates and precision calculations. The Kaplan-Meier estimator and the Cox regression coefficient are chosen as point estimators, and the precision measurements focus on the mean square error and the stan-dard error.

APA, Harvard, Vancouver, ISO, and other styles

23

Norfleet, David Matthew. "Sample size effects related to nickel, titanium and nickel-titanium at the micron size scale." Columbus, Ohio : Ohio State University, 2007. http://rave.ohiolink.edu/etdc/view?acc%5Fnum=osu1187038020.

Full text

APA, Harvard, Vancouver, ISO, and other styles

24

Norfleet, David M. "Sample size effects related to nickel, titanium and nickel-titanium at the micron size scale." The Ohio State University, 2007. http://rave.ohiolink.edu/etdc/view?acc_num=osu1187038020.

Full text

APA, Harvard, Vancouver, ISO, and other styles

25

Guan, Tianyuan. "Sample Size Calculations in Simple Linear Regression: A New Approach." University of Cincinnati / OhioLINK, 2021. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1627667392849137.

Full text

APA, Harvard, Vancouver, ISO, and other styles

26

Cheng, Dunlei Stamey James D. "Topics in Bayesian sample size determination and Bayesian model selection." Waco, Tex. : Baylor University, 2007. http://hdl.handle.net/2104/5039.

Full text

APA, Harvard, Vancouver, ISO, and other styles

27

Lee, Myung Hee Marron James Stephen. "Continuum direction vectors in high dimensional low sample size data." Chapel Hill, N.C. : University of North Carolina at Chapel Hill, 2007. http://dc.lib.unc.edu/u?/etd,1132.

Full text

Abstract:

Thesis (Ph. D.)--University of North Carolina at Chapel Hill, 2007. Title from electronic title page (viewed Mar. 27, 2008). "... in partial fulfillment of the requirements for the degree of Doctor of Philosophy in the Department of Statistics and Operations Research Statistics." Discipline: Statistics and Operations Research; Department/School: Statistics and Operations Research.

APA, Harvard, Vancouver, ISO, and other styles

28

M'lan, Cyr Emile. "Bayesian sample size calculations for cohort and case-control studies." Thesis, McGill University, 2002. http://digitool.Library.McGill.CA:80/R/?func=dbin-jump-full&object_id=82923.

Full text

Abstract:

Sample size determination is one of the most important statistical issues in the early stages of any investigation that anticipates statistical analyses. In this thesis, we examine Bayesian sample size determination methodology for interval estimation. Four major epidemiological study designs, cohort, case-control, cross-sectional and matched pair are the focus. We study three Bayesian sample size criteria: the average length criterion (ALC), the average coverage criterion ( ACC) and the worst outcome criterion (WOC ) as well as various extensions of these criteria. In addition, a simple cost function is included as part of our sample size calculations for cohort and case-controls studies. We also examine the important design issue of the choice of the optimal ratio of controls per case in case-control settings or non-exposed to exposed in cohort settings. The main difficulties with Bayesian sample size calculation problems are often at the computational level. Thus, this thesis is concerned, to a considerable extent, with presenting sample size methods that are computationally efficient.

APA, Harvard, Vancouver, ISO, and other styles

29

Meganathan, Karthikeyan. "Sample Size Determination in Simple Logistic Regression: Formula versus Simulation." University of Cincinnati / OhioLINK, 2021. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1627663458916666.

Full text

APA, Harvard, Vancouver, ISO, and other styles

30

Gilkey, Justin Michael. "The Effects of Sample Size on Measures of Subjective Correlation." Bowling Green State University / OhioLINK, 2008. http://rave.ohiolink.edu/etdc/view?acc_num=bgsu1211901739.

Full text

APA, Harvard, Vancouver, ISO, and other styles

31

Fernandes, Jessica Katherine de Sousa. "Estudo de algoritmos de otimização estocástica aplicados em aprendizado de máquina." Universidade de São Paulo, 2017. http://www.teses.usp.br/teses/disponiveis/45/45134/tde-28092017-182905/.

Full text

Abstract:

Em diferentes aplicações de Aprendizado de Máquina podemos estar interessados na minimização do valor esperado de certa função de perda. Para a resolução desse problema, Otimização estocástica e Sample Size Selection têm um papel importante. No presente trabalho se apresentam as análises teóricas de alguns algoritmos destas duas áreas, incluindo algumas variações que consideram redução da variância. Nos exemplos práticos pode-se observar a vantagem do método Stochastic Gradient Descent em relação ao tempo de processamento e memória, mas, considerando precisão da solução obtida juntamente com o custo de minimização, as metodologias de redução da variância obtêm as melhores soluções. Os algoritmos Dynamic Sample Size Gradient e Line Search with variable sample size selection apesar de obter soluções melhores que as de Stochastic Gradient Descent, a desvantagem se encontra no alto custo computacional deles. In different Machine Learnings applications we can be interest in the minimization of the expected value of some loss function. For the resolution of this problem, Stochastic optimization and Sample size selection has an important role. In the present work, it is shown the theoretical analysis of some algorithms of these two areas, including some variations that considers variance reduction. In the practical examples we can observe the advantage of Stochastic Gradient Descent in relation to the processing time and memory, but considering accuracy of the solution obtained and the cost of minimization, the methodologies of variance reduction has the best solutions. In the algorithms Dynamic Sample Size Gradient and Line Search with variable sample size selection, despite of obtaining better solutions than Stochastic Gradient Descent, the disadvantage lies in their high computational cost.

APA, Harvard, Vancouver, ISO, and other styles

32

Pedersen, Kristen E. "Sample Size Determination in Auditing Accounts Receivable Using a Zero-Inflated Poisson Model." Digital WPI, 2010. https://digitalcommons.wpi.edu/etd-theses/421.

Full text

Abstract:

In the practice of auditing, a sample of accounts is chosen to verify if the accounts are materially misstated, as opposed to auditing all accounts; it would be too expensive to audit all acounts. This paper seeks to find a method for choosing a sample size of accounts that will give a more accurate estimate than the current methods for sample size determination that are currently being used. A review of methods to determine sample size will be investigated under both the frequentist and Bayesian settings, and then our method using the Zero-Inflated Poisson (ZIP) model will be introduced which explicitly considers zero versus non-zero errors. This model is favorable due to the excess zeros that are present in auditing data which the standard Poisson model does not account for, and this could easily be extended to data similar to accounting populations.

APA, Harvard, Vancouver, ISO, and other styles

33

Chen, Yanran. "Influence of Correlation and Missing Data on Sample Size Determination in Mixed Models." Bowling Green State University / OhioLINK, 2013. http://rave.ohiolink.edu/etdc/view?acc_num=bgsu1370448410.

Full text

APA, Harvard, Vancouver, ISO, and other styles

34

Wang, Xiangrong. "Effect of Sample Size on Irt Equating of Uni-Dimensional Tests in Common Item Non-Equivalent Group Design: a Monte Carlo Simulation Study." Diss., Virginia Tech, 2012. http://hdl.handle.net/10919/37555.

Full text

Abstract:

Test equating is important to large-scale testing programs because of the following two reasons: strict test security is a key concern for high-stakes tests and fairness of test equating is important for test takers. The question of adequacy of sample size often arises in test equating. However, most recommendations in the existing literature are based on classical test equating. Very few research studies systematically investigated the minimal sample size which leads to reasonably accurate equating results based on item response theory (IRT). The main purpose of this study was to examine the minimal sample size for desired IRT equating accuracy for the common-item nonequivalent groups design under various conditions. Accuracy was determined by examining the relative magnitude of six accuracy statistics. Two IRT equating methods were carried out on simulated tests with combinations of test length, test format, group ability difference, similarity of the form difficulty, and parameter estimation methods for 14 sample sizes using Monte Carlo simulations with 1,000 replications per cell. Observed score equating and true score equating were compared to the criterion equating to obtain the accuracy statistics. The results suggest that different sample size requirements exist for different test lengths, test formats and parameter estimation methods. Additionally, the results show the following: first, the results for true score equating and observed score equating are very similar. Second, the longer test has less accurate equating than the shorter one at the same sample size level and as the sample size decreases, the gap is greater. Third, concurrent parameter estimation method produced less equating error than separate estimation at the same sample size level and as the sample size reduces, the difference increases. Fourth, the cases with different group ability have larger and less stable error comparing to the base case and the cases with different test difficulty, especially when using separate parameter estimation method with sample size less than 750. Last, the mixed formatted test is more accurate than the single formatted one at the same sample size level. Ph. D.

APA, Harvard, Vancouver, ISO, and other styles

35

Song, Juhee. "Bootstrapping in a high dimensional but very low sample size problem." Texas A&M University, 2003. http://hdl.handle.net/1969.1/3853.

Full text

Abstract:

High Dimension, Low Sample Size (HDLSS) problems have received much attention recently in many areas of science. Analysis of microarray experiments is one such area. Numerous studies are on-going to investigate the behavior of genes by measuring the abundance of mRNA (messenger RiboNucleic Acid), gene expression. HDLSS data investigated in this dissertation consist of a large number of data sets each of which has only a few observations. We assume a statistical model in which measurements from the same subject have the same expected value and variance. All subjects have the same distribution up to location and scale. Information from all subjects is shared in estimating this common distribution. Our interest is in testing the hypothesis that the mean of measurements from a given subject is 0. Commonly used tests of this hypothesis, the t-test, sign test and traditional bootstrapping, do not necessarily provide reliable results since there are only a few observations for each data set. We motivate a mixture model having C clusters and 3C parameters to overcome the small sample size problem. Standardized data are pooled after assigning each data set to one of the mixture components. To get reasonable initial parameter estimates when density estimation methods are applied, we apply clustering methods including agglomerative and K-means. Bayes Information Criterion (BIC) and a new criterion, WMCV (Weighted Mean of within Cluster Variance estimates), are used to choose an optimal number of clusters. Density estimation methods including a maximum likelihood unimodal density estimator and kernel density estimation are used to estimate the unknown density. Once the density is estimated, a bootstrapping algorithm that selects samples from the estimated density is used to approximate the distribution of test statistics. The t-statistic and an empirical likelihood ratio statistic are used, since their distributions are completely determined by the distribution common to all subject. A method to control the false discovery rate is used to perform simultaneous tests on all small data sets. Simulated data sets and a set of cDNA (complimentary DeoxyriboNucleic Acid) microarray experiment data are analyzed by the proposed methods.

APA, Harvard, Vancouver, ISO, and other styles

36

Serrano, Daniel Curran Patrick J. "Error of estimation and sample size in the linear mixed model." Chapel Hill, N.C. : University of North Carolina at Chapel Hill, 2008. http://dc.lib.unc.edu/u?/etd,1653.

Full text

Abstract:

Thesis (M.A.)--University of North Carolina at Chapel Hill, 2008. Title from electronic title page (viewed Sep. 16, 2008). "... in partial fulfillment of the requirements for the degree of Master of Arts in the Department of Psychology." Discipline: Psychology; Department/School: Psychology.

APA, Harvard, Vancouver, ISO, and other styles

37

McIntosh, Matthew J. "Sample size when the alternative is ordered and other multivariate results /." free to MU campus, to others for purchase, 1998. http://wwwlib.umi.com/cr/mo/fullcit?p9924907.

Full text

APA, Harvard, Vancouver, ISO, and other styles

38

Tongur, Can. "Small sample performances of two tests for overidentifying restrictions." Thesis, Uppsala University, Department of Economics, 2006. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-6367.

Full text

Abstract:

Two new specification tests for overidentifying restrictions proposed by Hahn and Hausman (2002:b) are here tested and compared to the classical Sargan test. Power properties are found to be very similar in overall performance, while Sargan generally has better size than the new tests. Also, size is distorted for one of the new tests, thus a tendency to reject prevails. In addition, sometimes severe bias is found which affects the tests’ performances, something that differs from earlier studies.

APA, Harvard, Vancouver, ISO, and other styles

39

Bofill, Roig Marta. "Statistical methods and software for clinical trials with binary and survival endpoints : efficiency, sample size and two-sample comparison." Doctoral thesis, Universitat Politècnica de Catalunya, 2020. http://hdl.handle.net/10803/670371.

Full text

Abstract:

Defining the scientific question is the starting point for any clinical study. However, even though the main objective is generally clear, how this is addressed is not usually straightforward. Clinical studies very often encompass several questions, defined as primary and secondary hypotheses, and measured through different endpoints. In clinical trials with multiple endpoints, composite endpoints, defined as the union of several endpoints, are widely used as primary endpoints. The use of composite endpoints is mainly motivated because they are expected to increase the number of observed events and to capture more information than by only considering one endpoint. Besides, it is generally thought that the power of the study will increase if using composite endpoints and that the treatment effect on the composite endpoint will be similar to the average effect of its components. However, these assertions are not necessarily true and the design of a trial with a composite endpoint might be difficult. Different types of endpoints might be chosen for different research stages. This is the case for cancer trials, where short-term binary endpoints based on the tumor response are common in early-phase trials, whereas overall survival is the gold standard in late-phase trials. In the recent years, there has been a growing interest in designing seamless trials with both early response outcome and later event times. Considering these two endpoints together could provide a wider characterization of the treatment effect and also may reduce the duration of clinical trials and their costs. In this thesis, we provide novel methodologies to design clinical trials with composite binary endpoints and to compare two treatment groups based on binary and time-to-event endpoints. In addition, we present the implementation of the methodologies by means of different statistical tools. Specifically, in Chapter 2, we propose a general strategy for sizing a trial with a composite binary endpoint as primary endpoint based on previous information on its components. In Chapter 3, we present the ARE (Asymptotic Relative Efficiency) method to choose between a composite binary endpoint or one of its components as the primary endpoint of a trial. In Chapter 4, we propose a class of two-sample nonparametric statistics for testing the equality of proportions and the equality of survival functions. In Chapter 5, we describe the software developed to implement the methods proposed in this thesis. In particular, we present CompARE, a web-based tool for designing clinical trials with composite endpoints and its corresponding R package, and the R package SurvBin in which we have implemented the class of statistics presented in Chapter 4. We conclude this dissertation with general conclusions and some directions for future research in Chapter 6. La evaluación de la eficacia de los tratamientos es uno de los mayores retos en el diseño de ensayos clínicos. La variable principal cuantifica la respuesta clínica y define, en gran medida, el ensayo. Los ensayos clínicos generalmente abarcan varias cuestiones de interés. En estos casos, se establecen hipótesis primarias y secundarias, que son evaluadas a través de diferentes variables. Los ensayos clínicos con múltiples variables de interés utilizan frecuentemente las llamadas variables compuestas. Una variable compuesta se define como la unión de diversas variables de interés. La utilización de variables compuestas en lugar de variables simples estriba en que con éstas aumenta el número de eventos observados y se obtiene una información más completa sobre la respuesta al tratamiento. También se plantea a menudo, por un lado, que la potencia estadística del estudio es mayor si se usan variables compuestas y, por otro, que el efecto del tratamiento de la variable compuesta será similar al efecto medio de las variables que la componen. Sin embargo, estas afirmaciones no son necesariamente ciertas y el diseño de un estudio con una variable compuesta suele ser complejo. El tipo de variable escogida como variable principal puede diferir en las diferentes etapas de investigación. Por ejemplo, en el caso de estudios oncológicos, las variables binarias evaluadas a corto plazo son usadas en fases tempranas del desarrollo del tratamiento; mientras que en fases más avanzadas, las variables más usadas son tiempos de vida. En los últimos años, ha habido un interés creciente en el diseño de ensayos fase II/III con variables binarias y tiempos de vida. Este tipo de ensayos podría proporcionar una caracterización más amplia del efecto del tratamiento y también podría reducir la duración de los ensayos clínicos y sus costes. En esta tesis, proponemos nuevas metodologías, junto con el software estadístico correspondiente, para el diseño de ensayos clínicos con variables compuestas y para la comparación de dos grupos de tratamiento en base a variables binarias y tiempos de vida. Específicamente, en el capítulo 2, proponemos una estrategia para calcular el tamaño muestral de un ensayo con una variable compuesta como variable principal del estudio basado en la información previa sobre sus componentes. En el capítulo 3, presentamos el método ARE (Asymptotic Relative Efficiency) para elegir entre una variable compuesta o una de sus componentes como variable principal de un ensayo. En el capítulo 4, proponemos una clase de estadísticos no paramétricos para contrastar la igualdad de proporciones y la igualdad de las funciones de supervivencia. En el capítulo 5, describimos el software desarrollado para implementar los métodos propuestos en esta tesis. En particular, presentamos CompARE, una herramienta web para diseñar ensayos clínicos con variables compuestas y su correspondiente paquete R, y el paquete R SurvBin en el que hemos implementado la clase de estadísticos presentadas en el capítulo 4. La tesis concluye con un resumen de las principales aportaciones, algunas conclusiones de carácter general así como con una discusión sobre diversos problemas abiertos y futuras líneas de investigación. L’avaluació de l’eficàcia dels tractaments és un dels grans reptes en el disseny d'assajos clínics. La variable principal quantifica la resposta clínica i defineix, en gran manera, l'assaig. Els assaigs clínics generalment inclouen diverses qüestions d’interès. En aquests casos, s'estableixen hipòtesis primàries i secundàries, que són avaluades mitjançant diferents variables. Els assajos clínics amb múltiples variables d’interès utilitzen freqüentment les anomenades variables compostes. Una variable composta es defineix com la unió de diverses variables d’interès. La utilització de variables compostes en lloc de variables simples rau en el fet que amb aquestes augmenta el nombre d'esdeveniments observats i s’obté una informació més completa sobre la resposta al tractament. També es planteja sovint, d'una banda, que la potència estadística de l'estudi és més gran si es fan servir variables compostes i, de l'altra, que l'efecte del tractament de la variable composta serà semblant a l'efecte mitjà de les variables que la composen. No obstant això, aquestes afirmacions no són necessàriament certes i el disseny d'un estudi amb una variable composta sol ser complex. El tipus de variable escollida com a variable principal pot diferir en les diferents etapes d’investigació. Per exemple, en el cas d'estudis oncològics, les variables binàries avaluades a curt termini són utilitzades en fases inicials; mentre que en fases més avançades, les variables més utilitzades són temps de vida. En els últims anys, hi ha hagut un interès creixent en el disseny d'assaigs fase II/III amb variables binàries i temps de vida. Aquest tipus d'assajos podria proporcionar una caracterització més àmplia de l'efecte del tractament i també podria reduir la durada dels assaigs clínics i els seus costos. En aquesta tesi, proposem noves metodologies, juntament amb el software estadístic corresponent, per al disseny d'assajos clínics amb variables compostes i per a la comparació de dos grups de tractament a partir de variables binàries i temps de vida. Específicament, en el capítol 2, proposem una estratègia per calcular la mida mostral d'un assaig amb una variable composta com a variable principal d'estudi basat en la informació prèvia sobre els seus components. En el capítol 3, presentem el mètode ARE (Asymptotic Relative Efficiency) per triar entre una variable composta o una de les seves components com a variable principal d'un assaig. En el capítol 4, proposem una classe d’estadístics no paramètrics per contrastar la igualtat de proporcions i la igualtat de les funcions de supervivència. En el capítol 5, descrivim el software desenvolupat per implementar els mètodes proposats en aquesta tesi. En particular, presentem CompARE, una eina web per dissenyar assajos clínics amb variables compostes i el seu corresponent paquet d'R, i el paquet d'R SurvBin on hem implementat la classe d’estadístics presentada en el capítol 4. La tesi conclou amb un resum de les principals aportacions, algunes conclusions de caràcter general així com amb una discussió sobre diversos problemes oberts i futures línies d’investigació.

APA, Harvard, Vancouver, ISO, and other styles

40

Knowlton, Nicholas Scott. "Robust estimation of inter-chip variability to improve microarray sample size calculations." Oklahoma City : [s.n.], 2005.

Find full text

APA, Harvard, Vancouver, ISO, and other styles

41

Xu, Yanzhi. "Effective GPS-based panel survey sample size for urban travel behavior studies." Diss., Georgia Institute of Technology, 2010. http://hdl.handle.net/1853/33843.

Full text

Abstract:

This research develops a framework to estimate the effective sample size of Global Positioning System (GPS) based panel surveys in urban travel behavior studies for a variety of planning purposes. Recent advances in GPS monitoring technologies have made it possible to implement panel surveys with lengths of weeks, months or even years. The many advantageous features of GPS-based panel surveys make such surveys attractive for travel behavior studies, but the higher cost of such surveys compared to conventional one-day or two-day paper diary surveys requires scrutiny at the sample size planning stage to ensure cost-effectiveness. The sample size analysis in this dissertation focuses on three major aspects in travel behavior studies: 1) to obtain reliable means for key travel behavior variables, 2) to conduct regression analysis on key travel behavior variables against explanatory variables such as demographic characteristics and seasonal factors, and 3) to examine impacts of a policy measure on travel behavior through before-and-after studies. The sample size analyses in this dissertation are based on the GPS data collected in the multi-year Commute Atlanta study. The sample size analysis with regard to obtaining reliable means for key travel behavior variables utilizes Monte Carlo re-sampling techniques to assess the trend of means against various sample size and survey length combinations. The basis for the framework and methods of sample size estimation related to regression analysis and before-and-after studies are derived from various sample size procedures based on the generalized estimating equation (GEE) method. These sample size procedures have been proposed for longitudinal studies in biomedical research. This dissertation adapts these procedures to the design of panel surveys for urban travel behavior studies with the information made available from the Commute Atlanta study. The findings from this research indicate that the required sample sizes should be much larger than the sample sizes in existing GPS-based panel surveys. This research recommends a desired range of sample sizes based on the objectives and survey lengths of urban travel behavior studies.

APA, Harvard, Vancouver, ISO, and other styles

42

Huh, Seungho. "SAMPLE SIZE DETERMINATION AND STATIONARITY TESTING IN THE PRESENCE OF TREND BREAKS." NCSU, 2001. http://www.lib.ncsu.edu/theses/available/etd-20010222-121906.

Full text

Abstract:

Traditionally it is believed that most macroeconomic time series represent stationary fluctuations around a deterministic trend. However, simple applications of the Dickey-Fuller test have, in many cases, been unable to show that major macroeconomic variables are stationary univariate time series structure. One possible reason for non-rejection of unit roots is that the simple mean or linear trend function used by the tests are not sufficient to describe the deterministic part of the series. To address this possibility, unit root tests in the presence of trend breaks have been studied by several researchers.In our work, we deal with some issues associated with unit root testing in time series with a trend break.The performance of various unit root test statistics is compared with respect to the break induced size distortion problem. We examine the effectiveness of tests based on symmetric estimators as compared to those based on the least squares estimator.In particular, we show that tests based on the weighted symmetric estimator not only eliminate thespurious rejection problem but also have reasonably good power properties when modified to allow for a break.We suggest alternative test statistics for testing the unit root null hypothesis in the presence of a trend break. Our new test procedure, which we call the ``bisection'' method, is based on the idea of subgrouping. This is simpler than other methods since the necessity of searching for the break is avoided.Using stream flow data from the US Geological Survey, we perform a temporal analysis of some hydrologicvariables. We first show that the time series for the target variables are stationary, then focus on finding the sample size necessary to detect a mean change if one occurs. Three different approaches are used to solve this problem: OLS, GLS and a frequency domain method. A cluster analysis of stations is also performed using these sample sizes as data.We investigate whether available geographic variables can be used to predict cluster membership.

APA, Harvard, Vancouver, ISO, and other styles

43

Lin, Tzu-Yu, and 林滋堉. "Sample Size Requirements for Pharmacogenetic Studies." Thesis, 2007. http://ndltd.ncl.edu.tw/handle/82428278428524332702.

Full text

Abstract:

碩士 國立臺灣大學 流行病學研究所 95 Pharmacogenetic studies investigate the inter-individual variability due to the genetic effect on drug response. Pharmacogenetic studies can be classified to experimental studies and observational studies with or without randomization. In this thesis, we aim to investigate the useful designs, from simple to delicate, in pharmacogenetic studies. In our framework, we use case-control study design to dissect the association between genetic effect and drug response when there is one drug treatment. Furthermore, when there are two or more treatment groups, we detect the effects of treatments, genotypes, and gene-treatment interaction on drug response under trial design. For each design we present the analysis methods for detecting association and the methods for calculation of required sample size. The family-based (sib pairs) design is especially discussed for the intention of setting up a robust design against population stratification.

APA, Harvard, Vancouver, ISO, and other styles

44

Jian, Yu-Jhih, and 簡玉芝. "Sample Size Algorithm for Exact Test." Thesis, 2006. http://ndltd.ncl.edu.tw/handle/20022865767094911676.

Full text

Abstract:

碩士 中原大學 應用數學研究所 94 We develop an algorithm which medical researchers can accurately estimate the sample size for a specific level and statistical power.For most test statisticals,the equations of sample sizes do not have a closed-form expression.It is usually calculated based on asymptotic methods,but asymptotic methods may not be reliable in small sample cases.The proposed algorithm requires an initial guess of the sample size and then substitute into the alpha equation and beta equation until it converges.Successive approximations produced by the algorithm tend to fall closer to the sample sizes in two-sample t-test,ANOVA F-test,contingency X^2 test,and binomial test.

APA, Harvard, Vancouver, ISO, and other styles

45

Bedo, Justin. "Small sample size learning in bioinformatics." Phd thesis, 2009. http://hdl.handle.net/1885/151603.

Full text

APA, Harvard, Vancouver, ISO, and other styles

46

Huang, Dong-Si, and 黃東溪. "Sample Size Determination in a Microarray Experiment." Thesis, 2005. http://ndltd.ncl.edu.tw/handle/37217728505587250306.

Full text

APA, Harvard, Vancouver, ISO, and other styles

47

Yang, Yu-chun, and 楊喻淳. "Sample Size Algorithm for Fisher’s Exact Test." Thesis, 2007. http://ndltd.ncl.edu.tw/handle/58638879957275722119.

Full text

Abstract:

碩士 中原大學 應用數學研究所 95 We develop an algorithm which can get sample size through accurate computation of computer under specific significance of level and expected power. In general statistical documents, the methods of computing sample size are varied. The traditional method of computing sample used complicated formula or simulation for a long time to get the result, but these methods may have some error and it will take more long time to calculate. Hence, the paper develops an algorithm to compute sample size which can avoid enormous computation and need short time to complete. It is important that there is no problem about error. We adopt original distributions and employ equation and equation to do the algorithm. We control the type I error to achieve expected power until convergence and then stop. Through the algorithm like this, we can succeed in computing sample size by simulation for Fisher’s Exact Test.

APA, Harvard, Vancouver, ISO, and other styles

48

Luo, Chao-Wei, and 羅兆為. "Using virtual sample and linear independence to solve small sample size problem." Thesis, 2015. http://ndltd.ncl.edu.tw/handle/89244692498377091129.

Full text

Abstract:

碩士 朝陽科技大學 資訊工程系 103 This research proposed a novel algorithm to solve small sample size problem. The small sample size problem is difficult to solve due to it can’t use statistical methods to estimate the distribution of training samples. Therefore, the conventional method which applies to the large sample problem does not apply to the small sample size problem. This research generates virtual samples to increase the number of samples, and then calculating the probability of noise in order to filter data. After filtering, using linearly independence to select support vector. The experimental results indicate that the proposed method is effective.

APA, Harvard, Vancouver, ISO, and other styles

49

Hsiao, Ching-Lin, and 蕭敬霖. "Pseudo Sample Size Calculation of Two-sample T-test in A Small Area." Thesis, 2004. http://ndltd.ncl.edu.tw/handle/87921696747600696064.

Full text

Abstract:

碩士 國立臺北大學 統計學系 92 The traditional approach to calculate the conventional sample size of the Two-sample T-test and the multiple-sample F-test is difficult because of degrees of freedom of distribution of the test statistics related to the non-central parameter. In this paper, a new approach is given to calculate the conventional sample size. This alternative way to calculate the conventional sample size is given by pushing the lower bound of the non-central parameter under Ha to reach the upper bound under H0. Interestingly, the sample size calculated from the intuitive way is proved to be the same as the conventional sample size formula under the large sample. In addition, a sample size calculation for the bridging study, which is referred as the pseudo sample size, is invented. The pseudo sample size is obtained by forcing the non-central parameter based upon the second trial in a small area to reach the lower bound of the non-central parameter from the first trial. The basic property for the invented pseudo sample size is that the higher the effect size is concluded or the more the sample size are used in first trial, the bigger the pseudo sample size is allowed in second trial. In the simulation study, the test statistic using the pseudo sample sizes provides much more powers than that of using the original sample sizes in the second trial.

APA, Harvard, Vancouver, ISO, and other styles

50

Wu, Chao-Hsien, and 吳昭賢. "ON SAMPLE SIZE SIMULATION IN CENTRAL LIMIT THEOREM." Thesis, 2005. http://ndltd.ncl.edu.tw/handle/19723785767261131687.

Full text

Abstract:

博士 淡江大學 管理科學研究所博士班 93 The central limit theorem is of crucial importance in many statistical applications. Given a large enough sample size, it enables us to make inferences about the population mean in cases where we do not know the specific shape of the population distribution and even in cases where we know that the population is not normally distributed. Based on the central limit theorem, when the sample size is sufficiently large, the distribution of sample mean is approximated to normal distribution. How large will be sufficiently enough? Some researchers may use the criterion: in practical applications, the distribution of sample mean may be assumed to be normal distribution if the sampling size is larger than 30. But various shapes of probability distributions exist, e.g. single peak and multi-peak, symmetric and asymmetric, high skewness and low skewness, and also, the uniform distribution with symmetry but no peak, no skewness and no tails. Furthermore, there are distributions similar to the normal distribution while the others are vastly different. The purpose of this thesis is to examine the criterion mentioned above by simulation. We consider four continuous distributions in chapter 2 through chapter 5, including uniform, triangular, gamma and Weibull distributions and have provided regression models and tables of the required sample size in using central limit theorem. In investigation interview, interviewees feel panic of privacy to be disclosed on sensitive subjects that they often refuse or untrue to answer the sensitive questions. In chapter 6, some indirect randomized response techniques are proposed, which maintain the requirement of efficiency and protection of confidentiality. The interviewee is only required to report a positive or negative integer, something that every individual participating in a survey is expected to be capable of. Since the information provided to the interviewer is not sufficient to verify whether an individual possesses the characteristic or not, the respondents’ privacy is well protected. In this regard, the respondents are perhaps more willing to cooperate and report truthfully. For the sake of simplicity of survey process, the proposed procedure seems more practicable than Christofides (2003) procedure. In section 6.2, we also consider the decision of sample size in using the above indirect randomized response techniques. Three sets of finite discrete distributions are utilized to demonstrate the application of central limit theorem.

APA, Harvard, Vancouver, ISO, and other styles

Dissertations / Theses on the topic 'Sample size'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles