Log in

Relevant bibliographies by topics / Item response theory – Statistical methods / Dissertations / Theses

To see the other types of publications on this topic, follow the link: Item response theory – Statistical methods.

Dissertations / Theses on the topic 'Item response theory – Statistical methods'

Author: Grafiati

Published: 4 June 2021

Last updated: 2 February 2022

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 35 dissertations / theses for your research on the topic 'Item response theory – Statistical methods.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.

1

Combs, Adam. "Bayesian Model Checking Methods for Dichotomous Item Response Theory and Testlet Models." Bowling Green State University / OhioLINK, 2014. http://rave.ohiolink.edu/etdc/view?acc_num=bgsu1394808820.

Full text

APA, Harvard, Vancouver, ISO, and other styles

2

Kopf, Julia [Verfasser]. "Model-based Recursive Partitioning Meets Item Response Theory. New Statistical Methods for the Detection of Differential Item Functioning and Appropriate Anchor Selection / Julia Kopf." München : Verlag Dr. Hut, 2013. http://d-nb.info/1045988804/34.

Full text

APA, Harvard, Vancouver, ISO, and other styles

3

Kopf, Julia [Verfasser], and Carolin [Akademischer Betreuer] Strobl. "Model-based recursive partitioning meets item response theory : new statistical methods for the detection of differential item functioning and appropriate anchor selection / Julia Kopf. Betreuer: Carolin Strobl." München : Universitätsbibliothek der Ludwig-Maximilians-Universität, 2013. http://d-nb.info/1046503235/34.

Full text

APA, Harvard, Vancouver, ISO, and other styles

4

Carter, Nathan T. "APPLICATIONS OF DIFFERENTIAL FUNCTIONING METHODS TO THE GENERALIZED GRADED UNFOLDING MODEL." Bowling Green State University / OhioLINK, 2011. http://rave.ohiolink.edu/etdc/view?acc_num=bgsu1290885927.

Full text

APA, Harvard, Vancouver, ISO, and other styles

5

Ueckert, Sebastian. "Novel Pharmacometric Methods for Design and Analysis of Disease Progression Studies." Doctoral thesis, Uppsala universitet, Institutionen för farmaceutisk biovetenskap, 2014. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-216537.

Full text

Abstract:

With societies aging all around the world, the global burden of degenerative diseases is expected to increase exponentially. From the perspective drug development, degenerative diseases represent an especially challenging class. Clinical trials, in this context often termed disease progression studies, are long, costly, require many individuals, and have low success rates. Therefore, it is crucial to use informative study designs and to analyze efficiently the obtained trial data. The development of novel approaches intended towards facilitating both the design and the analysis of disease progression studies was the aim of this thesis. This aim was pursued in three stages (i) the characterization and extension of pharmacometric software, (ii) the development of new methodology around statistical power, and (iii) the demonstration of application benefits. The optimal design software PopED was extended to simplify the application of optimal design methodology when planning a disease progression study. The performance of non-linear mixed effect estimation algorithms for trial data analysis was evaluated in terms of bias, precision, robustness with respect to initial estimates, and runtime. A novel statistic allowing for explicit optimization of study design for statistical power was derived and found to perform superior to existing methods. Monte-Carlo power studies were accelerated through application of parametric power estimation, delivering full power versus sample size curves from a few hundred Monte-Carlo samples. Optimal design and an explicit optimization for statistical power were applied to the planning of a study in Alzheimer's disease, resulting in a 30% smaller study size when targeting 80% power. The analysis of ADAS-cog score data was improved through application of item response theory, yielding a more exact description of the assessment score, an increased statistical power and an enhanced insight in the assessment properties. In conclusion, this thesis presents novel pharmacometric methods that can help addressing the challenges of designing and planning disease progression studies.

APA, Harvard, Vancouver, ISO, and other styles

6

Jiang, Jing. "Regularization Methods for Detecting Differential Item Functioning:." Thesis, Boston College, 2019. http://hdl.handle.net/2345/bc-ir:108404.

Full text

Abstract:

Thesis advisor: Zhushan Mandy Li
Differential item functioning (DIF) occurs when examinees of equal ability from different groups have different probabilities of correctly responding to certain items. DIF analysis aims to identify potentially biased items to ensure the fairness and equity of instruments, and has become a routine procedure in developing and improving assessments. This study proposed a DIF detection method using regularization techniques, which allows for simultaneous investigation of all items on a test for both uniform and nonuniform DIF. In order to evaluate the performance of the proposed DIF detection models and understand the factors that influence the performance, comprehensive simulation studies and empirical data analyses were conducted. Under various conditions including test length, sample size, sample size ratio, percentage of DIF items, DIF type, and DIF magnitude, the operating characteristics of three kinds of regularized logistic regression models: lasso, elastic net, and adaptive lasso, each characterized by their penalty functions, were examined and compared. Selection of optimal tuning parameter was investigated using two well-known information criteria AIC and BIC, and cross-validation. The results revealed that BIC outperformed other model selection criteria, which not only flagged high-impact DIF items precisely, but also prevented over-identification of DIF items with few false alarms. Among the regularization models, the adaptive lasso model achieved superior performance than the other two models in most conditions. The performance of the regularized DIF detection model using adaptive lasso was then compared to two commonly used DIF detection approaches including the logistic regression method and the likelihood ratio test. The proposed model was applied to analyzing empirical datasets to demonstrate the applicability of the method in real settings
Thesis (PhD) — Boston College, 2019
Submitted to: Boston College. Lynch School of Education
Discipline: Educational Research, Measurement and Evaluation

APA, Harvard, Vancouver, ISO, and other styles

7

Peterson, Jaime Leigh. "Multidimensional item response theory observed score equating methods for mixed-format tests." Diss., University of Iowa, 2014. https://ir.uiowa.edu/etd/1379.

Full text

Abstract:

The purpose of this study was to build upon the existing MIRT equating literature by introducing a full multidimensional item response theory (MIRT) observed score equating method for mixed-format exams because no such methods currently exist. At this time, the MIRT equating literature is limited to full MIRT observed score equating methods for multiple-choice only exams and Bifactor observed score equating methods for mixed-format exams. Given the high frequency with which mixed-format exams are used and the accumulating evidence that some tests are not purely unidimensional, it was important to present a full MIRT equating method for mixed-format tests. The performance of the full MIRT observed score method was compared with the traditional equipercentile method, and unidimensional IRT (UIRT) observed score method, and Bifactor observed score method. With the Bifactor methods, group-specific factors were defined according to item format or content subdomain. With the full MIRT methods, two- and four-dimensional models were included and correlations between latent abilities were freely estimated or set to zero. All equating procedures were carried out using three end-of-course exams: Chemistry, Spanish Language, and English Language and Composition. For all subjects, two separate datasets were created using pseudo-groups in order to have two separate equating criteria. The specific equating criteria that served as baselines for comparisons with all other methods were the theoretical Identity and the traditional equipercentile procedures. Several important conclusions were made. In general, the multidimensional methods were found to perform better for datasets that evidenced more multidimensionality, whereas unidimensional methods worked better for unidimensional datasets. In addition, the scale on which scores are reported influenced the comparative conclusions made among the studied methods. For performance classifications, which are most important to examinees, there typically were not large discrepancies among the UIRT, Bifactor, and full MIRT methods. However, this study was limited by its sole reliance on real data which was not very multidimensional and for which the true equating relationship was not known. Therefore, plans for improvements, including the addition of a simulation study to introduce a variety of dimensional data structures, are also discussed.

APA, Harvard, Vancouver, ISO, and other styles

8

Morse, Brendan J. "Controlling Type 1 errors in moderated multiple regression an application of item response theory for applied psychological research /." Ohio : Ohio University, 2009. http://www.ohiolink.edu/etd/view.cgi?ohiou1247063796.

Full text

APA, Harvard, Vancouver, ISO, and other styles

9

Choi, Jiwon. "Comparison of MIRT observed score equating methods under the common-item nonequivalent groups design." Diss., University of Iowa, 2019. https://ir.uiowa.edu/etd/6716.

Full text

Abstract:

For equating tests that measure several distinct proficiencies, procedures that reflect the multidimensional structure of the data are needed. Although there exist a few equating procedures developed under the multidimensional item response theory (MIRT) framework, there is a need for further research in this area. Therefore, the primary objectives of this dissertation are to consolidate and expand MIRT observed score equating research with a specific focus on the common-item nonequivalent groups (CINEG) design, which requires scale linking. Content areas and item types are two focal points of dimensionality. This dissertation uses two studies with different data types and comparison criteria to address the research objectives. In general, a comparison between unidimensional item response theory (UIRT) and MIRT methods suggested a better performance of the MIRT methods over UIRT. The simple structure (SS) and full MIRT methods showed more accurate equating results compared to UIRT. In terms of calibration methods, concurrent calibration outperformed separate calibration for all equating methods under most of the studied conditions.

APA, Harvard, Vancouver, ISO, and other styles

10

Chen, Keyu. "A comparison of fixed item parameter calibration methods and reporting score scales in the development of an item pool." Diss., University of Iowa, 2019. https://ir.uiowa.edu/etd/6923.

Full text

Abstract:

The purposes of the study were to compare the relative performances of three fixed item parameter calibration methods (FIPC) in item and ability parameter estimation and to examine how the ability estimates obtained from these different methods affect interpretations using reported scales of different lengths. Through a simulation design, the study was divided into two stages. The first stage was the calibration stage, where the parameters of pretest items were estimated. This stage investigated the accuracy of item parameter estimates and the recovery of the underlying ability distributions for different sample sizes, different numbers of pretest items, and different types of ability distributions under the three-parameter logistic model (3PL). The second stage was the operational stage, where the estimated parameters of the pretest items were put on operational forms and were used to score examinees. The second stage investigated the effect of item parameter estimation had on the ability estimation and reported scores for the new test forms. It was found that the item parameters estimated from the three FIPC methods showed subtle differences, but the results of the DeMars method were closer to those of the separate calibration with linking method than to the FIPC with simple-prior update and FIPC with iterative prior update methods, while the FIPC with simple-prior update and FIPC with iterative prior update methods performed similarly. Regarding the experimental factors that were manipulated in the simulation, the study found that the sample size influenced the estimation of item parameters. The effect of the number of pretest items on estimation of item parameters was strong but ambiguous, likely because the effect was confounded by changes of both the number of the pretest items and the characteristics of the pretest items among the item sets. The effect of ability distributions on estimation of item parameters was not as evident as the effect of the other two factors. After the pretest items were calibrated, the parameter estimates of these items were put into operational use. The abilities of the examinees were then estimated based on the examinees’ response to the existing operational items and the new items (previously called pretest items), of which the item parameters were estimated under different conditions. This study found that there were high correlations between the ability estimates and the true abilities of the examinees when forms containing pretest items calibrated using any of the three FIPC methods. The results suggested that all three FIPC methods were similarly competent in estimating parameters of the items, leading to satisfying determination of the examinees’ abilities. When considering the scale scores, because the estimated abilities were very similar, there were small differences among the scaled scores on the same scale; the relative frequency of examinees classified into performance categories and the classification consistency index also showed the interpretation of reported scores across scales were similar. The study provided a comprehensive comparison on the use of FIPC methods in parameter estimation. It was hoped that this study would help the practitioners choose among the methods according to the needs of the testing programs. When ability estimates were linearly transformed into scale scores, the lengths of scales did not affect the statistical properties of scores, however, they may impact how the scores are subjectively perceived by stakeholders and therefore should be carefully selected.

APA, Harvard, Vancouver, ISO, and other styles

11

Pfleger, Phillip Isaac. "Designing Software to Unify Person-Fit Assessment." BYU ScholarsArchive, 2020. https://scholarsarchive.byu.edu/etd/8776.

Full text

Abstract:

Item-response theory (IRT)assumes that the model fits the data. One commonly overlooked aspect of model-fit assessment is an examination of personfit, or person-fit assessment (PFA). One reason that PFA lacks popularity among psychometricians is that comprehensive software is notpresent.This dissertation outlines the development and testing ofa new software package, called wizirt, that will begin to meet this need. This software package provides a wide gamut of tools to the user but is currently limited to unidimensional, dichotomous, and parametricmodels. The wizirt package is built in the open source language R, where it combines the capabilities of a number of other R packages under a single syntax.In addition to the wizirt package, I have created a number of resources to help users learn to use the package. This includes support for individuals who have never used R before, as well as more experienced R users.

APA, Harvard, Vancouver, ISO, and other styles

12

Sheng, Yanyan. "Bayesian analysis of hierarchical IRT models comparing and combining the unidimensional & multi-unidimensional IRT models /." Diss., Columbia, Mo. : University of Missouri-Columbia, 2005. http://hdl.handle.net/10355/4153.

Full text

Abstract:

Thesis (Ph. D.)--University of Missouri-Columbia, 2005.
The entire dissertation/thesis text is included in the research.pdf file; the official abstract appears in the short.pdf file (which also appears in the research.pdf); a non-technical general description, or public abstract, appears in the public.pdf file. Title from title screen of research.pdf file viewed on (July 19, 2006) Vita. Includes bibliographical references.

APA, Harvard, Vancouver, ISO, and other styles

13

Martin, Dale Frederick Hosking. "Improving the Detection of Narcissistic Transformational Leaders with the Multifactor Leadership Questionnaire: An Item Response Theory Analysis." ScholarWorks, 2011. https://scholarworks.waldenu.edu/dissertations/849.

Full text

Abstract:

Narcissistic transformation leaders have inflicted severe physical, psychological, and financial damage on individuals, institutions, and society. Multifactor Leadership Questionnaire (MLQ) has shown promise for early detection of narcissistic leadership tendencies, but selection criteria have not been established. The purpose of this quantitative research was to determine if item response theory (IRT) could advance the detection of narcissistic leadership tendencies using an item-level analysis of the 20 transformational leadership items of the MLQ. Three archival samples of subordinates from Israeli corporate and athletic organizations were combined (N = 1,703) to assess IRT data assumptions, comparative fit of competing IRT models, item discrimination and difficulty, and theta reliabilities within the trait range. Compared to the generalized graded unfolding model, the graded response model had slightly more category points within the 95% confidence interval and consistently lower X2/df item fit indices. Items tended to be easier yet more discriminating than average, and five items were identified as candidates for modification. IRT item marginal reliability was .94 (slightly better than classical test theory reliability of .93), and IRT ability prediction had a .96 reliability within a trait range from -1.7 to 1.3 theta. Based on 8 invariant item parameters, selection criteria of category fairly often (3) or above on attributed idealized influence items and sometimes (2) or below on individual consideration items was suggested. A test case demonstrated how narcissistic tendencies could be detected with these criteria. The study can contribute to positive social change by informing improved selection processes that more effectively screen candidates for key leadership roles that directly impact the wellbeing of individuals and organizations.

APA, Harvard, Vancouver, ISO, and other styles

14

Meng, Huijuan Vispoel Walter P. Lee Won-Chan. "A comparison study of IRT calibration methods for mixed-format tests in vertical scaling." Iowa City : University of Iowa, 2007. http://ir.uiowa.edu/etd/338.

Full text

APA, Harvard, Vancouver, ISO, and other styles

15

Whorton, Skyler. "Can a computer adaptive assessment system determine, better than traditional methods, whether students know mathematics skills?" Digital WPI, 2013. https://digitalcommons.wpi.edu/etd-theses/224.

Full text

Abstract:

Schools use commercial systems specifically for mathematics benchmarking and longitudinal assessment. However these systems are expensive and their results often fail to indicate a clear path for teachers to differentiate instruction based on studentsâ€™ individual strengths and weaknesses in specific skills. ASSISTments is a web-based Intelligent Tutoring System used by educators to drive real-time, formative assessment in their classrooms. The software is used primarily by mathematics teachers to deliver homework, classwork and exams to their students. We have developed a computer adaptive test called PLACEments as an extension of ASSISTments to allow teachers to perform individual student assessment and by extension school-wide benchmarking. PLACEments uses a form of graph-based knowledge representation by which the exam results identify the specific mathematics skills that each student lacks. The system additionally provides differentiated practice determined by the studentsâ€™ performance on the adaptive test. In this project, we describe the design and implementation of PLACEments as a skill assessment method and evaluate it in comparison with a fixed-item benchmark.

APA, Harvard, Vancouver, ISO, and other styles

16

Mair, Patrick, Eva Hofmann, Kathrin Gruber, Reinhold Hatzinger, Achim Zeileis, and Kurt Hornik. "What Drives Package Authors to Participate in the R Project for Statistical Computing? Exploring Motivation, Values, and Work Design." National Academy of Sciences, 2015. http://epub.wu.ac.at/4702/1/cranpnas.pdf.

Full text

Abstract:

One of the cornerstones of the R system for statistical computing is the multitude of packages contributed by numerous package authors. This makes an extremely broad range of statistical techniques and other quantitative methods freely available. So far no empirical study has investigated psychological factors that drive authors to participate in the R project. This article presents a study of R package authors, collecting data on different types of participation (number of packages, participation in mailing lists, participation in conferences), three psychological scales (types of motivation, psychological values, and work design characteristics), as well as various sociodemographic factors. The data are analyzed using item response models and subsequent generalized linear models, showing that the most important determinants for participation are a hybrid form of motivation and the social characteristics of the work design. Other factors are found to have less impact or influence only specific aspects of participation. (authors' abstract)

APA, Harvard, Vancouver, ISO, and other styles

17

Hou, Jianlin Vispoel Walter P. "Effectiveness of the hybrid Levine equipercentile and modified frequency estimation equating methods under the common-item nonequivalent groups design." Iowa City : University of Iowa, 2007. http://ir.uiowa.edu/etd/339.

Full text

APA, Harvard, Vancouver, ISO, and other styles

18

Lopez, Gabriel E. "Detection and Classification of DIF Types Using Parametric and Nonparametric Methods: A comparison of the IRT-Likelihood Ratio Test, Crossing-SIBTEST, and Logistic Regression Procedures." Scholar Commons, 2012. http://scholarcommons.usf.edu/etd/4131.

Full text

Abstract:

The purpose of this investigation was to compare the efficacy of three methods for detecting differential item functioning (DIF). The performance of the crossing simultaneous item bias test (CSIBTEST), the item response theory likelihood ratio test (IRT-LR), and logistic regression (LOGREG) was examined across a range of experimental conditions including different test lengths, sample sizes, DIF and differential test functioning (DTF) magnitudes, and mean differences in the underlying trait distributions of comparison groups, herein referred to as the reference and focal groups. In addition, each procedure was implemented using both an all-other anchor approach, in which the IRT-LR baseline model, CSIBEST matching subtest, and LOGREG trait estimate were based on all test items except for the one under study, and a constant anchor approach, in which the baseline model, matching subtest, and trait estimate were based on a predefined subset of DIF-free items. Response data for the reference and focal groups were generated using known item parameters based on the three-parameter logistic item response theory model (3-PLM). Various types of DIF were simulated by shifting the generating item parameters of select items to achieve desired DIF and DTF magnitudes based on the area between the groups' item response functions. Power, Type I error, and Type III error rates were computed for each experimental condition based on 100 replications and effects analyzed via ANOVA. Results indicated that the procedures varied in efficacy, with LOGREG when implemented using an all-other approach providing the best balance of power and Type I error rate. However, none of the procedures were effective at identifying the type of DIF that was simulated.

APA, Harvard, Vancouver, ISO, and other styles

19

Duncan, Kristin A. "Case and covariate influence implications for model assessment /." Connect to this title online, 2004. http://rave.ohiolink.edu/etdc/view?acc%5Fnum=osu1095357183.

Full text

Abstract:

Thesis (Ph. D.)--Ohio State University, 2004.
Title from first page of PDF file. Document formatted into pages; contains xi, 123 p.; also includes graphics (some col.). Includes bibliographical references (p. 120-123).

APA, Harvard, Vancouver, ISO, and other styles

20

Santos, José Roberto Silva dos 1984. "Um modelo de resposta ao item para grupos múltiplos com distribuições normais assimétricas centralizadas." [s.n.], 2012. http://repositorio.unicamp.br/jspui/handle/REPOSIP/306791.

Full text

Abstract:

Orientador: Caio Lucidius Naberezny Azevedo
Dissertação (mestrado) - Universidade Estadual de Campinas, Instituto de Matemática, Estatística e Computação Científica
Made available in DSpace on 2018-08-20T09:23:25Z (GMT). No. of bitstreams: 1 Santos_JoseRobertoSilvados_M.pdf: 2068782 bytes, checksum: f8dc91d2f7f6091813ba229dc12991f4 (MD5) Previous issue date: 2012
Resumo: Uma das suposições dominantes nos modelos de resposta ao item (MRI) é a suposição de normalidade simétrica para modelar o comportamento dos traços latentes. No entanto, tal suposição tem sido questionada em vários trabalhos como, por exemplo, nos trabalhos de Micceri (1989) e Bazán et.al (2006). Recentemente Azevedo et.al (2011) propuseram um MRI com distribuição normal assimétrica centralizada para os traços latentes, considerando a estrutura de um único grupo de indivíduos. No presente trabalho fazemos uma extensão desse modelo para o caso de grupos múltiplos. Desenvolvemos dois algoritmos MCMC para estimação dos parâmetros utilizando a estrutura de dados aumentados para representar a função de resposta ao item (FRI), veja Albert (1992). O primeiro é um amostrador de Gibbs com passos de Metropolis-Hastings. No segundo utilizamos representações estocásticas (gerando uma estrutura hierárquica) das densidades a priori dos traços latentes e parâmetros populacionais conseguindo, assim, formas conhecidas para todas as distribuições condicionais completas, o que nos possibilitou desenvolver o amostrador de Gibbs completo. Comparamos esses algoritmos utilizando como critério o tamanho efetivo de amostra, veja Sahu (2002). O amostrador de Gibbs completo obteve o melhor desempenho. Também avaliamos o impacto do número de respondentes por grupo, número de itens por grupo, número de itens comuns, assimetria da distribuição do grupo de referência e priori, na recuperação dos parâmetros. Os resultados indicaram que nosso modelo recuperou bem todos os parâmetros, principalmente, quando utilizamos a priori de Jeffreys. Além disso, o número de itens por grupo e o número de examinados por grupo, mostraram ter um alto impacto na recuperação dos traços latentes e parâmetros dos itens, respectivamente. Analisamos um conjunto de dados reais que apresenta indícios de assimetria na distribuição dos traços latentes de alguns grupos. Os resultados obtidos com o nosso modelo confirmam a presença de assimetria na maioria dos grupos. Estudamos algumas medidas de diagnóstico baseadas na distribuição preditiva de medidas de discrepância adequadas. Por último, comparamos os modelos simétrico e assimétrico utilizando os critérios sugeridos por Spiegelhalter et al. (2002). O modelo assimétrico se ajustou melhor aos dados segundo todos os critérios
Abstract: An usual assumption for parameter estimation in the Item Response Models (IRM) is to assume that the latent traits are random variables which follow a normal distribution. However, many works suggest that this assumption does not apply in many cases. For example, the works of Micceri (1989) and Bazán (2006). Recently Azevedo et.al (2011) proposed an IRM with skew-normal distribution under the centred parametrization for the latent traits, considering one single group of examinees. In the present work, we developed an extension of this model to account for multiple groups. We developed two MCMC algorithms to parameter estimation using the augmented data structure to represent the Item response function (IRF), see Albert (1992). The First is a Metropolis-Hastings within Gibbs sampling. In the second, we use stochastic representations (creating a hierarchical structure) in the prior distribution of the latent traits and population parameters. Therefore, we obtained known full conditional distributions, which enabled us to develop the full Gibbs sampler. We compared these algorithms using the effective sample size criteria, see Sahu (2002). The full Gibbs sampling presented the best performance. We also evaluated the impact of the number of examinees per group, number of items per group, number of common items, priors and asymmetry of the reference group, on the parameter recovery. The results indicated that our approach recovers properly all parameters, mainly, when we consider the Jeffreys prior. Furthermore, the number of items per group and the number of examinees per group, showed to have a high impact on the recovery of the true of latent traits and item parameters, respectively. We analyze a real data set in which we found an evidence of asymmetry in the distribution of latent traits of some groups. The results obtained with our model confirmed the presence of asymmetry in most groups. We studied some diagnostic measures based on predictive distribution of appropriate discrepancy measures. Finally, we compared the symmetric and asymmetric models using the criteria suggested by Spiegelhalter et al. (2002). The asymmetrical model fits better according to all criteria
Mestrado
Estatistica
Mestre em Estatística

APA, Harvard, Vancouver, ISO, and other styles

21

Padilla, Gómez Juan Leonardo 1989. "Modelos da teoria de resposta ao item multidimensionais assimétricos de grupos múltiplos para respostas dicotômicas sob um enfoque bayesiano." [s.n.], 2014. http://repositorio.unicamp.br/jspui/handle/REPOSIP/306792.

Full text

Abstract:

Orientadores: Caio Lucidius Naberezny Azevedo, Dalton Francisco de Andrade
Dissertação (mestrado) - Universidade Estadual de Campinas, Instituto de Matemática Estatística e Computação Científica
Made available in DSpace on 2018-08-24T22:30:44Z (GMT). No. of bitstreams: 1 PadillaGomez_JuanLeonardo_M.pdf: 10775900 bytes, checksum: 50bc9965f728b4b04b42b7428c3ec8ab (MD5) Previous issue date: 2014
Resumo: No presente trabalho propõe-se novos modelos da Teoria de Resposta ao Item Multidimensional (TRIM) para respostas dicotômicas ou dicotomizadas considerando uma estrutura de grupos múltiplos. Para as distribuições dos traços latentes de cada grupo, propõe-se uma nova parametrização da distribuição normal assimétrica multivariada centrada, que combina as propostas de Lachos (2004) e de Arellano-Valle et.al (2008), a qual não só garante a identificabilidade dos modelos aqui introduzidos, mas também facilita a interpretação e estimação dos seus parâmetros. Portanto, nosso modelo representa uma alternativa interessante, para solucionar os problemas de falta de identificabilidade encontrados por Matos (2010) e Nojosa (2008), nos modelos multidimensionais assimétricos de um único grupo por eles desenvolvidos. Estudos de simulação, considerando vários cenários de interesse prático, foram conduzidos a fim de avaliar o potencial da tríade: modelagem, métodos de estimação e ferramentas de diagnósticos. Os resultados indicam que os modelos considerando a assimetria nos traços latentes, em geral, forneceram estimativas mais acuradas que os modelos tradicionais. Para a seleção de modelos, utilizou-se o critério de informação deviance (DIC), os valores esperados do critério de informação de Akaike (EAIC) e o critério de informação bayesiano (EBIC). Em relação à verificação da qualidade do ajuste de modelos, explorou-se alguns métodos de checagem preditiva a posteriori, os quais fornecem meios para avaliar a qualidade tanto do instrumento de medida, quanto o ajuste do modelo de um ponto de vista global e em relação à suposições específicas, entre elas a dimensão do teste. Com relação aos métodos de estimação, adaptou-se e implementou-se vários algoritmos MCMC propostos na literatura para outros modelos, inclusive a proposta de aceleração de convergência de González (2004), os quais foram comparados em relação aos aspectos de qualidade de convergência através do critério de tamanho efetivo da amostra de Sahu (2002). A análise de um conjunto de dados reais, referente à primeira fase do vestibular da UNICAMP de 2013 também foi realizada
Abstract: In this work it is proposed a new class of Multidimensional Item Response Theory (MIRT) models for dichotomic or dichotomized answers considering a multiple group structure. For the latent traits distribution of each group, it is proposed a new parametrization of the centered multivariate skew normal distribution, which combines the proposed by Lachos (2004) and the one proposed by Arellano-Valle et.al (2008), which not only ensures de identifiability of our proposed models, but also it makes simpler the interpretation and estimation of their parameters. Hence, our model stands as an important alternative, in order to solve the identifiability problems found for the one group multidimensional skewed models proposed by Matos (2010) and Nojosa (2008). Simulation studies, taking into account some situations of practical interest, were conducted in order to evaluate the potential of the triad: modeling, estimation methods and diagnostic tools. The results indicate that the models considering a skew component on the latent traits, in general, produced more accurate results than those ones obtained with the symmetric models. For model selection, it was used the deviance information criterion (DIC), the expected values of both the Akaike¿s information criterion (EAIC) and bayesian information criteron (EBIC). Concerning assessment of model fit quality, it was explored posterior predictive checking methods, which allows for evaluating the quality of the measure instrument as well as the quality fit of the model from a global point of view and related to specific assumptions, as the test dimensionality. Concerning the estimation methods, it was adopted and implemented several MCMC algorithms proposed in the literature for other models, including the convergence accelerating propose algorithm by Gonzalez (2004), which were compared concerning some convergence quality aspects through the Sahu (2002) effective sample size. The analysis of a real data set, from the 2013 first stage of the UNICAMP admission exam was done as well
Mestrado
Estatistica
Mestre em Estatística

APA, Harvard, Vancouver, ISO, and other styles

22

Sunny, Cijy Elizabeth. "Stakeholders’ Conceptualization of Students’ Attitudes and Persistence towards STEM: A Mixed Methods Instrument Development and Validation Study." University of Cincinnati / OhioLINK, 2018. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1521190666039014.

Full text

APA, Harvard, Vancouver, ISO, and other styles

23

Bush, Joan Spooner. "A Comparison of Traditional Norming and Rasch Quick Norming Methods." Thesis, University of North Texas, 1993. https://digital.library.unt.edu/ark:/67531/metadc277818/.

Full text

Abstract:

The simplicity and ease of use of the Rasch procedure is a decided advantage. The test user needs only two numbers: the frequency of persons who answered each item correctly and the Rasch-calibrated item difficulty, usually a part of an existing item bank. Norms can be computed quickly for any specific group of interest. In addition, once the selected items from the calibrated bank are normed, any test, built from the item bank, is automatically norm-referenced. Thus, it was concluded that the Rasch quick norm procedure is a meaningful alternative to traditional classical true score norming for test users who desire normative data.

APA, Harvard, Vancouver, ISO, and other styles

24

Silva, Wellington. "Eficácia dos processos de linkagem na avaliação educacional em larga escala." Universidade Federal de Juiz de Fora (UFJF), 2010. https://repositorio.ufjf.br/jspui/handle/ufjf/2699.

Full text

Abstract:

Submitted by Renata Lopes (renatasil82@gmail.com) on 2016-09-20T13:47:56Z No. of bitstreams: 1 wellingtonsilva.pdf: 6130109 bytes, checksum: 639bf4b28ab59af38731c1e34562bfcc (MD5)
Approved for entry into archive by Adriana Oliveira (adriana.oliveira@ufjf.edu.br) on 2016-10-04T15:48:34Z (GMT) No. of bitstreams: 1 wellingtonsilva.pdf: 6130109 bytes, checksum: 639bf4b28ab59af38731c1e34562bfcc (MD5)
Approved for entry into archive by Adriana Oliveira (adriana.oliveira@ufjf.edu.br) on 2016-10-04T15:48:47Z (GMT) No. of bitstreams: 1 wellingtonsilva.pdf: 6130109 bytes, checksum: 639bf4b28ab59af38731c1e34562bfcc (MD5)
Made available in DSpace on 2016-10-04T15:48:47Z (GMT). No. of bitstreams: 1 wellingtonsilva.pdf: 6130109 bytes, checksum: 639bf4b28ab59af38731c1e34562bfcc (MD5) Previous issue date: 2010-09-14
Em 1997, através do Sistema Nacional de Avaliação da Educação Básica – SAEB, definiu-se a escala de proficiência para o Brasil. A partir de então, praticamente todas as avaliações em larga escala realizadas por diversos estados brasileiros têm procurado manter uma comparabilidade de resultados com essa escala, por meio da Metodologia da Teoria da Resposta ao Item – TRI. Entretanto observa-se uma diversidade de situações ao se analisar as diferentes avaliações realizadas pelos Estados brasileiro e até mesmo no próprio SAEB. Nesse trabalho, apresentaremos alguns aspectos técnicos necessários para se garantir a comparabilidade nos procedimentos de linkagem de avaliações, bem como as características das avaliações do SAEB e de alguns estados brasileiros ao longo do tempo.
In 1997, through the National System of Basic Education Evaluation ( SAEB ), the proficiency scale for Brazil was defined. From that time on, almost all the assessment realized by several Brazilian states have tried to keep a result comparability with this scale through Item Response Theory Methodology ( IRT ). However, a variety of situations is observed when different assessments realized in Brazilian states or even at SAEB are analyzed.In this article, some technical aspects needed for ensuring the comparability in the assessment linking procedures are presented, as well as the characteristic of SAEB´s assessment and some Brazilian states´ assessment throughout time.

APA, Harvard, Vancouver, ISO, and other styles

25

Azevedo, Caio Lucidius Naberezny. "Modelos longitudinais de grupos múltiplos multiníveis na teoria da resposta ao item: métodos de estimação e seleção estrutural sob uma perspectiva bayesiana." Universidade de São Paulo, 2008. http://www.teses.usp.br/teses/disponiveis/45/45133/tde-15042008-165256/.

Full text

Abstract:

No presente trabalho propomos uma estrutura bayesiana, através de um esquema de dados aumentados, para analisar modelos longitudinais com grupos mútiplos (MLGMTRI) na Teoria da Resposta ao Item (TRI). Tal estrutura consiste na tríade : modelagem, métodos de estimação e métodos de diagnóstico para a classe de MLGMTRI. Na parte de modelagem, explorou-se as estruturas multivariada e multinível, com o intuito de representar a hierarquia existente em dados longitudinais com grupos múltiplos. Esta abordagem permite considerar várias classes de submodelos como: modelos de grupos múltiplos e modelos longitudinais de um único grupo. Estudamos alguns aspectos positivos e negativos de cada uma das supracitadas abordagens. A modelagem multivariada permite representar de forma direta estruturas de dependência, além de possibilitar que várias delas sejam facilmente incorporadas no processo de estimação. Isso permite considerar, por exemplo, uma matriz não estruturada e assim, obter indícios da forma mais apropriada para a estrutura de dependência. Por outro lado, a modelagem multinível propicia uma interpretação mais direta, obtenção de condicionais completas univariadas, fácil inclusão de informações adicionais, incorporação de fontes de dependência intra e entre unidades amostrais, dentre outras. Com relação aos métodos de estimação, desenvolvemos um procedimento baseado nas simulações de Monte Carlo via cadeias de Markov (MCMC). Mostramos que as distribuições condicionais completas possuem forma analítica conhecida e, além disso, são fáceis de se amostrar. Tal abordagem, apesar de demandar grande esforço computacional, contorna diversos problemas encontrados em outros procedimentos como: limitação no número de grupos envolvidos, quantidade de condições de avaliação, escolha de estruturas de dependência, assimetria dos traços latentes, imputação de dados, dentre outras. Além disso, através da metodologia MCMC, desenvolvemos uma estrutura de seleção de matrizes de covariâncias, através de um esquema de Monte Carlo via Cadeias de Markov de Saltos Reversíveis (RJMCMC). Estudos de simulação indicam que o modelo, o método de estimação e o método de seleção produzem resultados bastante satisfatórios. Também, robustez à escolha de prioris e valores iniciais foi observada. Os métodos de estimação desenvolvidos podem ser estendidos para diversas situações de interesse de um modo bem direto. Algumas das técnicas de diagnóstico estudadas permitem avaliar a qualidade do ajuste do modelo de um modo global. Outras medidas fornecem indícios de violação de suposições específicas, como ausência de normalidade para os traços latentes. Tal metodologia fornece meios concretos de se avaliar a qualidade do instrumento de medida (prova, questionário etc). Finalmente, a análise de um conjunto de dados real, utilizando-se alguns dos modelos abordados no presente trabalho, ilustra o potencial da tríade desenvolvida além de indicar um ganho na utilização dos modelos longitudinais da TRI na análise de ensaios educacionais com medidas repetidas em deterimento a suposição de independência.
In this work we proposed a bayesian framework, by using an augmented data scheme, to analyze longitudinal multiple groups models (LMGMIRT) in the Item Response Theory (IRT). Such framework consists in the following set : modelling, estimation methods and diagnostic tools to the LMGMIRT. Concerning the modelling, we exploited multivariate and multilevel structures in order to represent the hierarchical nature of the longitudinal multiple groupos model. This approach allows to consider several submodels such that: multiple groups and longitudinal one group models. We studied some positive and negative aspects of both above mentioned approches. The multivariate modelling allows to represent, in a straightforward way, many dependence structures. Furthermore it possibilities that many of them can be easily considered in the estimation process. This allows, for example, to consider an unstructured covariance matrix and, then, it allows to obtain information about the most appropritate dependece structure. On the other hand, the multilevel modelling permits to obtain: more straightforward interpretations of the model, the construction of univariate full conditional distributions, an easy way to include auxiliary information, the incorporation of within and between subjects (groups) sources of variability, among others. Concerning the estimation methods, we developed a procedure based on Monte Carlo Markov Chain (MCMC) simulation. We showed that the full conditional distributions are known and easy to sample from. Even though such approach demands a considerable amount of time it circumvents many problems such that: limitation in the number of groups that can be considered, the limitation in the number of instants of observation, the choice of covariance matrices, latent trait asymmetry, data imputation, among others. Furthermore, within the MCMC metodology, we developed a procedure to select covariance matrices, by using the so called Reversible Jump MCMC (RJMCMC). Simulation studies show that the model, the estimation method and the model selection procedure produce reasonable results. Also, the studies indicate that the developed metodology presents robustness concerning prior choice and different initial values choice. It is possible to extent the developed estimation methods to other situations in a straightforward way. Some diagnostics techniques that were studied allow to assess the model fit, in a global sense. Others techniques give directions toward the departing from some specific assumptions as the latent trait normality. Such methodology also provides ways to assess the quality of the test or questionaire used to measure the latent traits. Finally, by analyzing a real data set, using some of the models that were developed, it was possible to verify the potential of the methodology considered in this work. Furthermore, the results of this analysis indicate advantages in using longitudinal IRT models to model educational repeated measurement data instead of to assume independence.

APA, Harvard, Vancouver, ISO, and other styles

26

Buatois, Simon. "Novel pharmacometric methods to improve clinical drug development in progressive diseases." Thesis, Sorbonne Paris Cité, 2018. http://www.theses.fr/2018USPCC133.

Full text

Abstract:

Suite aux progrès techniques et méthodologiques dans le secteur de la modélisation, l’apport de ces approches est désormais reconnu par l’ensemble des acteurs de la recherche clinique et pourrait avoir un rôle clé dans la recherche sur les maladies progressives. Parmi celles-ci les études pharmacométriques (PMX) sont rarement utilisées pour répondre aux hypothèses posées dans le cadre d’études dites de confirmation. Parmi les raisons évoquées, les analyses PMX traditionnelles ignorent l'incertitude associée à la structure du modèle lors de la génération d'inférence statistique. Or, ignorer l’étape de sélection du modèle peut aboutir à des intervalles de confiance trop optimistes et à une inflation de l’erreur de type I. Pour y remédier, nous avons étudié l’apport d’approches PMX innovantes dans les études de choix de dose. Le « model averaging » couplée à un test du rapport de « vraisemblance combiné » a montré des résultats prometteurs et tend à promouvoir l’utilisation de la PMX dans les études de choix de dose. Pour les études dites d’apprentissage, les approches de modélisation sont utilisées pour accroitre les connaissances associées aux médicaments, aux mécanismes et aux maladies. Dans cette thèse, les mérites de l’analyse PMX ont été évalués dans le cadre de la maladie de Parkinson. En combinant la théorie des réponses aux items à un modèle longitudinal, l’analyse PMX a permis de caractériser adéquatement la progression de la maladie tout en tenant compte de la nature composite du biomarqueur. Pour conclure, cette thèse propose des méthodes d’analyses PMX innovantes pour faciliter le développement des médicaments et/ou les décisions des autorités réglementaires
In the mid-1990, model-based approaches were mainly used as supporting tools for drug development. Restricted to the “rescue mode” in situations of drug development failure, the impact of model-based approaches was relatively limited. Nowadays, the merits of these approaches are widely recognised by stakeholders in healthcare and have a crucial role in drug development for progressive diseases. Despite their numerous advantages, model-based approaches present important drawbacks limiting their use in confirmatory trials. Traditional pharmacometric (PMX) analyses relies on model selection, and consequently ignores model structure uncertainty when generating statistical inference. The problem of model selection is potentially leading to over-optimistic confidence intervals and resulting in a type I error inflation. Two projects of this thesis aimed at investigating the value of innovative PMX approaches to address part of these shortcomings in a hypothetical dose-finding study for a progressive disorder. The model averaging approach coupled to a combined likelihood ratio test showed promising results and represents an additional step towards the use of PMX for primary analysis in dose-finding studies. In the learning phase, PMX is a key discipline with applications at every stage of drug development to gain insight into drug, mechanism and disease characteristics with the ultimate goal to aid efficient drug development. In this thesis, the merits of PMX analysis were evaluated, in the context of Parkinson’s disease. An item-response theory longitudinal model was successfully developed to precisely describe the disease progression of Parkinson’s disease patients while acknowledging the composite nature of a patient-reported outcome. To conclude, this thesis enhances the use of PMX to aid efficient drug development and/or regulatory decisions in drug development

APA, Harvard, Vancouver, ISO, and other styles

27

Fugita, Felipe. "Avaliação educacional : um olhar matemático." reponame:Repositório Institucional da UFABC, 2018.

Find full text

Abstract:

Orientador: Prof. Dr. Daniel Miranda Machado
Dissertação (mestrado) - Universidade Federal do ABC, Programa de Pós-Graduação em Mestrado Profissional em Matemática em Rede Nacional - PROFMAT, Santo André, 2017.
Um dos objetivos desse trabalho é explicar a Teoria de Resposta ao Item, conhecida como TRI, enfatizando o modelo logístico de três parâmetros e descrevendo suas principais características. Outro objetivo é mostrar como o professor pode utilizar ferramentas estatísticas, em uma planilha eletrônica, para: verificar a qualidade das questões que compõe sua prova; analisar se existe uma correlação entre dois instrumentos de avaliação; utilizar a média escolar de um aluno para inferir sobre o seu desempenho no vestibular; entre outras possibilidades. Com a finalidade de explicar a TRI e seu método de estimação de parâmetros por Máxima Verossimilhança, são apresentados previamente os modelos Matemáticos, Probabilísticos e Estatísticos, pilares dessa teoria. Além disso, é descrito como os programas de avaliações educacionais em larga escala de diversos países utilizam a TRI para monitorar o desempenho de seus sistemas educacionais. Em seguida, são expostas algumas ferramentas Estatísticas, em específico, o coeficiente de correlação, o método de mínimos quadrados e o ponto bisserial que podem colaborar nos processos de avaliações educacionais que fazem parte da rotina escolar. São ilustrados também exemplos de planilhas eletrônicas com a descrição passo a passo de sua construção e dos comandos utilizados. Desse modo, espera-se contribuir para compreensão da TRI e, consequentemente, dos indicativos educacionais produzidos pelos programas de avaliações em larga escala, bem como, para atuação e reflexão da prática docente em seus métodos de avaliação educacional.
One of the goals of this work is to explain Item Response Theory, known as IRT, emphasizing the Three-Parameter Logistic model and describing its main characteristics. Another objective is to demonstrate how educators can use statistical tools within a spreadsheet to: verify the quality and reliability of test questions; examine whether there is a correlation between two assessment tools; use the school average of a student to predict his or her performance in entrance examinations; among other possibilities. To explain IRT and its method of parameter estimation by maximum likelihood, this work presents the mathematical, probabilistic and statistical models that are the pillars of the theory. It also describes how the large-scale educational assessment programs of various countries use IRT to monitor the performance of their education systems. Then, this work presents a selection of statistical tools, specifically, the correlation coefficient, the least squares method and the point biserial correlation, which could contribute to the process of routine educational assessments. Also provided are illustrated examples of spreadsheets with step-by- step descriptions of their creation and the commands used. Thus, the work hopes to contribute to the understanding of IRT and, consequently, of the educational indicators produced by large-scale assessment programs, as well as benefit educators in their practice and reflection on methods of educational evaluation.

APA, Harvard, Vancouver, ISO, and other styles

28

"Stability and sensitivity of a model-based person-fit index in detecting item pre-knowledge in computerized adaptive test." Thesis, 2008. http://library.cuhk.edu.hk/record=b6074790.

Full text

Abstract:

After the stability and sensitivity of FLOR were investigated, the application of it in the CAT environment had become the main concern. The present studies found that both the test length and the number of exposed items affect the final value of FLOR. In the fixed length CAT, the FLOR has a much stronger sensitivity than lz and CUSUM in detecting item pre-knowledge. The sensitivity of FLOR in the fixed length CAT was the same as that in the fixed length fixed items test. If the test length could vary, the sensitivity of FLOR in CAT would be slightly weakened. The Adjusted FLOR index could increase the sensitivity. Concerning about the effect of ability on the sensitivity of FLOR in CAT, it was found that the abilities of the test takers in CAT did not affect the sensitivity of FLOR and Adjusted FLOR.
Item response theory is a modern test theory. It focuses on the performance of each item. Under this framework, the performance of test takers on a test item can be predicted by a set of abilities. The relationship between the test takers' item performances and the set of abilities underlying item performances can be described by a monotonically increasing function called an item characteristic curve. Due to various personal reasons, the performances of the test takers may depart from the response patterns predicted by the underlying test model. In order to calculate the extent of departure of these aberrant response patterns, a number of methods have been developed under the theme "person-fit statistics". The degree of aberration is calculated as an index called person-fit index. Inside the computerized adaptive testing (CAT), test takers with different abilities will answer different numbers of questions and the difficulties of the items administered to them are usually clustered at the abilities of the test takers. Due to this reason, the application of person-fit indices in the computerized adaptive testing environment to measure misfit is difficult.
The present study also found that FLOR has a much superior sensitivity over other indices in detecting item pre-knowledge. Concerning about the sensitivity over different abilities of test takers, it was found that the sensitivity of FLOR was the highest among low ability test takers and the weakest among strong ability test takers in the fixed length and fixed items tests. However, the sensitivities of FLOR became the same among different abilities of test takers if items with difficulties matching their abilities were used in the tests. The number of beneficiaries among the test takers did not affect the sensitivity of FLOR. Moreover, in a simulation to test the differentiating power of FLOR, it was found that FLOR could differentiate item pre-knowledge from other reasons of personal misfits (test anxiety, player, random response and challenger) effectively.
The present study assessed the stability of FLOR over other variables, which were unrelated to item pre-knowledge. It found that FLOR was stable over the discrimination and difficulty parameters of test items. It was also stable over positions of the exposed items in the test and the initial assignment of prior probability of item pre-knowledge. However, the asymptotes (guessing factor) and the probabilities of item exposure did affect the final values of FLOR seriously.
The present study used the hf plot to access the sensitivity of the person-fit indices. hf plot is a plot of hit rate against false alarm rate. For a higher hit rate, usually a higher false alarm rate is followed. hf plot provides a good tools for comparison between indices by inspection of the speed of rise of the curves. A sensitive index should give a faster rise of the curve. In this study, sensitivity of an index was defined as the speed of rise of the hf plot, which is represented by a parameter hftau estimated from the data obtained from hf plot.
When the frequent accesses to the item bank has become feasible, test takers may memorize blocks of test items and share these items with future test takers. Individuals with prior knowledge of some items may use that information to get high scores, in the sense that their test scores have been artificially inflated. FLOR is an index of posterior log-odds ratio used for detecting the use of item pre-knowledge. It can be applied both in the fixed item, fixed length test and the CAT environment. It is a model-based index in which aberrant models are defined in the situation of item pre-knowledge. FLOR describes the likelihood that a response pattern arises from the aberrant models.
Hui Hing-fai.
Adviser: Kit-tai Hau.
Source: Dissertation Abstracts International, Volume: 70-09, Section: A, page: .
Thesis (Ed.D.)--Chinese University of Hong Kong, 2008.
Includes bibliographical references (leaves 108-111).
Electronic reproduction. Hong Kong : Chinese University of Hong Kong, [2012] System requirements: Adobe Acrobat Reader. Available via World Wide Web.
Electronic reproduction. [Ann Arbor, MI] : ProQuest Information and Learning, [200-] System requirements: Adobe Acrobat Reader. Available via World Wide Web.
Abstracts in English and Chinese.
School code: 1307.

APA, Harvard, Vancouver, ISO, and other styles

29

Deng, Nina. "Evaluating IRT- and CTT- based methods of estimating classification consistency and accuracy indices from single administrations." 2011. https://scholarworks.umass.edu/dissertations/AAI3482610.

Full text

Abstract:

Three decision consistency and accuracy (DC/DA) methods, the Livingston and Lewis (LL) method, LEE method, and the Hambleton and Han (HH) method, were evaluated. The purposes of the study were: (1) to evaluate the accuracy and robustness of these methods, especially when their assumptions were not well satisfied, (2) to investigate the “true” DC/DA indices in various conditions, and (3) to assess the impact of choice of reliability estimate on the LL method. Four simulation studies were conducted: Study 1 looked at various test lengths. Study 2 focused on local item dependency (LID). Study 3 checked the consequences of IRT model-data misfit and Study 4 checked the impact of using different scoring metrics. Finally, a real data study was conducted where no advantages were given to any models or assumptions. The results showed that the factors of LID and model misfit had a negative impact on “true” DA index, and made all selected methods over-estimate DA index. On the contrary, the DC estimates had minimal impacts from the above factors, although the LL method had poorer estimates in short tests and the LEE and HH methods were less robust to tests with a high level of LID. Comparing the selected methods, the LEE and HH methods had nearly identical results across all conditions, while the HH method had more flexibility in complex scoring metrics. The LL method was found sensitive to the choice of test reliability estimate. The LL method with Cronbach’s alpha consistently underestimated DC estimates while LL with stratified alpha functioned noticeably better with smaller bias and more robustness in various conditions. Lastly it is hoped to make the software be available soon to permit the wider use of the HH method. The other methods in the study are already well supported by easy to use software.

APA, Harvard, Vancouver, ISO, and other styles

30

Meng, Yu. "Comparison of kernel equating and item response theory equating methods." 2012. https://scholarworks.umass.edu/dissertations/AAI3518262.

Full text

Abstract:

The kernel method of test equating is a unified approach to test equating with some advantages over traditional equating methods. Therefore, it is important to evaluate in a comprehensive way the usefulness and appropriateness of the Kernel equating (KE) method, as well as its advantages and disadvantages compared with several popular item response theory (IRT) equating techniques. The purpose of this study was to evaluate the accuracy and stability of KE and IRT true score equating by manipulating several common factors that are known to influence the equating results. Three equating methods (Kernel post-stratification equating, Stocking-Lord and Mean/Sigma) were compared with an established equating criterion. A wide variety of conditions were simulated to match realistic situations that reflected differences in sample size, anchor test length and, group ability differences. The systematic error and random error of equating were summarized with bias statistics and the standard error of equating (SEE), and compared across the methods. The overall better equating methods under specific conditions were recommended based on the root mean squared error (RMSE). The equating results revealed that, and as expected, in general, equating error decreased as the number of anchor items was increased and sample size was increased across all the methods. Aside from method effects, group differences in ability produced the greatest impact on equating error in this particular study. The accuracy and stability of each equating method depended on the portion of the score scale range where comparisons were being made. Overall, Kernel equating was shown to be more stable in most situations but not as accurate as IRT equating for the conditions studied. The interactions between pairs of factors investigated in this study seemed to be more influential and beneficial to IRT equating than for KE. Further practical recommendations were suggested for future study: for example, using alternate methods of data simulation to remove the advantage of the IRT equating methods.

APA, Harvard, Vancouver, ISO, and other styles

31

Kuo, Hsiu-Fen, and 郭秀芬. "The Performance of Diffrernt Estimation Methods under Multidimensional Item Response Theory." Thesis, 2013. http://ndltd.ncl.edu.tw/handle/95824695563257046316.

Full text

Abstract:

碩士
國立臺中教育大學
教育測驗統計研究所
101
Currently, the relevant research of possible values is based on UIRT. Reaserch that focus on the influence of auxiliary variables on the estimation of population statistics and item parameter based on MIRT is rare. The purpose of this study is to explore the performance of different methods of estimation under MIRT, using simulated data. This study is based on multidimensional random coefficients multinomial logit model (MRCMLM). EAP, EAP_AV, MLE, WLE, PV_noAV and PV are used to compare the estimation efficiency of individual ability and population statistics under both of the situations that ancillary variables are used or not. 　　The result shows that when the number of items increase, the estimation error decrease, especially in the individual ability estimation. In multidimensional item response theory, when the correlation between dimensions increases, the accuracy of estimation does not improve. In the condition that ancillary variables are used, the impact of the correlation between the ancillary variables and the ability is limited by the item parameter setting. 　　When ancillary variables are incorporated, the parameters can be estimated well. Even if the item parameters and ability parameters have different distributions. In the condition that ancillary variables are not incorporated, plausible value still performances well in the recovery of standard deviation.

APA, Harvard, Vancouver, ISO, and other styles

32

Sabouri, Pooneh 1980. "Alternative estimation approaches for some common Item Response Theory models." Thesis, 2010. http://hdl.handle.net/2152/ETD-UT-2010-08-1841.

Full text

Abstract:

In this report we give a brief introduction to Item Response Theory models and multilevel models. The general assumptions of two classical Item Response Theory, 1PL and 2PL models are discussed. We follow the discussion by introducing a multilevel level framework for these two Item Response Theory Models. We explain Bock and Aitkin's (1981) work to estimate item parameters for these two models. Finally we illustrate these models with a LSAT exam data and two statistical softwares; R project and Stata.
text

APA, Harvard, Vancouver, ISO, and other styles

33

Sohn, Youngsoon. "A comparison of methods for item analysis and DIF using classical test theory, item response theory, and generalized linear model." 2009. http://purl.galileo.usg.edu/uga%5Fetd/sohn%5Fyoungsoon%5F200905%5Fma.

Full text

APA, Harvard, Vancouver, ISO, and other styles

34

Taljaard, Monica. "Non-response error in surveys." Diss., 1997. http://hdl.handle.net/10500/16167.

Full text

Abstract:

Non-response is an error common to most surveys. In this dissertation, the error of non-response is described in terms of its sources and its contribution to the Mean Square Error of survey estimates. Various response and completion rates are defined. Techniques are examined that can be used to identify the extent of nonresponse bias in surveys. Methods to identify auxiliary variables for use in nonresponse adjustment procedures are described. Strategies for dealing with nonresponse are classified into two types, namely preventive strategies and post hoc adjustments of data. Preventive strategies discussed include the use of call-backs and follow-ups and the selection of a probability sub-sample of non-respondents for intensive follow-ups. Post hoc adjustments discussed include population and sample weighting adjustments and raking ratio estimation to compensate for unit non-response as well as various imputation methods to compensate for item non-response.
Mathematical Sciences
M. Com. (Statistics)

APA, Harvard, Vancouver, ISO, and other styles

35

Tělupil, Dominik. "Adaptivní testování pro odhad znalostí." Master's thesis, 2018. http://www.nusl.cz/ntk/nusl-386945.

Full text

Abstract:

In this thesis, we describe and analyze computerized adaptive tests (CAT), the class of psychometrics tests in which items are selected based on the actual estimate of respondent's ability. We focus on the tests based on di- chotomic IRT (item response theory) models. We present critera for item selection, methods for ability estimation and termination criteria, as well as methods for exposure rate control and content balancing. In the analytical part, the effect of CAT settings on the average length of the test and on absoulute bias of ability estimates is investigated using linear regression mo- dels. We provide post hoc analysis of real data coming from real admission test with unknown true values of abilities, as well as simulation study based on the simulated answers of respondents with known true values of ability. In the last chapter of the thesis we investigate the possibilities of analysing adaptive tests in R software and of creating a real CAT. 1

APA, Harvard, Vancouver, ISO, and other styles

We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!